Add rate limiting on auth and invite endpoints #1

Open
opened 2026-04-02 11:20:03 +02:00 by marcel · 5 comments
Owner

Problem

There is no rate limiting anywhere in the application. /v1/auth/login is wide open for credential stuffing and brute-force attacks. /v1/invites/{code}/accept can be used for invite code enumeration. /v1/auth/signup enables account spam.

Attack scenario

An attacker can attempt thousands of login requests per second against /v1/auth/login to brute-force credentials, or enumerate invite codes against the accept endpoint.

Affected files

  • SecurityConfig.java — no rate limiting filter configured

Add rate limiting at minimum on:

  • /v1/auth/login (e.g., 5 attempts per minute per IP)
  • /v1/auth/signup (e.g., 3 per minute per IP)
  • /v1/invites/*/accept (e.g., 10 per minute per IP)

Consider using a servlet filter with Bucket4j or Spring Boot's built-in rate limiting support.

Severity

Critical — without this, brute-force and credential stuffing are trivially possible.

## Problem There is no rate limiting anywhere in the application. `/v1/auth/login` is wide open for credential stuffing and brute-force attacks. `/v1/invites/{code}/accept` can be used for invite code enumeration. `/v1/auth/signup` enables account spam. ## Attack scenario An attacker can attempt thousands of login requests per second against `/v1/auth/login` to brute-force credentials, or enumerate invite codes against the accept endpoint. ## Affected files - `SecurityConfig.java` — no rate limiting filter configured ## Recommended fix Add rate limiting at minimum on: - `/v1/auth/login` (e.g., 5 attempts per minute per IP) - `/v1/auth/signup` (e.g., 3 per minute per IP) - `/v1/invites/*/accept` (e.g., 10 per minute per IP) Consider using a servlet filter with Bucket4j or Spring Boot's built-in rate limiting support. ## Severity Critical — without this, brute-force and credential stuffing are trivially possible.
marcel added the kind/securitypriority/critical labels 2026-04-02 11:20:53 +02:00
Author
Owner

👨‍💻 Kai — Frontend Engineer

Rate limiting is a backend concern, but it surfaces directly in the frontend UX when limits are hit. Here's what I need to handle.

Frontend implications of 429 responses:

  • When /v1/auth/login returns 429 Too Many Requests, the login form must show a clear, user-friendly message — not a generic error. Something like "Too many login attempts. Please wait a minute before trying again." I'll wire this up in the +page.server.ts login action, catching 429 specifically.
  • The 429 response should ideally include a Retry-After header. If it does, I can display a countdown: "Try again in 45 seconds." That's a better UX than "wait a minute" with no feedback.
  • For signup (/v1/auth/signup), same pattern — 429 should show "Too many signup attempts. Please try again later."
  • For invite acceptance, the user is likely coming in from a link, so a 429 there would be confusing. What's the UX if an invite link hits rate limiting? "This invite link is temporarily unavailable" feels wrong if it's a legitimate user clicking once. Is the rate limit per-IP strict enough that a legitimate user would ever hit it?

Questions:

  • Does the rate limit reset per-IP or per-account? Per-IP is standard, but I want to know for the error message copy.
  • Will the 429 response body include a message field with a human-readable reason, or just the status code? I'll use whatever is in the response body for the error display.
  • Is there a Retry-After header in the plan? If yes, I'll build the countdown UI. If no, the message will be generic.

No code blockers — just need the response contract defined before I implement the error states.

## 👨‍💻 Kai — Frontend Engineer Rate limiting is a backend concern, but it surfaces directly in the frontend UX when limits are hit. Here's what I need to handle. **Frontend implications of 429 responses:** - When `/v1/auth/login` returns `429 Too Many Requests`, the login form must show a clear, user-friendly message — not a generic error. Something like "Too many login attempts. Please wait a minute before trying again." I'll wire this up in the `+page.server.ts` login action, catching 429 specifically. - The `429` response should ideally include a `Retry-After` header. If it does, I can display a countdown: "Try again in 45 seconds." That's a better UX than "wait a minute" with no feedback. - For signup (`/v1/auth/signup`), same pattern — 429 should show "Too many signup attempts. Please try again later." - For invite acceptance, the user is likely coming in from a link, so a 429 there would be confusing. What's the UX if an invite link hits rate limiting? "This invite link is temporarily unavailable" feels wrong if it's a legitimate user clicking once. Is the rate limit per-IP strict enough that a legitimate user would ever hit it? **Questions:** - Does the rate limit reset per-IP or per-account? Per-IP is standard, but I want to know for the error message copy. - Will the `429` response body include a `message` field with a human-readable reason, or just the status code? I'll use whatever is in the response body for the error display. - Is there a `Retry-After` header in the plan? If yes, I'll build the countdown UI. If no, the message will be generic. **No code blockers** — just need the response contract defined before I implement the error states.
Author
Owner

🏗️ Backend Engineer — Spring Boot / PostgreSQL Specialist

Agreed this is critical. Let me lay out the implementation options with the tradeoffs clearly, since the issue mentions a few approaches.

Option 1: Bucket4j (recommended for v1)

Bucket4j is a Java rate-limiting library based on the token bucket algorithm. It integrates cleanly with Spring Boot as a servlet filter or Spring MVC interceptor, and supports in-memory (no Redis required) or distributed (Redis/Hazelcast) backends.

// Simplified per-IP filter using Bucket4j + Caffeine cache
Cache<String, Bucket> buckets = Caffeine.newBuilder()
    .expireAfterWrite(1, TimeUnit.MINUTES)
    .build();

For v1 (single instance), Caffeine-backed in-memory buckets are sufficient. If we later go multi-instance, we switch to Redis-backed Bucket4j — the API is the same.

Option 2: Spring's built-in rate limiting

The issue mentions "Spring Boot's built-in rate limiting support" — worth clarifying: Spring Boot doesn't have native rate limiting out of the box as of Spring Boot 3.x/4.x. Spring Cloud Gateway has rate limiting, but we're not using that. The recommendation here is Bucket4j or a servlet filter, not a built-in Spring feature.

Implementation plan:

  • Add bucket4j-core + caffeine dependencies to pom.xml
  • Write a RateLimitFilter that implements OncePerRequestFilter, extracts client IP from X-Forwarded-For or RemoteAddr, and applies per-endpoint limits
  • Register the filter in SecurityFilterChain before the auth processing filters
  • Return 429 Too Many Requests with a Retry-After header on limit exceeded

Questions:

  • Are we behind a reverse proxy (nginx/Traefik)? If so, the real client IP comes from X-Forwarded-For — we must trust this header only from known proxy IPs, or attackers can spoof it to bypass per-IP limits.
  • Are the suggested limits (5/min login, 3/min signup, 10/min invite accept) product decisions or just starting points? I can implement whatever values are decided, but I don't want to pick them unilaterally.
  • Should the rate limiter persist across app restarts? In-memory Caffeine buckets reset on restart — is that acceptable for v1?
## 🏗️ Backend Engineer — Spring Boot / PostgreSQL Specialist Agreed this is critical. Let me lay out the implementation options with the tradeoffs clearly, since the issue mentions a few approaches. **Option 1: Bucket4j (recommended for v1)** Bucket4j is a Java rate-limiting library based on the token bucket algorithm. It integrates cleanly with Spring Boot as a servlet filter or Spring MVC interceptor, and supports in-memory (no Redis required) or distributed (Redis/Hazelcast) backends. ```java // Simplified per-IP filter using Bucket4j + Caffeine cache Cache<String, Bucket> buckets = Caffeine.newBuilder() .expireAfterWrite(1, TimeUnit.MINUTES) .build(); ``` For v1 (single instance), Caffeine-backed in-memory buckets are sufficient. If we later go multi-instance, we switch to Redis-backed Bucket4j — the API is the same. **Option 2: Spring's built-in rate limiting** The issue mentions "Spring Boot's built-in rate limiting support" — worth clarifying: Spring Boot doesn't have native rate limiting out of the box as of Spring Boot 3.x/4.x. Spring Cloud Gateway has rate limiting, but we're not using that. The recommendation here is Bucket4j or a servlet filter, not a built-in Spring feature. **Implementation plan:** - Add `bucket4j-core` + `caffeine` dependencies to `pom.xml` - Write a `RateLimitFilter` that implements `OncePerRequestFilter`, extracts client IP from `X-Forwarded-For` or `RemoteAddr`, and applies per-endpoint limits - Register the filter in `SecurityFilterChain` before the auth processing filters - Return `429 Too Many Requests` with a `Retry-After` header on limit exceeded **Questions:** - Are we behind a reverse proxy (nginx/Traefik)? If so, the real client IP comes from `X-Forwarded-For` — we must trust this header only from known proxy IPs, or attackers can spoof it to bypass per-IP limits. - Are the suggested limits (5/min login, 3/min signup, 10/min invite accept) product decisions or just starting points? I can implement whatever values are decided, but I don't want to pick them unilaterally. - Should the rate limiter persist across app restarts? In-memory Caffeine buckets reset on restart — is that acceptable for v1?
Author
Owner

🧪 QA Engineer

Rate limiting is deceptively tricky to test correctly. Here's the full test plan.

Unit tests:

  • Rate limiter allows N requests within the window → all succeed
  • Rate limiter blocks request N+1 within the window → 429 returned
  • Rate limiter resets after the window expires → requests succeed again
  • Rate limiter is per-IP: different IPs get independent buckets (IP A hitting the limit doesn't block IP B)
  • Rate limiter is per-endpoint: hitting the login limit doesn't affect the signup limit

Integration tests:

  • POST /v1/auth/login — send 6 rapid requests from the same IP → first 5 succeed (or return auth error), 6th returns 429
  • POST /v1/auth/signup — send 4 rapid requests → first 3 succeed/fail normally, 4th returns 429
  • POST /v1/invites/*/accept — send 11 rapid requests → 10th + returns 429
  • 429 response includes Retry-After header with a numeric value
  • After the rate limit window expires, requests succeed again (requires controlling time or using a short test window)

Testing challenges I want to flag:

  • Time dependency: Rate limit tests are time-sensitive. Inject a clock or use a configurable window duration (e.g., 1 second in tests, 1 minute in production) to avoid slow tests with Thread.sleep().
  • IP spoofing in tests: When using MockMvc or TestRestTemplate, the "client IP" needs to be set consistently. Confirm how the filter extracts IP in tests — MockHttpServletRequest.setRemoteAddr() should work.
  • Test isolation: Rate limit state must be reset between tests. If using in-memory Caffeine, ensure the bucket cache is cleared between test runs (either via a test-scoped bean or by using a short TTL in tests).

E2E consideration: Rate limiting should not be verified in E2E tests — that's the integration layer's job. But E2E tests should be written to not accidentally trigger rate limits (e.g., don't call login 10 times in a test without thinking about it).

## 🧪 QA Engineer Rate limiting is deceptively tricky to test correctly. Here's the full test plan. **Unit tests:** - Rate limiter allows N requests within the window → all succeed - Rate limiter blocks request N+1 within the window → `429` returned - Rate limiter resets after the window expires → requests succeed again - Rate limiter is per-IP: different IPs get independent buckets (IP A hitting the limit doesn't block IP B) - Rate limiter is per-endpoint: hitting the login limit doesn't affect the signup limit **Integration tests:** - `POST /v1/auth/login` — send 6 rapid requests from the same IP → first 5 succeed (or return auth error), 6th returns `429` - `POST /v1/auth/signup` — send 4 rapid requests → first 3 succeed/fail normally, 4th returns `429` - `POST /v1/invites/*/accept` — send 11 rapid requests → 10th + returns `429` - `429` response includes `Retry-After` header with a numeric value - After the rate limit window expires, requests succeed again (requires controlling time or using a short test window) **Testing challenges I want to flag:** - **Time dependency:** Rate limit tests are time-sensitive. Inject a clock or use a configurable window duration (e.g., 1 second in tests, 1 minute in production) to avoid slow tests with `Thread.sleep()`. - **IP spoofing in tests:** When using MockMvc or TestRestTemplate, the "client IP" needs to be set consistently. Confirm how the filter extracts IP in tests — `MockHttpServletRequest.setRemoteAddr()` should work. - **Test isolation:** Rate limit state must be reset between tests. If using in-memory Caffeine, ensure the bucket cache is cleared between test runs (either via a test-scoped bean or by using a short TTL in tests). **E2E consideration:** Rate limiting should not be verified in E2E tests — that's the integration layer's job. But E2E tests should be written to not accidentally trigger rate limits (e.g., don't call login 10 times in a test without thinking about it).
Author
Owner

🔒 Sable — Security Engineer

This is the foundational defense layer. Without it, issues #2 (invite brute force) and the credential stuffing vector are wide open. Let me add the threat model precision and implementation security requirements.

Why all three endpoints are critical:

  • /v1/auth/login — credential stuffing and password brute force. Automated tools (Hydra, Burp Intruder) can attempt thousands of login combinations per second. Without rate limiting, a leaked credential list becomes a working attack.
  • /v1/auth/signup — account spam and resource exhaustion. Spamming signup creates junk accounts (household pollution, DB growth) and, combined with issue #3's orphaned session bug, is a DoS vector on the session store.
  • /v1/invites/*/accept — invite code enumeration. Even after fixing issue #2 (UUIDv4), rate limiting here is defense-in-depth. Rate limiting also provides a signal for detecting probing behavior.

Security implementation requirements:

  1. IP extraction must be safe from spoofing. If behind a proxy, use X-Forwarded-For — but only trust it if the request comes from a known proxy IP. Accepting arbitrary X-Forwarded-For values lets attackers rotate IPs with a header change. If in doubt, use RemoteAddr only.

  2. Rate limit by IP AND by username for login. Per-IP alone can be bypassed with IP rotation. Per-username rate limiting (e.g., lock out an account after 10 failed attempts regardless of IP) adds a second layer. Username enumeration is a separate concern — the lockout response should be the same whether the account exists or not.

  3. The 429 response must not leak information. Don't say "account locked" vs "IP rate limited" differently — same response in both cases to prevent enumeration.

  4. Log every 429 event. IP, endpoint, timestamp, username (if applicable, and only if it doesn't reveal account existence). These logs are an early warning system for attacks in progress.

  5. Alert threshold: Consider logging a warning when a single IP exceeds the rate limit by 10x in a single window — that's an active attack, not an accidental retry.

Interaction with other issues: This issue should be resolved before or alongside issues #2 and #3. Rate limiting is the safety net that makes the other fixes meaningful at scale.

## 🔒 Sable — Security Engineer This is the foundational defense layer. Without it, issues #2 (invite brute force) and the credential stuffing vector are wide open. Let me add the threat model precision and implementation security requirements. **Why all three endpoints are critical:** - `/v1/auth/login` — credential stuffing and password brute force. Automated tools (Hydra, Burp Intruder) can attempt thousands of login combinations per second. Without rate limiting, a leaked credential list becomes a working attack. - `/v1/auth/signup` — account spam and resource exhaustion. Spamming signup creates junk accounts (household pollution, DB growth) and, combined with issue #3's orphaned session bug, is a DoS vector on the session store. - `/v1/invites/*/accept` — invite code enumeration. Even after fixing issue #2 (UUIDv4), rate limiting here is defense-in-depth. Rate limiting also provides a signal for detecting probing behavior. **Security implementation requirements:** 1. **IP extraction must be safe from spoofing.** If behind a proxy, use `X-Forwarded-For` — but only trust it if the request comes from a known proxy IP. Accepting arbitrary `X-Forwarded-For` values lets attackers rotate IPs with a header change. If in doubt, use `RemoteAddr` only. 2. **Rate limit by IP AND by username for login.** Per-IP alone can be bypassed with IP rotation. Per-username rate limiting (e.g., lock out an account after 10 failed attempts regardless of IP) adds a second layer. Username enumeration is a separate concern — the lockout response should be the same whether the account exists or not. 3. **The `429` response must not leak information.** Don't say "account locked" vs "IP rate limited" differently — same response in both cases to prevent enumeration. 4. **Log every 429 event.** IP, endpoint, timestamp, username (if applicable, and only if it doesn't reveal account existence). These logs are an early warning system for attacks in progress. 5. **Alert threshold:** Consider logging a warning when a single IP exceeds the rate limit by 10x in a single window — that's an active attack, not an accidental retry. **Interaction with other issues:** This issue should be resolved before or alongside issues #2 and #3. Rate limiting is the safety net that makes the other fixes meaningful at scale.
Author
Owner

🎨 Atlas — UI/UX Designer

Rate limiting is invisible to users when it works — it only becomes visible when they hit a limit. That's the design challenge: communicating a limit clearly without alarming legitimate users.

Error states I need to design:

  • Login rate limit hit: The login form should show an inline message — not a toast, not a modal. Something like: "Too many login attempts. Please wait 60 seconds before trying again." If there's a Retry-After header, the message can count down: a subtle countdown timer below the submit button. The form should disable the submit button during the lockout window.

  • Signup rate limit hit: Less common for legitimate users, so a simpler message works: "Too many accounts created from this connection. Please try again later." Don't make it feel accusatory.

  • Invite accept rate limit hit: This is the most UX-sensitive case. A legitimate user clicks an invite link and gets a 429 — they have no idea what "rate limit" means. The message should be: "Something went wrong. Please try again in a moment." with a retry button. Technical language (429, rate limit, IP) must never surface to the user.

Design principles for error states:

  • Use --color-error for the message text, not a full red background — it's informational, not catastrophic
  • The error message should appear inline near the relevant action (below the submit button), not as a page-level alert
  • Button text during lockout: disable the button and change its label to "Try again in 58s..." (countdown) — this is clearer than greying it out with no explanation

Questions:

  • Is there a standard error message component in the design system already? I want to make sure the rate limit error states use consistent patterns with other form errors (validation errors, server errors).
  • For the countdown timer on the login form — is that a feature Kai should implement, or is the generic "wait 60 seconds" message sufficient for v1? I'd lean toward the countdown as it's a meaningfully better experience, but it's extra implementation work.
## 🎨 Atlas — UI/UX Designer Rate limiting is invisible to users when it works — it only becomes visible when they hit a limit. That's the design challenge: communicating a limit clearly without alarming legitimate users. **Error states I need to design:** - **Login rate limit hit:** The login form should show an inline message — not a toast, not a modal. Something like: "Too many login attempts. Please wait 60 seconds before trying again." If there's a `Retry-After` header, the message can count down: a subtle countdown timer below the submit button. The form should disable the submit button during the lockout window. - **Signup rate limit hit:** Less common for legitimate users, so a simpler message works: "Too many accounts created from this connection. Please try again later." Don't make it feel accusatory. - **Invite accept rate limit hit:** This is the most UX-sensitive case. A legitimate user clicks an invite link and gets a 429 — they have no idea what "rate limit" means. The message should be: "Something went wrong. Please try again in a moment." with a retry button. Technical language (429, rate limit, IP) must never surface to the user. **Design principles for error states:** - Use `--color-error` for the message text, not a full red background — it's informational, not catastrophic - The error message should appear inline near the relevant action (below the submit button), not as a page-level alert - Button text during lockout: disable the button and change its label to "Try again in 58s..." (countdown) — this is clearer than greying it out with no explanation **Questions:** - Is there a standard error message component in the design system already? I want to make sure the rate limit error states use consistent patterns with other form errors (validation errors, server errors). - For the countdown timer on the login form — is that a feature Kai should implement, or is the generic "wait 60 seconds" message sufficient for v1? I'd lean toward the countdown as it's a meaningfully better experience, but it's extra implementation work.
Sign in to join this conversation.