API Rate Limiting Bypass: When Throttling Fails
Rate limiting occupies a specific role in API security: it is the control that bounds how quickly an attacker can iterate. Password brute force requires many attempts. Account enumeration requires many requests. Credential stuffing requires many logins. Scraping requires many fetches. In each case, rate limiting is the mechanism that makes the attack slow enough to detect, stop, or render economically unviable.
When rate limiting can be bypassed, this bound disappears. The attacker can iterate as fast as the API can respond. The practical consequence depends on what the endpoint does — but for authentication endpoints, user enumeration endpoints, or any resource with per-user sensitivity, removing the iteration limit changes the threat model significantly.
Rate limiting bypass is not a single technique. It is a category of implementation weaknesses, each exploiting a different assumption the rate limiter makes about its inputs. Understanding which assumptions a given rate limiter makes is the first step in assessing whether those assumptions hold under adversarial conditions.
How Rate Limiting Is Typically Implemented
Most rate limiters work by associating a request with a key, incrementing a counter for that key, and rejecting or delaying requests when the counter exceeds a threshold within a given window.
The key is what matters. Common choices include:
- Source IP address — The IP address of the connecting client
- IP address from a header — The value of X-Forwarded-For, X-Real-IP, or a similar header when the application sits behind a proxy
- API key or token — The credential supplied in the request
- User ID — The authenticated identity, after session parsing
- Endpoint path — Combined with the IP or identity to produce per-endpoint limits
Each key choice creates specific bypass opportunities. A rate limiter keyed purely on source IP can be defeated by rotating IP addresses. A rate limiter keyed on a header can be defeated by forging that header. A rate limiter that applies only to specific endpoint paths can be defeated by accessing the same resource through a different path.
IP Header Manipulation
The most widely documented rate limiting bypass exploits trust in client-supplied HTTP headers that indicate the original source IP address.
When an application runs behind a reverse proxy, load balancer, or CDN, the TCP connection originates from the infrastructure component rather than the end user. To preserve the actual client IP, the proxy adds a header to forwarded requests:
X-Forwarded-For: 203.0.113.45
X-Real-IP: 203.0.113.45
CF-Connecting-IP: 203.0.113.45
True-Client-IP: 203.0.113.45
The application reads one of these headers to determine the client's IP address. If the rate limiter uses this header value as its key, the rate limiter is controlled by whatever value is in the header — not the actual TCP source IP.
An attacker sending requests directly to the application, or through a proxy that does not strip the header, can set any value:
POST /api/login HTTP/1.1
Host: api.example.com
X-Forwarded-For: 192.168.1.1
Content-Type: application/json
{"username": "target@example.com", "password": "attempt1"}Next request:
POST /api/login HTTP/1.1
Host: api.example.com
X-Forwarded-For: 192.168.1.2
{"username": "target@example.com", "password": "attempt2"}The rate limiter sees each request as originating from a different IP address and applies a fresh counter. The attacker increments the last octet and makes unlimited attempts from the same actual connection.
The exploit works whenever the rate limiter trusts the IP header value without verifying that it was set by trusted infrastructure. If the application is supposed to sit behind a CDN, requests that bypass the CDN and reach the application directly can carry forged headers without any intermediary stripping them.
Variants exist across header names. Some applications check X-Forwarded-For first but fall back to X-Real-IP. Others check a priority list. Systematically testing different header names with different values identifies which headers influence the rate limiter's key selection.
Endpoint Variation
Rate limiters that enforce per-endpoint limits track request counts against a URL path. This works correctly only if requests to the same resource consistently produce the same key.
Applications frequently handle equivalent URL forms without canonicalizing the path before rate limit evaluation. Common variations that produce different keys in a naive rate limiter:
Case sensitivity. Some frameworks normalize URL paths to lowercase; others are case-sensitive at the routing layer. If the rate limiter applies before routing, /api/Login, /api/login, and /API/LOGIN may each have independent counters.
Trailing slashes. /api/login and /api/login/ may route to the same handler but appear as different paths to the rate limiter.
URL encoding. /api/login and /api/%6Cogin and /api/l%6Fgin are equivalent after URL decoding but different as raw strings.
Path parameters. Endpoints parameterized in the URL may be treated as different paths for each parameter value: /api/users/reset versus /api/users/Reset.
Query string presence. Adding arbitrary query parameters that the application ignores — /api/login?utm_source=x, /api/login?_=1 — may produce different rate limit keys if the key includes the full URL including query string.
Each variation that produces a distinct rate limit key can be rotated through to distribute requests across independent counters, all while performing the same underlying operation.
The bypass is most effective when combined: varying both the path form and the header values produces a large space of unique keys, each with an independent counter starting at zero.
Parameter Cycling
Some rate limiters key on request content in addition to or instead of source IP. A rate limiter protecting a password reset endpoint might track the email address being reset, to prevent an attacker from flooding a single address.
Parameter cycling bypasses content-keyed rate limits by varying the content of each request while the underlying goal remains the same.
For email-based password resets:
target@example.comandTARGET@EXAMPLE.COMmay be treated as different email addresses by the rate limiter but equivalent by the mail delivery systemtarget+tag1@example.comandtarget+tag2@example.commay both deliver totarget@example.comif the email provider supports subaddressing- Submitting a valid-looking but non-existent email address on alternating requests can exhaust the rate limit on real addresses while generating noise
For username-based authentication:
- Varying target accounts across multiple attempts can stay under per-account limits while conducting credential stuffing at scale against many accounts simultaneously
This technique is less about bypassing the rate limiter's key lookup and more about exploiting the gap between what the rate limiter considers equivalent and what the application considers equivalent.
Rate Limit Scope Gaps
Rate limiting applied at the API gateway or CDN edge protects only the paths that traverse those layers. Applications with multiple entry points may have gaps where rate limiting is absent entirely.
Direct backend access. If the API gateway enforces rate limits but backend services are reachable directly from the internal network — or from a cloud environment where internal IPs are accessible to a compromised host — requests to those backend services bypass the gateway rate limits entirely.
Mobile API endpoints. Applications sometimes maintain separate endpoints for mobile clients, distinguished by URL prefix, subdomain, or API version. If mobile endpoints are rate limited independently from web endpoints, or not at all, they provide an alternative path.
Legacy or versioned APIs. APIs that maintain multiple versions (/api/v1/login and /api/v2/login) may have different rate limiting configurations, or may have rate limiting on the current version but not on older versions that remain functional.
Partner or internal APIs. Internal APIs intended for service-to-service communication sometimes have relaxed or absent rate limiting under the assumption that only trusted callers reach them. If an attacker can reach these endpoints, the relaxed limits apply.
Identifying scope gaps requires understanding the full set of paths into the API — not just the primary public-facing endpoints.
Race Conditions in Limit Enforcement
Rate limiters that track counts in shared storage (a database, cache, or in-memory store) can be susceptible to race conditions when the check-and-increment operation is not atomic.
A non-atomic rate check follows this sequence:
1. Read current count for key K
2. Compare count to limit
3. If below limit, increment count
4. Proceed with request
If two requests arrive simultaneously, both may read the count before either has incremented it. Both see a count below the limit. Both proceed. Both increment. The actual count after both requests is limit + 1, but neither was blocked.
This race condition is most exploitable when the rate limit threshold is low (a limit of 5 requests per minute can be bypassed by sending 5 concurrent requests simultaneously) and when the rate limiter's storage has high read latency relative to request processing time.
The bypass technique is to issue a burst of concurrent requests — not sequential requests — at the exact moment the rate limit counter would otherwise block the next request. At low limits, this can double or triple effective throughput.
Impact
The consequence of a rate limiting bypass depends on what the rate-limited endpoint does.
Authentication endpoints. A bypassed rate limit on a login endpoint allows brute force and credential stuffing at the speed of the API. Against accounts with common or reused passwords, success rates can be significant.
Account enumeration. Rate-limited enumeration endpoints (password reset, user search, registration) leak account existence data slowly enough to be impractical at protected speeds. Bypassing the limit makes enumeration fast enough to build target account lists from email address lists.
Resource-intensive operations. Endpoints that trigger expensive backend operations (report generation, bulk exports, complex searches) rely on rate limits to prevent server resource exhaustion. Bypassing these limits can degrade service for legitimate users.
SMS and email sending. Endpoints that trigger outbound messages are often rate-limited to prevent using the application as an SMS spam relay or phishing delivery mechanism. Bypassed limits allow unbounded message generation charged to the application's account.
Prevention
Effective rate limiting requires matching the rate limit key to the operation being protected, not to an input that can be trivially varied.
Use authenticated identifiers where available. After login, rate limit by user ID or session token rather than or in addition to IP address. An authenticated attacker with many sessions can still be blocked, but legitimate users with dynamic IPs are not penalized.
Validate and strip IP override headers. The application should trust IP headers only when they are set by verified infrastructure. In practice this means: only accept X-Forwarded-For and similar headers when the request arrives from a known proxy IP range. Requests arriving directly from external IPs should use the TCP source address, not header values.
TRUSTED_PROXIES = {"10.0.0.1", "10.0.0.2"} # Load balancer IPs
def get_client_ip(request):
if request.remote_addr in TRUSTED_PROXIES:
return request.headers.get("X-Forwarded-For", request.remote_addr)
return request.remote_addr # Ignore header from untrusted sourcesCanonicalize URLs before rate limit evaluation. Lowercase the path, strip trailing slashes, decode percent-encoded characters, and ignore or sort query parameters that are not relevant to the operation. Apply rate limits to the canonical form.
Implement application-layer rate limiting. Gateway rate limiting is a useful first layer but should not be the only one. Application-layer rate limiting catches requests that bypass the gateway and allows enforcement to incorporate application context (authenticated identity, operation type) that the gateway does not have.
Make enforcement atomic. Use atomic increment-and-check operations in the rate limit counter. Redis INCR followed by TTL comparison, or distributed locking around the check-and-increment cycle, prevents race condition bypasses.
Apply rate limits uniformly across all endpoint variants. Ensure that rate limits cover all API versions, mobile endpoints, and internal endpoints that perform sensitive operations — not just the primary production path.
Testing Rate Limiting
When assessing an API, a systematic rate limit test covers several dimensions:
-
Identify the key. Send requests from the same IP with varying header values. Send requests with the same header value from different IPs. Observe which changes reset the counter.
-
Test header injection. Try X-Forwarded-For, X-Real-IP, CF-Connecting-IP, True-Client-IP, X-Originating-IP, Forwarded, and X-Cluster-Client-IP with arbitrary values.
-
Try endpoint variations. Test case variations, trailing slashes, URL-encoded equivalents, and version-path alternatives for the same operation.
-
Check concurrent bursts. At limit boundaries, send concurrent requests and observe whether any succeed beyond the stated limit.
-
Verify all entry points. If documentation or discovery reveals multiple paths to the same underlying operation, verify that rate limiting is applied consistently across all of them.
-
Observe the response. Confirm that the rate limiter blocks requests effectively after the threshold is hit, and that legitimate users can continue operating normally after a limit triggers — some implementations block the IP permanently, which creates denial-of-service risk against legitimate users.
Rate limiting is easy to get mostly right and difficult to get entirely right. The gap between mostly right and entirely right is where attackers operate.
Need your API rate limiting assessed against real-world bypass techniques? Get in touch.