Welcome!
Ever wondered how websites and apps stay stable and fair for everyone, even with tons of users? One key secret is Rate Limiting. Let's dive in!
A presentation by Peyman Khosravi.
What is Rate Limiting?
Rate limiting is like a traffic cop for digital services. It controls how many requests a user (or a service) can make to a server within a specific time window.
Think of it like:
- An ATM allowing only a few transactions before asking you to wait.
- A library allowing you to borrow a limited number of books at a time.
The goal? To ensure the service remains stable, responsive, and fair for all users.
Why is Rate Limiting Crucial?
Prevent Abuse
Stops malicious bots from overwhelming the system (e.g., DDoS attacks, brute-force login attempts).
Ensure Fair Usage
Prevents a single user from hogging all resources, ensuring everyone gets a fair chance to use the service.
Manage Resources
Helps control server load and operational costs by preventing unexpected spikes in traffic.
Improve Security
Can limit attempts to guess passwords or scrape sensitive data.
Maintain Service Quality
Keeps the service responsive and available for legitimate users by avoiding overload.
How It Works: Fixed Window Demo
One common method is the "Fixed Window Counter". Imagine you can make 5 requests every 10 seconds.
Requests in window: 0 / 5
Time in window: 10s
How it works: We count requests within a fixed time period. If the count exceeds the limit, new requests are blocked until the next window starts.
This is a simplified example. Real-world systems might use more complex algorithms like Token Bucket or Leaky Bucket for smoother traffic shaping.
How Would You Build a Basic One?
Implementing a rate limiter involves a few key ingredients. Here's a simplified conceptual overview:
1. Tracking User Requests
- Identifier: You need to know *who* is making the request (e.g., IP address, API key, User ID).
-
Storage: A place to store request counts and window start times for each identifier.
- In-memory (e.g., dictionary, Redis): Fast, but data might be lost on restart (unless using persistent cache). Good for single servers.
- Database: Persistent, shareable across servers, but potentially slower.
2. The Logic (Fixed Window Example)
When a request comes in:
- Get the `identifier` (e.g., IP address).
- Look up their `request_count` and `window_start_time`.
- If `current_time > window_start_time + WINDOW_DURATION`:
(Old window expired)
→ Reset `request_count` to 1, `window_start_time` to `current_time`. Allow request. - Else (still in current window):
→ If `request_count < MAX_REQUESTS`: Increment `request_count`. Allow request.
→ Else: Reject request (HTTP 429).
Simplified Pseudo-code
function handleRequest(identifier):
userData = storage.get(identifier)
currentTime = now()
WINDOW_DURATION = 60 // seconds
MAX_REQUESTS_PER_WINDOW = 100
// If no record or window expired
if not userData or currentTime > userData.windowStart + WINDOW_DURATION:
storage.set(identifier, { count: 1, windowStart: currentTime })
return ALLOW_REQUEST
else:
// Still in current window
if userData.count < MAX_REQUESTS_PER_WINDOW:
userData.count += 1
storage.set(identifier, userData)
return ALLOW_REQUEST
else:
return REJECT_REQUEST_429
Note: This is a very basic fixed window approach. Real-world systems often use more advanced algorithms (Token Bucket, Leaky Bucket) and need to handle distributed environments carefully.
Uh Oh! The 429 Error
If you make too many requests and hit a rate limit, the server will often respond with an HTTP status code:
429 Too Many Requests
What to do as a developer when you see this?
-
Check for a
Retry-Afterheader: This header (if present) tells you how long to wait (in seconds or a specific date) before trying again.(Hover for example)Retry-After: 60(wait 60s) orRetry-After: Fri, 31 Dec 2025 23:59:59 GMT -
Implement Exponential Backoff: If no
Retry-Afterheader, wait a small amount of time, then retry. If it fails again, wait longer, then retry, and so on. This prevents hammering the server. - Review API Documentation: Understand the specific rate limits of the API you're using.
Where You'll See Rate Limiting
Rate limiting is everywhere! Here are a few common places:
- Public APIs: Services like Twitter, GitHub, Google Maps limit how many API calls you can make to prevent abuse and ensure availability.
- Login Attempts: Limiting login attempts (e.g., 5 tries per 15 minutes) helps prevent brute-force password attacks.
- Password Resets & Email Verifications: Prevents spamming users with too many requests.
- Search Engines & Web Scrapers: Search engines might temporarily block IPs that make too many automated queries too quickly.
- E-commerce Sites: During flash sales, rate limits can help manage traffic and prevent inventory issues.
Key Takeaways for Developers
Even if you're not implementing rate limiting on the backend, understanding it is vital:
- Rate limiting is a crucial mechanism for service stability, fairness, and security.
- Be aware of API rate limits when integrating third-party services. Always read the documentation!
- Implement graceful error handling for
429 Too Many Requestserrors, including respectingRetry-Afterheaders and using exponential backoff. - Design your applications to be resilient and to anticipate potential rate limits.
- While often a backend concern, front-end developers need to understand how to react to rate limits.
Understanding rate limiting helps you build more robust and considerate applications!
Thank You & Q&A
Hopefully, this gave you a good introduction to the world of Rate Limiting!