Understanding Rate Limiting

Welcome!

Ever wondered how websites and apps stay stable and fair for everyone, even with tons of users? One key secret is Rate Limiting. Let's dive in!

A presentation by Peyman Khosravi.

What is Rate Limiting?

Rate limiting is like a traffic cop for digital services. It controls how many requests a user (or a service) can make to a server within a specific time window.

Think of it like:

An ATM allowing only a few transactions before asking you to wait.
A library allowing you to borrow a limited number of books at a time.

The goal? To ensure the service remains stable, responsive, and fair for all users.

Why is Rate Limiting Crucial?

Prevent Abuse

Stops malicious bots from overwhelming the system (e.g., DDoS attacks, brute-force login attempts).

Ensure Fair Usage

Prevents a single user from hogging all resources, ensuring everyone gets a fair chance to use the service.

Manage Resources

Helps control server load and operational costs by preventing unexpected spikes in traffic.

Improve Security

Can limit attempts to guess passwords or scrape sensitive data.

Maintain Service Quality

Keeps the service responsive and available for legitimate users by avoiding overload.

How It Works: Fixed Window Demo

One common method is the "Fixed Window Counter". Imagine you can make 5 requests every 10 seconds.

Requests in window: 0 / 5

Time in window: 10s

How it works: We count requests within a fixed time period. If the count exceeds the limit, new requests are blocked until the next window starts.

This is a simplified example. Real-world systems might use more complex algorithms like Token Bucket or Leaky Bucket for smoother traffic shaping.

How Would You Build a Basic One?

Implementing a rate limiter involves a few key ingredients. Here's a simplified conceptual overview:

1. Tracking User Requests

Identifier: You need to know *who* is making the request (e.g., IP address, API key, User ID).
Storage: A place to store request counts and window start times for each identifier.
- In-memory (e.g., dictionary, Redis): Fast, but data might be lost on restart (unless using persistent cache). Good for single servers.
- Database: Persistent, shareable across servers, but potentially slower.

2. The Logic (Fixed Window Example)

When a request comes in:

Get the `identifier` (e.g., IP address).
Look up their `request_count` and `window_start_time`.
If `current_time > window_start_time + WINDOW_DURATION`:
(Old window expired)
→ Reset `request_count` to 1, `window_start_time` to `current_time`. Allow request.
Else (still in current window):
→ If `request_count < MAX_REQUESTS`: Increment `request_count`. Allow request.
→ Else: Reject request (HTTP 429).

Simplified Pseudo-code

function handleRequest(identifier):
  userData = storage.get(identifier)
  currentTime = now()
  WINDOW_DURATION = 60 // seconds
  MAX_REQUESTS_PER_WINDOW = 100

  // If no record or window expired
  if not userData or currentTime > userData.windowStart + WINDOW_DURATION:
    storage.set(identifier, { count: 1, windowStart: currentTime })
    return ALLOW_REQUEST
  else:
    // Still in current window
    if userData.count < MAX_REQUESTS_PER_WINDOW:
      userData.count += 1
      storage.set(identifier, userData)
      return ALLOW_REQUEST
    else:
      return REJECT_REQUEST_429

Note: This is a very basic fixed window approach. Real-world systems often use more advanced algorithms (Token Bucket, Leaky Bucket) and need to handle distributed environments carefully.

Uh Oh! The `429` Error

If you make too many requests and hit a rate limit, the server will often respond with an HTTP status code: 429 Too Many Requests

What to do as a developer when you see this?

Check for a Retry-After header: This header (if present) tells you how long to wait (in seconds or a specific date) before trying again.
(Hover for example) Retry-After: 60 (wait 60s) or Retry-After: Fri, 31 Dec 2025 23:59:59 GMT
Implement Exponential Backoff: If no Retry-After header, wait a small amount of time, then retry. If it fails again, wait longer, then retry, and so on. This prevents hammering the server.
Review API Documentation: Understand the specific rate limits of the API you're using.

Where You'll See Rate Limiting

Rate limiting is everywhere! Here are a few common places:

Public APIs: Services like Twitter, GitHub, Google Maps limit how many API calls you can make to prevent abuse and ensure availability.
Login Attempts: Limiting login attempts (e.g., 5 tries per 15 minutes) helps prevent brute-force password attacks.
Password Resets & Email Verifications: Prevents spamming users with too many requests.
Search Engines & Web Scrapers: Search engines might temporarily block IPs that make too many automated queries too quickly.
E-commerce Sites: During flash sales, rate limits can help manage traffic and prevent inventory issues.

Key Takeaways for Developers

Even if you're not implementing rate limiting on the backend, understanding it is vital:

Rate limiting is a crucial mechanism for service stability, fairness, and security.
Be aware of API rate limits when integrating third-party services. Always read the documentation!
Implement graceful error handling for 429 Too Many Requests errors, including respecting Retry-After headers and using exponential backoff.
Design your applications to be resilient and to anticipate potential rate limits.
While often a backend concern, front-end developers need to understand how to react to rate limits.

Understanding rate limiting helps you build more robust and considerate applications!

Thank You & Q&A

Hopefully, this gave you a good introduction to the world of Rate Limiting!