Rate Limits

Overview

Rate limits control how many API requests you can make per hour. They prevent abuse, ensure fair usage, and maintain platform stability for all users.

Rate Limit Tiers

Tier	Requests/Hour	Burst Limit	Videos/Month
Free	100	10	10
Starter	1,000	50	100
Pro	5,000	100	500
Enterprise	50,000	500	3,000+

Burst Limit allows short bursts of requests exceeding the average rate. Useful for batch operations.

How Rate Limiting Works

Sliding Window

Bluma uses a sliding window algorithm:

Time:     10:00   10:15   10:30   10:45   11:00
Requests:  [250] → [180] → [220] → [200] → [150]
Limit:     1000/hour

At 10:45:
  Requests in last hour = 180 + 220 + 200 = 600
  Remaining = 1000 - 600 = 400 ✓

This is more accurate than fixed windows and allows for smoother usage patterns.

Rate Limit Headers

Every API response includes rate limit information:

HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 987
X-RateLimit-Reset: 1699127600

Header	Description
`X-RateLimit-Limit`	Maximum requests allowed per hour
`X-RateLimit-Remaining`	Requests remaining in current window
`X-RateLimit-Reset`	Unix timestamp when limit resets

Reading Headers

const response = await fetch(url, options);

const rateLimit = {
  limit: parseInt(response.headers.get('X-RateLimit-Limit')),
  remaining: parseInt(response.headers.get('X-RateLimit-Remaining')),
  reset: parseInt(response.headers.get('X-RateLimit-Reset'))
};

console.log(`${rateLimit.remaining}/${rateLimit.limit} requests remaining`);

// Check if approaching limit
if (rateLimit.remaining < 10) {
  console.warn('Approaching rate limit!');
  // Slow down or pause requests
}

Rate Limit Exceeded (429)

When you exceed your rate limit, you’ll receive a 429 Too Many Requests response:

{
  "error": {
    "type": "rate_limit_exceeded",
    "title": "Rate Limit Exceeded",
    "status": 429,
    "detail": "You have exceeded the rate limit of 1,000 requests per hour.",
    "metadata": {
      "limit": 1000,
      "retry_after": 3600,
      "current_usage": 1000
    },
    "links": {
      "docs": "https://docs.getbluma.com/concepts/rate-limits",
      "upgrade": "https://getbluma.com/billing"
    }
  }
}

Additional Header:

Retry-After: 3600

The Retry-After header indicates how many seconds to wait before retrying.

Handling Rate Limits

1. Exponential Backoff

async function apiCallWithBackoff(url: string, options: RequestInit, maxRetries = 5) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url, options);

    if (response.status === 429) {
      const retryAfter = parseInt(response.headers.get('Retry-After') || '60');
      const backoffTime = Math.min(retryAfter * 1000, Math.pow(2, attempt) * 1000);

      console.log(`Rate limited. Retrying in ${backoffTime}ms...`);
      await new Promise(resolve => setTimeout(resolve, backoffTime));
      continue;
    }

    return response;
  }

  throw new Error('Max retries exceeded');
}

2. Request Queue

class RateLimitedQueue {
  private queue: Array<() => Promise<any>> = [];
  private processing = false;
  private requestsThisHour = 0;
  private hourStart = Date.now();
  private limit = 1000;

  async enqueue<T>(fn: () => Promise<T>): Promise<T> {
    return new Promise((resolve, reject) => {
      this.queue.push(async () => {
        try {
          const result = await fn();
          resolve(result);
        } catch (error) {
          reject(error);
        }
      });
      this.processQueue();
    });
  }

  private async processQueue() {
    if (this.processing || this.queue.length === 0) return;

    this.processing = true;

    while (this.queue.length > 0) {
      // Reset counter if hour has passed
      if (Date.now() - this.hourStart > 3600000) {
        this.requestsThisHour = 0;
        this.hourStart = Date.now();
      }

      // Check if under limit
      if (this.requestsThisHour >= this.limit) {
        const waitTime = 3600000 - (Date.now() - this.hourStart);
        console.log(`Rate limit reached. Waiting ${waitTime}ms...`);
        await new Promise(resolve => setTimeout(resolve, waitTime));
        continue;
      }

      // Process next request
      const task = this.queue.shift();
      if (task) {
        await task();
        this.requestsThisHour++;
      }

      // Small delay between requests
      await new Promise(resolve => setTimeout(resolve, 100));
    }

    this.processing = false;
  }
}

// Usage
const queue = new RateLimitedQueue();

for (let i = 0; i < 100; i++) {
  queue.enqueue(() => fetch(url, options));
}

3. Monitoring Usage

async function monitorRateLimit(response: Response) {
  const remaining = parseInt(response.headers.get('X-RateLimit-Remaining') || '0');
  const limit = parseInt(response.headers.get('X-RateLimit-Limit') || '0');
  const reset = parseInt(response.headers.get('X-RateLimit-Reset') || '0');

  const percentUsed = ((limit - remaining) / limit) * 100;

  if (percentUsed > 90) {
    console.error('⚠️ CRITICAL: 90%+ of rate limit used!');
    // Alert, slow down, or pause
  } else if (percentUsed > 75) {
    console.warn('⚠️ WARNING: 75%+ of rate limit used');
  }

  // Log to monitoring service
  metrics.gauge('api.rate_limit.remaining', remaining);
  metrics.gauge('api.rate_limit.percent_used', percentUsed);
}

Per-Key vs Account-Wide

Rate limits are applied per API key, not per account. This allows you to:

Create separate keys for different applications
Isolate production from development traffic
Scale horizontally with multiple keys

Example: Multiple Keys

// Production key (high traffic)
const prodKey = 'bluma_live_prod_key';

// Background jobs key (batch operations)
const batchKey = 'bluma_live_batch_key';

// Development key (testing)
const devKey = 'bluma_test_dev_key';

Each key has its own independent rate limit.

Upgrading Limits

Increase Your Tier

Higher tiers get higher rate limits:

Free → Starter:    100 → 1,000 req/hr (10x)
Starter → Pro:     1,000 → 5,000 req/hr (5x)
Pro → Enterprise:  5,000 → 50,000 req/hr (10x)

Upgrade at getbluma.com/billing

Custom Limits

Enterprise customers can request custom rate limits based on their specific needs. Contact sales@getbluma.com.

Best Practices

Check Headers

Monitor rate limit headers and adjust request rate dynamically

Implement Backoff

Use exponential backoff when receiving 429 responses

Cache Responses

Cache frequently accessed data (templates list, etc.) to reduce API calls

Batch Operations

Combine multiple operations when possible to reduce request count

Exceptions

Rate limits do not apply to: ✅ Webhook deliveries (server-initiated) ✅ OAuth token refresh (authentication) ✅ Health check endpoints Rate limits do apply to: ❌ All /v1/* API endpoints ❌ OpenAPI spec endpoint (/v1/openapi.json)

Testing Rate Limits

Simulate Rate Limiting

Test your backoff logic using test keys with artificially low limits:

curl -X POST https://api.getbluma.com/api/v1/api-keys \
  -H "Authorization: Bearer YOUR_SESSION_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Rate Limit Test Key",
    "environment": "test",
    "rate_limit_per_hour": 10
  }'

Then make >10 requests to trigger rate limiting.

Frequently Asked Questions

Can I purchase additional rate limit capacity?

Yes! Upgrade your tier or contact sales@getbluma.com for custom limits (Enterprise only).

Do rate limits reset at a specific time?

No, rate limits use a sliding window. They reset continuously based on your request pattern.

What counts as a request?

Every HTTP request to /v1/* endpoints counts, regardless of success or failure.

Can I get rate limited in test mode?

Yes, test keys have the same rate limits as production keys of your tier. This helps you test rate limit handling logic.

Will WebSocket connections count against rate limits?

Bluma currently doesn’t support WebSockets. All communication is via HTTP REST API.

Troubleshooting

Issue: Constant 429 Errors

Causes:

Making too many requests too quickly
Multiple API keys from same account hitting shared limit
Batch operations without rate limiting

Solutions:

Implement request queueing
Add delays between requests
Upgrade to a higher tier
Use exponential backoff

Issue: Unexpected Rate Limit

Causes:

Previous requests in the sliding window
Shared API key across multiple services
Clock skew in reset time calculation

Solutions:

Check X-RateLimit-Remaining header
Use separate API keys per service
Monitor usage in dashboard

Monitoring

Track rate limit metrics in your usage dashboard:

Current usage vs limit
Historical rate limit hits
Per-key usage breakdown
Average requests per hour

Next Steps

View Usage Dashboard

Monitor your API usage in real-time

Upgrade Plan

Increase your rate limits

Error Handling

Learn how to handle 429 errors

Best Practices

Build rate-limit-aware integrations

Get Started

Core Concepts

Guides

Overview

Rate Limit Tiers

How Rate Limiting Works

Sliding Window

Rate Limit Headers

Reading Headers

Rate Limit Exceeded (429)

Handling Rate Limits

1. Exponential Backoff

2. Request Queue

3. Monitoring Usage

Per-Key vs Account-Wide

Example: Multiple Keys

Upgrading Limits

Increase Your Tier

Custom Limits

Best Practices

Check Headers

Implement Backoff

Cache Responses

Batch Operations

Exceptions

Testing Rate Limits

Simulate Rate Limiting

Frequently Asked Questions

Troubleshooting

Issue: Constant 429 Errors

Issue: Unexpected Rate Limit

Monitoring

Next Steps

View Usage Dashboard

Upgrade Plan

Error Handling

Best Practices

Get Started

Core Concepts

Guides

​Overview

​Rate Limit Tiers

​How Rate Limiting Works

​Sliding Window

​Rate Limit Headers

​Reading Headers

​Rate Limit Exceeded (429)

​Handling Rate Limits

​1. Exponential Backoff

​2. Request Queue

​3. Monitoring Usage

​Per-Key vs Account-Wide

​Example: Multiple Keys

​Upgrading Limits

​Increase Your Tier

​Custom Limits

​Best Practices

Check Headers

Implement Backoff

Cache Responses

Batch Operations

​Exceptions

​Testing Rate Limits

​Simulate Rate Limiting

​Frequently Asked Questions

​Troubleshooting

​Issue: Constant 429 Errors

​Issue: Unexpected Rate Limit

​Monitoring

​Next Steps

View Usage Dashboard

Upgrade Plan

Error Handling

Best Practices

Overview

Rate Limit Tiers

How Rate Limiting Works

Sliding Window

Rate Limit Headers

Reading Headers

Rate Limit Exceeded (429)

Handling Rate Limits

1. Exponential Backoff

2. Request Queue

3. Monitoring Usage

Per-Key vs Account-Wide

Example: Multiple Keys

Upgrading Limits

Increase Your Tier

Custom Limits

Best Practices

Exceptions

Testing Rate Limits

Simulate Rate Limiting

Frequently Asked Questions

Troubleshooting

Issue: Constant 429 Errors

Issue: Unexpected Rate Limit

Monitoring

Next Steps