Overview
Rate limits control how many API requests you can make per hour. They prevent abuse, ensure fair usage, and maintain platform stability for all users.
Rate Limit Tiers
| Tier | Requests/Hour | Burst Limit | Videos/Month |
|---|
| Free | 100 | 10 | 10 |
| Starter | 1,000 | 50 | 100 |
| Pro | 5,000 | 100 | 500 |
| Enterprise | 50,000 | 500 | 3,000+ |
Burst Limit allows short bursts of requests exceeding the average rate. Useful for batch operations.
How Rate Limiting Works
Sliding Window
Bluma uses a sliding window algorithm:
Time: 10:00 10:15 10:30 10:45 11:00
Requests: [250] → [180] → [220] → [200] → [150]
Limit: 1000/hour
At 10:45:
Requests in last hour = 180 + 220 + 200 = 600
Remaining = 1000 - 600 = 400 ✓
This is more accurate than fixed windows and allows for smoother usage patterns.
Every API response includes rate limit information:
HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 987
X-RateLimit-Reset: 1699127600
| Header | Description |
|---|
X-RateLimit-Limit | Maximum requests allowed per hour |
X-RateLimit-Remaining | Requests remaining in current window |
X-RateLimit-Reset | Unix timestamp when limit resets |
const response = await fetch(url, options);
const rateLimit = {
limit: parseInt(response.headers.get('X-RateLimit-Limit')),
remaining: parseInt(response.headers.get('X-RateLimit-Remaining')),
reset: parseInt(response.headers.get('X-RateLimit-Reset'))
};
console.log(`${rateLimit.remaining}/${rateLimit.limit} requests remaining`);
// Check if approaching limit
if (rateLimit.remaining < 10) {
console.warn('Approaching rate limit!');
// Slow down or pause requests
}
Rate Limit Exceeded (429)
When you exceed your rate limit, you’ll receive a 429 Too Many Requests response:
{
"error": {
"type": "rate_limit_exceeded",
"title": "Rate Limit Exceeded",
"status": 429,
"detail": "You have exceeded the rate limit of 1,000 requests per hour.",
"metadata": {
"limit": 1000,
"retry_after": 3600,
"current_usage": 1000
},
"links": {
"docs": "https://docs.getbluma.com/concepts/rate-limits",
"upgrade": "https://getbluma.com/billing"
}
}
}
Additional Header:
The Retry-After header indicates how many seconds to wait before retrying.
Handling Rate Limits
1. Exponential Backoff
async function apiCallWithBackoff(url: string, options: RequestInit, maxRetries = 5) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const response = await fetch(url, options);
if (response.status === 429) {
const retryAfter = parseInt(response.headers.get('Retry-After') || '60');
const backoffTime = Math.min(retryAfter * 1000, Math.pow(2, attempt) * 1000);
console.log(`Rate limited. Retrying in ${backoffTime}ms...`);
await new Promise(resolve => setTimeout(resolve, backoffTime));
continue;
}
return response;
}
throw new Error('Max retries exceeded');
}
2. Request Queue
class RateLimitedQueue {
private queue: Array<() => Promise<any>> = [];
private processing = false;
private requestsThisHour = 0;
private hourStart = Date.now();
private limit = 1000;
async enqueue<T>(fn: () => Promise<T>): Promise<T> {
return new Promise((resolve, reject) => {
this.queue.push(async () => {
try {
const result = await fn();
resolve(result);
} catch (error) {
reject(error);
}
});
this.processQueue();
});
}
private async processQueue() {
if (this.processing || this.queue.length === 0) return;
this.processing = true;
while (this.queue.length > 0) {
// Reset counter if hour has passed
if (Date.now() - this.hourStart > 3600000) {
this.requestsThisHour = 0;
this.hourStart = Date.now();
}
// Check if under limit
if (this.requestsThisHour >= this.limit) {
const waitTime = 3600000 - (Date.now() - this.hourStart);
console.log(`Rate limit reached. Waiting ${waitTime}ms...`);
await new Promise(resolve => setTimeout(resolve, waitTime));
continue;
}
// Process next request
const task = this.queue.shift();
if (task) {
await task();
this.requestsThisHour++;
}
// Small delay between requests
await new Promise(resolve => setTimeout(resolve, 100));
}
this.processing = false;
}
}
// Usage
const queue = new RateLimitedQueue();
for (let i = 0; i < 100; i++) {
queue.enqueue(() => fetch(url, options));
}
3. Monitoring Usage
async function monitorRateLimit(response: Response) {
const remaining = parseInt(response.headers.get('X-RateLimit-Remaining') || '0');
const limit = parseInt(response.headers.get('X-RateLimit-Limit') || '0');
const reset = parseInt(response.headers.get('X-RateLimit-Reset') || '0');
const percentUsed = ((limit - remaining) / limit) * 100;
if (percentUsed > 90) {
console.error('⚠️ CRITICAL: 90%+ of rate limit used!');
// Alert, slow down, or pause
} else if (percentUsed > 75) {
console.warn('⚠️ WARNING: 75%+ of rate limit used');
}
// Log to monitoring service
metrics.gauge('api.rate_limit.remaining', remaining);
metrics.gauge('api.rate_limit.percent_used', percentUsed);
}
Per-Key vs Account-Wide
Rate limits are applied per API key, not per account. This allows you to:
- Create separate keys for different applications
- Isolate production from development traffic
- Scale horizontally with multiple keys
Example: Multiple Keys
// Production key (high traffic)
const prodKey = 'bluma_live_prod_key';
// Background jobs key (batch operations)
const batchKey = 'bluma_live_batch_key';
// Development key (testing)
const devKey = 'bluma_test_dev_key';
Each key has its own independent rate limit.
Upgrading Limits
Increase Your Tier
Higher tiers get higher rate limits:
Free → Starter: 100 → 1,000 req/hr (10x)
Starter → Pro: 1,000 → 5,000 req/hr (5x)
Pro → Enterprise: 5,000 → 50,000 req/hr (10x)
Upgrade at getbluma.com/billing
Custom Limits
Enterprise customers can request custom rate limits based on their specific needs. Contact sales@getbluma.com.
Best Practices
Check Headers
Monitor rate limit headers and adjust request rate dynamically
Implement Backoff
Use exponential backoff when receiving 429 responses
Cache Responses
Cache frequently accessed data (templates list, etc.) to reduce API calls
Batch Operations
Combine multiple operations when possible to reduce request count
Exceptions
Rate limits do not apply to:
✅ Webhook deliveries (server-initiated)
✅ OAuth token refresh (authentication)
✅ Health check endpoints
Rate limits do apply to:
❌ All /v1/* API endpoints
❌ OpenAPI spec endpoint (/v1/openapi.json)
Testing Rate Limits
Simulate Rate Limiting
Test your backoff logic using test keys with artificially low limits:
curl -X POST https://api.getbluma.com/api/v1/api-keys \
-H "Authorization: Bearer YOUR_SESSION_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Rate Limit Test Key",
"environment": "test",
"rate_limit_per_hour": 10
}'
Then make >10 requests to trigger rate limiting.
Frequently Asked Questions
Can I purchase additional rate limit capacity?
Do rate limits reset at a specific time?
No, rate limits use a sliding window. They reset continuously based on your request pattern.
What counts as a request?
Every HTTP request to /v1/* endpoints counts, regardless of success or failure.
Can I get rate limited in test mode?
Yes, test keys have the same rate limits as production keys of your tier. This helps you test rate limit handling logic.
Will WebSocket connections count against rate limits?
Bluma currently doesn’t support WebSockets. All communication is via HTTP REST API.
Troubleshooting
Issue: Constant 429 Errors
Causes:
- Making too many requests too quickly
- Multiple API keys from same account hitting shared limit
- Batch operations without rate limiting
Solutions:
- Implement request queueing
- Add delays between requests
- Upgrade to a higher tier
- Use exponential backoff
Issue: Unexpected Rate Limit
Causes:
- Previous requests in the sliding window
- Shared API key across multiple services
- Clock skew in reset time calculation
Solutions:
- Check
X-RateLimit-Remaining header
- Use separate API keys per service
- Monitor usage in dashboard
Monitoring
Track rate limit metrics in your usage dashboard:
- Current usage vs limit
- Historical rate limit hits
- Per-key usage breakdown
- Average requests per hour
Next Steps