DoiT Cloud Intelligence™

Firestore scaling: the 500/50/5 rule and how to test it

By Matthias BaetensOct 28, 20246 min read
Firestore scaling: the 500/50/5 rule and how to test it

In the world of NoSQL cloud databases, Firestore stands out as a flexible and scalable solution for mobile, web, and server development. However, even with its impressive capabilities, there’s a common misconception that Firestore can handle any load thrown at it without breaking a sweat. While theoretically true, the reality is a bit more nuanced. Imagine you’re launching your latest and greatest feature and have experienced huge spikes in traffic in the past. Or, you know your user patterns and at a specific part of the day, you expect a severely increased load. In this post, we’ll explore a common Firestore scaling problem and introduce you to a practical way to test and avoid it.

The 500/50/5 Rule: A Gentle Introduction to Firestore

Firestore is designed for scale, but like any elastic and distributed system, it needs time to adjust to increasing loads. This is where the 500/50/5 rule comes into play:

Start with a maximum of 500 operations per second to a new collection, then increase traffic by 50% every 5 minutes.

This guideline ensures that Firestore’s internal scaling mechanisms can keep up with your growth, preventing common issues like high latency or DEADLINE_EXCEEDED errors.

Enter k6: Your Load Testing Ally

To illustrate the importance of the 500/50/5 rule, we’ve created a script using k6, an open-source load-testing tool. k6is an excellent choice for several reasons:

  • It’s simple to use with a JavaScript-based scripting language.
  • It provides real-time performance metrics and detailed insights.
  • It’s highly scalable, capable of generating 100,000–300,000 requests per second from a single instance.

The script

The script can be found here. An overview of what it does:

Initial and Target Load:

  • Starts at 500 requests per second (RPS)
  • Aims to reach 1500 RPS (you can play with this of course)

Ramping Strategy:

  • Maintains each load level for 5 minutes (300 seconds)
  • Increases load by 50% over 1-minute periods
  • Continues this pattern until reaching or exceeding the target RPS

Dynamic Stage Generation:

  • Automatically calculates the number of stages needed
  • Creates a series of alternating ‘stable’ and ‘ramp-up’ stages
  • Logs each stage’s target RPS and duration for clarity

Document Selection:

  • Reads document IDs from a file (‘orders.txt’)
  • Randomly selects a document ID for each request
  • You will need to get these document IDs for your own use case as I generated a fake dataset

Request Execution:

  • Performs GET requests to the Firestore REST API
  • Includes authentication via a bearer token
  • I have included a script that gets a token for you as well

Performance Monitoring:

  • Tracks successful reads and errors
  • Logs any non-200 status codes with details

You can run the script using k6 run warm-up.js after installingk6(using e.g.brewif you are on Mac). You can get a token for the script using generate-firebase-token.py. For both scripts there are a few variables to update, please use “Find” in your editor and search for INSERT.

The Experiment: Success vs. Failure

We ran two experiments to demonstrate the impact of following (or ignoring) the 500/50/5 rule, allowing you to see the difference in action:

Experiment 1: Setting Up for Failure

In this scenario, we started with 2000 requests per second (RPS) and ramped up to 2500 RPS over 5 minutes, completely disregarding the 500/50/5 rule.

```js
// Warmup parameters
const initialRPS = 2000;
const targetRPS = 2500;
const stablePeriodSeconds = 300; // 5 minutes
const rampPeriodSeconds = 0;
const stageCount = Math.ceil(Math.log(targetRPS / initialRPS) / Math.log(1.5));
```
Ran between 1/1/10 between 0110 and 0115 CEST
Results:
```bash
INFO[0335] Warmup Stages:                                source=console
INFO[0335] Stage 1: Target RPS: 2500, Duration: 300s     source=console
     ✗ status is 200
      ↳  4% — ✓ 6408 / ✗ 123474
     checks.........................: 4.93%  ✓ 6408       ✗ 123474
     data_received..................: 23 MB  70 kB/s
     data_sent......................: 4.6 MB 14 kB/s
     dropped_iterations.............: 1      0.003028/s
     errors.........................: 123474 373.866077/s
     http_req_blocked...............: avg=473.49ms min=0s       med=0s     max=59.9s  p(90)=0s     p(95)=0s
     http_req_connecting............: avg=287.6ms  min=0s       med=0s     max=38.39s p(90)=0s     p(95)=0s
     http_req_duration..............: avg=806.08ms min=0s       med=0s     max=1m3s   p(90)=0s     p(95)=2.01s
       { expected_response:true }...: avg=12.62s   min=311.44ms med=9.48s  max=1m0s   p(90)=30.92s p(95)=36.56s
     http_req_failed................: 95.06% ✓ 123474     ✗ 6409
     http_req_receiving.............: avg=82.85ms  min=0s       med=0s     max=59.4s  p(90)=0s     p(95)=30µs
     http_req_sending...............: avg=651.35µs min=0s       med=0s     max=8.82s  p(90)=0s     p(95)=92µs
     http_req_tls_handshaking.......: avg=261.72ms min=0s       med=0s     max=57.6s  p(90)=0s     p(95)=0s
     http_req_waiting...............: avg=722.58ms min=0s       med=0s     max=1m2s   p(90)=0s     p(95)=1.89s
     http_reqs......................: 129883 393.271845/s
     iteration_duration.............: avg=32.65s   min=2.58µs   med=33.98s max=1m12s  p(90)=48.47s p(95)=51.64s
     iterations.....................: 129883 393.271845/s
     successful_reads...............: 4.93%  ✓ 6408       ✗ 123474
     vus............................: 47     min=0        max=25000
     vus_max........................: 25000  min=4179     max=25000
running (5m30.3s), 00000/25000 VUs, 129882 complete and 21 interrupted iterations
firestore_warmup ✓ [======================================] 00021/25000 VUs  5m0s  2105.47 iters/s
```

What this looks like in Key Visualiser:

The results? A success rate under 5%. Ouch.

Experiment 2: Setting Up for Success

For this test, we adhered to the 500/50/5 rule, starting at 500 RPS and gradually increasing to 1500 RPS over about 20 minutes.

```js
// Warmup parameters
const initialRPS = 500;
const targetRPS = 1500;
const stablePeriodSeconds = 300; // 5 minutes
const rampPeriodSeconds = 60; // 1 minute
const stageCount = Math.ceil(Math.log(targetRPS / initialRPS) / Math.log(1.5));
```

Ran between 1/1/10 between 0140 and 0158 CEST

Results:
```bash
INFO[1111] Warmup Stages:                                source=console
INFO[1111] Stage 1: Target RPS: 500, Duration: 300s      source=console
INFO[1111] Stage 2: Target RPS: 750, Duration: 60s       source=console
INFO[1111] Stage 3: Target RPS: 750, Duration: 300s      source=console
INFO[1111] Stage 4: Target RPS: 1125, Duration: 60s      source=console
INFO[1111] Stage 5: Target RPS: 1125, Duration: 300s     source=console
INFO[1111] Stage 6: Target RPS: 1500, Duration: 60s      source=console

     ✗ status is 200
      ↳  99% — ✓ 863739 / ✗ 231

     checks.........................: 99.97% ✓ 863739     ✗ 231
     data_received..................: 1.5 GB 1.4 MB/s
     data_sent......................: 173 MB 156 kB/s
     dropped_iterations.............: 20999  18.915827/s
     errors.........................: 231    0.208084/s
     http_req_blocked...............: avg=50.75ms  min=0s       med=0s       max=41.09s p(90)=1µs      p(95)=1µs
     http_req_connecting............: avg=35.83ms  min=0s       med=0s       max=29.93s p(90)=0s       p(95)=0s
     http_req_duration..............: avg=554.84ms min=0s       med=334.66ms max=1m0s   p(90)=728.85ms p(95)=1.29s
       { expected_response:true }...: avg=554.07ms min=304.44ms med=334.66ms max=59.46s p(90)=728.84ms p(95)=1.29s
     http_req_failed................: 0.02%  ✓ 231        ✗ 863739
     http_req_receiving.............: avg=68.05ms  min=0s       med=6.92ms   max=59.42s p(90)=21.82ms  p(95)=160.12ms
     http_req_sending...............: avg=288.4µs  min=0s       med=32µs     max=12.11s p(90)=89µs     p(95)=150µs
     http_req_tls_handshaking.......: avg=16.93ms  min=0s       med=0s       max=46.68s p(90)=0s       p(95)=0s
     http_req_waiting...............: avg=486.5ms  min=0s       med=327.66ms max=1m0s   p(90)=615.6ms  p(95)=943.67ms
     http_reqs......................: 863970 778.261194/s
     iteration_duration.............: avg=609.81ms min=2.2µs    med=334.94ms max=1m0s   p(90)=748.42ms p(95)=1.35s
     iterations.....................: 863970 778.261194/s
     successful_reads...............: 99.97% ✓ 863739     ✗ 231
     vus............................: 14     min=14       max=5720
     vus_max........................: 5849   min=1000     max=5849

running (18m30.1s), 00000/05849 VUs, 863970 complete and 14 interrupted iterations
firestore_warmup ✓ [======================================] 00014/05849 VUs  18m0s  1499.93 iters/s
```

What this looks like in Key Visualiser:

Start of scaling

End of scaling

The outcome? An impressive 99.97% success rate.

Running Your Own Tests

We can run the script with k6 locally which has a few advantages like easy set-up, no costs, etc. However, you might be limited by the resources of your local machine, have only one instance, and results might be affected by your network constraints. For more accurate results, it might make sense to run the script on a (Google Cloud) VM.

The 500/50/5 rule isn’t just a suggestion — it’s a crucial guideline for ensuring your Firestore implementation scales smoothly and efficiently. By following this rule and using tools like k6 to test your scaling strategies, you can avoid performance pitfalls and keep your application running smoothly as it grows.

Remember, when it comes to database scaling, slow and steady wins the race. Happy scaling!

— -

Want to dive deeper into Firestore scaling or need help optimizing your cloud infrastructure? Visit doit.com to learn how we can assist you in maximizing your cloud potential.