Database Optimization for Time-Series Data: Handling Billions of Pings

Uptime monitoring generates a deceivingly large amount of data.

1 Monitor * 1 Minute = 1,440 rows/day.
10,000 Monitors = 14.4 Million rows/day.
1 Year = 5.2 Billion rows.

If you throw this into a standard MySQL

with a B-Tree index, it will explode. The index won’t fit in RAM. Inserts will crawl. Deletes will lock the table for hours.

Here is how Cluster Uptime handles this volume responsibly.

1. The Write Pattern: Append-Only

Time-series data is (mostly) append-only. We rarely update old records. Traditional DBs optimize for random updates (ACID). We don’t need that overhead.

Optimization: Use an LSM (Log-Structured Merge-tree) based storage engine if possible (like RocksDB or SQLite’s WAL mode heavily tuned). Or, insert buffers. Buffer 5 seconds of metrics in RAM, then do a single INSERT INTO metrics VALUES (...), (...), (...). Batching reduces the transaction overhead by 100x.

2. Partitioning by Time

Deleting old data is the hardest operation for a database. DELETE FROM pings WHERE date < '2024-01-01' has to scan the index and mark pages as empty, causing fragmentation.

Solution: Table Partitioning. Create a new table for every month.

pings_2025_01
pings_2025_02

When January 2025 is “expired” (data retention policy), you simply DROP TABLE pings_2025_01. This is an instant O(1) file system operation. No row scanning. No fragmentation.

3. Downsampling (Rollups)

Do you need to know that your server responded in 45ms at 10:04:23 AM three years ago? No. You need high precision for recent data, and aggregates for old data.

The Strategy:

Raw Data (1-min resolution): Keep for 30 days.
Hourly Rollup (Average/P95): Keep for 1 year.
- Compress 60 rows -> 1 row.
Daily Rollup: Keep forever.
- Compress 24 rows -> 1 row.

This reduces your long-term storage requirements by 99.9% while keeping the trend data intact.

Conclusion

Data is heavy. Gravity applies. By using partitioning and downsampling, you can keep your monitoring database fast and lightweight, preventing it from collapsing under its own weight.

👨‍💻

Jesus Paz

Founder

Previous ← Migrating from UptimeRobot to Cluster Uptime: The Complete Guide Next Why "Simple" is Better for System Reliability: Simplicity as a Feature →

Database Optimization for Time-Series Data: Handling Billions of Pings

1. The Write Pattern: Append-Only

2. Partitioning by Time

3. Downsampling (Rollups)

Conclusion

Jesus Paz

Read Next

Scaling Your Monitoring Stack Horizontally: Infinite Growth

Monitoring Microservices: Sidecars, Daemons, and Centralized Checks

Why "Simple" is Better for System Reliability: Simplicity as a Feature

Join 1,000+ FinOps and platform leaders