Compose Platform Monitoring
The following is a non-definitive list of events and metrics that the Compose platform monitors. These are events that are fed to the operations team which can trigger a response or corrective action.
Platform Monitoring
These are events that can occur in the underlying infrastructure which hosts Compose database deployments.
- Failed deployments
- Failed backups
- Failed backup restores
- Failed version changes
- Platform service failures (either underlying services or the application services)
- Failed deployment scaling
- Volume alerts
- Host up/down
- Host load
- Cluster capacity for deployments
Database Monitoring
For all databases, we check
- Cluster nodes are available and healthy
- Capacity thresholds
- Replication is not too slow
- Service not running
- Capsule connection status
Elasticsearch tests
+ HEAP status
- Cluster node status
- Missing shards
- Number of nodes (HA)
PostgreSQL tests
- Connection limits
- Governor warnings (our high availability solution)
- Replication lag
MongoDB tests
- Mongo process is down
- Replication lag
- Missing shards
Redis tests
- Sentinels missing / offline
MySQL tests
- Data container availability (HA)
- Replication health
Still Need Help?
If this article didn't solve things, summon a human and get some help!
Updated over 3 years ago