Compose Platform Monitoring

The following is a non-definitive list of events and metrics that the Compose platform monitors. These are events that are fed to the operations team which can trigger a response or corrective action.

Platform Monitoring

These are events that can occur in the underlying infrastructure which hosts Compose database deployments.

  • Failed deployments
  • Failed backups
  • Failed backup restores
  • Failed version changes
  • Platform service failures (either underlying services or the application services)
  • Failed deployment scaling
  • Volume alerts
  • Host up/down
  • Host load
  • Cluster capacity for deployments

Database Monitoring

For all databases, we check

  • Cluster nodes are available and healthy
  • Capacity thresholds
  • Replication is not too slow
  • Service not running
  • Capsule connection status

Elasticsearch tests

+ HEAP status

  • Cluster node status
  • Missing shards
  • Number of nodes (HA)

PostgreSQL tests

  • Connection limits
  • Governor warnings (our high availability solution)
  • Replication lag

MongoDB tests

  • Mongo process is down
  • Replication lag
  • Missing shards

Redis tests

  • Sentinels missing / offline

MySQL tests

  • Data container availability (HA)
  • Replication health

