Each Compose database has a Metrics tab which can display information about how each capsule in a database's deployment is behaving. The graphs are interactive and the user can scan through the chart for particular numbers at a point in time. You can set the view to a range of time periods from the last 30 minutes to the last seven days.
There are graphs for both HAProxy portals and the data member nodes on a deployment. Each graph has a name in the upper-left with the specific capsule it describes.
The screenshot here shows a MongoDB database having three import operations kicked off in succession. After the imports, the collections were deleted. A new import operation is then run. As we discuss the various metrics, we'll refer to these graphs to put them in context.
The top graph is the capsule running the primary database server which is receiving the imported data, the lower graph is the capsule running the secondary database server which is replicating it. The order of the graphs does not reflect the primary or secondary nature of the databases the capsules are running. Check on the deployment topology on the overview page and with the database configuration to confirm what roles the capsules are playing.
Values may appear more or less important in the graph
The data for capsules is presented for guidance. Due to the way that the Compose platform operates, various behaviors visible in the graphs can be expected behavior.
This is the absolute maximum memory that is available to this capsule. By design, we allocate as much memory as possible to any deployment, in excess of its actual provisioned levels. This is a hard limit. Databases may expand to fill that memory.
In the example above, that stays at just about 1GB of memory all the operations.
This tracks the overall memory usage within the capsules. Note that by design this will tend to be as full as possible; applications, cache and any other memory usage is included in the memory usage count.
In the example, the memory available rapidly gets consumed under the workload, making maximum use of the RAM while the operations are running.
Over the lifetime of the capsule, there will be times when data is written to the swap. This indicator shows the amount of available swap space being used by the process. We set the swappiness value of capsules so that it may swap out memory before memory limits have been hit; it makes for improved reliability and better resource use. This also means that swap usage with capsules is normal and it should only be of concern if the percentage used is very high.
The example shows the swap expanding slightly as various components are safely taken out of RAM. After the operations swap usage will drop as memory is freed up or allocated to the capsule in scaling.
The failcnt is the number of requests for more memory that were denied due to memory limits being hit. The value shown is the failcnt per second. This rate can rise and fall as a normal part of the operation of a capsule. It can peak during or just before a rescaling operation. A sustained high failcnt may represent an issue. Where a database deployment is scaled based on the amount of disk storage it has, a sustained high failcnt can mean that it needs more disk allocated to it, which will in turn increase RAM available to the capsules.
In the example at the top of the page, you can see a steady low level of failcnt as the application consumes memory and then gently pushes some data to swap. There's then a brief burst of a high failcnt as the second import begins, after which the memory pressure is relieved. Shortly after that, the imports and the delete operations finish and the system stabilizes with the memory being effectively used, a slightly larger swap and no failcnt being recorded. The same process re-occurs when another import is started.
If this article didn't solve things, summon a human and get some help
Updated almost 3 years ago