Aurora /docs/overview/rtsrvc
rtsrvc

Monitoring-service overview

The Ratatosk monitoring-service is an AURORA-service to handle the aggregation and serving of relevant metrics from the AURORA-system.

The service uses the Prometheus simple text format to export its metrics and is reachable through its host-address and port as defined in the AURORA-config file and only answering GET-requests and the path /metrics.

In addition to the metrics it aggregates from other services is the metrics it collects itself, such as cpu-, memory- and disk usage as relevant and available. These are not available in the metrics process folder, but aggregated directly from the system and only made available in the Prometheus export of the Monitoring-service. Examples of these are:

aurora_restsrvc_percent_cpu_utilization 0
aurora_restsrvc_seconds_cpu_utilization 1
aurora_restsrvc_percent_allocated_physical_memory 0.3
aurora_restsrvc_bytes_private_memory_used 58908672
system_percent_storage_use 56

As one can see the cpu- and memory aggregates are valid for specific services, while the disk use information is for the system as a whole and therefore prepended with the “system_”-string.

Folder Structure

The metrics that the monitoring-service aggregates are read from the monitoring folder location defined in the AURORA config-file. All relevant services of the AURORA-system exports its metrics in this folder location and appends the location path with its process id. Let assume process id 1234 and location “/media/ratatosk” and it will export thus:

/media/ratatosk/1234/

In each process folder that exports metrics, one will find unique metrics for that service, but also some shared identifiers. One of these are the “id”-tag, which tells the Monitoring-service who has exported the metrics in this process-folder (ie. which service)? They will also have “start”, “uptime”, “alive” and “stop” metrics in common. So eg.:

-rw-r--r-- 1 sys-aurora sys_aurora    0 mai   21 06:33  alive_number_1747816322.16009
-rw-r--r-- 1 sys-aurora sys_aurora    0 mai   21 06:33  avg-time-per-request_gauge_0.0486917421221733
-rw-r--r-- 1 sys-aurora sys_aurora    0 mai   21 06:33 'id_string_aurora restsrvc'
-rw-r--r-- 1 sys-aurora sys_aurora    0 mai   21 06:33  no-of-client-authentication-error_counter_1
-rw-r--r-- 1 sys-aurora sys_aurora    0 mai   21 06:33  no-of-client-input-error_counter_0
-rw-r--r-- 1 sys-aurora sys_aurora    0 mai   21 06:33  no-of-errors-total_counter_1
-rw-r--r-- 1 sys-aurora sys_aurora    0 mai   21 06:33  no-of-forks_counter_19
-rw-r--r-- 1 sys-aurora sys_aurora    0 mai   21 06:33  no-of-logins-failed_counter_1
-rw-r--r-- 1 sys-aurora sys_aurora    0 mai   21 06:33  no-of-logins-success_counter_6
-rw-r--r-- 1 sys-aurora sys_aurora    0 mai   21 06:33  no-of-requests-bad_counter_0
-rw-r--r-- 1 sys-aurora sys_aurora    0 mai   21 06:33  no-of-requests-total_counter_12
-rw-r--r-- 1 sys-aurora sys_aurora    0 mai   21 06:33  no-of-rest-calls-errors-total_counter_0
-rw-r--r-- 1 sys-aurora sys_aurora    0 mai   21 06:33  no-of-rest-calls-success-total_counter_6
-rw-r--r-- 1 sys-aurora sys_aurora    0 mai   21 06:33  no-of-rest-calls-unknown_counter_0
-rw-r--r-- 1 sys-aurora sys_aurora    0 mai   21 06:33  no-of-success-total_counter_11
-rw-r--r-- 1 sys-aurora sys_aurora    0 mai   21 06:33  start_number_1747809185.6006
-rw-r--r-- 1 sys-aurora sys_aurora    0 mai   21 06:33  uptime_number_1747809185.6006

The filename in the location tells us all the information we need. It starts with the metric variable name, followed by the metrics type (counter, gauge, number etc.) and that followed by the value of the variable.

In the example above, one can see that the id of the process folder is “aurora restsrvc”, that is the AURORA REST-server.

Process folders that have the end-tag written will get its metrics harvested and then the folder will be removed from the metrics folder location. This is done on the assumption that a service ending its metrics export will be replaced with a new process id taking over its role so the old will not be needed anymore. Either this or the service has stopped and it will in any case be relevant to remove it accordingly.

In cases where the process has ended abruptly and there is no “end”-tag written the Ratatosk Monitoring-service will use a heuristic approach based on the alive-tag and the process-id. In cases where there exists multiple folders for the same “id”-tag or service, the Monitoring-service will check the alive-tag on all and choose the folder that has the latest updated information until the heuristic approach described above can have the old and non-relevant process folders removed.

Mostly one does not need to relate to what happens inside this Monitoring-service folder hierarchy, but a general understanding is useful.

Export endpoint

The endpoint of the Ratatosk Monitoring-service is available through its set host- and port-number in the AURORA config-file. If we assume host-name “aurora.mydomain.org” and the default Prometheus-port of 9090, it will be available through:

http://aurora.mydomain.org:9090/metrics

It only accepts non-encrypted http, GET-request and only the path “/metrics”. This should render the metrics of the system in the Prometheus simple text format as a HTTP-response.


For further questions, contact hjelp.ntnu.no