The Deadliest Duo of Prometheus and Grafana

Observability with two popular open-source tools

Sigrid Jin
5 min readFeb 18, 2023
The deadliest duo in Premier League history! Heung-Min Son & Harry Kane combine to break the all-time record for, and the same happens in cloud monitoring scene — Prometheus and Grafana https://twitter.com/spursofficial/status/1497577949247295490?lang=ko

Prometheus and Grafana are widely used open-source tools that aid in monitoring and observability. The former tool functions as a database for time-series data while collecting and storing metrics information in TSDB format. Prometheus was initially created on SoundCloud in 2012 and eventually emerged as one of the most widely adopted monitoring tools in the tech industry thanks to its high scalability and efficiency. The system can proficiently gather and store millions of metrics from a plethora of sources. The pull-based approach of Prometheus enables it to actively query the plethora of targets for collecting necessary data for DevOps, which enables them to operate in a highly adaptable and flexible manner to different server environments.

Prometheus

Prometheus utilizes an internal TSDB for storing data. Although TSDB was originally a separate Github repository, it has now been merged into Prometheus project. The TSDB module is crucial in defining how Prometheus manages storage. To understand its functioning, one would need to examine the db.go file in the tsdb directory.

To gain a better understanding of Prometheus storage, it is essential to investigate how the storage is stored.

// Reference: https://blog.naver.com/PostView.nhn?blogId=alice_k106&logNo=221829384846

root@sigridjin:/home/ubuntu/prometheus$ tree
.
...
├── 01E24YV8YFHGSES1WJ92FYMG6K -> Chunk & Metadata
│ ├── chunks
│ │ └── 000001 -> block Chunk file
│ ├── index -> label for indexing & inverted index filed
│ ├── meta.json -> block metadata
│ └── tombstones -> is deleted?
├── 01E24YX3HF1GTZ4CPF24XZ3MKR
│ ├── chunks
│ │ └── 000001
│ ├── index
│ ├── meta.json
│ └── tombstones
├── lock -> lock file
├── queries.active -> saves the query in progress
└── wal -> WAL (write ahead logging) files
├── 00006966 -> WAL file
├── 00006967
├── 00006968
├── 00006969
├── 00006970
└── checkpoint.006965 -> restore checkpoint files
└── 00000000

16 directories, 36 files

There are two main methods in which Prometheus stores data:

  1. Local file system: uses chunk block files
  2. In-memory: uses WAL (Write Ahead Logging) files and in-memory buffer

Prometheus doesn't store record data right away in storage like databases like InfluxDB do. Instead, the incoming data is stored in the in-memory buffer until a new record exceeds the current memory page size of 32KB. At this point, the current page is flushed to the WAL file. The data is primarily stored in memory, but backed up periodically to the WAL file. This type of storage space is commonly referred to as the Head Block.

The WAL file where the data is backed up can take up to a maximum of 128MB, and a new WAL file is created once it exceeds this size. The purpose of the WAL file is to prevent the loss of in-memory data in case of abnormal termination or crashes. When this happens, the checkpoint.XXXXX file in the WAL directory serves as the reference point for reading the WAL file and recovering the original data via a replay operation.

It is important to note that network storage can result in the corruption of the checkpoint or WAL files, which can lead to significant problems. This can cause Prometheus to continually fail to restart and lead to abnormalities in the creation of new chunk blocks. In such cases, the only option may be to delete the data and start over. Therefore, using NVMe or local storage may be a wise choice for stability purposes.

Prometheus stores data in memory and WAL, periodically flushing data to chunk blocks and deleting the oldest WAL file. Each chunk block is represented by a directory with a strange string of characters, such as “01E24Y.” The directory stores the inverted index file along with a record of a specific time window. When querying Prometheus, the record index for multiple chunk blocks is combined to provide the results for the user. As each index file and block data contain labels and time zones, a full scan is not necessary, and the index and block data are mapped to memory as required.

Interestingly, the blocks do not continue to increase in number over time. Depending on the configuration options set for Prometheus, blocks can be merged or deleted according to retention policies. Two options related to this include the — storage.tsdb.min-block-duration option, which refers to the time window for data stored in one block, and the — storage.tsdb.max-block-duration option, which refers to the maximum time window that can be stored in one block. The default value is set to 10% of the block retention time set by the — storage.tsdb.retention.time option.

When provisioning Prometheus, determining the amount of memory and storage capacity to allocate is the most critical decision to make. Initially, it is difficult to estimate the exact capacity required, as it depends on the size of each data sample, the number of scrapes, and the label cardinality. This task is made more challenging by the fact that while Prometheus configuration can control targets, Kubernetes allows scraping through annotations, making it difficult to visualize the number of scrapes at a glance.

However, once an estimate is made, Prometheus’ memory requirements can be roughly calculated using robust perception, which also considers the Head Block data residing in memory in addition to reading chunk block data through mmap. Therefore, it is generally recommended to allocate a large amount of memory. If memory usage leads to an OOM error, sharding using hashmod may be considered.

Required Capacity = Retention Time * Data Ingestion Rate per Second * Data Size per Ingestion (usually 10–30kb) — Prometheus Official Documentation

Grafana

The second tool, on the other hand, is a visualization tool that lets users make dashboards and charts based on the data stored in Prometheus, but it is not limited to a single data source. Grafana is frequently utilized in conjunction with Prometheus, where the major feature is creating informative dashboards that display real-time metrics data in graphs, charts, and tables.

Using them together, they provide a comprehensive solution for monitoring and analyzing real-time data in a microservices environment, covering the entire end-to-end process. Prometheus can effortlessly store and collect vast amounts of data, while Grafana enables easy visualization and analysis of the data provided by Prometheus. This approach enables quick identification of trends and anomalies and prompts action to resolve issues. Prometheus provides its own /metrics endpoint, which can be used to collect important metrics indicating the proper functioning of Prometheus. This endpoint allows access to metrics such as block creation/truncation, which are valuable metrics to monitor. Once understood, these metrics can be visualized in Grafana for further analysis and observation.

In addition to the primary capabilities, both tools offer numerous other features, like in-built alerting mechanisms that enable users to set rules triggering notifications when particular conditions are met. Grafana has rich plugins/themes developed and went open by other fellow DevOps that allow users to broaden its functionalities and integrate it with other tools and services.

References

--

--