Kubernetes logging at scale — PLG Stack
The 6th PKOS Kubernetes Study Excerpt
In modern microservice architectures, monitoring and analyzing application logs has become essential to ensure smooth functioning and rapid troubleshooting. It is often sufficient to view log output in real-time using tools like the
kubectl logs — tail command.
However, in a production environment, managing and analyzing a large volume of log messages from multiple containerized applications executing in multiple pods can be challenging. In cloud-native environments, where microservices are standard, there is a need to collect, understand, and investigate millions of logs from different sources to gain a comprehensive understanding of the application’s runtime.
Loki is a lightweight log aggregator that is particularly suited for collecting and storing application logs. It can be easily integrated with Grafana, a popular visualization tool, to provide a comprehensive solution for monitoring and alerting. Unlike Prometheus, which primarily focuses on metrics related to application performance monitoring, such as
requests_counts, Loki is specifically designed for capturing logs.
In addition, Loki supports multiple deployment modes that allow for flexibility in managing the system based on specific requirements.
The default deployment mode for Loki is monolithic which the binaries of Loki run in a single process or in Docker container. For high scalability and efficient performance, Loki also supports a microservices deployment mode. In this mode, the Loki components can be distributed across multiple systems, providing better resource utilization and more flexibility in scaling the system.
Loki is designed with a distributed architecture that consists of five different components, each with a specific role in the log management process.
The first characteristic is the distributor, which is a stateless component that acquires log data and forwards it to the ingester. The distributor is responsible for pre-processing the data, checking its validity, and ensuring that it originates from a configured tenant, which helps the system scale and protects it from potential denial-of-service attacks.
To push application logs to Loki, we can use Promtail, which is a recommended distributor agent for Loki that is specifically designed for collecting records from various sources and forwarding them to it.
The second characteristic is the Ingester. The ingester is the central component of the Loki architecture. It writes log data received from distributors to a cloud-native long-term storage service, such as object storage (Amazon S3) or a cloud-native database (Amazon RDS). It also collaborates with queries to return in-memory data in response to read requests.
The querier component interprets LogQL query requests and fetches the data either from ingesters. It can handle distributed queries across multiple ingesters and provides a unified view of log data to the user. In addition, Ruler performs the alerting and recording features for Loki. The ruler component continually evaluates a set of queries and takes a defined action based on the results like sending an alert or pre-specified actions.
The optional Query Frontend component, provides API endpoints that can be used to accelerate read processing. It optimizes read processing by queuing read requests, splitting large requests into multiple smaller ones, and caching data.
One key differentiating feature of Loki is that it does not perform full-text indexing on log data. Instead, it leverages the concept of labels, which it borrows from Prometheus, to extract and tag information from the log data. Loki then indexes only the labels themselves, which results in significant performance improvements on both the write and read paths.
By using labels, Loki can provide a consistent label taxonomy regardless of the input source, which can be invaluable in cloud-native environments where log data may be generated from multiple sources. With labels, users can more easily filter and search for relevant log data and gain valuable insights into the performance and behaviour of their applications.
Another key feature that should not be missed from Loki is that it uses a cloud service to store log data in chunks, which contain the raw log data, and indexes, which contain the normalized and indexed labels and data extracted from log records. By using this approach, Loki can improve query processing efficiency, as queriers can use the more efficient indexes to find the requested chunked log data.
To query log data in Loki, the system implements a log query language called LogQL, which is heavily inspired by Prometheus’ PromQL language. LogQL can be used directly or via a Grafana front-end dashboard, providing users with a consistent query language for both logs and metrics.
The combination of Promtail, Loki, and Grafana is often referred to as the PLG stack — Promtail for log stream acquisition, Loki for aggregation, storage and querying, and Grafana for visualization.
Promtail operates in a manner similar to Prometheus’ own scraper, and its configuration files are syntactically identical to those used by Prometheus. In addition, it can extract logs from Docker container log files and send them to Loki, where they can be further processed and analyzed by Grafana. When we start the services using docker-compose, Promtail automatically starts collecting logs and sending them to Loki. In other words, Promtail is designed to tail the Kubernetes master and pod log files and forward them to the core Loki system.
To view the logs in Grafana, you can select Loki Log Browser as the data source in the Explore panel. From there, we can filter the logs by various labels to isolate the relevant information. This setup allows for seamless and efficient log monitoring and analysis, which is essential in microservice architectures.
By leveraging LogQL and the efficient indexing of log data, Loki offers a powerful and flexible log management solution that enables users to gain deeper insights into their applications’ performance and behaviour. This can be particularly valuable in cloud environments, where complex and distributed applications generate vast amounts of log data, and efficient log management and analysis are critical for maintaining system health and performance.