Many new Loki users have no idea where to start with the distributor, Ingester, Querier, and various dependent third-party stores. In addition, the official documentation for cluster deployment is sketchy, which makes the deployment difficult for beginners. In addition to the official HELM, there is a cluster deployment pattern for the production environment hidden in the Production directory of the Loki repository.
The community uses Docker-compose to quickly pull up a Loki cluster. We will not be so stupid as to deploy docker-swarm on a node for docker-compose when it is officially implemented in production. However, the architecture and configuration files for Loki are worth studying.
So what makes this solution special compared to a purely distributed Loki cluster? First, let’s take a look at the following picture:
As you can see, there are three obvious differences:
- The Loki core services Distributor, Ingester, and Querier are not separated but start in an instance;
- Discarding kv storage outside consul and ETCD, memberList is used to maintain cluster status directly in memory;
- Use boltdb-shipper instead of other log indexing schemes
As a result, the overall architecture of the Loki cluster is clearer and less dependent on external systems. To summarize, in addition to the sticky S3 storage for chunks and indexes, a caching service is needed to speed up log queries and writes.
After Loki version 2.0, there has been a great reconstruction of boltDB to store indexes. The new BoltDB-Shipper mode can store indexes of Loki on S3, and completely get rid of Cassandra or Google BigTable. Horizontal scaling of services will become easier after that. For more details about bolt-shipper, see: grafana.com/docs/loki/l…
Say so mysterious, then let’s take a look at the configuration of this program what is not the same?
The original part
memberlist
memberlist:
join_members: ["loki-1", "loki-2", "loki-3"]
dead_node_reclaim_time: 30s
gossip_to_dead_nodes_time: 15s
left_ingesters_timeout: 30s
bind_addr: ['0.0.0.0']
bind_port: 7946
Copy the code
Loki memberList uses the Gossip protocol to achieve final consistency among all nodes in the cluster. This part of the configuration is almost always protocol frequency and timeout control, leave the default
ingester
ingester:
lifecycler:
join_after: 60s
observe_period: 5s
ring:
replication_factor: 2
kvstore:
store: memberlist
final_sleep: 0s
Copy the code
Ingester’s state is synchronized to all members of the cluster using the Gossip protocol, and ingester’s replication factor is set to 2. That is, a log stream is written simultaneously to both Ingster services to ensure data redundancy.
extension
The native part of the cluster mode configuration of the community is still not enough, except for memberList’s slightly more sincere configuration, the rest is still not enough for our production environment requirements. Here xiao Bai simple transformation, to share with you.
storage
Unified STORAGE of Index and chunks into S3 object storage management, so that Loki can completely get rid of three-party dependence.
schema_config:
configs:
- from: 2021-04-25
store: boltdb-shipper
object_store: aws
schema: v11
index:
prefix: index_
period: 24h
storage_config:
boltdb_shipper:
shared_store: aws
active_index_directory: /loki/index
cache_location: /loki/boltdb-cache
aws:
s3: s3://<S3_ACCESS_KEY>:<S3_SECRET_KEY>@<S3_URL>/<S3__BUCKET>
s3forcepathstyle: true
insecure: true
Copy the code
It’s worth noting that the type of log flow index used is bolt_shipper, which can be written to S3 using shared storage. Active_index_directory is the Bucket path on S3, and cache_location is the cached data for Loki’s local Bolt index.
In fact, the index path ingester uploades to S3 is
/index/
redis
The native solution does not provide caching, here we introduce Redis to do the query and write caching. For many partners who struggle with the use of one Redis or multiple Redis alone, this depends on the size of your cluster, in the case of small, one Redis instance is enough to meet the needs.
query_range:
results_cache:
cache:
redis:
endpoint: redis:6379
expiration: 1h
cache_results: true
index_queries_cache_config:
redis:
endpoint: redis:6379
expiration: 1h
chunk_store_config:
chunk_cache_config:
redis:
endpoint: redis:6379
expiration: 1h
write_dedupe_cache_config:
redis:
endpoint: redis:6379
expiration: 1h
Copy the code
ruler
Since Loki has done the clustering deployment, of course, the service of Ruler has to follow the syncopation. It is unacceptable that this part of the community should be missing. So we have to complete it ourselves. We know that the ruler of the log can be written on the S3 object store, and each ruler instance is also assigned its own rules through a consistent hash ring. Therefore, we can refer to this part of the configuration as follows:
ruler:
storage:
type: s3
s3:
s3: s3://<S3_ACCESS_KEY>:<S3_SECRET_KEY>@<S3_URL>/<S3_RULES_BUCKET>
s3forcepathstyle: true
insecure: true
http_config:
insecure_skip_verify: true
enable_api: true
enable_alertmanager_v2: true
alertmanager_url: "http://<alertmanager>"
ring:
kvstore:
store: memberlist
Copy the code
Support kubernetes
Finally, the most important thing is to have the official Loki clustering solution deployed in Kubernetes, otherwise it’s all nonsense. Due to space constraints, I submitted the manifest to Github, and everyone directly clone to the local deployment.
GitHub address: github.com/CloudXiaoba…
The MANIFEST relies on only one S3 object store, so make sure you have accesskeys and secretkeys for the object store in advance when deploying to production. After configuring them into installation.sh, execute the script directly to start the installation.
The ServiceMonitor in the file is a Prometheus Operator Metrics service discovery for Loki, which you can optionally deploy
conclusion
This article introduces an official cluster deployment solution in Loki production environment, and adds some extended configurations such as cache, S3 object storage, and ADAPTS the official Docker-compose deployment mode to Kubernetes. The official solution effectively simplifies the complex structure of Loki distributed deployment, which is worth learning.
“Cloud Born white”