This is the 7th day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.
Hello everyone, I’m Zhang Jintao.
Prometheus, which has become almost the de facto standard for monitoring and selection in the cloud native era, is the second CNCF project to graduate.
Currently, Prometheus can meet monitoring requirements for almost any scenario/service. I’ve written about Prometheus and its ecology before, but in this article we’ll focus on the Agent model released in the latest version of Prometheus, and I’ll skim over some concepts or uses that are not related to this topic.
Pull and Push modes
Prometheus is known as a Pull monitoring system, which is distinct from the traditional push-based monitoring system.
What is a Pull pattern?
The monitored service itself or through a few interfaces exposed to metrics, which Prometheus proactively and periodically grabs and collects, is called Pull mode. That is, the monitoring system actively pulls the metrics of the target.
The corresponding mode is Push mode.
The application program proactively reports its metrics, and the monitoring system processes the metrics accordingly. If you want to use Push mode for monitoring certain applications, for example because it is not easy to implement the metrics interface, consider using Pushgateway.
There’s a debate going on about which is better: Pull versus Push. If you’re interested, do a search.
The focus here is on the manual way in which individual Prometheus and application services interact. In this article, we’ll look at how Prometheus is currently doing HA, persistence, and clustering from a higher level or global perspective.
Prometheus HA/ Persistence/clustering scheme
When used in mass production environments, it is rare for a system to have a single instance of Prometheus. Running multiple instances of Prometheus is common, whether in terms of high availability, data persistence, or providing an easier global view for users.
At present, Prometheus has three main methods for aggregating data from multiple Prometheus instances and providing users with a unified global view.
- Federation: The earliest data aggregation scheme built into Prometheus. In this scheme, a central Prometheus instance can be used to fetch indicators from leaf Prometheus instance. In this scheme, the original timestamps of metrics can be retained and the overall situation is relatively simple.
- Prometheus Remote Read: Raw metrics can be Read from a Remote store, note that there are several options for Remote stores. When the data is read, it can be aggregated and presented to the user;
- Prometheus Remote Write: Writes metrics collected by Prometheus to Remote storage. Users can read data directly from remote storage and provide global view.
Prometheus Agent model
Prometheus Agent is a feature provided by Prometheus V2.32.0 using Prometheus Remote Write as described above. Writes data from Prometheus instances with Agent mode enabled to the remote storage. Remote storage is used to provide a global view.
Front rely on
Since it uses Prometheus Remote Write, we need to prepare a “Remote store” for Metrcis’s centralized store. Here we use Thanos to provide this capability. Of course, if you want to use other solutions, such as Cortex, influxDB, etc.
Preparing remote storage
Here we deploy directly using the latest Thanos version of the container image. Here we use the host network to facilitate testing.
After perform these commands, Thanos receive will be monitored at http://127.0.0.1:10908/api/v1/receive to accept a “remote writing”.
➜ cdPrometheus ➜ Prometheus docker run -d --rm \ -v $(pwd)/receive data: / receive/data \ -.net = host \ - name receive \ quay IO/thanos/thanos: v0.23.1 \ receive \ -- TSDB. Path"/receive/data" \
--grpc-address 127.0.0.1:10907 \
--http-address 127.0.0.1:10909 \
--label "receive_replica=\"0\"" \
--label "receive_cluster=\"moelove\""\ - remote - write the address 127.0.0.1:10908 59498 d43291b705709b3f360d28af81d5a8daba11f5629bb11d6e07532feb8b6 ➜ Prometheus Docker ps -l CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 59498 d43291b quay. IO/thanos/thanos: v0.23.1"/ bin/thanos receive..." 21 seconds ago Up 20 seconds receive
Copy the code
Preparing query Components
Next we start a Thanos Query component that connects to the Receive component to query the written data.
➜ Prometheus docker run - d - rm \ -.net = host \ - name query \ quay IO/thanos/thanos: v0.23.1 \ query \ - HTTP - the address"0.0.0.0:39090" \
--store "127.0.0.1:10907"10 c2b1bf2375837dbda16d09cee43d95787243f6dcbee73f4159a21b12d36019 ➜ Prometheus docker ps -l CONTAINER ID IMAGE COMMAND CREATED the STATUS PORTS NAMES 10 c2b1bf2375 quay. IO/thanos/thanos: v0.23.1"/ bin/thanos query -..." 4 seconds ago Up 3 seconds query
Copy the code
Note: Here we configure the –store field, which points to the receive component earlier.
Open a browser to http://127.0.0.1:39090/stores, if together well, you should see the receive has been registered in the store.
The Prometheus Agent mode is deployed
Here I downloaded the latest version of Prometheus, v2.32.0, as a binary file directly from its Release page. When you unzip it, you’ll find that the contents in the directory are the same as in the previous version.
This is because the Prometheus Agent mode is now built into the Prometheus binary, which can be enabled by adding the –enable-feature= Agent option.
Preparing configuration Files
We need to prepare a configuration file for it, note that remote_write needs to be configured and no alerting or anything like that
global:
scrape_interval: 15s
external_labels:
cluster: moelove
replica: 0
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
remote_write:
- url: 'http://127.0.0.1:10908/api/v1/receive'
Copy the code
Save the configuration file as prometheus.yml
Start the
We set its log level to DEBUG to see some of its details
➜ ./prometheus --enable-feature=agent --log.level=debug --config.file="prometheus.yml"Ts = 2021-11-27 T19:03:15. 861 zcaller=main.go:195 level=info msg="Experimental agent mode enabled."Ts = 2021-11-27 T19:03:15. 861 zcaller=main.go:515 level=info msg="Starting Prometheus" version="(version = 2.32.0 - beta. 0, branch = HEAD, revision = c32725ba7873dbaa39c223410043430ffa5a26c0)"Ts = 2021-11-27 T19:03:15. 861 zcaller=main.go:520 level=info build_context="(go = go1.17.3, user = root @ da630543d231, date = 20211116-11:23:14)"Ts = 2021-11-27 T19:03:15. 861 zcaller=main.go:521 level=info host_details=X86_64 #1 SMP Fri Nov 12 16:48:10 UTC 2021 x86_64 moelove (none)"Ts = 2021-11-27 T19:03:15. 861 zcaller=main.go:522 level=info fd_limits="(soft=1024, hard=524288)"Ts = 2021-11-27 T19:03:15. 861 zcaller=main.go:523 level=info vm_limits="(soft=unlimited, hard=unlimited)"Ts = 2021-11-27 T19:03:15. 862 zcaller=web.go:546 level=info component=web msg="Start listening for connections"Address = 0.0.0.0:9090 ts = 2021-11-27 T19:03:15. 862 zcaller=main.go:980 level=info msg="Starting WAL storage ..."Ts = 2021-11-27 T19:03:15. 863 zcaller=tls_config.go:195 level=info component=web msg="TLS is disabled." http2=falseTs = 2021-11-27 T19:03:15. 864 zcaller=db.go:306 level=info msg="replaying WAL, this may take a while"Dir = data - agent/wal ts = 2021-11-27 T19:03:15. 864 zcaller=db.go:357 level=info msg="WAL segment loaded"Segment = 0 maxSegment = 0 ts = 2021-11-27 T19:03:15. 864 zcaller= main. Go: 1001 level = info fs_type = 9123683 e ts = 2021-11-27 T19:03:15. 864 zcaller=main.go:1004 level=info msg="Agent WAL storage started"Ts = 2021-11-27 T19:03:15. 864 zcaller=main.go:1005 level=debug msg="Agent WAL storage options" WALSegmentSize=0B WALCompression=trueStripeSize=0 TruncateFrequency=0s MinWALTime=0s MaxWALTime=0s ts= 2021-11-27t19:03:15.864zcaller=main.go:1129 level=info msg="Loading configuration file"Filename = Prometheus. Yml ts = 2021-11-27 T19:03:15. 865 zcaller112 component = = dedupe. Go: remote level = info remote_name MSG = = e6fa2a url = http://127.0.0.1:10908/api/v1/receive"Starting WAL watcher"The queue = e6fa2a ts = 2021-11-27 T19:03:15. 865 zcaller112 component = = dedupe. Go: remote level = info remote_name MSG = = e6fa2a url = http://127.0.0.1:10908/api/v1/receive"Starting scraped metadata watcher"Ts = 2021-11-27 T19:03:15. 865 zcaller112 component = = dedupe. Go: remote level = info remote_name MSG = = e6fa2a url = http://127.0.0.1:10908/api/v1/receive"Replaying WAL"The queue = e6fa2a ts = 2021-11-27 T19:03:15. 865 zcaller112 component = = dedupe. Go: remote level = debug remote_name MSG = = e6fa2a url = http://127.0.0.1:10908/api/v1/receive"Tailing WAL"CurrentSegment =0 ts= 2021-11-27T19:03:15.865zcaller112 component = = dedupe. Go: remote level = debug remote_name MSG = = e6fa2a url = http://127.0.0.1:10908/api/v1/receive"Processing segment"CurrentSegment = 0 ts = 2021-11-27 T19:03:15. 877 zcaller=manager.go:196 level=debug component="discovery manager scrape" msg="Starting provider"The provider = static / 0 subs = [Prometheus] ts = 2021-11-27 T19:03:15. 877 zcaller=main.go:1166 level=info msg="Completed loading of configuration file"Filename =prometheus.yml totalDuration=12.433099ms db_storage=361ns remote_storage=323.413µs web_handler=247ns Query_engine =157ns scrape=11.609215ms scrape_SD =248.024µs notify=3.216µs notifY_SD =6.338µs rules=914ns Ts = 2021-11-27 T19:03:15. 877 zcaller=main.go:897 level=info msg="Server is ready to receive web requests."Ts = 2021-11-27 T19:03:15. 877 zcaller=manager.go:214 level=debug component="discovery manager scrape" msg="Discoverer channel closed"The provider = static/ts = 0 2021-11-27 T19:03:28. 196 zcaller112 component = = dedupe. Go: remote level = info remote_name MSG = = e6fa2a url = http://127.0.0.1:10908/api/v1/receive"Done replaying WAL"Duration = 12.331255772 s ts = 2021-11-27 T19:03:30. 867 zcaller112 component = = dedupe. Go: remote level = debug remote_name MSG = = e6fa2a url = http://127.0.0.1:10908/api/v1/receive"runShard timer ticked, sending buffered data"Samples = 230 exemplars = 0 shard = 0 ts = 2021-11-27 T19:03:35. 865 zcaller112 component = = dedupe. Go: remote level = debug remote_name = e6fa2a url = http://127.0.0.1:10908/api/v1/receive msg=QueueManager.calculateDesiredShards dataInRate=23 dataOutRate=23 dataKeptRatio=1 dataPendingRate=0 dataPending=0 DataOutDuration = 0.0003201718 timePerSample = 1.3920513043478261 e-05 desiredShards = 0.0003201718 highestSent e+09 = 1.638039808 HighestRecv = 1.638039808 e+09 ts = 2021-11-27 T19:03:35. 865 zcaller112 component = = dedupe. Go: remote level = debug remote_name = e6fa2a url = http://127.0.0.1:10908/api/v1/receive MSG = QueueManager. UpdateShardsLoop lowerBound = 0.7 0.0003201718 upperBound desiredShards = = 1.3 ts = 2021-11-27 T19:03:45. 866 zcaller112 component = = dedupe. Go: remote level = debug remote_name = e6fa2a url = http://127.0.0.1:10908/api/v1/receive MSG = QueueManager. CalculateDesiredShards dataInRate = 23.7 18.4 dataKeptRatio dataOutRate = = 1 DataPendingRate = 5.300000000000001 355.5 dataOutDuration dataPending = = 0.00025613744 timePerSample e-05 = 1.3920513043478263 DesiredShards = 0.00037940358300000006 highestSent = 1.638039808 e+09 highestRecv = 1.638039823 e+09 ts = 2021-11-27 T19:03:45. 866 zcaller112 component = = dedupe. Go: remote level = debug remote_name = e6fa2a url = http://127.0.0.1:10908/api/v1/receive MSG = QueueManager. UpdateShardsLoop lowerBound = 0.7 desiredShards upperBound = 0.00037940358300000006 = 1.3 Ts = 2021-11-27 T19:03:45. 871 zcaller112 component = = dedupe. Go: remote level = debug remote_name MSG = = e6fa2a url = http://127.0.0.1:10908/api/v1/receive"runShard timer ticked, sending buffered data" samples=265 exemplars=0 shard=0
Copy the code
You can see from the log, it will go http://127.0.0.1:10908/api/v1/receive, that is, we first deployed Thanos receive sending data.
Query data
Open the Thanos Query we initially deployed, and enter any metrics to query for the expected results.
However, an error is reported when accessing the UI address of Prometheus with Agent mode enabled. This is because Prometheus disables UI queries, alarms, and local storage if Agent mode is enabled.
conclusion
This paper mainly carried out the hands-on practice of Prometheus Agent, receiving metrics reports from Prometheus Agent through Thanos Receive, and then querying results through Thanos Query.
Prometheus Agent did not change the method of indicator collection for Prometheus in nature, and continued to use Pull mode.
It is used primarily for Prometheus’ HA/ data persistence or clustering. There will be some architectural overlap with some existing solutions, but there are some advantages:
- Agent mode is a built-in feature of Prometheus;
- Prometheus instances in Agent mode consume less resources, have less functionality, and are more convenient for extending some marginal scenarios;
- With Agent mode enabled, Prometheus instances can be treated almost as stateless applications, making them easy to extend;
The official version will be released in a few days. Will you try it?
Please feel free to subscribe to my official account [MoeLove]