We’re going to be using complex equalization a lot with Elastic Stack. If we don’t take this into account, then when one of our links goes wrong, it can lead to a Single point of failure, where the entire data collection doesn’t work. And load balancing to better leverage existing resources when multiple instances are deployed. In today’s article, we will cover how to use load balancing in data collection or access.
Typical Elastic Stack architecture diagram
Let’s start by looking at a typical Elastic Stack diagram:
Above, we can see that Beats connects directly to the Logstash, which processes the data for us and eventually imports it into Elasticsearch.
In the absence of load balancing, it looks like this:
We can usually configure beats like this:
output:
logstash.hosts: ["mylogstash"]
Copy the code
Once this TCP connection is established, it is a very reliable connection when nothing is happening. But one of the things that happens is when our Logstash fails, then we might be in trouble. This is the single point of failure we mentioned above. If there are too many Beats connected to a Logstash, all data collection will be affected.
So how can we avoid this situation?
The solution is to add one more Logstash server and change the configuration to:
On top of that, we added an extra Logstash server. In our actual use, if one of the Logstash servers dies, we can complete our data collection work through another Logstash server. So how do we configure in our Beats?
Methods a:
Here’s what we did in the Beats profile:
output:
logstash.hosts: ["Logstash1", "Logstash2"]
Copy the code
In this configuration, Beats randomly picks out a Logstash to send data each time it sends data. If one of them fails, Beats will pick the other one to send. In this way, load balancing is not used.
Method 2:
In this case, we will use a load balancing configuration. I can see the article www.elastic.co/guide/en/be…
output:
logstash.hosts: ["logstash1", "logstash2"]
logstash.loadbalance: true
Copy the code
Currently for Filebeat, the load balancing option is available for Redis, Logstash and Elasticsearch outputs. The Kafka output handles load balancing internally.
Beats, in this case, sends data evenly to one of them, the Logstash, depending on the load. If one of the connections breaks, Beats removes it from its pool and doesn’t use it until it can reconnect. Step back exponentially and retry to reconnect.
There is a big problem with the above method. When we added a new Logstash, we had to constantly modify our configuration file so that Beats knew it was there. Or, if we remove one of the Logstash files, we also need to modify our Beats configuration file. If we were just maintaining one or two Beats, this might not bea problem, since it’s not a lot of work.
However, the problem is that if we have a lot of beats, the workload will be very large. How do we do that?
Use load balancing to import data
As the number of Beats grows, one possible solution is to use proprietary load balancing:
As shown above, we can have each beat send data to a professional load balancer, which then sends it to the Logstash. After this transformation, our beat output is very simple:
output:
logstash.hosts: ["loadbalancer"]
Copy the code
Here, every time we add a new beat, or we add a new Logstash, we don’t need to maintain the beats section again. All configuration is done in the place of load balancing.
Hands-on practice
In our practice, I use the following configuration:
Load balancing diagram:
Above, the data beats collects is sent to Nginx, then to Logstash, then to Elasticsearch, and finally to Kibana.
The installation
Elasticsearch
If you haven’t already installed your own Elasticsearch, please refer to my previous article “How to Install Elasticsearch on Linux, MacOS, and Windows” to install your own Elasticsearch. To enable Elasticsearch to be accessed from the Logstash file in Ubunutu OS, we made the following changes to the config/ elasticSearch. yml file:
Network. The host: 0.0.0.0 discovery. Type: single - nodeCopy the code
So we made Elasticsearch bind to every network interface on Mac OS. We can see the output at http://localhost:9200/ and http://192.168.0.3:9200/ respectively:
Kibana
If you haven’t already installed your own Kibana, see my previous article “How to Install Kibana in An Elastic Stack on Linux, MacOS, and Windows” to install your own. We don’t have to make any changes. After the installation is complete, enter http://localhost:5601/ in the address box of the browser
Nginx
Nginx is available in Ubuntu’s default repository, so installation is very simple.
Since this is our first interaction with the APT packaging system in this session, we will update the local package index so that we can access the latest package list. After that, we can install nginx:
sudo apt-get update
sudo apt-get install nginx
Copy the code
Once nginx has been successfully installed, we can check whether the nginx service has been successfully started by using the following command:
sudo service nginx status
Copy the code
$sudo service nginx status ● nginx.service - nginx - High performance Web server Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enable> Active: active (running) since Wed 2020-06-17 16:44:00 CST; 5h 5min ago Docs: http://nginx.org/en/docs/ Process: 1761 ExecStart=/usr/sbin/nginx -c /etc/nginx/nginx.conf (code=exited, sta> Main PID: 1781 (nginx) Tasks: 2 (limit: 18985) the Memory: 3.7 M CGroup: / system. Slice/nginx service ├ ─ 1781 nginx: Master process /usr/sbin/nginx -c /etc/nginx/nginx.conf ├ ─ 082 Nginx: Worker processCopy the code
It shows that nginx has been successfully installed and is running.
To configure nginx as a load balancer, configure /etc/nginx/nginx.conf as follows:
/etc/nginx/nginx.conf
user nginx; worker_processes 1; error_log /var/log/nginx/error.log warn; pid /var/run/nginx.pid; events { worker_connections 1024; } stream {upstream stream_backend {server 192.168.0.4:5044; } server { listen 12345; proxy_pass stream_backend; }}Copy the code
Here 192.168.0.4 is the address of Ubuntu OS. Listen on port 12345 and forward it to 192.168.0.4:5044. After configuring our nginx.conf, let’s restart the nginx service:
sudo service nginx restart
Copy the code
Install Logstash2
Let’s install my previous article “How to Install A Logstash in an Elastic stack” to install a Logstash. Following the above configuration, we installed Logstash2 on the Ubuntu OS computer. For our situation, we can directly download the local, and add a compressed file to install:
Tar XZF logstash-7.7.1.tar.gz CD logstash-7.7.1/Copy the code
Next, we create the following logstash. Conf configuration file:
logstash.conf
input {
beats {
port => 5044
}
}
output {
stdout {
codec => dots
}
}
Copy the code
On top, Logstash listens on port 5044. If we have data, we just display dot, which is dot.
We use the following approach to start the Logstash:
./bin/logstash -f logstash.conf
Copy the code
In the following exercise, we will also install Logstash1.
Metricbeat
We can open Kibana:
Click Add Mertic Data:
Select System metrics
Then install it according to your platform. We need to modify Merticbeat.yml.
We launch MetricBeat:
./metricbeat -e
Copy the code
As shown above, the connection between MetricBeat and Nginx was successful.
Let’s go back to the Logstash console:
We see a lot of dots coming up. This shows that the data passed from MetricBeat to Nginx to Logstash is successful.
Install Logstash1
The installation of Logstash1 on Mac OS is the same as the installation of Logstash2 on Ubuntu OS. We also create the following configuration file:
logstash.conf
input {
beats {
port => 5044
}
}
output {
stdout {
codec => dots
}
}
Copy the code
Let’s run this Logstash:
./bin/logstash -f logstash.conf
Copy the code
We’ve got Logstash1 up and running, but we haven’t told Nginx to forward to this Logstash yet. Let’s reopen the nginx.conf file and add the Mac OS IP address information:
/etc/nginx/nginx.conf
user nginx; worker_processes 1; error_log /var/log/nginx/error.log warn; pid /var/run/nginx.pid; events { worker_connections 1024; } stream {upstream stream_backend {server 192.168.0.4:5044; Server 192.168.0.3:5044; } server { listen 12345; proxy_pass stream_backend; }}Copy the code
Note the line added above:
Server 192.168.0.3:5044;Copy the code
That is, the information for port 12345 is load-balanced to 192.168.0.3 and 192.168.0.4 Logstash.
After the above repair, we restart nginx:
sudo service nginx restart
Copy the code
At this point, we go back to the Mac OS Logstash1 console and see that, in fact, there is no output. Is there something wrong with our configuration? The answer is simple. The connection between Nginx and MetricBeat is a TCP/IP connection. Once connected, it will not be disconnected. Nginx also does not re-load balance. We need to do some configuration for MetricBeat. Let’s repeat the exercise as follows:
/etc/nginx/nginx.conf /nginx/nginx.conf /nginx/nginx.conf Then restart nginx
2) Stop metricBeat, edit metricBeat. Yml and add TTL to the output.logstash configuration:
Logstash: # The logstash hosts hosts: ["192.168.0.4:12345"] TTL: "30s" pipelining: 0Copy the code
Restart MetricBeat:
./metricbeat -e
Copy the code
3) It should now have the same effect as before, only Logstash2 can see the output. And in Logstash1 there’s no output.
4) Modify the /etc/nginx/nginx.conf file and add server 192.168.0.3:5044.
Stream {upstream stream_backend {server 192.168.0.4:5044; Server 192.168.0.3:5044; } server { listen 12345; proxy_pass stream_backend; }}Copy the code
After the modification, restart the nginx service:
sudo service nginx restart
Copy the code
5) Let’s revisit Logstash1’s console:
At this point, we can see that there are some points that are starting to appear. This shows that our NGINx load balancing is working. And if we do what we did with MetricBeat, every time we add a new Logstash, we don’t need to do any extra configuration for the beat. Load balancing takes effect automatically.
In our exercise, I did not add Elasticsearch to the Logstash output. I’ll leave you with that.