In practice, we sometimes want to import and search the RSS data directly. In the reality of many micro-services, a lot of data is provided in the form of RSS feeds, such as our common review sites. Is there a way to import this data into Elasticsearch and search it? The answer is to use the RSS input plugin provided by Logstash. In today’s article, I’ll use an example to illustrate.
Let’s start by finding an RSS feed. On Elastic’s official website, you can see a page like this: Elastic.co /blog. Let’s open this page:
Click on the RSS above:
Above, we can see the result of the RSS response. It contains title, description, and so on. These can be extracted using the RSS input plugin.
The installation
Since the RSS Input Plugin is not one of the Logstash Input Plugins, we have to install it manually. If you want to know the input plugin that comes with Logstash, you can use the following command:
./bin/logstash-plugin list --group input
Copy the code
The command above shows:
Logstash - input - azure_event_hubs logstash - input - beats └ ─ ─ logstash - input - elastic_agent (alias) logstash-input-couchdb_changes logstash-input-elasticsearch logstash-input-exec logstash-input-file logstash-input-ganglia logstash-input-gelf logstash-input-generator logstash-input-graphite logstash-input-heartbeat logstash-input-http logstash-input-http_poller logstash-input-imap logstash-input-jms logstash-input-pipe logstash-input-redis logstash-input-rss logstash-input-s3 logstash-input-snmp logstash-input-snmptrap logstash-input-sqs logstash-input-stdin logstash-input-syslog logstash-input-tcp logstash-input-twitter logstash-input-udp logstash-input-unixCopy the code
The RSS input plugin is obviously not there. We can use the following command to install:
./bin/logstash-plugin install logstash-input-rss
Copy the code
$ ./bin/logstash-plugin install logstash-input-rss Using JAVA_HOME defined java: / Library/Java/JavaVirtualMachines/JDK - 15.0.2. JDK/Contents/Home WARNING, using JAVA_HOME while Logstash distribution comes with a bundled JDK Validating logstash-input-rss Installing logstash-input-rss Installation successfulCopy the code
Once installed, we’ll look at it again using the list command above, and we’ll see that logstuck-input-rss is included.
Import RSS feed data
Next, we use the RSS Input plugin installed above to import the data in www.elastic.co/blog/feed. Let’s start by creating the following configuration file in the Logstash installation root:
logstash.conf
input {
rss {
url => "https://www.elastic.co/blog/feed"
interval => 120
tags => ["rss", "elastic"]
}
}
filter {
mutate {
rename => [ "message", "blog_html" ]
copy => { "blog_html" => "blog_text" }
copy => { "published" => "@timestamp" }
}
mutate {
gsub => [
"blog_text", "<.*?>", "",
"blog_text", "[\n\t]", " "
]
remove_field => [ "published", "author" ]
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => [ "localhost:9200" ]
index => "elastic_blog"
}
}
Copy the code
In the input section above, we defined RSS. Its URL is www.elastic.co/blog/feed. Interval is 120, which is 120 seconds. Run it every two minutes. At the same time, we added the tags we wanted to facilitate our data search. In the Filter section, we rename message to blog_html and duplicate the fields. In the mutate section, we replace all <> parentheses with “”, that is, remove. It also replaces characters like \n\t with empty “”. Finally, we delete the Published and Author fields, although deleting the Author field is unnecessary.
We run the following command in the Logstash installation directory:
./bin/logstash -f logstash.conf
Copy the code
We can see in the console:
We can see that in the blog_text, the <> tags have been removed.
Select * from Elasticsearch where elastic_blog is located:
GET elastic_blog/_count
Copy the code
{
"count" : 20,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
}
}
Copy the code
We can even search for it:
Two minutes later, we look at the imported data again:
This time, we see that count is 40, which is a double of 20 the first time. This is because we have to re-read the data every 2 minutes. If we wait another two minutes, the number will change to 60:
All right, in today’s practice. We showed you how to import RSS feed data using the Logstash RSS Input Plugin. I hope it will help you in your work.