Is there an easy way to index email to Elasticsearch? Logstash is the answer. Logstash is an open source server-side data processing pipeline that pulls data from multiple sources simultaneously, transforms it, and sends it to your favorite stash. In this case, “stash” refers to products like Elasticsearch, PagerDuty, email, Nagios, Jira, etc.

The Logstash event processing pipeline consists of three stages: input → filter → output. The input generates events, the filter modifies them, and the output sends them anywhere. Input and output support for codecs allows you to encode or decode data as it enters and exits the pipe without having to use a separate filter.

If you haven’t dabbed with Logstash yet, check out my previous article “How to Install A Logstash in an Elastic stack.”

 

The tutorial

To index email to Elasticsearch, we need to use a logstash input plug-in called “logstair-input-imap”. The plug-in periodically reads E-mail from the IMAP server. Lgstash comes with many plug-ins by default, and “iMAP” (Logstash -input-imap) is no exception.

Let’s start with the basic pipe Logstash, as follows:

./bin/logstash -e 'input { stdin { } } output { stdout {} }'
Copy the code

Open a browser and visit http://localhost:9600/_node/plugins? Pretty to verify the list of installed plug-ins. You should see in the response a list of plug-ins that are active in the current Logstash instance. You can scroll down to confirm that the logstuck-input-imap plugin is available/installed as follows:

You can also run the following command to check:

./bin/logstash-plugin list --group input
Copy the code

The result displayed is:

logstash-input-azure_event_hubs
logstash-input-beats
logstash-input-couchdb_changes
logstash-input-elasticsearch
logstash-input-exec
logstash-input-file
logstash-input-ganglia
logstash-input-gelf
logstash-input-generator
logstash-input-graphite
logstash-input-heartbeat
logstash-input-http
logstash-input-http_poller
logstash-input-imap
logstash-input-jdbc
logstash-input-jms
logstash-input-pipe
logstash-input-redis
logstash-input-s3
logstash-input-snmp
logstash-input-snmptrap
logstash-input-sqs
logstash-input-stdin
logstash-input-syslog
logstash-input-tcp
logstash-input-twitter
logstash-input-udp
logstash-input-unix
Copy the code

Above we can see the logstash-input-imap inside, indicating that it is part of our Logstash application installation, we do not need a special installation.

Next, we need to configure the Logstash pipeline using “logstash-input-imap” as the input. The only configurations required by this plug-in are “host”, “password”, and “user”.

Depending on the Settings required by the IMAP server to connect to, you may need to set values for other configurations (such as “port”, “Secure”, etc.). “Host” is where you specify IMAP server details, and “user” and “password” are where you need to specify user credentials to authenticate/connect to the IMAP server.

In today’s exercise, we use Hotmail for our experiment. I have tried to use China’s 163 and QQ mailbox to experiment. For these mailboxes, on the one hand, we need to find POP3/SMTP/IMAP in their mailbox Settings, so that the third party software can get the mail. More specifically, third-party applications that retrieve mail via IMAP need to input their verification codes, so they don’t work well with Logstuck-input-IMAP.

For my case, I use the following logstash configuration file:

logstash_email.conf

input {
	imap {
	 	host => "imap-mail.outlook.com"
	 	user => "YourEmailAddress"
	 	password => "YourPassword"
	 	content_type => "text/plain"
	 	secure => true
	 	port => 993
	 	check_interval => 10
	 	folder => "Inbox"
	}
}

output {
	stdout { codec => rubydebug }

 	elasticsearch {
 		index => "emails"
 		document_type => "_doc"
 		hosts => "localhost:9200"
	}
}
Copy the code

Above, we configure our Hotmail mail Settings. In your case, you’ll need to enter your email address and password in the User and Password fields. Our mail will be sent as a document to Elasticsearch and stored in the emails index.

By default, the logstash-input-imap plug-in reads from the folder INBOX and polls the IMAP server every 300 seconds. In the configuration above, I overwrite these N Settings as well as ports. We can start a logstash in the following way:

./bin/logstash -f ~/data/email/logstash_email.conf --config.reload.automatic
Copy the code

Change the address of the above.conf file to suit your needs.

Note: In development mode, it’s best to enable automatic reloading configuration (–config.reload. Automatic), so you don’t have to restart the Logstash every time you change the pipe/configuration.

For some reason, there is currently a bug in logstash-input-imap: Error: Can not decode an entire message. According to the description of this bug, if there is an attachment in the mail, it may cause a reading error, resulting in a Logstash error.

Now, let’s start the Logstash agent so that it starts listening for incoming E-mail from the IMAP server:

We can send messages to our mail from another mailbox, such as:

We can see the following information in our logstash run:

So in our Kibana, we can see it in the index of emails:

As you can see, we have successfully imported mail to Hotmail into Elasticsearch.

 

Conditions of tag

Now that we have the basic Settings to index Elasticsearch emails, we can add new fields, filters, conditional tags, etc. Suppose we tag all E-mail messages with the subject “critical/error” keyword. Depending on the tag, we can perform actions in the output plug-in, such as sending an email to the support team, creating a Jira problem, sending PagerDuty events, and so on. The possibilities are endless.

Here is an example configuration to display a conditional tag:

logstash_email_tag.conf

input { imap { host => "imap-mail.outlook.com" user => "YourEmailAddress" password => "YourPassword" content_type => "text/plain" secure => true port => 993 check_interval => 10 folder => "Inbox" add_field => {"parser" => "logstash"} } }  filter { if "critical" in [subject] { mutate { add_tag => "critical" } } else if "error" in [subject] { mutate { add_tag => "error" } } } output { stdout { codec => rubydebug } elasticsearch { index => "emails" document_type => "_doc" hosts => "localhost:9200" } }Copy the code

We can run our logstash application with the following command:

./bin/logstash -f ~/data/email/logstash_email_tag.conf --config.reload.automatic
Copy the code

We write a message like this in the other mail client:

 

So in our Logstash console we can see:

We can check through Kibana:

When critical mail occurs, we can use logstash-output-email to send mail information to our mail. I’ll leave the implementation to you.

conclusion

As you can see, the Logstash IMAP plug-in makes it very easy to send email to Elasticsearch or any other output. Now that we have email in Elasticsearch, we can either write a simple search client or call Elasticsearch to search the endpoints directly from curl or REST client to start mining and analyzing them. Better yet, we can apply various metrics and summaries to indexed email to produce useful visualizations in Kibana. Adding email to the data analysis pipeline in the ELK stack will definitely benefit your business intelligence.

 

Reference:

【 1 】 qbox. IO/blog/indexi…