Allow content search to take many forms. Elastic App Search already allows users to collect content by uploading or pasting JSON and through API terminals. With Elastic Enterprise Search 7.11, users can now gather content through a powerful web crawler that retrits information from publicly accessible websites, making it easy to Search for content in your App Search engine. Like any collection method on App Search, this pattern is inferred at collection time and updated in near real time with a single click. With one click (without writing code), users can customize web crawler rules to specify entry points while exclusion rules instruct web crawlers to avoid certain pages, content, and terms.

Elastic scrawler

 

In today’s exercise, we’ll describe in detail how to deploy the Elastic Stack to crawl a specific web site. Please note: App Search Web Crawler is only available in versions 7.11 and later.

 

The installation

Install the Elasticsearch

Install Elasticsearch on Linux, MacOS and Windows by referring to my previous article “How to install Elasticsearch on Windows”

Install Kibana

Let’s install Kibana next. We can refer to my previous article “How to Install Kibana in an Elastic stack on Linux, MacOS, and Windows” for our installation.

Java installation

You need to install Java. Version in Java 8 or Java 11.

App search installation

We in the address www.elastic.co/downloads/a… Find the version we need and download it. And in accordance with the corresponding instructions to proceed according to. If you want to on your previous versions for installation, please refer to the address www.elastic.co/downloads/p… .

If we don’t do anything about Elasticsearch or app-Search, we’ll get the following error message:

 

Configure Elasticsearch

According to www.elastic.co/downloads/e. We need to configure secure access for Elasticsearch. You can configure security by referring to the previous article “Elasticsearch: Setting Elastic Account Security”. Xpack.security.authc. Api_key. enabled: true = xpack.security.authc. Api_key. enabled: true = xpack.security.authc. Api_key. enabled: true = xpack.security.authc.

config/elasticsearch.yml

xpack.security.enabled: true
xpack.security.authc.api_key.enabled: true
Copy the code

After modifying the above configuration, we start ElasticSearch ina terminal:

$ bin/elasticsearch
Copy the code

Then, enter the following command in another terminal to set the password:

$ ./bin/elasticsearch-setup-passwords interactive
Copy the code

For convenience, we’ll set all of our passwords to password.

 

Configure Elastic Enterprise Search

In the Enterprise Search installation directory, find the config/enterpirse-search.yml file and add the following configuration:

config/enterpirse-search.yml

ent_search.auth.source: standard
elasticsearch.username: elastic
elasticsearch.password: ELASTIC_USER_PASSWORD
allow_es_settings_modification: true
Copy the code

Above, you need to change the above password to your own Elasticsearch password. On top, I set the password to password. As required, we must set at least one encrypted key value. When we first ran enterprise-search above, one of the errors returned was:

Invalid config file (/ Users/liuxg/elastic1 / enterprise - search - 7.11.0 / config/enterprise - search. Yml) : The setting '#/secret_management/encryption_keys' is not valid No secret management encryption keys were provided. Your secrets cannot be stored unencrypted. You can use the following generated encryption key in your config file to store new encrypted secrets: secret_management.encryption_keys: [99c52330f78f2b669ebacb58ed65c6289e1d7b18f779175b0ea715f6bf14451c]Copy the code

We add the above key to the config/enterprise-search.yml file:

 

Start the enterprise – search

At startup, we can set a password for the enterprise-search user. We can start it like this:

$ ENT_SEARCH_DEFAULT_PASSWORD=passwordexample bin/enterprise-search
Copy the code

In my case, I set the password to password:

$ ENT_SEARCH_DEFAULT_PASSWORD=password bin/enterprise-search
Copy the code

After a period of operation, we can see:

Above, we can see that the password for the enterprise-search user is password. This is the password we defined on the command line. We can also see the following information:

It indicates that our enterprise-search is running successfully and can be accessed at the port address http://localhost:3002. If we can see something like this, then our Enterprise Search installation is successful:

 

Configuring web crawlers

On the page above, click the Continue to Login button:

We enter the username and password we set earlier. Configuration for me: enterprise_search/password. Click the Log In button:

The above shows that it will crawl the address to the address www.elastic.co/ page as well as all the links contained in the page:

At the same time, since www.elastic.co/jobs is not linked on the home page, we have added an entry point for it. For some consideration, we also customized some crawler rules. On top, we don’t want to crawl all the pages in www.elastic.co/jobs.

Once defined, we click on the Start a Crawl button in the upper right:

The top right corner shows climbing. We can stop this by clicking Cancel Crawl. We can wait until we have climbed all the entry points.

After climbing the whole page, we can search for the article:

For more information on how to set synonyms, adjust relevancy, and Curations, see my previous article “Enterprise: Elastic App Search Primer.”