Big Data Digest is published under license
Project developer: Ke Zhenxu
It is also the peak of n-degree housing search in a year, and various rental information is dazzling. How to find a reliable house quickly and efficiently?
A tech geek recently created a scrapy-based crawler project that compiles listings from hundreds of cities, including Douban, Homelink and 58.com, to search for listings of interest and break through some of the site’s boring search functions.
Using this “secret weapon,” the tech geek has used the reptile to find a suitable home.
Not only that, but I selflessly cleaned up the project code and put it up on Github.
Making links:
https://github.com/kezhenxu94/house-renting
Click “Read the original article” to view the project introduction, in the background of big Data Digest reply “Rent” can download the source code ~
Next, follow the digest bacteria to see this wave of cool operation.
The deployment environment
Python version: Python 2 | | Python 3
Crawler frame: Scrapy
Linux operating system: Mac | | | | Windows
Service engine: Docker
Access to the source code
$ git clone https://github.com/kezhenxu94/house-renting$ cd house-renting
Copy the code
In big Data Digest backstage reply “rent” can download the source code ~
Start the service
Using Docker (recommended)
$ docker-compose up --build -d
Copy the code
Environment and Version: Mac Docker CE Version 18.03.1-CE-MAC65 (24312).
In order to facilitate users to use this project, the author provided the Docker-comemage.yml file for deploying the services needed for this project. However, due to the limitations of Docker itself, Docker Toolbox must be used in Windows non-professional version, which brought many problems. For detailed reference:
http://support.divio.com/local-development/docker/how-to-use-a-directory-outside-cusers-with-docker-toolbox-on-windowsdo cker-for-windows
If you encounter such a problem, you can submit an Issue here. If you encounter such a problem and solve it yourself, welcome to submit a Pull Request here to help optimize the project!
Issue:
https://github.com/kezhenxu94/house-renting/issues
Pull the Request:
https://github.com/kezhenxu94/house-renting/pulls
Manual deployment (not recommended)
Install Elasticsearch 5.6.9 and Kibana 5.6.9 and start
Download and install Elasticsearch and Kibana from:
https://www.elastic.co/downloads/past-releases
Install Redis and start it
Download and install Redis from:
https://redis.io/download
Configure the relevant hosts and ports in the crawler/house_renting/settings.py file:
# ES nodes that can be configured with multiple nodes (clusters), default to None, will not be stored in ESELASTIC_HOSTS = [{'host': 'elastic', 'port': 9200},]REDIS_HOST = 'redis' # default to None, REDIS_PORT = 6379 # default to 6379
Copy the code
Installing Python dependencies
$ cd crawler$ pip install -r requirements.txt
Copy the code
Select the city to loot (currently support Lianjia, 58.com) :
Select the city you want to pick from lianjia:
Open the crawler/house_renting/ spider_Settings /lianjia.py file and follow the comments to complete the city selection;
#... (u' guangzhou ')cities = (u' Guangzhou ', u' Beijing ')cities = (u' Guangzhou ', u' Beijing ')cities = (u' Guangzhou ', u' Beijing ')#...
Copy the code
Select the city you want to pick from 58.com:
Open crawler/house_renting/ spider_Settings /a58.py and follow the comments to complete the city selection:
#... (u' guangzhou ')cities = (u' Guangzhou ', u' Beijing ')cities = (u' Guangzhou ', u' Beijing ')cities = (u' Guangzhou ', u' Beijing ')#...
Copy the code
Start the crawler
Start the site crawler that needs to be scraped in a different command line window
$scrapy crawl douban $scrapy crawl LIANjia $scrapy crawl 58 #
Copy the code
Here, congratulations! House information has been successfully climbed to take a look at the climb results!
View the results
Look at the picture to choose room
After the crawler operation picks up data, house_renting/data directory will be created, in which the images folder downloads images in rental information. Users can use the image browser to view the images in this folder, and search for appropriate house pictures in Kibana using the file name of the image. Find the appropriate rental information details.
Search keywords
Open your browser and navigate to http://127.0.0.1:5601 (please change the URL corresponding to Kibana according to the Docker IP address).
Setting index Mode
Enter house_RENTING in the Index pattern input box in the figure below, and then press TAB, the Create button will become available. At this time, click the Create button. If the Create button is not available at this time, the crawler has not retrieved the data into Elasticsearch, so it needs to wait for a long time. If so, you need to check whether the crawler service is successfully started.
Switch to the Discover page
Add fields
Chronological order
Search for a keyword
Search for multiple keywords
Expand details
Warm prompt
If the environment is configured correctly and the results are not correct, the reason may be that the site has been updated, readers can go to the project introduction page to update the code and try again. The author will update the project according to spare time and energy, interested friends can continue to pay attention to oh.