Qun: 850973621, There are free video tutorials, development tools,
Electronic books, project source code sharing. Exchange and learn together, make progress together!
preface
The text and pictures in this article come from the network, only for learning, communication, do not have any commercial purposes, if you have any questions, please contact us to deal with.
Basic development environment
-
Python 3.6
-
Pycharm
Use of related modules
import requestsimport parselimport csvimport concurrent.futuresimport urllib
Copy the code
Target Page Analysis
Click on the second-hand house, you can enter.
I just want to say, it’s damn expensive
List of pp.
Details page
Webpage is static data, request URL can obtain webpage source code, and then parse data, it is not difficult
Get data content:
-
The total price,
-
The unit price,
-
Floor area,
-
On the floor,
-
House orientation,
-
House type,
-
Decoration condition,
-
Equipped with elevators,
-
Building structure,
-
The name of the community,
-
Transaction ownership,
-
Use the rent
Running effect drawing
Points to note:
Actually, the link is
'https://bj.lianjia.com/ershoufang/rs%E7%8F%A0%E6%B1%9F%E7%BD%97%E9%A9%AC%E5%98%89%E5%9B%AD%E8%A5%BF%E5%8C%BA/'
Copy the code
- You need to enter the search content you need to transcode
Take this for example:
How to package exe software:
1. Install the PyInstaller module PIP install PyInstaller
(Windows key + R enter CMD)
I’ve already installed it here
Pyinstall -f xxxx.py pyinstall -f xxxx.py
-
Also in CMD, select the path where the py file resides and type the command to package it.
-
In the folder where the file is located, hold down Shift + right mouse button, select Open Powershell window here, enter the relevant command.
I choose the second option for convenience.
There are many other contents about packing EXE, which are not discussed here.
3, exe running effect
Enter the city and community you want to crawl and how many pages of data you want to crawl.
When the crawl is complete, you can choose whether to continue the crawl.
4, can improve the place.
-
The interface is too ugly, you can choose to write a GUI interface. You can use TK or QT
-
Can be divided into regions to climb, after all, the community is too few, can be divided according to the region of each city
-
Temporarily unexpected, welcome everyone comments discussion
The code given to you will only be copy-pasted and run without any practical meaning
- Get web source code and parse
def get_response(html_url): response = requests.get(url=html_url, headers=headers).text return responsedef get_parsing(html_data): selector = parsel.Selector(html_data) return selector
Copy the code
- Get each listing information URL
def get_page_url(page_url): html_data = get_response(page_url) selector = get_parsing(html_data) page_url = selector.css('.sellListContent li .title a::attr(href)').getall() return page_urlCopy the code
- Parse web pages for relevant data
def main(url): lis = get_page_url(url) for li in lis: Html_data = get_response(li) selector = get_parsing(html_data) title = select.css ('.main::attr(title)').get() # header All_price = selector. CSS ('div.price. Total ::text').get() + 'thousand' # total price one_price = selector .unitpricevalue ::text').get() + '/ m2 '# unit price area = selector. CSS (' div.area.maininfo ::text').get() # house area floor = CSS ('# introduction.base.content ul Li :nth-child(2)::text').get() # house floor face = selector. CSS ('#introduction .base.content ul li:nth-child(7)::text').get() # house orientation unit_type = selector. CSS ('# introduction.base.content ul li:nth-child(7)::text') Nth-child (1)::text'). Get () # decoration = selector. CSS ('# introduction.base Li :nth-child(9)::text').get() # home decoration elevator = select.css ('# introduction.base.content ul ') Li :nth-child(11)::text'). Get () # elevator building = selector. CSS ('# introduction.base.content ul ') Li :nth-child(8)::text').get() # Constructor constructor = select.css ('# introduce.transaction. Content ul Li :nth-child(2) Span :nth-child(2)::text').get() # use = select.css ('# introduce.transaction. Content ul li:nth-child(4) Span :nth-child(2)::text').get() # selector. CSS ('.aroundinfo.communityname.info ::text').get() # selector Dit = {' total ': all_price,' unit price: one_price, 'area: area,' floor, floor, 'towards' : face,' family ': unit_type,' decoration, decoration, 'lift' : Elevator, 'building structure ': building,' transaction Ownership ': Ownership, 'Use ': Community,' Details page address ': li, } csv_writer.writerow(dit) print(dit)Copy the code
if __name__ == '__main__': while True: City_word = input(' please enter the name of the city you want to search for ') key_word = input(' please enter the name of the city you want to search for ') key_page = int(' please enter the number of pages you want to crawl data: ')) f = open('{}{}.csv'.format(city_word, key_word), mode='a', encoding='utf-8-sig', Newline = ' ') csv_writer = CSV. DictWriter (f, fieldnames = [' total ', 'price', 'area', 'floor', 'to', 'family', 'decoration', 'lift', 'structures' and' trading ownership, 'House Use ',' Name of plot ', 'address details page]) csv_writer. Writeheader () executor = concurrent. Futures. ThreadPoolExecutor (max_workers = 10) for page in range (1, key_page + 1): url = 'https://{}.lianjia.com/ershoufang/pg{}rs{}/'.format(city_word, page, key_word) executor.submit(main, Url) executor.shutdown() a = input(' Continue climbing (Yes or No) : ') if a == 'Yes': continue else: breakCopy the code
The code will report an error, specially left ~
# give you code that will just copy and paste run without any practical meaning