1. Import the extension libraries required by the project
1# -*- coding: UTF-8 -*-
2
3Import urllib for parameter data encoding
4import urllib
5Import urllib2 to perform the core crawler
6import urllib2
7
8Import UserAgent to generate UA
9from fake_useragent import UserAgent
Copy the code
2. Execute the page request function
 1Execute the web request
 2def req_url(self,full_url):
 3    Construct the request header
 4    headers = {
 5        Generate a random user-agent
 6        'User-Agent': self.user_agent.random,
 7        Format the request
 8        "Accept": "application/json, text/plain, */*".9        Allow request language
10        "Accept-Language": "zh-CN,zh; Q = 0.8"
11    }
12    Request parameters
13    params = {
14        'start':self.begin,
15        'tags':self.name
16    }
17    # Encode Chinese parameters
18    params = urllib.urlencode(params)
19    Construct Request object
20    request = urllib2.Request(headers=headers, url=full_url,data=params)
21    # execute request
22    response = urllib2.urlopen(request)
23    return response.read()
Copy the code
3. Save the file
1Save the HTML source code that you crawl down
2def save_doc(self,html_doc, file_name):
3    print "Start saving file :", file_name
4    with open(file_name, 'w') as f:
5        f.write(html_doc)
6    print "Complete file :", file_name, "Save"
Copy the code
4. Assemble the execution crawler
1Construct the crawler environment and execute it
2def run_spider(self):
3    Define the file name according to the number of pages
4    file_name = str(self.name) + '.html'
5    Execute crawler web request
6    html_doc = self.req_url(self.url)
7    # save file
8    self.save_doc(html_doc, file_name)
Copy the code
5. User-defined input
1User - defined input parameters
2url = 'https://movie.douban.com/j/new_search_subjects?sort=T&range=0, 10'
3type_name = raw_input('Please enter douban movie type:')
4begin = int(raw_input('Please enter the number of crawls:'))
Copy the code

More exciting things to come to wechat public account “Python Concentration Camp”, focusing on Python technology stack, information acquisition, communication community, dry goods sharing, looking forward to your joining ~