Hi ~ recommend a book to everyone! The Python crawler has been reprinted four times in two months! It is “Python3 web crawler development actual combat” made by Choi Gyeong-jae, a blogger of Jingmi Blog!! At the same time at the end of the article there is a lucky draw gift book activities, not to be missed!!
Books introduce
The book “Python3 Web crawler development Combat” a comprehensive introduction to the use of Python3 web crawler development knowledge, the book first introduces in detail the various types of environment configuration process and crawler basic knowledge, It also discusses urllib, Requests, Beautiful Soup, XPath, PyQuery and other parsing libraries as well as the storage methods of text and various databases. In addition, the book introduces the analysis of Ajax data crawling through several real and fresh cases. Selenium and Splash have followed the process of dynamic web crawling, and then shared some practical crawler tips, For example, the method of using proxy to crawl and maintain dynamic proxy pool, the use of ADSL dial-up proxy, the cracking method of various verification codes (graphics, polar, touch, grid, etc.), the method of simulating login website crawl and the maintenance of Cookies pool, etc.
In addition, the content of this book is far more than that. The author also discusses how to use Charles, MitMDump, Appium and other tools to achieve App packet capture analysis, encryption parameter interface crawling, and wechat moments crawling combined with the characteristics of mobile Internet. In addition, the book also introduces the PySpider framework, the use of Scrapy framework and distributed crawler knowledge, in addition to optimization and deployment work, The book also includes Bloom Filter efficiency optimization, Docker and Scrapyd crawler deployment, and Gerapy, a distributed crawler management framework.
The book has 604 pages and weighs 2 jin. It is priced at 99 yuan.
The authors introduce
Read a book first to see who wrote, let’s know ~
Cui Qingcai is a blogger of Jingmi (https://cuiqingcai.com) and has read over one million Python crawler blog posts. He holds a master’s degree from Beijing University of Aeronautics and Astronautics, is a lecturer of Tianshan Intelligent and netease Cloud, and a big data engineer of Microsoft Xiaobing. He has experience in several large distributed crawler projects and is willing to share technology. Article accessible ^_^
Attach a piece of soap ~(@^_^@)~
Graphic introduction
I also need to put up my painstaking design
Expert review
A book is a good book or a bad book. It needs to be judged by an expert
Crawler engineer is very important in the classification of Internet software development engineer. Crawler work is often the basis for the development of a company’s core business. Only after data is captured can subsequent processing and final presentation be performed. At this point, the grasping scale, stability, real-time, accuracy of data is very important. In the early days of the Internet, access to data was easy. As companies pay more and more attention to data assets, the level of anti-crawler is constantly improving, and various new technologies bring new problems to crawler software. The author of this book has a profound study on various fields of crawler. In the book, advanced topics such as Ajax data capture, dynamic rendering page capture, verification code recognition, simulated login are discussed, and App capture is also discussed in combination with the characteristics of mobile Internet. More importantly, the book provides a wealth of source code to help readers better understand the relevant content. Highly recommended to all technology enthusiasts reading!
— Liang Bin, general manager of Bayou Technology
Data is not only the premise of today’s big data analysis, but also the basis of various ai application scenarios. Those who get data get the world, and those who can crawl all over the world are not afraid! A book in hand, so that the little white to the old driver can harvest!
— Li Zhoujun, professor and doctoral supervisor at Beihang University
This book introduces the key points of crawler technology in detail from crawler entry to distributed crawler, and puts forward corresponding solutions for different scenarios. In addition, the book through a large number of examples to help readers better learn crawler technology, easy to understand, full of dry goods. Highly recommended!
— Ruihua Song, chief scientist at Microsoft Xiaoice
Some people say that the bandwidth of China’s Internet is occupied by all kinds of crawlers, which shows the importance of web crawlers and the status quo of China’s Internet data monopoly. Climbing is a kind of ability, climbing is in order not to climb.
— Shi Shuicai, President, Beijing TRS Information Technology Co., Ltd
The book catalog
The book catalog also has ~ see here!
-
1- Development environment configuration
-
The installation of 1.1 Python3
-
1.2- Request library installation
-
1.3- Installation of parsing libraries
-
1.4- Database installation
-
1.5- Repository installation
-
1.6- Installation of the Web library
-
1.7-App crawl related library installation
-
1.8- Crawler frame installation
-
1.9- Installation of deploy-related libraries
-
2- Crawler basics
-
2.1-HTTP fundamentals
-
2.2- Web basics
-
2.3- Basic principles of crawlers
-
2.4- Sessions and Cookies
-
2.5- Agency fundamentals
-
3- Use of basic libraries
-
3.1 using urllib
-
3.1.1- Send a request
-
3.1.2- Handling exceptions
-
3.1.3- Parsing links
-
3.1.4- Analyze Robots protocol
-
3.2 – use requests
-
3.2.1- Basic Usage
-
3.2.2- Advanced usage
-
3.3- Regular expressions
-
3.4- Grab top cat eye movie rankings
-
4- Use of parsing libraries
-
– use the XPath 4.1
-
4.2- Use Beautiful Soup
-
4.3 – use pyquery
-
5- Data storage
-
5.1- File storage
-
5.1.1-TXT Text storage
-
5.1.2-JSON file Storage
-
5.1.3-CSV file storage
-
5.2- Relational database storage
-
5.2.1 – MySQL storage
-
5.3- Non-relational database storage
-
5.3.1 – directing a storage
-
5.3.2 – Redis store
-
6-Ajax data crawl
-
6.1- What is Ajax
-
6.2-Ajax analysis methods
-
6.3-Ajax result extraction
-
6.4- Analyze Ajax to climb today’s headlines street photos
-
7- Dynamically render page crawls
-
7.1 the use of Selenium
-
The use of 7.2 Splash
-
7.3-Splash Load Balancing Configuration
-
7.4- Use Selenium to crawl Taobao products
-
8- Verification code identification
-
8.1- Recognition of graphic verification codes
-
8.2- Recognition of polar sliding verification codes
-
8.3- Identification of the tap verification code
-
8.4- Recognition of micro-blog palace verification code
-
9- Use of proxies
-
9.1- Proxy setup
-
9.2- Agent pool maintenance
-
9.3- Use of paid agents
-
9.4-ADSL Dial-up agent
-
9.5- Use an agent to crawl articles from wechat public accounts
-
10- Mock login
-
10.1- Mock login and climb GitHub
-
10.2- Establishment of Cookies pool
-
11 – App crawl
-
11.1 the use of Charles
-
The use of 11.2 mitmproxy
-
11.3-mitMDump crawls to “get” App ebook information
-
11.4- Basic use of Appium
-
11.5-Appium climbs wechat moments
-
11.6-Appium+ MitMDump crawl jingdong products
-
Use of the 12-PySpider framework
-
12.1 pySpider Framework Introduction
-
12.2- Basic Use of the PySpider
-
12.3-pyspider usage details
-
13- Use of Scrapy frames
-
13.1-Scrapy framework introduction
-
13.2 Scrapy primer
-
13.3 the use of the Selector
-
13.4 the use of the spiders
-
13.5- Usage of Downloader Middleware
-
13.6- Use of Spider Middleware
-
13.7-item Pipeline usage
-
13.8 Scrapy docking Selenium
-
13.9 Scrapy docking Splash
-
13.10-Scrapy Universal crawler
-
The use of 13.11 Scrapyrt
-
13.12 Scrapy docking Docker
-
13.13-Scrapy
-
14- Distributed crawlers
-
14.1- Distributed crawler principles
-
14.2-Scrapy-Redis source code parsing
-
14.3-Scrapy distributed implementation
-
14.4-Bloom Filter Interconnection
-
15- Deployment of distributed crawlers
-
15.1-Scrapyd Distributed deployment
-
15.2 – Scrapyd – the use of the Client
-
15.3 Scrapyd docking Docker
-
15.4-Scrapyd Bulk deployment
-
15.5-Gerapy Distributed management
Buy links
Must be a lot of partners have been waiting for a long time, so long before the pre-sale has been slow to no goods, there are many online stores and sold empty, but now don’t worry!
Books are now in jingdong, Tmall, Dangdang and other online stores and full supply, copy the link to the browser to open or scan the TWO-DIMENSIONAL code to open the purchase!
Jingdong mall
https://item.jd.com/12333540.html
Tmall mall
https://detail.tmall.com/item.htm?id=566699703917
dangdang
http://product.dangdang.com/25249602.html
Welcome to buy, O(∩_∩)O
Free preview
Don’t trust? Want to see what we have first? No problem! Look here:
Free chapter trial reading (copy and paste to open browser) :
https://cuiqingcai.com/5052.html
The first 7 chapters will always be free to read!
Ok, next is our welfare link ~
Bonus 1: lucky draw to send books!!
Congratulations on seeing this! Then the next welfare time is up! There are two more benefits you can’t miss
The second wave of lucky draw book sending activity (there are many more to come), the official account lucky draw 30 books signed by the author!!
Activity process (important, please read carefully) :
The deadline is 22:00, June 24, 2018, overdue participation will be invalid, please remember your lottery code, after the end of the activity will be from participating partners in accordance with the lucky value of the weight ratio of 30 and announced in the wechat public account, please pay attention to the announcement of the lottery results of the public account! The winning partner will receive an autographed copy of Python3 web crawler development.
Benefit two: exclusive discount!!
Wait, you think that’s all there is to it? Of course not! In addition to the lucky draw to send books, we also got the exclusive discount of the well-known brand Cloud Cube of dial-up VPS. Reply to “coupon” in the public account (advance Coder), you can get free cloud Cube host coupon of 50 yuan, limited number, first come, first served! Coupons can be purchased in the cloud Cube official website (www.yunlifang.cn) dynamic IP dial VPS cash deduction, with it, crawler agent is easy!
You ask me what dynamic dial VPS can do? How should it be used in a reptile? Check it out here:
Easy access to mass stable agent! ADSL dial-up agent construction
Bonus 3: Video course!!
Of course, in addition to books, there are also supporting video courses, the author is also Cui Qingcai, the combination of the two learning effect is better! Limited time discount! Scan the QR code below to learn more!
The last and most important thing is the address of the activity!! Scan the code to get your benefits!!
To advance the Coder
Special thanks
Finally, special thanks to Cloud Cube and Tianshan Intelligent for their strong support to this activity!