The open source project I share with you today is the wechat public account crawler. Interested friends can refer to this open source project and see if it can provide you with a new idea.

Project introduction


The crawler interface of wechat public account based on Sogou wechat search can be extended to the crawler based on Sogou

The installation

pip install wechatsogou --upgrade
Copy the code

use

Import wechatsogou # # configurable parameters directly connected ws_api = wechatsogou. WechatSogouAPI () # captcha input error retry count, . The default is 1 ws_api = wechatsogou WechatSogouAPI (captcha_break_time = 3) # all requests the parameters of the library can be in the # agent such as configuration, Proxy list should include at least one of the HTTPS protocol agent, and to ensure that agents available ws_api = wechatsogou. WechatSogouAPI (proxies = {" HTTP ":" 127.0.0.1:8888 ", "HTTPS" : "127.0.0.1:8888"}) # as setting timeout ws_api = wechatsogou. WechatSogouAPI (timeout = 0.1)Copy the code

Get information about a specific public number – get_gzh_info

In [5]: import wechatsogou ... :... : ws_api =wechatsogou.WechatSogouAPI() ... : ws_api.get_gzh_info(' Southern Young Volunteers ')... : Out[5]: {'authentication': 'nanjing University of Aeronautics and Astronautics ', 'headimage': 'http://img01.sogoucdn.com/app/a/100520090/oIWsFt1tmWoG6vO6BcsS7St61bRE', 'introduction': 'Post_perm ': 26, 'view_perm': 1000,' profile_URL ': 'http://mp.weixin.qq.com/profile?src=3&timestamp=1501140102&ver=1&signature=OpcTZp20TUdKHjSqWh7m73RWBIzwYwINpib2ZktBkLG8 NyHamTvK2jtzl7mf-VdpE246zXAq18GNm*S*bq4klw==', 'qrcode': 'http://mp.weixin.qq.com/rr?src=3&timestamp=1501140102&ver=1&signature=-DnFampQflbiOadckRJaTaDRzGSNfisIfECELSo-lN-GeEOH8 - ASdavl0xuavw - bmAEQXOa1T39 XTtM * * EIsjzxz30LjyBNkjmgbT6bGnZM = ', 'wechat_id nanhangqinggong' : ' ', 'wechat_name' : 'China southern youth volunteers'}Copy the code
  • Return data structure
{' profile_url ':', # 10 recently mass page links' headimage ':', # avatar 'wechat_name' : ', # name 'wechat_id' : ', 'post_perm' # WeChat id: Int, # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #Copy the code

Search public account

. :... : ws_api =wechatsogou.WechatSogouAPI() ... : ws_api.search_gzh(' Nanjing University of Aeronautics and Astronautics ')... : Out[6]: [{'authentication': 'headimage': 'http://img01.sogoucdn.com/app/a/100520090/oIWsFt1MvjqspMDVvZjpmxyo36sU', '; ': nanjing university of aeronautics and astronautics official WeChat', 'post_perm' : 0, 'view_perm': 0, 'profile_url': 'http://mp.weixin.qq.com/profile?src=3&timestamp=1501141990&ver=1&signature=S-7U131D3eQERC8yJGVAg2edySXn*qGVi5uE8QyQU034 di*2mS6vGJVnQBRB0It9t9M-Qn7ynvjRKZNQrjBMEg==', 'qrcode': 'http://mp.weixin.qq.com/rr?src=3&timestamp=1501141990&ver=1&signature=Tlp-r0AaBRxtx3TuuyjdxmjiR4aEJY-hjh0kmtV6byVu3QIQY iMlJttJgGu0hwtZMZCCntdfaP5jD4JXipTwoGecAze8ycEF5KYZqtLSsNE=', 'wechat_id': 'NUAA_1952', 'wechat_name': {'authentication': 'Nanjing University of Aeronautics and Astronautics ',' Headimage ': 'http://img01.sogoucdn.com/app/a/100520090/oIWsFtwVmjdK_57vIKeMceGXF5BQ', 'introduction': 'post_perm': 0, 'view_perm': 0, 'profile_URL ': 'http://mp.weixin.qq.com/profile?src=3&timestamp=1501141990&ver=1&signature=aXFQrSDOiZJHedlL7vtAkvFMckxBmubE9VGrVczTwS60 1bOIT5Nrr8Pcgs6bQ-oEd6jdQ0aK5WCQjNwMAhJnyQ==', 'qrcode': 'http://mp.weixin.qq.com/rr?src=3&timestamp=1501141990&ver=1&signature=7Cpbd9CVQsXJkExRcU5VM6NuyoxDQQfVfF7*CGI-PTR0y6stH PtdSDqzAzvPMWz67Xz9IMF2TDfu4Cndj5bKxlsFh6wGhiLH0b9ZKqgCW5k=', 'wechat_id': 'nuaa_tw', 'wechat_name': Youth League Committee of Nanjing University of Aeronautics and Astronautics},...]Copy the code
  • The data structure
{' profile_url ':', # 10 recently mass page links' headimage ':', # avatar 'wechat_name' : ', # name 'wechat_id' : ', 'post_perm' # WeChat id: Int, # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #Copy the code

Parse the recent article page – get_gzh_article_by_history

  • Method of use
In [1]: import wechatsogou ... :... : ws_api =wechatsogou.WechatSogouAPI() ... : ws_api.get_gzh_article_by_history(' China Southern Young Volunteers ')... : Out[1]: { 'article': [ { 'abstract': 'What we've done can't change anything immediately -- \n But the journey of a thousand miles will never stop. \n We won't stop here, then we'll start again. \n Minqin, goodbye. \n Green qin, no more. ', 'author': ',' content_URL ': 'http://mp.weixin.qq.com/s?timestamp=1501143158&src=3&ver=1&signature=B-*tqUrFyO7OqpFeJZwTA7JJtsHpz6BgC8ugyfgpOnyWLtPb85 R5Zmu0JuZRbZKG72x4bQjMCcsfA5mC3GSSOPbYd-9tzvTgmroGRmc4Tzk8090KCiEu6EjA0YMHeytWJWpxr51M2FUYQhTWJ01pTmNnXLVAG6Ex6AG52uvvmQ A=', 'copyright_stat': 100, 'cover': 'http://mmbiz.qpic.cn/mmbiz_jpg/icFYWMxnmxHDYgXNjAle7szYLgQmicbaQlb1eVFuwp2vxEu5eNVwYacaHah2N5W8dKAm725vxv5aM6DFlM59Wftg /0?wx_fmt=jpeg', 'datetime': 1501072594, 'fileid': 502326199, 'main': 1, 'send_id': 1000000306, 'source_url': ', 'title', 'green line frequently, don't say goodbye', 'type' : '49'}, {' abstract: 'did not miscellaneous, love no past, volunteer not old, we do not come loose!', 'author' : ', 'content_url' : 'http://mp.weixin.qq.com/s?timestamp=1501143158&src=3&ver=1&signature=B-*tqUrFyO7OqpFeJZwTA7JJtsHpz6BgC8ugyfgpOnyWLtPb85 R5Zmu0JuZRbZKG72x4bQjMCcsfA5mC3GSSOGUrM*jg*EP1jU-Dyf2CVqmPnOgBiET2wlitek4FcRbXorAswWHm*1rqODcN52NtfKD-OcRTazQS*t5SnJtu3Z A=', 'copyright_stat': 100, 'cover': 'http://mmbiz.qpic.cn/mmbiz_jpg/icFYWMxnmxHCoY44nPUXvkSgpZI1LaEsZfkZvtGaiaNW2icjibCp6qs93xLlr9kXMJEP3z1pmQ6TbRZNicHibGzR wh1w/0?wx_fmt=jpeg', 'datetime': 1500979158, 'fileid': 502326196, 'main': 1, 'send_id': 1000000305, 'source_url': ' ', 'title', 'stead fast | environmental protection service work summary from 2016 to 2017', 'type' : '49'},... , 'GZH ': {'authentication':' Nanjing University of Aeronautics and Aerospace ', 'headimage': 'http://wx.qlogo.cn/mmhead/Q3auHgzwzM4xV5PgPjK5XoPaaQoxnWJAFicibMvPAnsoybawMBFxua1g/0', 'introduction': The pacesetter of volunteer activities at China Southern University, providing you with volunteer resources and exciting news both inside and outside the university. ', 'wechat_id': 'NanhangQinggong ', 'wechat_name':' Youth volunteer of China Southern Airlines'}}Copy the code
  • The data structure
{' GZH: {' wechat_name ':', # name 'wechat_id' : ', '; '# WeChat id: "', 'authentication' # profile: ', # authentication 'headimage': '# avatar}, 'article': [{'send_id': int, 'article', 'article': [{'send_id': int, # id' datetime': Int, # datatime 10-bit timestamp 'type': ', # message type', both 49 1 or 0 'title': ", # abstract' fileID ': int, # content_URL ': # ', the article links' source_url ':', # read the links of the original 'cover' : ', # cover 'author' : ', # authors' copyright_stat ': Int, # article type, e.g. : original,...] }Copy the code

Here is the introduction, many functions, interested partners might as well go to experience.

Open source: github.com/chyroc/Wech… Do you like today’s recommendation? If you like the words, please leave a message at the bottom of the article or like, to express my support, your message, like, forward attention is I continue to update the power oh!

Concern public number reply: “1024”, free to receive a large wave of learning resources, first come, first served oh!