The open source project I share with you today is the wechat public account crawler. Interested friends can refer to this open source project and see if it can provide you with a new idea.

Project introduction

The crawler interface of wechat public account based on Sogou wechat search can be extended to the crawler based on Sogou

The installation

pip install wechatsogou --upgrade
Copy the code


Import wechatsogou # # configurable parameters directly connected ws_api = wechatsogou. WechatSogouAPI () # captcha input error retry count, . The default is 1 ws_api = wechatsogou WechatSogouAPI (captcha_break_time = 3) # all requests the parameters of the library can be in the # agent such as configuration, Proxy list should include at least one of the HTTPS protocol agent, and to ensure that agents available ws_api = wechatsogou. WechatSogouAPI (proxies = {" HTTP ":" ", "HTTPS" : ""}) # as setting timeout ws_api = wechatsogou. WechatSogouAPI (timeout = 0.1)Copy the code

Get information about a specific public number – get_gzh_info

In [5]: import wechatsogou ... :... : ws_api =wechatsogou.WechatSogouAPI() ... : ws_api.get_gzh_info(' Southern Young Volunteers ')... : Out[5]: {'authentication': 'nanjing University of Aeronautics and Astronautics ', 'headimage': '', 'introduction': 'Post_perm ': 26, 'view_perm': 1000,' profile_URL ': ' NyHamTvK2jtzl7mf-VdpE246zXAq18GNm*S*bq4klw==', 'qrcode': ' - ASdavl0xuavw - bmAEQXOa1T39 XTtM * * EIsjzxz30LjyBNkjmgbT6bGnZM = ', 'wechat_id nanhangqinggong' : ' ', 'wechat_name' : 'China southern youth volunteers'}Copy the code
  • Return data structure
{' profile_url ':', # 10 recently mass page links' headimage ':', # avatar 'wechat_name' : ', # name 'wechat_id' : ', 'post_perm' # WeChat id: Int, # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #Copy the code

Search public account

. :... : ws_api =wechatsogou.WechatSogouAPI() ... : ws_api.search_gzh(' Nanjing University of Aeronautics and Astronautics ')... : Out[6]: [{'authentication': 'headimage': '', '; ': nanjing university of aeronautics and astronautics official WeChat', 'post_perm' : 0, 'view_perm': 0, 'profile_url': '*qGVi5uE8QyQU034 di*2mS6vGJVnQBRB0It9t9M-Qn7ynvjRKZNQrjBMEg==', 'qrcode': ' iMlJttJgGu0hwtZMZCCntdfaP5jD4JXipTwoGecAze8ycEF5KYZqtLSsNE=', 'wechat_id': 'NUAA_1952', 'wechat_name': {'authentication': 'Nanjing University of Aeronautics and Astronautics ',' Headimage ': '', 'introduction': 'post_perm': 0, 'view_perm': 0, 'profile_URL ': ' 1bOIT5Nrr8Pcgs6bQ-oEd6jdQ0aK5WCQjNwMAhJnyQ==', 'qrcode': '*CGI-PTR0y6stH PtdSDqzAzvPMWz67Xz9IMF2TDfu4Cndj5bKxlsFh6wGhiLH0b9ZKqgCW5k=', 'wechat_id': 'nuaa_tw', 'wechat_name': Youth League Committee of Nanjing University of Aeronautics and Astronautics},...]Copy the code
  • The data structure
{' profile_url ':', # 10 recently mass page links' headimage ':', # avatar 'wechat_name' : ', # name 'wechat_id' : ', 'post_perm' # WeChat id: Int, # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #Copy the code

Parse the recent article page – get_gzh_article_by_history

  • Method of use
In [1]: import wechatsogou ... :... : ws_api =wechatsogou.WechatSogouAPI() ... : ws_api.get_gzh_article_by_history(' China Southern Young Volunteers ')... : Out[1]: { 'article': [ { 'abstract': 'What we've done can't change anything immediately -- \n But the journey of a thousand miles will never stop. \n We won't stop here, then we'll start again. \n Minqin, goodbye. \n Green qin, no more. ', 'author': ',' content_URL ': '*tqUrFyO7OqpFeJZwTA7JJtsHpz6BgC8ugyfgpOnyWLtPb85 R5Zmu0JuZRbZKG72x4bQjMCcsfA5mC3GSSOPbYd-9tzvTgmroGRmc4Tzk8090KCiEu6EjA0YMHeytWJWpxr51M2FUYQhTWJ01pTmNnXLVAG6Ex6AG52uvvmQ A=', 'copyright_stat': 100, 'cover': ' /0?wx_fmt=jpeg', 'datetime': 1501072594, 'fileid': 502326199, 'main': 1, 'send_id': 1000000306, 'source_url': ', 'title', 'green line frequently, don't say goodbye', 'type' : '49'}, {' abstract: 'did not miscellaneous, love no past, volunteer not old, we do not come loose!', 'author' : ', 'content_url' : '*tqUrFyO7OqpFeJZwTA7JJtsHpz6BgC8ugyfgpOnyWLtPb85 R5Zmu0JuZRbZKG72x4bQjMCcsfA5mC3GSSOGUrM*jg*EP1jU-Dyf2CVqmPnOgBiET2wlitek4FcRbXorAswWHm*1rqODcN52NtfKD-OcRTazQS*t5SnJtu3Z A=', 'copyright_stat': 100, 'cover': ' wh1w/0?wx_fmt=jpeg', 'datetime': 1500979158, 'fileid': 502326196, 'main': 1, 'send_id': 1000000305, 'source_url': ' ', 'title', 'stead fast | environmental protection service work summary from 2016 to 2017', 'type' : '49'},... , 'GZH ': {'authentication':' Nanjing University of Aeronautics and Aerospace ', 'headimage': '', 'introduction': The pacesetter of volunteer activities at China Southern University, providing you with volunteer resources and exciting news both inside and outside the university. ', 'wechat_id': 'NanhangQinggong ', 'wechat_name':' Youth volunteer of China Southern Airlines'}}Copy the code
  • The data structure
{' GZH: {' wechat_name ':', # name 'wechat_id' : ', '; '# WeChat id: "', 'authentication' # profile: ', # authentication 'headimage': '# avatar}, 'article': [{'send_id': int, 'article', 'article': [{'send_id': int, # id' datetime': Int, # datatime 10-bit timestamp 'type': ', # message type', both 49 1 or 0 'title': ", # abstract' fileID ': int, # content_URL ': # ', the article links' source_url ':', # read the links of the original 'cover' : ', # cover 'author' : ', # authors' copyright_stat ': Int, # article type, e.g. : original,...] }Copy the code

Here is the introduction, many functions, interested partners might as well go to experience.

Open source:… Do you like today’s recommendation? If you like the words, please leave a message at the bottom of the article or like, to express my support, your message, like, forward attention is I continue to update the power oh!

Concern public number reply: “1024”, free to receive a large wave of learning resources, first come, first served oh!