This is the 28th day of my participation in Gwen Challenge

Before, some fans asked me to climb the hot search topic on the Internet, according to the proposal of the fans, I thought of climbing the hot search topic of different platforms and made a: the whole network real-time hot search topic “running horse lantern” visualization.

Features: Real-time, visual browsing

The hot search data sources here are mainly Weibo and Zhihu. The purpose of choosing these two platforms is as follows: 1. 2. Direct hot search data Api interface.

Take a look at the results:

Dynamic chart:

1. Get data

1. Collect microblog data

The hot search data Api interface of Weibo is as follows:

https://s.weibo.com/top/summary/

Copy the code

Web analytics

Take a look at the source code

The data list is in pl_top_realtimeHOT, then go down to Tbody, tr is the list of hot data, each TR has an A tag, the A tag has the hot title and the corresponding hot link.

url = "https://s.weibo.com/top/summary/"
headers = {
    "User-Agent": "Mozilla / 5.0 (Windows NT 6.1; The rv: 85.0) Gecko / 20100101 Firefox / 85.0"."Host": "s.weibo.com"
}
r = requests.get(url, headers=headers)
soup = bs(r.text, "lxml")
div = soup.find("div", {"id": "pl_top_realtimehot"}).find("tbody")
tr_tags = div.find_all("tr")

Copy the code

Request and extract hot search data from web source code (BeautifulSoup library is used here to parse web source code)

The complete code

### #
def get_weibo() :
    url = "https://s.weibo.com/top/summary/"
    headers = {
        "User-Agent": "Mozilla / 5.0 (Windows NT 6.1; The rv: 85.0) Gecko / 20100101 Firefox / 85.0"."Host": "s.weibo.com"
    }
    r = requests.get(url, headers=headers)
    soup = bs(r.text, "lxml")
    div = soup.find("div", {"id": "pl_top_realtimehot"}).find("tbody")
    tr_tags = div.find_all("tr")
    Prepare for data saving
    hot_text = []
    hot_link = []
    for tr in tr_tags:
        a = tr.find("a")
        hot_text.append(a.text)
        # fetch link
        hot_link.append("https://s.weibo.com" + a.get("href"))
    return hot_text, hot_link

Copy the code

The code of crawling microblog hot search data is encapsulated into the function get_weibo, which is convenient for the visual code to call, where hot_text is the hot title and hot_link is the hot link

2. Collect zhihu data

Zhihu hot search API interface is as follows:

https://api.zhihu.com/topstory/hot-list?limit=10&reverse_order=0

Copy the code

Extract the data

Json data is directly returned, so there is no need to conduct web page analysis, just need to know the json data, hot search title and the corresponding hot search title link key

The data is in data, and the hot search title and link for each piece of data are under target. The hot search title is title, and the hot search title link is URL

### Climb zhihu hot search data
def get_zhihu() :
    headers = {'User-Agent': 'the Mozilla / 5.0 (Windows NT 6.3; Win64; x64; The rv: 84.0) Gecko / 20100101 Firefox 84.0 / '}
    url = "https://api.zhihu.com/topstory/hot-list?limit=10&reverse_order=0"
    text = requests.get(url, headers=headers).json()
    Prepare for data saving
    hot_text = []
    hot_link = []
    for i in text['data']:
        hot_text.append(i['target'] ['title'])
        hot_link.append(i['target'] ['url'])
 
 
    return hot_text,hot_link

Copy the code

Similarly, the code of crawling zhihu hot search data is encapsulated as a function get_zhihu, which is convenient for visual code to call. Hot_text is the title of hot spot, and hot_link is the link of hot spot

Flask backend

In order to combine collection and visualization of web pages, Flask framework is selected here to build the website.

The jump page

# Enter the page
@app.route('/')
def index() :
    return render_template('view.html')

Copy the code

Make an API to get data and return Json data

### Obtain the hot search data of Weibo and Zhihu
@app.route('/getdata')
def alldata() :
    wb_t, wb_u = get_weibo()
    zh_t, zh_u = get_zhihu()
    t = []
    u = []
    for i in range(0.len(wb_t)):
        t.append(wb_t[i])
        u.append(wb_u[i])
    for i in range(0.len(zh_t)):
        t.append(zh_t[i])
        u.append(zh_u[i])
    res = {}
    res['title'] = t
    res['url'] = u
    return Response(json.dumps(res), mimetype='application/json')

Copy the code

In order to facilitate you to run directly, do not need to change the IP, here is to use the default local IP (partners to get the source code directly run on the line), port is 80

if __name__ == "__main__":
    """ Initialize,debug=True"""
    app.run(host='127.0.0.1', port=80,debug=True)

Copy the code

3. Visual display of running lantern

Here is the use of HTML pages to make the running lantern rolling effect, the core code is as follows:

<! DOCTYPEhtml>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="Width = device - width, initial - scale = 1.0">
  <script src=".. / static/js/jquery - 2.1.4. Min. Js. ""></script>
    <title>Real-time hot search topic - Li Yunchen (public id: Python researcher)</title>
   <style>
   a{
   text-decoration: none;
   }
   .f1{
      color:"red"; 
   }
</style>
</head>
<body>
<div id="textdata">
    
</div>
<! -- Obtain hot search data of Weibo and Zhihu -->
        <script type="text/javascript">
          function getdata(){
                
                $.ajax({
                  type: 'GET'.url: "http://127.0.0.1/getdata".dataType: 'json'.success: function(data){}}}); }setInterval("getdata()"."15000");//1000 indicates 1 second

Copy the code

The data is collected every 15 seconds (real-time effect is achieved).

4, start,

Run main.py directly

Then access it in the browser

http:/ / 127.0.0.1

Copy the code

Then wait a few seconds for the horse-light visualization to appear

GIF version:

5, summary

In order to facilitate learning, Chen Elder brother has put the complete source code of this article upload, need to reply in the same name public number: hot search run lantern

This article is also in response to the request of fans, crawling hot search topic, finally I produced a real-time hot search “running horse lantern” visualization.

Features: Real-time, visual browsing