This is the 11th day of my participation in the August More Text Challenge

Hello, I’m Brother Chen

Today, I will teach you how to obtain and collect (some degree encyclopedia) celebrity relationship data, and visualize the atlas.

Highlights (Difficult points) :

1. Dynamic query (enter any star name to query the celebrity relationship).

2. Atlas display (and key-value form)

Specific introduction will not talk about, first on the effect:

Demo video

Visualization | teach you practical teaching with 30 lines of python code query star

You can input the corresponding star name in the link to get the corresponding figure relationship atlas (also support drag), such as: Li Yifeng

1. Collect data

Search inside Baidu: Li Yifeng

You can see the star relationship in the star column

Let’s start locating the page TAB

You can see that the data is the LI label of ul under the id slider_relations

relations = selector.xpath('//*[@id="slider_relations"]/ul/li')
Copy the code

After obtaining the li label, you need to parse key-value, key mapping (partner, friend, etc.), and value corresponding to the star name


for i in relations:
    re = i.xpath('.//div[@class="name"]/text()') [0]
    name = i.xpath('.//div[@class="name"]/em/text()') [0]
Copy the code

2. Web page creation

In order to combine the atlas and dynamically query the relationship between any stars, it is written in the form of ** website (webpage) **

By the Flask framework to write the background, HTML as the front end, because the front-end code is more here will not show (will provide source code later).

First of all, the code to collect the relationship between stars is encapsulated as a function.


### Get information
def getlist(name_i) :
    url_name = "https://baike.baidu.com/search/word?word="+str(name_i)
    s = requests.Session()
    response = s.get(url_name, headers=headers)
    text = response.text
    # here is the parsing code
    
    links = []
    for i in relations:
        re = i.xpath('.//div[@class="name"]/text()') [0]
        name = i.xpath('.//div[@class="name"]/em/text()') [0]
        print(re + "-" + name)
        dict = {'source': str(name_i), 'target': str(name), 'rela': str(re), 'type': 'resolved'}
        links.append(dict)
    return links

Copy the code

Where name_i is the name of the search star, the wrapped function name is getList, and the data returned by the function is links

Flask’s route (getData in the browser)


# Fetch data
@app.route('/getdata')
def getdata() :
    name_i = request.args.get('name')
    # Collect data
    links = getlist(name_i)
    print(links)
    #return Response(json.dumps(links), mimetype='application/json')
    return render_template('index.html', linkss=json.dumps(links))

Copy the code

3. Start

If __name__ = = "__main__" : "" "initialization "is" "app. The run (host =" '+ IP, port = 80, implementing = True)Copy the code

The port here is 80, and IP is the default native IP (when you run code access, just enter your own native IP)

If the preceding page is displayed after the py code is run, the system is successfully started

Then access it in the browser

http://127.0.0.1/getdata?name= Star nameCopy the code

The name of the star here is any star, such as Li Yifeng

Li yi feng http://127.0.0.1/getdata?name=Copy the code

http://127.0.0.1/getdata?name= Jackie chanCopy the code

4. Summary

In this paper, the data of seven national censuses are obtained and visualized.