This is the 11th day of my participation in the August More Text Challenge
Hello, I’m Brother Chen
Today, I will teach you how to obtain and collect (some degree encyclopedia) celebrity relationship data, and visualize the atlas.
Highlights (Difficult points) :
1. Dynamic query (enter any star name to query the celebrity relationship).
2. Atlas display (and key-value form)
Specific introduction will not talk about, first on the effect:
Demo video
Visualization | teach you practical teaching with 30 lines of python code query star
You can input the corresponding star name in the link to get the corresponding figure relationship atlas (also support drag), such as: Li Yifeng
1. Collect data
Search inside Baidu: Li Yifeng
You can see the star relationship in the star column
Let’s start locating the page TAB
You can see that the data is the LI label of ul under the id slider_relations
relations = selector.xpath('//*[@id="slider_relations"]/ul/li')
Copy the code
After obtaining the li label, you need to parse key-value, key mapping (partner, friend, etc.), and value corresponding to the star name
for i in relations:
re = i.xpath('.//div[@class="name"]/text()') [0]
name = i.xpath('.//div[@class="name"]/em/text()') [0]
Copy the code
2. Web page creation
In order to combine the atlas and dynamically query the relationship between any stars, it is written in the form of ** website (webpage) **
By the Flask framework to write the background, HTML as the front end, because the front-end code is more here will not show (will provide source code later).
First of all, the code to collect the relationship between stars is encapsulated as a function.
### Get information
def getlist(name_i) :
url_name = "https://baike.baidu.com/search/word?word="+str(name_i)
s = requests.Session()
response = s.get(url_name, headers=headers)
text = response.text
# here is the parsing code
links = []
for i in relations:
re = i.xpath('.//div[@class="name"]/text()') [0]
name = i.xpath('.//div[@class="name"]/em/text()') [0]
print(re + "-" + name)
dict = {'source': str(name_i), 'target': str(name), 'rela': str(re), 'type': 'resolved'}
links.append(dict)
return links
Copy the code
Where name_i is the name of the search star, the wrapped function name is getList, and the data returned by the function is links
Flask’s route (getData in the browser)
# Fetch data
@app.route('/getdata')
def getdata() :
name_i = request.args.get('name')
# Collect data
links = getlist(name_i)
print(links)
#return Response(json.dumps(links), mimetype='application/json')
return render_template('index.html', linkss=json.dumps(links))
Copy the code
3. Start
If __name__ = = "__main__" : "" "initialization "is" "app. The run (host =" '+ IP, port = 80, implementing = True)Copy the code
The port here is 80, and IP is the default native IP (when you run code access, just enter your own native IP)
If the preceding page is displayed after the py code is run, the system is successfully started
Then access it in the browser
http://127.0.0.1/getdata?name= Star nameCopy the code
The name of the star here is any star, such as Li Yifeng
Li yi feng http://127.0.0.1/getdata?name=Copy the code
http://127.0.0.1/getdata?name= Jackie chanCopy the code
4. Summary
In this paper, the data of seven national censuses are obtained and visualized.