One, foreword
Recently, He became the first Asian to TOP TOP of TOP BEAUTY WORLD’s list of the WORLD’s 100 most beautiful men and women. The news immediately became a traffic sensation.
Want to see the list of the most beautiful little sister information. There is no list of the prettiest little sisters yet. But there is data on the last 100 most beautiful women in the world, including names, regions and occupations.
The world’s Top100 most beautiful goddesses list data, how can not get down to explore a wave? Let’s use Python crawlers to get the list data down and visualize the data.
Second, crawl data
First of all, we want to obtain the data, including the little sister’s name, region, occupation and other information. Check that the web page belongs to the static web page, so we can directly analyze the source code of the web page, extract the data we want.
The Python code is as follows:
# -*- coding: UTF- 8 -- * -"""@ File: spiders. Py @ Author: Ye Tingyun @ CSDN: https://yetingyun.blog.csdn.net/"""
import requests
from lxml import etree
import logging
from fake_useragent import UserAgent
import openpyxl
wb = openpyxl.Workbook()
sheet = wb.active
sheet.append(['ranking'.'name'.'country'.'occupation'.'up_score'.'down_score'])
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s: %(message)s'Ua = UserAgent(verify_ssl=False, path= UserAgent'fake_useragent.json')
headers = {
"accept-encoding": "gzip"."upgrade-insecure-requests": "1"."user-agent": ua.random,
}
url = "https://kingchoice.me/topic-the-100-most-beautiful-women-in-the-world-2020-close-jan-29-2021-1255.html? option=40924"
response = requests.get(url, headers=headers)
# print(response.status_code)
# print(response.text)
html = etree.HTML(response.text)
lis = html.xpath('//div[@class="channel-box3-body box3-body"]/ul/li')
logging.info(len(lis)) # 100Pieces of informationfor index_, li in enumerate(lis, start=1):
src = li.xpath('.//div[@class="avatar"]/img/@src') [0Name = li.xpath()'.//div[@class="info"]/a/h3/text()') [0] # name country, occupation = li.xpath('.//div[@class="info"]/span/text()') [0].split(' '.1) # region career up_score = li.xpath('.//div[@class="des"]/div[1]/ul/li[1]/span/text()') [0] # down_score = li.xpath('.//div[@class="des"]/div[1]/ul/li[2]/span/text()') [0Img = requests. Get (SRC, headers=headers).content with open(r)'.\Top100_beauty_img\{}.jpg'.format(name), 'wb') as f:
f.write(img)
sheet.append([index_, name, country, occupation, up_score, down_score])
logging.info([index_, name, country, occupation, up_score, down_score])
logging.info('Saved {} information'.format(name))
wb.save(filename='datas.xlsx')
Copy the code
The results are as follows:
Data visualization
Let’s take a look at the Top100 most beautiful goddesses in the world
# -*- coding: UTF- 8 -- * -"""@ File: scoring. Py @ Author: Ye Tingyun @ CSDN: https://yetingyun.blog.csdn.net/"""
import pandas as pd
import pyecharts.options as opts
from pyecharts.charts import Line
from pyecharts.datasets import register_files
from pyecharts.globals importPyecharts register_files({"myTheme": ["themes/myTheme"."js"]})
CurrentConfig.ONLINE_HOST = 'D:/python/pyecharts-assets-master/assets/'
df = pd.read_excel('datas.xlsx')
up_score = list(df['up_score'])
down_score = list(df['down_score'])
x_data = [i for i in range(1.101)]
c = (
Line(init_opts=opts.InitOpts(theme='myTheme'))
.add_xaxis(xaxis_data=x_data)
.set_colors(['#7FFF00'.'red'] # set the color of the two line charts. Add_yaxis ('up_score', y_axis=up_score,
label_opts=opts.LabelOpts(is_show=False)
)
.add_yaxis('down_socre', y_axis=down_score, Label_opts = opts.labelopts (is_show=False)).set_global_opts(# xaxIS_opts = opts.axisopts (name='排名'),
yaxis_opts=opts.AxisOpts(name='score'),
title_opts=opts.TitleOpts('Score situation')
)
.render('score. HTML'))Copy the code
The results are as follows:First and second place Lalisa Manoban and Taylor Swift scored far more than the beauties who followed.
Top100 beautiful women by region
The women on the list come from all over the world, and the regions with the most women on the list are analyzed to get the Top10. Note that some of the beauties are mixed-race, such as the “English-American” above, which we counted twice, i.e., both British and American.
# -*- coding: UTF- 8 -- * -"""@ File: goddess. Py @ Author: Ye Tingyun @ CSDN: https://yetingyun.blog.csdn.net/"""
import pandas as pd
from collections import Counter
from pyecharts import options as opts
from pyecharts.charts import Bar
from pyecharts.globals import ThemeType, CurrentConfig
import random
CurrentConfig.ONLINE_HOST = 'D:/python/pyecharts-assets-master/assets/'
df = pd.read_excel('datas.xlsx')
areas = df['country']
area_list = []
for item in areas:
if The '-' in item:
item = item.split(The '-')
for i in item:
area_list.append(i)
else:
area_list.append(item)
area_count = Counter(area_list).most_common(10)
print(area_count)
area = [x[0] for x in area_count]
nums = [y[1] forY in area_count] bar = bar (init_opts= opts.initopts (theme= themetype.macarons)) colors = ['red'.'#0000CD'.'# 000000'.'# 008000'.'#FF1493'.'#FFD700'.'#FF4500'.'#00FA9A'.'# 191970'.'#9932CC'Shuffle (colors) # Baritem y = []for i in range(10):
y.append( opts.BarItem( name=area[i], value=nums[i], Itemstyle_opts = opts.itemStyLeopts (color=colors[I]) # Bar.add_xaxis (xaxis_data=area) bar.add_yaxis("Number of beautiful Women on the list", y)
bar.set_global_opts(xaxis_opts=opts.AxisOpts(
name='countries',
axislabel_opts=opts.LabelOpts(rotate=45)
),
yaxis_opts=opts.AxisOpts(
name='Number of Beautiful Women on the list', min_=0, max_=55Title_opts = opts.titLeopts (title="Number of beautiful women by region",
title_textstyle_opts=opts.TextStyleOpts(
font_family="KaiTi", font_size=25, color="black"Set_series_opts (label_opts= opts.labelopts (is_show=False), markpoint_opts=opts.MarkPointOpts( data=[ opts.MarkPointItem(type_="max", name="Maximum"),
opts.MarkPointItem(type_="min", name="Minimum"),
opts.MarkPointItem(type_="average", name="Average")]),
markline_opts=opts.MarkLineOpts(
data=[
opts.MarkLineItem(type_="average", name="Average")]))
bar.render("Goddess Area distribution.html")
Copy the code
The results are as follows:As you can see, beauties from the United States and the United Kingdom account for more than half of the list, followed by Those from South Korea and China.
import pandas as pd
df = pd.read_excel('datas.xlsx')
data = df[df['country'].str.contains('Chinese')]
data.to_excel('test.xlsx', index=False)
Copy the code
The beauty that discovers domestic list, profession is actor.
import pandas as pd
df = pd.read_excel('datas.xlsx')
data = df['occupation'].value_counts()
print(data)
Copy the code
The results are as follows:
Actress 69
Singer 18
Model 10
Atress 1
model 1
TV Actress 1
Name: occupation, dtype: int64
Process finished with exit code 0
Copy the code
Finally, take a look at the careers of the beautiful women
Then I checked the data in the website and found that some of the beauties were models, others were Model, this one was written as Model, and one of the Actress was misspelled as Atress. She is a TV Actress. The amount of data is small, so we find it directly in the table and change it.
# -*- coding: UTF- 8 -- * -"""@ File: professional distribution. Py @ Author: Ye Tingyun @ CSDN: https://yetingyun.blog.csdn.net/"""
import pandas as pd
from collections import Counter
from pyecharts.charts import Pie
from pyecharts import options as opts
from pyecharts.globals importThemeType, CurrentConfig # renders currentConfig. ONLINE_HOST = using local JS resources'D:/python/pyecharts-assets-master/assets/'
df = pd.read_excel('datas.xlsx')
data = list(df['occupation']) job_count = Counter(data).most_common() pie = pie (init_opts= opts.initopts (theme= themetype.macarons)) # pie.add('career', data_pair=job_count, radius=["40%"."55%"],
label_opts=opts.LabelOpts(
position="outside",
formatter="{a|{a}}{abg|}\n{hr|}\n {b|{b}: }{c} {per|{d}%} ",
background_color="#eee",
border_color="#aaa",
border_width=1,
border_radius=4,
rich={
"a": {"color": "# 999"."lineHeight": 22."align": "center"},
"abg": {
"backgroundColor": "#e3e3e3"."width": "100%"."align": "right"."height": 22."borderRadius": [4.4.0.0],},"hr": {
"borderColor": "#aaa"."width": "100%"."borderWidth": 0.5."height": 0,},"b": {"fontSize": 16."lineHeight": 33},
"per": {
"color": "#eee"."backgroundColor": "# 334455"."padding": [2.4]."borderRadius": 2,
},
},
),)
pie.set_global_opts(title_opts=opts.TitleOpts(title='Occupation proportion'))
pie.set_colors(['red'.'orange'.'purple'Pie.render () #'Beautiful women occupation distribution.html')
Copy the code
There are three main occupations for women on the list: actress, model and singer. These occupations all have certain requirements for various aspects, in order to develop well. In terms of occupation, it can be seen that actors account for the highest proportion, because the level of appearance is an important business card of an actor. Especially in today’s era of looking at the level of appearance, it also accounts for the highest proportion in the score. Therefore, it is not surprising that actors account for the highest proportion in the list.
Author: yetingyun CSDN:yetingyun.blog.csdn.net/
Love can be worth years, discover the joy of learning, learning and progress, and you share.
Read more
Top 10 Best Popular Python Libraries of 2020 \
2020 Python Chinese Community Top 10 Articles \
5 minutes to quickly master the Python timed task framework \