I used Python to crawl my wechat friends like this...

With the popularity of wechat, more and more people start to use it. Wechat has gradually changed from a simple social software into a way of life. People need wechat for daily communication and work communication. Each friend on wechat represents a different role that people play in society.

This paper will conduct data analysis on wechat friends based on Python. The selected dimensions mainly include gender, profile picture, signature and location. The results will be presented in two forms, namely charts and word clouds. As the saying goes: to do a good job, you must sharpen your tools. Before starting this article, a brief introduction to the third party modules used in this article:

Itchat: the wechat Web interface encapsulates the Python version used in this article to get wechat friend information.
Jieba: The Python version used to jieba text information.
Matplotlib: A graphing module in Python used in this article to draw bar and pie charts
Snownlp: A Python Chinese word segmentation module used in this article to make emotional judgments about text information.
PIL: Image processing module in Python, used in this article for image processing.
Numpy: Numerical computation module in Python, used in conjunction with the WordCloud module in this article.
Wordcloud: the Python wordcloud module used in this article to draw wordcloud images.
TencentYoutuyun: Python SDK provided by TencentYoutu, used in this article to recognize faces and extract image label information.

All of the above modules can be installed through PIP, please consult the respective documentation for detailed instructions on the use of each module.

1. Data analysis

The premise of analyzing wechat friends’ data is to obtain friends’ information. By using the itchat module, all this will become very simple, which can be achieved through the following two lines of code:

itchat.auto_login(hotReload = True) 
friends = itchat.get_friends(update = True)
Copy the code

As usual, we can log in to the webpage version of wechat by scanning the QR code with our mobile phones. The returned Friends object is a collection, and the first element is the current user. Therefore, in the following data analysis process, we always take friends[1:] as the original input data, and each element in the set is a dictionary structure. Taking myself as an example, we can notice that there are four fields: Sex, City, Province, HeadImgUrl and Signature. The following analysis starts with these four fields:

2 Gender of friends

To analyze the gender of friends, we first need to obtain the gender information of all friends. Here, we extract the Sex field of each friend’s information, and then calculate the number of Male, Female and Unkonw respectively. We assemble these three values into a list, and then use the Matplotlib module to draw the pie chart. Its code implementation is as follows:

def analyseSex(firends): sexs = list(map(lambda x:x['Sex'],friends[1:])) counts = list(map(lambda x:x[1],Counter(sexs).items())) labels = [' unknown ','Male','Female'] colors = ['red','yellowgreen','lightskyblue'] plt.figure(figsize=(8,5), Axes (aspect=1) PLt.pie (counts, # labels=labels, # colors=colors, # pie chart area matching labelDistance = 1.1, Autopct = '%3.1f%%', # pie chart area text format shadow = False, # pie chart whether to show shadow startAngle = 90, Plt.legend (loc='upper Right ',) plt.title(u'%s wechat friend gender '% friends[0]['NickName']) plt.show()Copy the code

Here is a brief explanation of this code. There are three values of the gender field in wechat: Unkonw, Male and Female, and the corresponding values are 0, 1 and 2 respectively. These three different values are counted through Counter() in the Collection module, whose items() method returns a Collection of tuples.

The first dimension element of the tuple represents the keys, namely 0, 1, 2, and the second dimension element of the tuple represents the number. Moreover, the set of the tuple is sorted, that is, the keys are arranged in the order of 0, 1, and 2, so the number of these three different values can be obtained through map() method, which is passed to Matplotlib to draw. The percentages of the three different values were calculated by Matplotlib. Here’s matplotlib’s friend gender distribution:

3 Friend profile pictures

Analyze the profile pictures of friends from two aspects. First, in these profile pictures of friends, how much proportion of friends who use face profile pictures; Second, what valuable keywords can be extracted from the profile pictures of your friends.

Here, you need to download the profile picture to the local according to the HeadImgUrl field, and then detect whether there is a face in the profile picture and extract the label in the picture through the API interface related to face recognition provided by Tencent Youtu. Among them, the former is the classification summary, we use the pie chart to present the results; The latter is text analysis, and we use word clouds to present the results. The key codes are as follows:

def analyseHeadImage(frineds): # Init Path basePath = os.path.abspath('.') baseFolder = basePath + '\\HeadImages\\' if(os.path.exists(baseFolder) == False): os.makedirs(baseFolder) # Analyse Images faceApi = FaceAPI() use_face = 0 not_use_face = 0 image_tags = '' for index in range(1,len(friends)): friend = friends[index] # Save HeadImages imgFile = baseFolder + '\\Image%s.jpg' % str(index) imgData = itchat.get_head_img(userName = friend['UserName']) if(os.path.exists(imgFile) == False): with open(imgFile,'wb') as file: file.write(imgData) # Detect Faces time.sleep(1) result = faceApi.detectFace(imgFile) if result == True: use_face += 1 else: not_use_face += 1 # Extract Tags result = faceApi.extractTags(imgFile) image_tags += ','.join(list(map(lambda X :x['tag_name'],result)) labels = [u' using face ',u' not using face '] counts = [use_face,not_use_face] colors = ['red','yellowgreen','lightskyblue'] axes(axes =(8,5) # gender statistical results labels=labels, # gender display label colors=colors, # pie area color matching labeldistance = 1.1, # labeldistance dot autopct = '%3.1f%%', # pie chart area text format shadow = False, # pie chart area text format Plt.legend (loc='upper right',) plt.title(u'%s '%s '%s '%s '%s '%s '%s '% friends[0]['NickName']) plt.show() image_tags = image_tags.encode('iso8859-1').decode('utf-8') back_coloring = np.array(Image.open('face.jpg')) wordcloud = WordCloud( font_path='simfang.ttf', background_color="white", max_words=1200, mask=back_coloring, max_font_size=75, random_state=45, width=800, height=480, margin=15 ) wordcloud.generate(image_tags) plt.imshow(wordcloud) plt.axis("off") plt.show()Copy the code

Here, we will create a HeadImages directory in the current directory, which is used to store all friends’ pictures. Then, we will use a class called FaceApi, which is packaged by Tencent Utu SDK. ** Here, two API interfaces of face detection and image label recognition are respectively called. ** counts the number of friends with and without facial avatars, while ** adds up the tags extracted from each profile picture. The analysis results are shown in the figure below:

Can be noticed that, in all WeChat friends about nearly a quarter of the WeChat friends use the human face image, and nearly three-quarters of WeChat friends without human face image, it shows that in all friends WeChat level “appearance” confident people, only 25% of the total number of friends, friends or 75% WeChat partial low-key style is given priority to, Don’t like to use face avatar as wechat avatar.

** Secondly, considering that Tencent UtU cannot really recognize “face”, ** we extract the labels in friends’ profile pictures again to help us understand the keywords in wechat friends’ profile pictures. The analysis results are shown in the figure:

Through the word cloud, we can find that in the signature word cloud of wechat friends, the keywords with relatively high frequency are: ** girls, trees, houses, text, screenshot, cartoon, group photo, sky and sea. ** This indicates that among my wechat friends, the wechat profile pictures selected by my friends mainly come from four sources: daily life, tourism, scenery and screenshots.

The style of the wechat profile picture selected by friends is mainly cartoon. Common elements of the wechat profile picture selected by friends include sky, sea, house and trees. All friends by observing the picture, I found in my WeChat friends, use of personal photos as WeChat head 15 people, using the network picture as WeChat avatar has 53 people, use cartoon images as WeChat avatar has 25 people, use photo images as WeChat portraits of three people, use pictures as WeChat avatar five people, There are 13 people who use landscape pictures as their wechat profile pictures, and 18 people use girls’ photos as their wechat profile pictures, which basically conforms to the analysis results of image label extraction.

4 Friend Signature

Analysis friends signature, the signature is the most abundant text information in friends information, according to the methodology of human idiomatic “label”, the signature can be analyzed out one person in a certain period of time, like people happy laugh, sad will cry, laugh and cry two tags, respectively, show the state of people happy and sad.

Here, we do two kinds of signature processing. The first one is to generate word cloud after stuttering word segmentation, in order to understand the keywords in friends’ signatures and which keywords appear relatively frequently. ** The second method is to use SnowNLP to analyze the emotional tendency of friends’ signatures. ** That is, whether friends’ signatures are positive, negative or neutral on the whole, and the proportion of each. Here we can extract the Signature field, and its core code is as follows:

def analyseSignature(friends): signatures = '' emotions = [] pattern = re.compile("1f\d.+") for friend in friends: signature = friend['Signature'] if(signature ! = None): signature = signature.strip().replace('span', '').replace('class', '').replace('emoji', '') signature = re.sub(r'1f(\d.+)','',signature) if(len(signature)>0): nlp = SnowNLP(signature) emotions.append(nlp.sentiments) signatures += ' '.join(jieba.analyse.extract_tags(signature,5))  with open('signatures.txt','wt',encoding='utf-8') as file: file.write(signatures) # Sinature WordCloud back_coloring = np.array(Image.open('flower.jpg')) wordcloud = WordCloud( font_path='simfang.ttf', background_color="white", max_words=1200, mask=back_coloring, max_font_size=75, random_state=45, width=960, height=720, margin=15 ) wordcloud.generate(signatures) plt.imshow(wordcloud) plt.axis("off") plt.show() wordcloud.to_file('signatures.jpg') # Signature Emotional Judgment count_good = len(list(filter(lambda Emotions)) count_normal = len(list(filter(lambda x:x>=0.33 and x<=0.66,emotions)) Len (list(filter(lambda x:x<0.33,emotions)) labels = [u' negative negative ',u' neutral ',u' positive positive '] values = (count_bad,count_normal,count_good) plt.rcParams['font.sans-serif'] = ['simHei'] plt.rcParams['axes.unicode_minus'] = Ylabel (u' frequency ') plt. xTicks (range(3),labels) plt.legend(loc='upper right',) plt.bar(range(3), labellabels (loc='upper right', labelLabels) plt. xTicks (range(3), labellabels (labellabels) plt. xTicks (range(3), labellabels (labellabels) plt.legend(loc='upper right', labelLabels (labelLabels) Values, color = 'RGB ') plt.show() plt.show()Copy the code

Through the word cloud, we can find that: in the signature information of wechat friends, the keywords with relatively high frequency are: efforts, grow up, good, happy, life, happiness, life, distance, time, walk.

As can be seen from the following bar chart, positive, neutral and negative judgments account for 55.56%, 32.10% and 12.35% of the signature messages of wechat friends. This result is basically consistent with the result shown through the word cloud, which indicates that about 87.66% of the signature messages of wechat friends convey a positive attitude.

5 Friend Location

Analyze friend locations by extracting the Province and City fields. Map visualization in Python is mainly done through the Basemap module, which requires downloading map information from foreign websites, making it very difficult to use.

Although pyEcharts project is provided in the community, I have noticed that ECharts no longer supports the function of exporting maps due to policy changes, so the customization of maps is still a problem at present. The mainstream technical solution is to configure JSON data of provinces and cities across the country.

This is a zero programming solution. We export a CSV file in Python and upload it to BDP. We can create a visual map with a simple drag and drop.

def analyseLocation(friends): 
 headers = ['NickName','Province','City'] 
 with open('location.csv','w',encoding='utf-8',newline='',) as csvFile: 
  writer = csv.DictWriter(csvFile, headers) 
  writer.writeheader() 
  for friend in friends[1:]: 
   row = {} 
   row['NickName'] = friend['NickName'] 
   row['Province'] = friend['Province'] 
   row['City'] = friend['City'] 
   writer.writerow(row)
Copy the code

The following is the geographical distribution map of wechat friends generated in BDP. It can be found that: My wechat friends are mainly concentrated in Ningxia and Shaanxi provinces.

**PS: Thank you for your patience. In addition, reading more books can enhance the competitiveness of the workplace. ** Here also send you a set of I spent a month to compile a complete Python learning package, a total of 400 sets (source code). Video. Notes) this book, I hope you are useful:

Get it here:

Python 400 set (source code. Video notes)

Extraction code: wechat search public account [code farmer sortie] follow, reply [Python] can be obtained.

6 summarizes

This article is another attempt of data analysis by me. It mainly conducts a simple data analysis of wechat friends from four dimensions, namely gender, profile picture, signature and location, and presents the results in two forms, namely charts and word clouds. In a word, “data visualization is a means rather than an end”. What is important is not what we have done here, but what we can learn from the phenomenon reflected in these graphs. I hope this article can inspire you.

Source: Internet. If invaded, please contact to delete

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

I used Python to crawl my wechat friends like this…

I used Python to crawl my wechat friends like this…

Related Posts

21 must-know Open source tools for machine learning, covering 5 broad areas

Python Data Analysis Basics: outlier detection and processing

[Edge detection] Image edge detection based on Matlab ant colony algorithm