This is the 8th day of my participation in Gwen Challenge

Today we will teach you to extract names from a novel/famous book, and make statistics on the relationship between names (in the same paragraph, names appear in pairs), according to the relationship between names to draw a graph — Gephi

Core knowledge points:

  1. Extract the name of the person in the text

  2. Count the relationships between people in the text

  3. Draw a network diagram

Take a look at the results:

01 jieba小 example

Before we start to analyze douluodu, let’s take a small example to give you an impression of jieba’s use.

Description:

You can see the text content result jieba library after word segmentation, you can get each part of speech. Here we need to get the name, just need to pick out the part of speech of NR, and count the number of each word.

Here is a small example of jieba extracting the name from the text. The following takes the novel Douluo Continent as an example to explain how to extract the name of the characters, establish a directed graph, and finally draw the character relationship graph.

02 Extraction of character relationships

1. Make a name dictionary

The partial result is shown in the figure above, by extracting all the names in the novel, extracting the top 100 and writing TXT in order of frequency from highest to lowest. However, some are found to be incomplete, useless, a character more than one situation. After sorting, it is as follows.

2. Build character directed relationships

Count the characters’ names as they appear in pairs in each paragraph. Finally, the relationship between character nodes and characters is established.

3. Save the file to CSV

The results are as follows:

03 People network diagram

Here, gephi is used to draw, and the download address of gephi is as follows:

https://gephi.org/
Copy the code

After the download and installation, import the newly saved Node. CSV and edge. CSV data to draw the character network diagram

1. Create a project and import data

New Project -> Select Data Data TAB and click Enter number table to add the CSV data for nodes and edges.

2. Adjust the styles

3. Modify the font to display the corresponding labels

4. Select an automatic layout, preview the layout, and adjust related parameters

5. Finally click the lower left corner to export the image

04 summary

This paper explains how to extract the names of people in the text and count the relationships between people in the text, and finally draw the network relationship map. If you don’t understand, you can leave a message below and communicate with them.