The first effect




Today, when I was paddling as usual, my water friend gave me a novel called the Sword. Come home from work, ready to download a look, the results of a look, light TXT all want 9MB, with my dawdling speed, afraid not next year can not finish. Just recently I was doing some NLP related things, so I wanted to practice with this novel.


According to the content of the novel, establish a simple character relationship diagram

The idea is:

1. Create a list of people

2. Count the number of times each character appears in the novel

3. Count the number of times each 2 characters appear together in the novel

4. Make simple charts of 1,2,3

#! /usr/bin/env python
# -*- coding: utf-8 -*-

from pyecharts import Graph

# Split the text according to periods
def cut_test():
    reader = open("jianlai.txt"."r",encoding="utf-8").read()
    sentence_list = reader.split("。")
    return sentence_list

# count the number of times the name appears in the full text
def count_name():
    reader = open("jianlai.txt"."r", encoding="utf-8").read()
    output = []
    for i in name:
        count = reader.count(i)
        # If the main character appears too many times, the final drawing effect will be affected, this is a small fix
        if i == "Ping an Chen":
            count = count/5
        row = {"name": i, "symbolSize": count/100}
        output.append(row)
    return output

# Count the number of times two characters appear together in a sentence
def get_rel(name1, name2,snetence):
    counter = 0
    for i in sentence:
        if name1 in i and name2 in i:
            counter +=1
    row = {"source": name1, "target": name2, "weight": counter/100}
    return row

# Count the number of times two characters appear together based on the list of characters
def count_re():
    output = []
    counter = 0
    for i in name:
        for m in name[counter:]:
            ifi ! = m: row = get_rel(i,m,sentence)if row["weight"] >0:
                    output.append(get_rel(i,m,sentence))
        counter += 1
    return output

# Draw with Pyecharts
def paint_grapg(nodes=[],links=[]):
    print(nodes)
    print(links)
    nodes = nodes
    links = links
    graph = Graph("The Heat Map of Character Relationships in The Sword.", height=800)
    graph.add("",
              nodes,
              links,
              graph_repulsion=200,
              graph_edge_length=400,
              graph_layout="force", is_label_show=True, line_opacity=0.2, line_curve=0.5) graph.use_theme("dark")
    graph.render()


if __name__ == '__main__':

    name = ['Ping an Chen'.'Qi Jingchun'.'ning yao'.'RuanXiu'.'Li Bao bottle'.'Song set salary'.'young keigo'.'The old scholar'.O 'good'.'about'.'cui 瀺'.'nguyen Qiong'.'Old Yang Tau'.'bathygenic'.'Big Dick'.'Long Mirror of song'.'Li Er'.'Zheng Dafeng'.'w widow'.'Old Man Cui'.'pei cup'.'Cao Ci'.'pei money']

    sentence = cut_test()
    name_count = count_name()
    rel_count = count_re()
    paint_grapg(name_count,count_re())
Copy the code


Optimizations that can be made next

1. Now all the characters have not been classified, the next step is to make a simple classification according to the gender and school of the characters

2. After finishing the jieba, you can find that this version is useless. Next, you can mark the sentences with participles and parts of speech to see if you can extract something interesting.

3. The frequency of “co-occurrence” relationship between characters is not shown in the chart, which can be optimized next time

4. At present, there is only one relationship between characters: “co-appearance”, so more relationship types of characters can be considered. After expansion, neo4j or D3 can be used for presentation.