This article is participating in Python Theme Month. See the link to the event for more details

The opening

As a veteran pirate fan, just brush several times the cartoon animation, a lot of details are not as strong as this article to show the project. In this paper, Lord Po brings the knowledge map of The Whole personage of One Piece King. First, feel the following:

This project is an open source project, more details can be found at the end of the address. Let’s take a look at how to become a Python master.

Note:

  • The version requires Python 3.6+
  • This article has many resource addresses, want to do knowledge base can be a good reference ideas

This article you can get

  • Design logic of knowledge graph
  • If the desired knowledge graph is formed through effective coupling of various channels
  • I believe you can use this analogy to write naruto, or other anime you like knowledge map

One Piece King introduction

Nautical King is a manga of Japanese cartoonist Eiichiro Oda. It was serialized in jieisha weekly Shonen Jump on July 22, 1997. An animated TV adaptation of “The Navigator” premiered on Fuji TV on October 20, 1999. On May 11, 2012, “King Of Navigation” won the 41st award from the Japanese Cartoonists Association.

The book was officially certified by Guinness World Records as “the world’s highest circulation comic series by a single author.” On July 21, 2017, the Japan Memorial Day Association approved the observance of July 22 as ONE PIECE Day.

Project landscape

It is a little long, you can click the catalogue to skip this part, there is a need to see the whole partner

   |--visualization
   |   |--html
   |   |   |--index.html
   |   |   |--test_vizdata_vivrecard_relation.json
   |   |   |--alignment_vizdata_vivrecard_relation.json
   |   |   |--vizdata_vivrecard_relation.json
   |   |   |--vizdata_vivrecard_avpair.json
   |--vivirecard-KB_query
   |   |--question_temp.py
   |   |--query_main.py
   |   |--test_sparql.py
   |   |--jena_sparql_endpoint.py
   |   |--external_dict
   |   |   |--csv2txt.py
   |   |   |--__init__.py
   |   |   |--vivre_zhpname.csv
   |   |   |--person_name.txt
   |   |   |--movie_title.csv
   |   |   |--onepiece_hierarchy_place_terminology.txt
   |   |   |--movie_title.txt
   |   |   |--vivire_ntriples2zhpname.py
   |   |   |--person_name.csv
   |   |   |--vivre_zhpname.txt
   |   |   |--onepiece_place_terminology.txt
   |   |--data
   |   |   |--talkop_vivre_card
   |   |   |   |--14-(201809初始套装+1张追加卡)-entities_avpair.json
   |   |   |   |--11-(201906水之都CP9+德岛竞技场)-predicate_key_list.txt
   |   |   |   |--12-(201903初始套装Vol2-16张主卡)-entities_avpair.json
   |   |   |   |--preprocessed-5-(201902空岛住民+新鱼人海贼团).txt
   |   |   |   |--13-(201811可可亚西村+大监狱)-entities_id_name_list.txt
   |   |   |   |--preprocessed-8-(201905水之都+德岛).txt
   |   |   |   |--file_prefix.json
   |   |   |   |--preprocessed-4-(201908杰尔马66+大妈团).txt
   |   |   |   |--skill.txt
   |   |   |   |--7-(201904空岛+PH岛)-entities_id_name_list.txt
   |   |   |   |--6-(201907恐怖船+象岛)-entities_avpair.json
   |   |   |   |--8-(201905水之都+德岛)-predicate_key_list.txt
   |   |   |   |--13-(201811可可亚西村+大监狱).txt
   |   |   |   |--13-(201811可可亚西村+大监狱)-predicate_key_list.txt
   |   |   |   |--9-(201812阿拉巴斯坦+白胡子海贼团)-entities_avpair.json
   |   |   |   |--preprocessed-7-(201904空岛+PH岛).txt
   |   |   |   |--9-(201812阿拉巴斯坦+白胡子海贼团)-entities_id_name_list.txt
   |   |   |   |--9-(201812阿拉巴斯坦+白胡子海贼团)-predicate_key_list.txt
   |   |   |   |--preprocessed-10-(201901鱼人岛居民+巴洛克社).txt
   |   |   |   |--preprocessed-6-(201907恐怖船+象岛).txt
   |   |   |   |--6-(201907恐怖船+象岛)-entities_id_name_list.txt
   |   |   |   |--preprocessed-12-(201903初始套装Vol2-16张主卡).txt
   |   |   |   |--summary_predicate_set.txt
   |   |   |   |--4-(201908杰尔马66+大妈团)-entities_id_name_list.txt
   |   |   |   |--14-(201809初始套装+1张追加卡)-entities_id_name_list.txt
   |   |   |   |--14-(201809初始套装+1张追加卡).txt
   |   |   |   |--ntriples_talkop_vivre_card.nt
   |   |   |   |--9-(201812阿拉巴斯坦+白胡子海贼团).txt
   |   |   |   |--3-(201810东海的猛者们+超新星集结)-predicate_key_list.txt
   |   |   |   |--8-(201905水之都+德岛).txt
   |   |   |   |--5-(201902空岛住民+新鱼人海贼团).txt
   |   |   |   |--6-(201907恐怖船+象岛)-predicate_key_list.txt
   |   |   |   |--11-(201906水之都CP9+德岛竞技场)-entities_id_name_list.txt
   |   |   |   |--10-(201901鱼人岛居民+巴洛克社)-entities_id_name_list.txt
   |   |   |   |--12-(201903初始套装Vol2-16张主卡).txt
   |   |   |   |--4-(201908杰尔马66+大妈团)-entities_avpair.json
   |   |   |   |--8-(201905水之都+德岛)-entities_id_name_list.txt
   |   |   |   |--7-(201904空岛+PH岛)-entities_avpair.json
   |   |   |   |--5-(201902空岛住民+新鱼人海贼团)-predicate_key_list.txt
   |   |   |   |--8-(201905水之都+德岛)-entities_avpair.json
   |   |   |   |--summary_entities_id_name_list.txt
   |   |   |   |--3-(201810东海的猛者们+超新星集结)-entities_avpair.json
   |   |   |   |--5-(201902空岛住民+新鱼人海贼团)-entities_id_name_list.txt
   |   |   |   |--10-(201901鱼人岛居民+巴洛克社)-predicate_key_list.txt
   |   |   |   |--preprocessed-13-(201811可可亚西村+大监狱).txt
   |   |   |   |--4-(201908杰尔马66+大妈团)-predicate_key_list.txt
   |   |   |   |--preprocessed-9-(201812阿拉巴斯坦+白胡子海贼团).txt
   |   |   |   |--4-(201908杰尔马66+大妈团).txt
   |   |   |   |--3-(201810东海的猛者们+超新星集结)-entities_id_name_list.txt
   |   |   |   |--11-(201906水之都CP9+德岛竞技场)-entities_avpair.json
   |   |   |   |--7-(201904空岛+PH岛).txt
   |   |   |   |--preprocessed-11-(201906水之都CP9+德岛竞技场).txt
   |   |   |   |--3-(201810东海的猛者们+超新星集结).txt
   |   |   |   |--preprocessed-14-(201809初始套装+1张追加卡).txt
   |   |   |   |--5-(201902空岛住民+新鱼人海贼团)-entities_avpair.json
   |   |   |   |--10-(201901鱼人岛居民+巴洛克社)-entities_avpair.json
   |   |   |   |--12-(201903初始套装Vol2-16张主卡)-entities_id_name_list.txt
   |   |   |   |--14-(201809初始套装+1张追加卡)-predicate_key_list.txt
   |   |   |   |--6-(201907恐怖船+象岛).txt
   |   |   |   |--11-(201906水之都CP9+德岛竞技场).txt
   |   |   |   |--12-(201903初始套装Vol2-16张主卡)-predicate_key_list.txt
   |   |   |   |--7-(201904空岛+PH岛)-predicate_key_list.txt
   |   |   |   |--13-(201811可可亚西村+大监狱)-entities_avpair.json
   |   |   |   |--10-(201901鱼人岛居民+巴洛克社).txt
   |   |--question2sparql.py
   |   |--word_tagging.py
   |--LICENSE.md
   |--cndbpedia
   |   |--filter_moelgirl_cndbpedia_entities_mapping_file.py
   |   |--get_onepiece_cndbpedia_avpair.py
   |   |--avpair2ntriples_onepiece_cndbpedia.py
   |   |--get_onepiece_cndbpedia_entities.py
   |   |--parse_raw_moegirl_onepiece_entries.py
   |   |--data
   |   |   |--moelgirl_cndbpedia_entities_mapping.json
   |   |   |--processed_moegirl_onepiece_entries.txt
   |   |   |--query_avpair_entities_list.txt
   |   |   |--raw_moegirl_onepiece_entries.txt
   |   |   |--cndbpedia_onepiece_entities_list.txt
   |   |   |--query_avpair_cndbpedia_onepiece_results.json
   |   |   |--query_avpair_entities_mapping.json
   |   |   |--query_avpair_keys_list_file.txt
   |   |   |--moelgirl_cndbpedia_api_no_results_mention_name_list.txt
   |   |   |--filter_out_entities_mapping.json
   |   |   |--ntriples_cndbpedia_onepiece.nt
   |--index.html
   |--deepke-master
   |   |--metrics.py
   |   |--vocab.py
   |   |--LICENSE
   |   |--requirements.txt
   |   |--test
   |   |   |--test_cnn.py
   |   |   |--test_serializer.py
   |   |   |--test_embedding.py
   |   |   |--test_attention.py
   |   |   |--test_rnn.py
   |   |   |--test_vocab.py
   |   |   |--test_transformer.py
   |   |--preprocess.py
   |   |--images
   |   |   |--APCNN.jpg
   |   |   |--Capsule.png
   |   |   |--Bert.png
   |   |   |--CNN.png
   |   |   |--GCN.png
   |   |   |--Transformer2.png
   |   |   |--LSTM.jpg
   |   |   |--Transformer1.png
   |   |   |--PCNN.jpg
   |   |--module
   |   |   |--Attention.py
   |   |   |--GCN.py
   |   |   |--Capsule.py
   |   |   |--Embedding.py
   |   |   |--__init__.py
   |   |   |--__pycache__
   |   |   |   |--Embedding.cpython-37.pyc
   |   |   |   |--GCN.cpython-37.pyc
   |   |   |   |--Transformer.cpython-37.pyc
   |   |   |   |--Capsule.cpython-37.pyc
   |   |   |   |--RNN.cpython-37.pyc
   |   |   |   |--Attention.cpython-37.pyc
   |   |   |   |--CNN.cpython-37.pyc
   |   |   |   |--__init__.cpython-37.pyc
   |   |   |--CNN.py
   |   |   |--Transformer.py
   |   |   |--RNN.py
   |   |--predict.py
   |   |--utils
   |   |   |--discoveralign_related_entity.py
   |   |   |--convert_vivrecard2deepke.py
   |   |   |--__init__.py
   |   |   |--__pycache__
   |   |   |   |--nnUtils.cpython-37.pyc
   |   |   |   |--ioUtils.cpython-37.pyc
   |   |   |   |--__init__.cpython-37.pyc
   |   |   |--get_vivrecard_rawdata.py
   |   |   |--nnUtils.py
   |   |   |--convert_baiduke2deepke.py
   |   |   |--check_data.py
   |   |   |--ioUtils.py
   |   |--models
   |   |   |--PCNN.py
   |   |   |--GCN.py
   |   |   |--Capsule.py
   |   |   |--BasicModule.py
   |   |   |--__init__.py
   |   |   |--__pycache__
   |   |   |   |--GCN.cpython-37.pyc
   |   |   |   |--Transformer.cpython-37.pyc
   |   |   |   |--LM.cpython-37.pyc
   |   |   |   |--BiLSTM.cpython-37.pyc
   |   |   |   |--Capsule.cpython-37.pyc
   |   |   |   |--PCNN.cpython-37.pyc
   |   |   |   |--BasicModule.cpython-37.pyc
   |   |   |   |--__init__.cpython-37.pyc
   |   |   |--Transformer.py
   |   |   |--LM.py
   |   |   |--BiLSTM.py
   |   |--__pycache__
   |   |   |--preprocess.cpython-37.pyc
   |   |   |--dataset.cpython-37.pyc
   |   |   |--serializer.cpython-37.pyc
   |   |   |--metrics.cpython-37.pyc
   |   |   |--vocab.cpython-37.pyc
   |   |   |--trainer.cpython-37.pyc
   |   |--README.md
   |   |--dataset.py
   |   |--.gitignore
   |   |--.github
   |   |   |--CODE_OF_CONDUCT.md
   |   |   |--CONTRIBUTING.md
   |   |   |--ISSUE_TEMPLATE
   |   |   |   |--feature_request.md
   |   |   |   |--bug_report.md
   |   |   |   |--question_consult.md
   |   |--serializer.py
   |   |--trainer.py
   |   |--main.py
   |   |--tutorial-notebooks
   |   |   |--GCN.ipynb
   |   |   |--PCNN.ipynb
   |   |   |--img
   |   |   |   |--Bert.png
   |   |   |   |--GCN.png
   |   |   |   |--PCNN.jpg
   |   |   |--LM.ipynb
   |   |   |--data
   |   |   |   |--valid.csv
   |   |   |   |--test.csv
   |   |   |   |--train.csv
   |   |   |   |--relation.csv
   |   |--data
   |   |   |--vivrecard
   |   |   |   |--alignment
   |   |   |   |   |--json_entity_mapping.json
   |   |   |   |   |--alignment_vizdata_vivrecard_relation.json
   |   |   |   |   |--raw_entity_mapping.txt
   |   |   |   |--origin
   |   |   |   |   |--valid.csv
   |   |   |   |   |--test.csv
   |   |   |   |   |--train.csv
   |   |   |   |   |--relation.csv
   |   |   |   |--annot
   |   |   |   |   |--fuseki_vivrecard_sentence_item.txt
   |   |   |   |   |--outputs
   |   |   |   |   |   |--fuseki_vivrecard_sentence_item.ann
   |   |   |   |   |   |--fuseki_vivrecard_sentence_item.json
   |   |   |   |   |   |--formatted_fuseki_vivrecard_sentence_item.json
   |   |   |   |--raw
   |   |   |   |   |--fuseki_vivrecard_sentence_dict.json
   |   |   |   |   |--fuseki_vivrecard_sentence_item.txt
   |   |   |   |   |--fuseki_vivrecard.csv
   |   |   |   |--summary
   |   |   |   |   |--annot_relation_sent.txt
   |   |   |   |   |--entities_type_name_dict.json
   |   |   |   |   |--unannot_relation_sent.txt
   |   |   |   |   |--all_sent.txt
   |   |   |   |   |--vizdata_vivrecard_relation.json
   |   |   |   |   |--vivrecard_ntriples.nt
   |   |   |   |   |--unannot_entity_sent.txt
   |   |   |   |   |--relation.csv
   |   |   |   |   |--annot_entity_sent.txt
   |   |--conf
   |   |   |--embedding.yaml
   |   |   |--config.yaml
   |   |   |--train.yaml
   |   |   |--model
   |   |   |   |--lm.yaml
   |   |   |   |--cnn.yaml
   |   |   |   |--transformer.yaml
   |   |   |   |--capsule.yaml
   |   |   |   |--rnn.yaml
   |   |   |   |--gcn.yaml
   |   |   |--preprocess.yaml
   |   |   |--hydra
   |   |   |   |--output
   |   |   |   |   |--custom.yaml
   |   |--pretrained
   |   |   |--readme.md
   |--docs
   |   |--CHANGELOG.md
   |   |--images
   |   |   |--graph (3).png
   |   |   |--graph (2).png
   |   |   |--soogif-5m.gif
   |   |   |--graph.png
   |   |   |--graph (1).png
   |   |   |--relation-freq.png
   |   |   |--viz1.jpg
   |   |   |--viz3.jpg
   |   |   |--viz2.jpg
   |   |--report.md
   |   |--report.pdf
   |--README.md
   |--talkop
   |   |--parse_vivire_card_webpage.py
   |   |--preprocess_vivre_card
   |   |   |--preprocess_8_vivre_card.py
   |   |   |--preprocess_EX_CHARACTERS.py
   |   |   |--preprocess_12_vivre_card.py
   |   |   |--preprocess_5_vivre_card.py
   |   |   |--preprocess_11_vivre_card.py
   |   |   |--preprocess_6_vivre_card.py
   |   |   |--preprocess_14_vivre_card.py
   |   |   |--preprocess_7_vivre_card.py
   |   |   |--preprocess_10_vivre_card.py
   |   |   |--preprocess_4_vivre_card.py
   |   |   |--preprocess_13_vivre_card.py
   |   |   |--preprocess_9_vivre_card.py
   |   |--parse_vivire_card_catalog.py
   |   |--summary_talkop_vivre_card.py
   |   |--parse_processed_manual_talkop_vivre_card.py
   |   |--data
   |   |   |--original_manual_talkop_vivre_card
   |   |   |   |--13-(201811可可亚西村+大监狱).txt
   |   |   |   |--14-(201809初始套装+1张追加卡).txt
   |   |   |   |--9-(201812阿拉巴斯坦+白胡子海贼团).txt
   |   |   |   |--8-(201905水之都+德岛).txt
   |   |   |   |--5-(201902空岛住民+新鱼人海贼团).txt
   |   |   |   |--12-(201903初始套装Vol2-16张主卡).txt
   |   |   |   |--4-(201908杰尔马66+大妈团).txt
   |   |   |   |--7-(201904空岛+PH岛).txt
   |   |   |   |--3-(201810东海的猛者们+超新星集结).txt
   |   |   |   |--6-(201907恐怖船+象岛).txt
   |   |   |   |--11-(201906水之都CP9+德岛竞技场).txt
   |   |   |   |--10-(201901鱼人岛居民+巴洛克社).txt
   |   |   |--talkop_vivre_card_webpage
   |   |   |   |--talkop_vivire_card_catalog.json
   |   |   |   |--8-【TalkOP汉化】海贼王资料集生命卡ONEPIECE图鉴全图翻译(201905水之都+德岛)
   |   |   |   |   |--8-【TalkOP汉化】海贼王资料集生命卡ONEPIECE图鉴全图翻译(201905水之都+德岛).html
   |   |   |   |--13-【TalkOP汉化】海贼王资料集生命卡ONEPIECE图鉴全图翻译(201811可可亚西村+大监狱)
   |   |   |   |   |--13-【TalkOP汉化】海贼王资料集生命卡ONEPIECE图鉴全图翻译(201811可可亚西村+大监狱).html
   |   |   |   |--9-【TalkOP汉化】海贼王资料集生命卡ONEPIECE图鉴全图翻译(201812阿拉巴斯坦+白胡子海贼团)
   |   |   |   |   |--9-【TalkOP汉化】海贼王资料集生命卡ONEPIECE图鉴全图翻译(201812阿拉巴斯坦+白胡子海贼团).html
   |   |   |   |--3-【TalkOP汉化】海贼王资料集生命卡ONEPIECE图鉴全图翻译(201810东海的猛者们+超新星集结)
   |   |   |   |   |--3-【TalkOP汉化】海贼王资料集生命卡ONEPIECE图鉴全图翻译(201810东海的猛者们+超新星集结).txt
   |   |   |   |   |--3-【TalkOP汉化】海贼王资料集生命卡ONEPIECE图鉴全图翻译(201810东海的猛者们+超新星集结).html
   |   |   |   |--4-【TalkOP汉化】海贼王资料集生命卡ONEPIECE图鉴全图翻译(201908杰尔马66+大妈团)
   |   |   |   |   |--4-【TalkOP汉化】海贼王资料集生命卡ONEPIECE图鉴全图翻译(201908杰尔马66+大妈团).html
   |   |   |   |   |--4-【TalkOP汉化】海贼王资料集生命卡ONEPIECE图鉴全图翻译(201908杰尔马66+大妈团)_files
   |   |   |   |   |   |--zhounianqing1.gif
   |   |   |   |   |   |--avatar(5).php
   |   |   |   |   |   |--2018new_zuohengheng_thumb.png
   |   |   |   |   |   |--userinfo.gif
   |   |   |   |   |   |--sleepy.gif
   |   |   |   |   |   |--thread-prev.png
   |   |   |   |   |   |--2018new_taikaixin_org.png
   |   |   |   |   |   |--avatar(14).php
   |   |   |   |   |   |--share.js.下载
   |   |   |   |   |   |--upload.js.下载
   |   |   |   |   |   |--2018new_kuxiao_thumb.png
   |   |   |   |   |   |--shutup.gif
   |   |   |   |   |   |--avatar(9).php
   |   |   |   |   |   |--avatar(22).php
   |   |   |   |   |   |--2018new_yinxian_org.png
   |   |   |   |   |   |--2018new_xixi_thumb.png
   |   |   |   |   |   |--kiss.gif
   |   |   |   |   |   |--namepost.small.gif
   |   |   |   |   |   |--TB1_3FrKVXXXXbdXXXXXXXXXXXX-129-128.png
   |   |   |   |   |   |--avatar(18).php
   |   |   |   |   |   |--nc.js.下载
   |   |   |   |   |   |--jquery-1.8.3.min.js.下载
   |   |   |   |   |   |--caomao.gif
   |   |   |   |   |   |--slide_share.css
   |   |   |   |   |   |--avatar(19).php
   |   |   |   |   |   |--handshake.gif
   |   |   |   |   |   |--avatar(23).php
   |   |   |   |   |   |--shangfangbaojian.gif
   |   |   |   |   |   |--avatar(8).php
   |   |   |   |   |   |--qq.gif
   |   |   |   |   |   |--thread-next.png
   |   |   |   |   |   |--2018new_xiaoku_thumb.png
   |   |   |   |   |   |--2018new_xiaoerbuyu_org.png
   |   |   |   |   |   |--f(2).txt
   |   |   |   |   |   |--avatar(15).php
   |   |   |   |   |   |--call.gif
   |   |   |   |   |   |--bump.small.gif
   |   |   |   |   |   |--avatar(4).php
   |   |   |   |   |   |--shocked.gif
   |   |   |   |   |   |--avatar(24).php
   |   |   |   |   |   |--none.gif
   |   |   |   |   |   |--putong.png
   |   |   |   |   |   |--pn_post.png
   |   |   |   |   |   |--faq.gif
   |   |   |   |   |   |--at.js.下载
   |   |   |   |   |   |--avatar(3).php
   |   |   |   |   |   |--hot_2.gif
   |   |   |   |   |   |--024515qk9jvria44ysd322.png
   |   |   |   |   |   |--wenshen.gif
   |   |   |   |   |   |--avatar(12).php
   |   |   |   |   |   |--meilihao.gif
   |   |   |   |   |   |--fav.gif
   |   |   |   |   |   |--2018new_leimu_org.png
   |   |   |   |   |   |--094711mzmnonn42d200dmv.gif
   |   |   |   |   |   |--rec_add.gif
   |   |   |   |   |   |--home.php
   |   |   |   |   |   |--avatar(13).php
   |   |   |   |   |   |--titter.gif
   |   |   |   |   |   |--smilies.js.下载
   |   |   |   |   |   |--star_level1.gif
   |   |   |   |   |   |--2018new_xinsui_thumb.png
   |   |   |   |   |   |--avatar(2).php
   |   |   |   |   |   |--smile.gif
   |   |   |   |   |   |--oshr.png
   |   |   |   |   |   |--2018new_ku_org.png
   |   |   |   |   |   |--star_level3.gif
   |   |   |   |   |   |--saved_resource(1).html
   |   |   |   |   |   |--oculus.css
   |   |   |   |   |   |--avatar(25).php
   |   |   |   |   |   |--style_14_widthauto.css
   |   |   |   |   |   |--star_level2.gif
   |   |   |   |   |   |--10.png
   |   |   |   |   |   |--biggrin.gif
   |   |   |   |   |   |--qq_big.gif
   |   |   |   |   |   |--2018new_doge02_org.png
   |   |   |   |   |   |--guanliyuan.png
   |   |   |   |   |   |--kx.png
   |   |   |   |   |   |--checkonline.small.gif
   |   |   |   |   |   |--print.png
   |   |   |   |   |   |--fengche.gif
   |   |   |   |   |   |--avatar.php
   |   |   |   |   |   |--2018new_shuai_thumb.png
   |   |   |   |   |   |--2018new_jiyan_org.png
   |   |   |   |   |   |--common_smilies_var.js.下载
   |   |   |   |   |   |--shaoshao.gif
   |   |   |   |   |   |--logo3.png
   |   |   |   |   |   |--hug.gif
   |   |   |   |   |   |--nc.css
   |   |   |   |   |   |--tongue.gif
   |   |   |   |   |   |--avatar(10).php
   |   |   |   |   |   |--funk.gif
   |   |   |   |   |   |--lol.gif
   |   |   |   |   |   |--xiong.gif
   |   |   |   |   |   |--2018new_zuoyi_org.png
   |   |   |   |   |   |--dizzy.gif
   |   |   |   |   |   |--forum_viewthread.js.下载
   |   |   |   |   |   |--zrt_lookup.html
   |   |   |   |   |   |--avatar(1).php
   |   |   |   |   |   |--hm.js.下载
   |   |   |   |   |   |--seditor.js.下载
   |   |   |   |   |   |--avatar(11).php
   |   |   |   |   |   |--2018new_ye_thumb.png
   |   |   |   |   |   |--kele.gif
   |   |   |   |   |   |--ajax.js.下载
   |   |   |   |   |   |--fj_btn.png
   |   |   |   |   |   |--style.css
   |   |   |   |   |   |--sad.gif
   |   |   |   |   |   |--cry.gif
   |   |   |   |   |   |--f.txt
   |   |   |   |   |   |--yinghua.gif
   |   |   |   |   |   |--sweat.gif
   |   |   |   |   |   |--loveliness.gif
   |   |   |   |   |   |--2018new_chongjing_org.png
   |   |   |   |   |   |--style_14_forum_viewthread.css
   |   |   |   |   |   |--2018new_wu_thumb.png
   |   |   |   |   |   |--victory.gif
   |   |   |   |   |   |--f(1).txt
   |   |   |   |   |   |--collection.png
   |   |   |   |   |   |--qq_share.png
   |   |   |   |   |   |--avatar(16).php
   |   |   |   |   |   |--2018new_guolai_thumb.png
   |   |   |   |   |   |--2018new_touxiao_org.png
   |   |   |   |   |   |--colorroger2018 .gif
   |   |   |   |   |   |--2018new_ruo_thumb.png
   |   |   |   |   |   |--pn_reply.png
   |   |   |   |   |   |--2018new_nu_thumb.png
   |   |   |   |   |   |--avatar(7).php
   |   |   |   |   |   |--curse.gif
   |   |   |   |   |   |--forum.js.下载
   |   |   |   |   |   |--shy.gif
   |   |   |   |   |   |--2018new_good_thumb.png
   |   |   |   |   |   |--2018new_ok_org.png
   |   |   |   |   |   |--huffy.gif
   |   |   |   |   |   |--avatar(20).php
   |   |   |   |   |   |--nu.png
   |   |   |   |   |   |--rec_subtract.gif
   |   |   |   |   |   |--mad.gif
   |   |   |   |   |   |--common.js.下载
   |   |   |   |   |   |--12344279923_utf8.js.下载
   |   |   |   |   |   |--2.jpg
   |   |   |   |   |   |--saved_resource.html
   |   |   |   |   |   |--avatar(21).php
   |   |   |   |   |   |--forumlink.gif
   |   |   |   |   |   |--font_992399_xgto9646zx.css
   |   |   |   |   |   |--oculus_nc.js.下载
   |   |   |   |   |   |--style_14_common.css
   |   |   |   |   |   |--ng.png
   |   |   |   |   |   |--3.jpg
   |   |   |   |   |   |--avatar(6).php
   |   |   |   |   |   |--colorroger2019.gif
   |   |   |   |   |   |--2018new_tianping_thumb.png
   |   |   |   |   |   |--avatar(17).php
   |   |   |   |   |   |--html5notification.js.下载
   |   |   |   |   |   |--ym.png
   |   |   |   |   |   |--arw_r.gif
   |   |   |   |--14-【TalkOP汉化】海贼王资料集生命卡ONEPIECE图鉴全图翻译(201809初始套装+1张追加卡)
   |   |   |   |   |--14-【TalkOP汉化】海贼王资料集生命卡ONEPIECE图鉴全图翻译(201809初始套装+1张追加卡).html
   |   |   |   |--5-【TalkOP汉化】海贼王资料集生命卡ONEPIECE图鉴全图翻译(201902空岛住民+新鱼人海贼团)
   |   |   |   |   |--5-【TalkOP汉化】海贼王资料集生命卡ONEPIECE图鉴全图翻译(201902空岛住民+新鱼人海贼团).html
   |   |   |   |--6-【TalkOP汉化】海贼王资料集生命卡ONEPIECE图鉴全图翻译(201907恐怖船+象岛)
   |   |   |   |   |--6-【TalkOP汉化】海贼王资料集生命卡ONEPIECE图鉴全图翻译(201907恐怖船+象岛).html
   |   |   |   |--talkop_vivire_card_catalog.html
   |   |   |   |--2-【TalkOP汉化】海贼王资料集生命卡ONEPIECE图鉴完全版汉化(201810东海+超新星篇)
   |   |   |   |   |--2-【TalkOP汉化】海贼王资料集生命卡ONEPIECE图鉴完全版汉化(201810东海+超新星篇).html
   |   |   |   |--7-【TalkOP汉化】海贼王资料集生命卡ONEPIECE图鉴全图翻译(201904空岛+PH岛)
   |   |   |   |   |--7-【TalkOP汉化】海贼王资料集生命卡ONEPIECE图鉴全图翻译(201904空岛+PH岛).html
   |   |   |--processed_manual_talkop_vivre_card
   |   |   |   |--14-(201809初始套装+1张追加卡)-entities_avpair.json
   |   |   |   |--11-(201906水之都CP9+德岛竞技场)-predicate_key_list.txt
   |   |   |   |--12-(201903初始套装Vol2-16张主卡)-entities_avpair.json
   |   |   |   |--preprocessed-5-(201902空岛住民+新鱼人海贼团).txt
   |   |   |   |--13-(201811可可亚西村+大监狱)-entities_id_name_list.txt
   |   |   |   |--preprocessed-8-(201905水之都+德岛).txt
   |   |   |   |--file_prefix.json
   |   |   |   |--preprocessed-4-(201908杰尔马66+大妈团).txt
   |   |   |   |--skill.txt
   |   |   |   |--7-(201904空岛+PH岛)-entities_id_name_list.txt
   |   |   |   |--6-(201907恐怖船+象岛)-entities_avpair.json
   |   |   |   |--8-(201905水之都+德岛)-predicate_key_list.txt
   |   |   |   |--13-(201811可可亚西村+大监狱).txt
   |   |   |   |--13-(201811可可亚西村+大监狱)-predicate_key_list.txt
   |   |   |   |--9-(201812阿拉巴斯坦+白胡子海贼团)-entities_avpair.json
   |   |   |   |--preprocessed-7-(201904空岛+PH岛).txt
   |   |   |   |--9-(201812阿拉巴斯坦+白胡子海贼团)-entities_id_name_list.txt
   |   |   |   |--9-(201812阿拉巴斯坦+白胡子海贼团)-predicate_key_list.txt
   |   |   |   |--preprocessed-10-(201901鱼人岛居民+巴洛克社).txt
   |   |   |   |--preprocessed-6-(201907恐怖船+象岛).txt
   |   |   |   |--6-(201907恐怖船+象岛)-entities_id_name_list.txt
   |   |   |   |--preprocessed-12-(201903初始套装Vol2-16张主卡).txt
   |   |   |   |--summary_predicate_set.txt
   |   |   |   |--4-(201908杰尔马66+大妈团)-entities_id_name_list.txt
   |   |   |   |--14-(201809初始套装+1张追加卡)-entities_id_name_list.txt
   |   |   |   |--14-(201809初始套装+1张追加卡).txt
   |   |   |   |--ntriples_talkop_vivre_card.nt
   |   |   |   |--9-(201812阿拉巴斯坦+白胡子海贼团).txt
   |   |   |   |--3-(201810东海的猛者们+超新星集结)-predicate_key_list.txt
   |   |   |   |--8-(201905水之都+德岛).txt
   |   |   |   |--5-(201902空岛住民+新鱼人海贼团).txt
   |   |   |   |--6-(201907恐怖船+象岛)-predicate_key_list.txt
   |   |   |   |--11-(201906水之都CP9+德岛竞技场)-entities_id_name_list.txt
   |   |   |   |--10-(201901鱼人岛居民+巴洛克社)-entities_id_name_list.txt
   |   |   |   |--12-(201903初始套装Vol2-16张主卡).txt
   |   |   |   |--4-(201908杰尔马66+大妈团)-entities_avpair.json
   |   |   |   |--8-(201905水之都+德岛)-entities_id_name_list.txt
   |   |   |   |--7-(201904空岛+PH岛)-entities_avpair.json
   |   |   |   |--5-(201902空岛住民+新鱼人海贼团)-predicate_key_list.txt
   |   |   |   |--8-(201905水之都+德岛)-entities_avpair.json
   |   |   |   |--summary_entities_id_name_list.txt
   |   |   |   |--3-(201810东海的猛者们+超新星集结)-entities_avpair.json
   |   |   |   |--5-(201902空岛住民+新鱼人海贼团)-entities_id_name_list.txt
   |   |   |   |--10-(201901鱼人岛居民+巴洛克社)-predicate_key_list.txt
   |   |   |   |--preprocessed-13-(201811可可亚西村+大监狱).txt
   |   |   |   |--4-(201908杰尔马66+大妈团)-predicate_key_list.txt
   |   |   |   |--preprocessed-9-(201812阿拉巴斯坦+白胡子海贼团).txt
   |   |   |   |--vizdata_vivrecard_avpair.json
   |   |   |   |--4-(201908杰尔马66+大妈团).txt
   |   |   |   |--3-(201810东海的猛者们+超新星集结)-entities_id_name_list.txt
   |   |   |   |--11-(201906水之都CP9+德岛竞技场)-entities_avpair.json
   |   |   |   |--7-(201904空岛+PH岛).txt
   |   |   |   |--preprocessed-11-(201906水之都CP9+德岛竞技场).txt
   |   |   |   |--3-(201810东海的猛者们+超新星集结).txt
   |   |   |   |--preprocessed-14-(201809初始套装+1张追加卡).txt
   |   |   |   |--5-(201902空岛住民+新鱼人海贼团)-entities_avpair.json
   |   |   |   |--10-(201901鱼人岛居民+巴洛克社)-entities_avpair.json
   |   |   |   |--12-(201903初始套装Vol2-16张主卡)-entities_id_name_list.txt
   |   |   |   |--14-(201809初始套装+1张追加卡)-predicate_key_list.txt
   |   |   |   |--6-(201907恐怖船+象岛).txt
   |   |   |   |--11-(201906水之都CP9+德岛竞技场).txt
   |   |   |   |--12-(201903初始套装Vol2-16张主卡)-predicate_key_list.txt
   |   |   |   |--7-(201904空岛+PH岛)-predicate_key_list.txt
   |   |   |   |--13-(201811可可亚西村+大监狱)-entities_avpair.json
   |   |   |   |--10-(201901鱼人岛居民+巴洛克社).txt
   |   |--avpair2ntriples_talkop_vivre_card.py
   |--.gitignore
   |--.gitattributes
Copy the code

Project introduction

Onepice-kg is a knowledge mapping project for one Piece domain data.

The contents of this project include data collection, knowledge storage, knowledge extraction, knowledge computing and knowledge application

  1. The data collection

    This project mainly collects and constructs two knowledge graphs and one relational extraction data set

    • Person knowledge map: mainly contains the information of each person
    • Relational extraction data sets: Annotate the entities that exist in natural language and the relationships between them
    • Entity relationship knowledge map: Construct a knowledge map of the relationships between entities in One Piece
  2. Knowledge is stored

    The triplet database Apace Jena and the native graph database Neo4j were tried to query on the knowledge graph using RDF structured query language SPARQL and attribute graph query language Cypher, respectively.

  3. knowledge

    We extracted the data set based on the relationship built between each other. We used the tools provided in Deepke to carry out the practice of relationship extraction, and tested the effect of the models including PCNN, GCN and BERT on the data set we built

  4. Knowledge of computing

    • Graph calculation: The graph mining of entity relationship knowledge graph was carried out on Neo4j, including shortest path query, authority node discovery, community discovery, etc
    • Knowledge reasoning: Carried out knowledge reasoning on the relational knowledge graph on Apache Jena and completed part of the data
  5. Knowledge application

    • Intelligent Question answering: Implement a knowledge base question answering system (KBQA) for one Piece characters based on REfO.
    • Visualized pictures: The entity relationship pictures were visualized through D3, and the information in the knowledge graph of people was integrated for display.

Directory tree, there are annotations, explain the role

| - the visualization # knowledge map visualization | | | -- - HTML vivirecard - KB_query # intelligent question answering | | - external_dict | | - data | | | - talkop_vivre_card | - cndbpedia # original semi-structured entry data | | - data | - deepke - master # Pytorch based deep learning Chinese relationship extraction processing suite | | - test | | | - images | - module | | |--__pycache__ | |--utils | | |--__pycache__ | |--models | | |--__pycache__ | |--__pycache__ | |--.github | | |--ISSUE_TEMPLATE | |--tutorial-notebooks | | |--img | | |--data | |--data | | |--vivrecard | | | |--alignment | | | |--origin | | | |--annot | | | | |--outputs | | | |--raw | | | |--summary | |--conf | | |--model | | |--hydra | | | | - output | | - pretrained | | - docs # document | | -- - images talkop # talkop BBS knowledge acquisition semi-structured | | - preprocess_vivre_card | | - data | | |--original_manual_talkop_vivre_card | | |--talkop_vivre_card_webpage | | |--processed_manual_talkop_vivre_cardCopy the code

1. Data collection

1.1. Sources of data collection

There are two main sources of data:

  • Obtain existing knowledge information from other knowledge graphs
  • Crawl and parse semi-structured natural language text information from relevant web pages

Cn-dbpedia is used to extract the human knowledge graph, which is developed and maintained by the Knowledge Workshop laboratory of Fudan University as the source of the general domain structured encyclopedia knowledge

1.2. Construction of human knowledge graph

1.2.1. Construct entity vocabulary (names/places, etc.)

Cndbpedia /data/raw_moegirl_onepiece_entries.txt

| group1 = straw hat pirates group | list1 = [[luffy niang | be p. d. luffy]], <! -- -->[[Roronoah Sauron]] • <! -- - > [[Naomi murdoch (one piece) | Naomi murdoch]], <! -- -->[[Usop]] • <! -- -->[[sanji]] • <! -- -->[[Tony Tony Jabbar]] • <! -- -->[[Nicole Robin]] • <! -- -->[[Fledge]] • <! -- - > [[brook]] | group2 = "seven Wu Hai | list2 = {{Navbox subgroup | groupstyle = background: # 00 FFFF; Current | | group1 = list1 = [[Samuel jolla can dracul mihawk]], <! -- -->[[Bartholomei · Bear]] • <! -- -->[[Boyar Hankuk]] • <! -- -->[[bucky]] • <! -- -->[[Edward Weibull]]...Copy the code

Parse command:

python cndbpedia/parse_raw_moegirl_onepiece_entries.py
Copy the code

The result is output to: cndbpedia/data/processed_moegirl_onepiece_entries. TXT with 509 entries

Baby-5 G1 branch G8 fortress Miss. Both hands point Miss. Valentine's Day Miss. Monday Miss. Friday Miss. Father's Day Miss. Mr.11 Mr.13 Mr.4 Mr.5 Mr.7 Mr.9 T Penn X Drake Capital of seven Waters Triangle Current World Government East Lee East Git...Copy the code
1.2.2. Get a list of entities (names/places, etc.

Using knowledge factory provide API, an chestnuts: shuyantech.com/api/cndbped…

{
"status": "ok"."ret": [
"Munch D. Luffy."."Munch D. Luffy."."Munch D. Luffy"."Munch D. Luffy."."Munch D. Luffy."]}Copy the code

Write a script that runs through the list of names from the build entity vocabulary:

python cndbpedia/get_onepiece_cndbpedia_entities.py
Copy the code

A total of 1014 different entity names are retrieved, and the results are stored in the CNDBpedia /data folder

  • cndbpedia_onepiece_entities_list.txt: Saves all identified names. After all, some names cannot be found, for example:
Edward Newgate (From The Manga series One Piece) and Aisha (From The Godslayer! You know, one of the godslayers in "One Piece."Copy the code
  • moelgirl_cndbpedia_entities_mapping.json: Save similar to:
"Edward Newgate ": [" Edward Newgate "," Edward Newgate "], "Aisha ": [" Aisha Chida ", "Aisha "," Aisha ", "Aisha "," Aisha "],Copy the code
1.2.3. Filter the list of entities (names/places, etc.)

They all have the same name, like:

{
"status": "ok"."ret": [
"Kelly Brook"."Brooke (Spanish film made in 2010)"."Bruck."."Brooke (Advertising strategist)"."Brooke."."Brook (Singing by Wen Liming)"."Brooke."."Brooke"]}Copy the code

So much Brooke, we only want the data of One Piece, the default contains onepiece, Navigation, sea piece, navigation, onepiece, onepiece, animation, cartoon is the target data we want

python cndbpedia/filter_moelgirl_cndbpedia_entities_mapping_file.py
Copy the code

Also saved in the CNdbpedia /data folder

The result: 162 out of 509 entries have no corresponding entity name. These entries are stored in moelgirl_cndbpedia_api_no_results_mention_name_list.txt; Such as: shuyantech.com/api/cndbped… There is no corresponding entity name

{
"status": "ok"."ret": []}Copy the code
  • There are 11 entries that have entity names, but don’t fit our presetfilter_out_entities_mapping.json

Such as shuyantech.com/api/cndbped…

{
"status": "ok"."ret": [
"Maria Callas"."Callas."."Callas" (the fictional character in StarCraft II)."Kallas (Czech football player)"]}Copy the code
  • For the rest of the 336 entries, there are correspondences, 357 in all, because one to many

List: query_avpair_entities_list. TXT

Baby-5 Miss. Valentine's Day Mr. 1 Mr. 11 Mr. 13......Copy the code

Dictionary: query_avpair_entities_mapping. Json

{
    "Baby-5": [
        "baby-5"]."Miss. Valentine's Day": [
        "Miss. Valentine's Day"]."Mr.11": [
        "mr.11"]."Mr.13": [
        "mr.13"],...Copy the code
1.2.4. Obtain the triplet knowledge of the corresponding entities in the graph

Also use the API provided by the knowledge factory to obtain the triplet knowledge of the corresponding entities in the map based on the previous query_avpair_Entities_list.txt instance list

python cndbpedia/get_onepiece_cndbpedia_avpair.py
Copy the code

Cndbpedia /data/query_avpair_cndbpedia_onepiece_results.json, for example:

..."Munch D. Luffy": {
        "Munch D. Luffy.": {
            "Chinese name": "Munch D. Luffy"."Name in Foreign Language": "Monkey D. Luffy"."Other names": "The Fifth Emperor of the Sea."."The voice": "Yang Tianxiang (Mainland China)"."Entry works": "ROMANCE DAWN comics."."Birthday": "May 5th (Children's Day in Japan)"."Age": "17 to 19."."Gender": "Male"."Type": "F"."Height": "172 cm to 174 cm"."Weight": "45kg".Devil's Fruit: "Rubber fruit"."Home": "East Sea - Kingdom of Goya - Windmill Village."."Representative district": "Okinawa prefecture"."Represents animals.": "Monkey"."Represents the color.": "Red"."Represents the number": "01, 56 (Japanese 56 is the same as rubber), 09"."On behalf of the nation.": "Brazil"."For flowers.": Cosmos.."Reward money": "1.5 billion berry."."Favorite island.": "Islands with meat."."The season of love": "Summer"."Favorite foods": "Of all the delicacies, meat is the first."."Lousy food.": "Cherry Pie on The island of Gaya"."Body smell": "The smell of roast meat."."CATEGORY_ZH": "Character"."DESC": "Monky D. Luffy" Luffy "Straw Hat" luffy, the main character in the Japanese comic book "Aquaman" and its derivatives, captain of the Straw Hat Crew and the Straw Hat Ship Crew, one of the most evil generations. The rubber man of the rubber fruit power man, a reward of 1.5 billion berry. The dream is to find the legendary One Piece and become the King of One Piece. \n Luffy positive and optimistic personality, love and hate clearly, and attaches great importance to partners, unwilling to be inferior to others, any dangerous things are super interested in. Unlike other traditional pirates, he does not kill in pursuit of wealth but enjoys the adventure and freedom of being a pirate.}},...Copy the code

Cndbpedia /data/query_avpair_keys_list_file. TXT: list of all attribute names

1.2.5. Extract semi-structured knowledge from web pages

Vivre Card is an official character handbook of One Piece, which contains rich character information. Domestic fans translated it into Chinese and posted it on talkop BBS.

Extraction process

Because the format is relatively fixed, direct pattern matching (regular matching). Manual deletion of irrelevant text, collection, regular matching, and manual verification (circular steps, adjusting templates for predicate) :

cd talkop
python parse_processed_manual_talkop_vivre_card.py
Copy the code

Stored in the talkop/data/processed_manual_talkop_vivre_card folder, each page corresponds to three output files

  • Xxx-predicate_key_list. TXT: indicates all the predicateases obtained by parsing

  • Xxx-entities_id_name_list. TXT: indicates all the ids and entity names that are parsed

  • Xxx-entities_avpair. Json: Extracts the attribute knowledge of all entities and saves it in JSON format

Summary results

In the above part, we extracted the attribute information of individual entities in each web page, and now we will further summarize these information

cd talkop
python summary_talkop_vivre_card.py
Copy the code

You can see that there are 660 different entities, 164 different predicates

Stored in the talkop/data/processed_manual_talkop_vivre_card folder, there are two files:

  • summary_predicate_set.txt: The summary of all the predicates
  • summary_entities_id_name_list.txt: The summary of all extracted entity names and corresponding ids

1.3. Relational extraction data set construction

  • Annotated data sources: In the previously constructed character knowledge map, an important attribute is historical information, which records the timeline of each character in the story and the corresponding story. Each person’s history records information about their interactions with other entities, and we can use it to construct relational extraction data sets in our verticals

  • Annotation tool: Sprite Annotation Assistant

  • Construction method: Build from the bottom up, and build the schema of the whole graph step by step during the construction process

  • ** Data annotation format: ** Sprite annotation assistant provides the exported JSON format, the specific form is as follows, where T and E respectively represent the annotated entity information and relationship information

    { "content": "xxxx" "labeled": true, "outputs": { "annotation": { "A": [""], "E": [""], "R": ["",{ "arg1": Arg1, arg2 ":" arg2 ", "from" : 1, "name" : "to", "to" : 2}], "T" : [", "{" attributes" : [], "the end" : 7, "id" : 1, "name" : "People", "start" : 0, "type" : "T", "value" : "his d. luffy"},]}}, "path" : "D:\\annot\\fuseki_vivrecard_sentence_item.txt", "time_labeled": 1578072175246 }Copy the code
  • Data storage location: By labeling the raw data are stored in deepke – master/data/vivrecard rawfuseki_vivrecard_sentence_item. TXT original labeling results are saved in Deepke – master/data/vivrecard/annot/outputs/fuseki_vivrecard_sentence_item json.

    To facilitate subsequent relationship extraction model processing, we converted the annotated data into data conforming to the Deepke project format

    And stored in deepke – master/data/vivrecard/origin

Data set statistics
  • Entity type: a total of seven kinds of entity: ‘events’,’ organization ‘, ‘ships’,’ site ‘, ‘position’, ‘the devil fruit’, ‘people’

  • Relationship types: 22 relationships in total

    head_type tail_type relation index freq
    None None None 0 0
    people The event To participate in 1 36
    people people alliance 2 1
    people people Husband and wife 3 3
    people people fighting 4 38
    people people mother 5 3
    people people father 6 4
    people people The teacher 7 6
    people people meet 8 100
    people place Place of birth 9 3
    people place Have been to 10 145
    people The devil fruit With the fruit 11 10
    people organization create 12 23
    people organization join 13 66
    people organization Belong to 14 38
    people organization fighting 15 20
    people organization leave 16 18
    people organization meet 17 14
    people organization leadership 18 15
    people position As a 19 70
    people ships To build 20 2
    organization organization fighting 21 1

    The frequency bar chart of these relationships is shown in the figure below, and you can see that these relationships show a distinct long-tailed distribution

  • Number of positive training samples: 616

1.4. Construction of entity relationship knowledge graph

In the annotation process of relation extraction data set, the annotated entities and relationships are separately exported to build one Piece entity relationship data set

In the above process, 307 different entities and 569 relationships between different nodes were labeled

cd deepke-master
python utils/convert_vivrecard2deepke.py
Copy the code

Output of entity relationship data stored in the deepke – master/data/vivrecard/summary/vizdata_vivrecard_relation json, can be used for subsequent knowledge map visualization, detailed in the knowledge map visualization

2. Knowledge storage

2.1. RDF based triplet database: Apache Jena

2.1.1. Jena profile

Jena is an Apache top-level project, formerly known as the Jena Toolkit developed by HP LABS. Jena is a major open source framework and RDF triples library in the semantic Web field. It follows W3C standards well, and its functions include: RDF data management, RDFS and OWL ontology management, SPARQL query processing, etc. Jena has a native storage engine for disk – or memory-based storage management of RDF triples. At the same time, there is a rule-based inference engine to perform RDFS and OWL ontology inference tasks.

2.1.2. Project practice

avpair to triple

Taking the Vivrecard character attribute knowledge graph as an example, we first convert the previously obtained data into jena-parsed N-triple Triple format with namespace prefix

cd talkop
python avpair2ntriples_talkop_vivre_card.py
Copy the code

The exported data in n-triple format is stored in talkop/data/processed_manual_talkop_vivre_card/ntriples_talkop_vivre_card.nt, a total of 14055, There are 12,863 non-empty triples

NOTE:

  1. In the process of building a project, we will also get knowledge from CN – DBpedia converted to N – Triple format, the namespace prefix for < http://kg.course/onepiece/ >

    python cndbpedia/avpair2ntriples_onepiece_cndbpedia.py
    Copy the code

    The results are saved on cndbpedia/data/ntriples_cndbpedia_onepiece. Nt with a total of 4,691 triple

Start the Fuseki

According to the documents provided by Mr. Chen Huajun: github.com/zjunlp/kg-c…

Configure Fuseki and upload the data set for query

2.1.3. SPARQL Query Example

SPARQL is a W3C standard query language for RDF knowledge graphs. SPARQL borrows syntax from SQL. The basic unit of SPARQL query is triple pattern, and multiple triple patterns can form the Basic graph pattern. SPARQL supports several operators that extend the basic graph pattern to the Complex graph pattern. SPARQL version 1.1 introduced a Property path mechanism to support navigational queries over RDF graphs.

The following is an example of using SPARQL to query on a database we built

  1. Example Query the heights of the first five roles
PREFIX : <http://kg.course/talkop-vivre-card/> select ? s ? name ? zhname ? height ? o where { ? s ? height ? o . FILTER(? OPTIONAL {? OPTIONAL {? S: Name? name. ? S: Foreign name? zhname.} } limit 5Copy the code

The results of

"S", "name", "height", "o", ":0001", "[monky D Luffy]", "Monkey D Luffy", ": height", "174cm", ":0004", "Usopp", "Usopp", ": height ", "174cm", ":0511", "Qiao Ali Bonney", "Jewelry Bonney", ": Height"," "174cm", ":0002", "[Roronoa Zoro]", "Roronoa Zoro", ": height ", "181cm", ":0224", "Hina", ": height ", "181cm",Copy the code
  1. Screening birthday range
PREFIX : <http://kg.course/talkop-vivre-card/> select ? s ? name ? o where { ? S: Birthday? o . ? S: Name? name . filter(? O > 'April 1st' &&? O < 'May 1 ')} limit 5Copy the code

The results of

"S", "name", "o", ":" 0009, "[brooke/Brook]", "on April 3," : "0660", "[burr jamie/Porchemy]", "on April 3," : "0010", "" very flat/Jinbe 】 【, On April 2, ":", "0076", "[ZheFu/Zeff]", "on April 2," : "0028", "[g than/Koby]", "on May 13,"Copy the code

2.2. Based on native graph database: Neo4j

2.2.1. Secondary profile

Neo4j is a graph database developed by Neo Technologies. It can be said that Neo4j is the most popular graph database product. Neo4j is based on the attribute graph model, and its storage management layer designs special storage schemes for elements such as nodes, node attributes, edges, and edge attributes of the attribute graph. This makes Neo4j more efficient than relational databases in accessing graph data at the storage layer.

2.2.2. Project practice

relation to triple

Taking the entity relationship knowledge graph as an example, we first convert the previously obtained data about the relationships between entities into jena-parsed n-triple format with namespace prefix

cd deepke-master
python utils/convert_vivrecard2deepke.py
Copy the code

Export of N – Triple format of the data stored in the deepke – master/data/vivrecard/summary/vivrecard_ntriples nt, a total of 1848

Enable secondary

Neo4j can be downloaded and installed at neo4j.com/download-th…

cd D:\neo4j\bin
neo4j.bat console
Copy the code

Then visit: http://localhost:7474/

The default user name and password are neo4j

2.2.3. Cypher Query Example

Cypher was originally a query language for attribute graph data implemented in Neo4j, a graph database. It is a declarative language where users only need to declare what to look up, not how to look up relationships.

The following is an example of using Cypher to query on a database we built

  1. The import

    CREATE INDEX ON :Resource(uri)
                              
    CALL semantics.importRDF("file:///${PROJECT_PATH}/deepke-master/data/vivrecard/summary/vivrecard_ntriples.nt","N-Triples")
    Copy the code
  2. Look at the schema

    call db.schema()
    Copy the code

    By blocking the resource, you can see the schema clearly

  3. Check the top 100 people

    MATCH (n:ns0__人) RETURN n LIMIT 100
    Copy the code

  4. Query nodes that belong to people, and the URI contains Vivian’s nodes

    MATCH (n:ns0__人) WHERE n.ri CONTAINS 'viv' RETURN n.riCopy the code
    n.uri
    “Kg. Course/talkop – vivr…”
    “Kg. Course/talkop – vivr…”
    “Kg. Course/talkop – vivr…”
  5. Filter shortest paths between names by URI

    MATCH p=shortestPath((n1)-[*]-(n2)) WHERE n1.uri CONTAINS 'Smog' and n2.uri CONTAINS 'Robin' RETURN pCopy the code

  6. Sift by name the four-jump route from Dresrosa to Justice Island

    # 9312 Dorothy (Miss. Merry Christmas) # 9306 Ben Bachman MATCH p = (((n1)-[*4]-(n2)) WHERE n1.uri CONTAINS 'n2.uri' and n2.uri CONTAINS 'Dressa' RETURN pCopy the code

    You can see that there are some loop cases, where the same node appears twice in the path

3. Knowledge extraction

DeepKE Deep learning Chinese relational extraction suite based on Pytorch. In this part, we use the previously constructed relational extraction data set and Deepke to carry out Chinese relational extraction practice.

3.1. Data conversion & Annotation statistics

In this section, we need to complete the following three parts:

  1. Convert our annotation results into the format that Deepke will receive
  2. To ensure an even distribution of relationships, the data were randomly scrambled
  3. The division of training set, test set and verification set has been completed. At present, the division is conducted according to 7:2:1

Use deepke-master/utils/convert_vivrecard2deepke.py for data format conversion

cd deepke-master
python utils/convert_vivrecard2deepke.py
Copy the code

The output

There are a total of 616 positive training samples, among which train, test and valid are 431/123/62 respectively

The output file is saved in the Origin and Summary folders in deepke-master/data/vivrecard/

├── annot │ ├── output │ ├─ ch.pdf ├── train.csv │ ├── test. CSV │ ├── train.csv │ ├── train.csv │ ├── valy.csv │ ─ summary ├── all_sent. TXT # Annot_relation_sent. TXT # Annot_relation_sent. TXT # Annot_relation_sent. # Entities_type_name_dict Annotate all entity types in the data, ├── unannot_entity_sent. TXT # [unannot_entity_sent. TXT] [unannot_relation_sent # sentences that are not marked with relationshipsCopy the code

3.2. Training

In the training process, we tried to use the models of PCNN, RNN, GCN, Capsule, Transformer and Bert provided by Deepke. The epoch was set to 50 and the NUM_relations was changed to 19 according to the actual situation of our data set. It should be noted that the pre-training model should be downloaded from the relevant web pages before training the language model based on BERT

The new data set has 22 relationships (including None) that need to be changed with NUM_relations

cd deepke-master

python main.py show_plot=False data_path=data/vivrecard/origin out_path=data/vivrecard/out num_relations=22 epoch=50 model=cnn

python main.py show_plot=False data_path=data/vivrecard/origin out_path=data/vivrecard/out num_relations=22 epoch=50 model=rnn 

python main.py show_plot=False data_path=data/vivrecard/origin out_path=data/vivrecard/out num_relations=22 epoch=50 model=gcn

python main.py show_plot=False data_path=data/vivrecard/origin out_path=data/vivrecard/out num_relations=22 epoch=50 model=capsule

python main.py show_plot=False data_path=data/vivrecard/origin out_path=data/vivrecard/out num_relations=22 epoch=50 model=transformer

# lm bert layer=1
python main.py show_plot=False data_path=data/vivrecard/origin out_path=data/vivrecard/out num_relations=22 epoch=50 model=lm lm_file=~/ZJU_study/Knowledge_Graph/deepke/pretrained/ num_hidden_layers=1

# lm bert layer=2
python main.py show_plot=False data_path=data/vivrecard/origin out_path=data/vivrecard/out num_relations=22 epoch=50 model=lm lm_file=~/ZJU_study/Knowledge_Graph/deepke/pretrained/ gpu_id=0 num_hidden_layers=2


# lm bert layer=3
python main.py show_plot=False data_path=data/vivrecard/origin out_path=data/vivrecard/out num_relations=22 epoch=50 model=lm lm_file=/home/zenghao/ZJU_study/Knowledge_Graph/deepke/pretrained/ gpu_id=1 num_hidden_layers=3
Copy the code

3.3. Training results

PCNN RNN GCN CAPSULE TRANSFORMER LM(BERT) LAYER=1 LM(BERT) LAYER=2 LM(BERT) LAYER=3
VALID 80.11 83.87 55.91 75.27 82.26 89.79 90.86 89.78
TEST 86.18 85.64 63.15 82.66 86.18 91.87 91.33 92.14

You can go to the Bert-based language model that works best, obviously because of the other models. GCN was the worst. This also shows that the pre-trained language model can still extract better features on small-scale data.

However, in the later prediction results of actual data, we find that the generalization effect of the language model is not as good as that of the PCNN model

We suspect that because of the long-tail distribution problem in our data, the model may tend to cheat by predicting certain relationships to achieve the effect of improved accuracy

4. Knowledge calculation

4.1. Figure is calculated

A very important feature of knowledge graph is its graph structure. The structure of different entities itself contains a lot of implicit information, which can be further mined and used.

In this section, we refer to the [practices] of others in similar fields (www.macalester.edu/~abeverid/t… ↩︎) (reference) (www.maa.org/sites/defau… ↩︎)), use the graph algorithm provided by Neo4j to carry out certain calculation and analysis of the entity relationship knowledge graph built by us, including the calculation of shortest path, key nodes, node centrality, community discovery, etc.

4.1.1. Character network analysis

Number of figures

Start simple. Take a look at how many people are on the picture above:

MATCH (c:`ns0__人`) RETURN count(c)
Copy the code
count(c)
134

Summary statistics

Count the number of other characters touched by each character:

MATCH (c: 'ns0__')-[]->(: 'ns0__') WITH c, count(*) AS num RETURN min(num) AS min, Max (num) AS Max, avg(num) AS avg_characters, stdev(num) AS stdevCopy the code
min max avg_characters stdev
1 6 1.8374999999999997 1.1522542572790615

The diameter of the graph (network)

The diameter of the network or the bottom line or the longest and shortest path:

// Find maximum diameter of network
// maximum shortest path between two nodes
MATCH (a:`ns0__人`), (b:`ns0__人`) WHERE id(a) > id(b)
MATCH p=shortestPath((a)-[*]-(b))
RETURN length(p) AS len, extract(x IN nodes(p) | split(x.uri, 'http://kg.course/talkop-vivre-card/deepke')[-1]) AS path
ORDER BY len DESC LIMIT 4
Copy the code
len path
10 [“/people/carat bateer “, “/ post/housekeeper”, “/ people/he”, “captain/post/”,”/people/very flat “, “war/events/top”, “/ people/Tina”, “/ events/world conference”, “/ people/Dr. GuLei doll”, “/ people/Joe”, “/ people/Dr. Silver,”]
9 [“/people/Dr. Silver, “, “/ people/Joe”, “/ people/Dr. GuLei doll”, “/ events/world conference”, “/ people/ica lime”, “/ organization/straw hat group”, “/ people/koro kass”, “/ locations/grand line”, “/ people/Paul d. Roger”, “/ people/sich”]
9 [“/people/Dr. Silver, “, “/ people/Joe”, “/ people/Dr. GuLei doll”, “/ events/world conference”, “/ people/Tina,” “straw/organization/group”, “/ people/na beauty”, “/ organization/group of dragon”, “/ people/moo”, “cloth/people/card”]
9 [“/people/carat bateer “, “/ post/housekeeper”, “/ people/crowe”, “captain/post/”,”/people/east benefit “, “/ people/road”, “/ people/crick”, “/ locations/grand line”, “/ people/Paul d. Roger”, “/ people/sich”]

We can see that there are many paths of length 9 in the network.

The shortest path

Use Cypher’s shortestPath function to find the shortestPath between any two roles in the graph. Let’s find the shortest path between Krokedahl and ** Galdino (Mr.3) ** :

MATCH p=shortestPath((n1)-[*]-(n2)) WHERE n1.uri CONTAINS 'Krokedarr' and n2.uri CONTAINS 'Galpino' RETURN pCopy the code

You can also place some restrictions on the nodes in the path, such that the path cannot contain certain types of nodes

MATCH p=shortestPath((n1)-[*]-(n2)) WHERE n1.uri CONTAINS 'krondarr' and n2.uri CONTAINS 'Gartinew' and id(n2) > id(n1) and id(n2) > id(n1) and MATCH p=shortestPath((n1)-[*]-(n2)) WHERE n1.uri CONTAINS 'krondarr' and n2.uri CONTAINS 'gartinew' and id(n2) > id(n1) NONE(n IN nodes(p) WHERE n: 'ns0__ organization') RETURN pCopy the code

A path can contain only certain types of nodes

Example: Of all the 1 to 3 hop paths from Sauron to Johnny, only the path through the character node

MATCH p=(n1)-[*1..3]-(n2) WHERE n1.uri CONTAINS 'Johnny' and n2.uri CONTAINS 'Johnny' and all(x in nodes(p) WHERE 'ns0__ 'in LABELS(x)) RETURN pCopy the code

All shortest paths

There may be other shortest paths between Smooger and Pinsong, which we can find using Cypher’s allShortestPaths function:

MATCH (n1: ` ns0__ `), (n2: 'ns0__') WHERE n1.uri CONTAINS 'crockdall' and n2.uri CONTAINS 'Galtino' and id(n2) > id(n1) MATCH p=allShortestPaths((n1)-[*]-(n2)) RETURN pCopy the code
4.1.2. Key nodes

In a network, if a node is on the shortest path of all the other two nodes, it is called a critical node. Let’s identify all the key nodes in the network:

// Find all pivotal nodes in network
MATCH (a:`ns0__人`), (b:`ns0__人`) WHERE id(a) > id(b)
MATCH p=allShortestPaths((a)-[*]-(b)) WITH collect(p) AS paths, a, b
MATCH (c:`ns0__人`) WHERE all(x IN paths WHERE c IN nodes(x)) AND NOT c IN [a,b]
RETURN a.uri, b.uri, c.uri AS PivotalNode SKIP 490 LIMIT 10
Copy the code
a.uri b.uri PivotalNode
“Kg. Course/talkop – vivr…” “Kg. Course/talkop – vivr…” “Kg. Course/talkop – vivr…”
“Kg. Course/talkop – vivr…” “Kg. Course/talkop – vivr…” “Kg. Course/talkop – vivr…”
“Kg. Course/talkop – vivr…” “Kg. Course/talkop – vivr…” “Kg. Course/talkop – vivr…”
“Kg. Course/talkop – vivr…” “Kg. Course/talkop – vivr…” “Kg. Course/talkop – vivr…”

An interesting result can be seen from the table of results: Nami and Luffy were key nodes for Sakis and Noki. This means that all the shortest paths linking Sarches and Noggi would have to go through Nami and Luffy. We can verify this by visualizing all the shortest paths between Sakis and Noki Gau:

MATCH (n1: ` ns0__ `), (n2: 'ns0__') WHERE n1.uri CONTAINS 'n2.uri' and n2.uri CONTAINS 'n2.uri' and id(n1) <> id(n2) MATCH p=shortestPath((n1)-[*]-(n2)) RETURN pCopy the code
4.1.3. Node centrality

Node centrality gives a relative measure of the importance of nodes in a network. There are many different ways to measure centrality, each representing a different type of “importance.”

Degree Centrality

Degree centrality is the simplest measure, which is the number of connections of a node in the network. In the one Piece diagram, the degree centrality of a character is the number of other characters that the character touches. The authors used Cypher computing degree centrality:

MATCH (c: 'ns0__ ') -[]-() RETURN split(c: 'http://kg.course/talkop-vivre-card/deepke')[-1] AS character, count(*) AS degree ORDER BY degree DESCCopy the code
character degree
“/ man/Luffy” 33
“/ person/Tina” 20
“/ person/Nami” 19
“/ man/Sanji” 15

As you can see above, Luffy is in contact with the most characters on the One Piece network. Given that he’s the main character in the comics, we think this makes sense.

Betweenness Centrality

Intermediate centrality: In a network, the intermediate centrality of a node means that all the shortest paths of the other two nodes pass through this node, so the number of these shortest paths is the intermediate centrality of this node. Intermediate centrality is an important measure because it can identify the “information middlemen” in the network or the nodes of the network after clustering.

The red nodes in the figure are the join nodes of network clustering with high intermediate centrality.

To calculate intermediate centrality, you need to install the ALGO library

CALL algo.betweenness. Stream ('ns0__ ', 'ns1__ meet ',{direction:'both'}) YIELD nodeId, Centrality MATCH (user: 'ns0__ person') WHERE id(user) = nodeId RETURN user.uri AS user,centrality ORDER BY centrality DESC; CALL algo.betweenness. Stream ('ns0__ person ', null,{direction:'both'}) YIELD nodeId, Centrality MATCH (user: 'ns0__ person') WHERE id(user) = nodeId RETURN user.uri AS user,centrality ORDER BY centrality DESC;Copy the code
user centrality
“Kg. Course/talkop – vivr…” 759.0
“Kg. Course/talkop – vivr…” 335.0
“Kg. Course/talkop – vivr…” 330.0

NOTE: {direction:’both’} If you think about the direction, yes

  • loading incoming relationships: ‘INCOMING’,’IN’,’I’ or ‘<‘
  • loading outgoing relationships: ‘OUTGOING’,’OUT’,’O’ or ‘>’

Tightness centrality

Compactness centrality is the reciprocal of the average distance to all other actors in the network. In the figure, nodes with high tightness centrality are highly connected between the clustered communities, but not necessarily outside the communities.

Nodes with high compactness centrality in the network are highly connected by other nodes

MATCH (C: 'ns0__ person ') WITH collect(c) AS characters to CALL algo.stream ('ns0__ person ', null) YIELD nodeId, centrality RETURN algo.asNode(nodeId).uri AS node, centrality ORDER BY centrality DESC LIMIT 20;Copy the code
node centrality
“Kg. Course/talkop – vivr…” 1.0
“Kg. Course/talkop – vivr…” 1.0
“Kg. Course/talkop – vivr…” 1.0
4.1.4. Community discovery
CALL algo.beta.louvain.stream(null, null, {
 graph: 'huge',
 direction: 'BOTH'
}) YIELD nodeId, community, communities
RETURN algo.asNode(nodeId).uri as name, community, communities
ORDER BY community ASC
Copy the code
name community communities
“Kg. Course/talkop – vivr…” 151 null
“Kg. Course/talkop – vivr…” 151 null
“Kg. Course/talkop – vivr…” 151 null
“Kg. Course/talkop – vivr…” 151 null
“Kg. Course/talkop – vivr…” 151 null
“Kg. Course/talkop – vivr…” 151 null
“Kg. Course/talkop – vivr…” 151 null
“Kg. Course/talkop – vivr…” 151 null
“Kg. Course/talkop – vivr…” 151 null

As you can see, a series of communities have been detected in Waporna, including the Island of Magnetic Drum and the Kingdom of Dark Magnetic Drum

4.1.5. PageRank
CALL algo.pagerank.stream ('ns0__ person ', null, iterations:20, dampingFactor:0.85}) score RETURN algo.asNode(nodeId).uri AS page,score ORDER BY score DESCCopy the code
page score
“Kg. Course/talkop – vivr…” 2.9112886658942436
“Kg. Course/talkop – vivr…” 1.4952359730610623
“Kg. Course/talkop – vivr…” 1.1878799288533628

5. Knowledge application

5.1. Intelligent q&A

In this part, we refer to the previous work and research, and implement a KBQA system based on REfO. The main process is to parse the input natural language questions to generate SPARQL queries, and further request the Background Apache Jena Fuseki service based on TDB knowledge base to get the results. The code and data are stored in the vivirecard-kb_query directory

5.1.1. Supported Problem Types
  1. For birthday/English name/blood type/constellation/domineer/height query
  2. Who was born where/who was born somewhere
5.1.2. Query Example

Run Python query_main.py to start the QA process

cd vivirecard-KB_query
python query_main.py
Copy the code

Type the question directly and press Enter to return the answer; I don’t know :(; Returns I can’t understand when the system cannot understand the problem. (

  1. What’s the height of Raley?

    188cm

  2. Roger’s blood type

    “S”

  3. Who was born in Windmill Village?

    Munch D. Luffy, Marcino, Uncle Joe & Aunt Chicken, Upp Slup

  4. Who was born in the village of Kokaxi?

    Nami, Noki Goh, Ajian, Bermel, Dr. Nack, Sam

  5. I want to know Smolger’s birthday

    On March 14

  6. What’s Trump’s birthday

    I don’t know. 🙁

  7. sasdasdasd

    I can’t understand. 🙁

5.2. Visualization of knowledge graph

In this part, we refer to others’ work and use D3 to provide visual interaction functions for the previously constructed entity relationship knowledge graph, including visualization of node connection relationship and query of relevant node information. At the same time, this part also integrates the character attribute knowledge map built between them, provides the presentation process of the information box, and the related data and code are stored in the Visualization directory. The interactive process of the entire visual page is shown in the GIF below:

Visual web store in the visualization/HTML/index. The HTML, can use Microsoft Edge directly open the browser

If you need to open it in another browser, the visuals may not load. This is because cross-domain requests are forbidden in most browsers and json data cannot be requested. Therefore, you need to configure a Web network environment with WAMP/LAMP/MAMP.

After opening, the visual interface is as follows. Different colors represent different types of entities, and entities with relationships are connected with white thin lines. It is obvious that some entities have a large number of connections with other entities

By clicking the mode switch button in the upper left corner, we can change the node display from circle mode to text mode for a more detailed view

When a node is selected, only that node and the nodes directly connected to it will be displayed. In particular, if the node type is a person, an information box for that person is displayed on the right side of the page

In addition, the left side also provides the function of search box, which can facilitate us to find node information

Making the address