FoolNLTK
Chinese Processing Kit
GitHub users have opened an open source Chinese processing toolkit built with two-way LSTM, which can not only realize word segmentation, partof speech tagging and named entity recognition, but also enhance the effect of word segmentation with user-defined dictionary.
The characteristics of
- May not be the fastest open source Chinese word segmentation, but it is probably the most accurate open source Chinese word segmentation
- Based on BiLSTM model training
- Including word segmentation, part-of-speech tagging, entity recognition, all have relatively high accuracy
- User-defined dictionary
- You can train your own model
- Batch processing
Dependencies :(Windows test successful)
- Python3.5 +
- Tensorflow > = 1.0.0
The installation
pip install foolnltk
Copy the code
Directions for use
participles
import fool
text = "A fool in Beijing."
print(fool.cut(text))
# [' a ', 'fool ',' in ', 'Beijing ']
Copy the code
Command line word segmentation. You can specify the -b parameter. The number of lines cut each time can speed up word segmentation
python -m fool [filename]
Copy the code
User-defined dictionary
The dictionary format is as follows: The higher the weight of the word, the longer the length of the word, the more likely it is to appear. The weight value should be greater than 1
Uncomfortable mushroom 10 what ghost 10 participle tool 10 Beijing 10 Beijing Tian 'anmen 10Copy the code
Loading the dictionary
import fool
fool.load_userdict(path)
text = ["I watched you suffer shiitake mushrooms at Tiananmen square in Beijing."."I was sunning myself in Beijing and you were watching snow in Africa."]
print(fool.cut(text))
#[[' I ', 'in ',' Beijing ', 'Tiananmen ',' look ', 'you ',' uncomfortable ', 'shiitake mushroom '],
# [' me ', 'in', 'Beijing', 'the sun', 'you' and 'in', 'Africa', 'look at', 'snow']]
Copy the code
Delete the dictionary
fool.delete_userdict();
Copy the code
POS tags
import fool
text = ["A fool in Beijing."]
print(fool.pos_cut(text))
# [[(' a ', 'm'), (' fool ', 'n'), (' in ', 'p'), (' Beijing ', 'ns)]]
Copy the code
Entity recognition
import fool
text = ["A fool in Beijing."."Hello?"]
words, ners = fool.analysis(text)
print(ners)
#[[(5, 8, 'location', 'Beijing ')]]
Copy the code
Note that for any missing model files, try looking at sys.prefix at the /usr/local/open source address: github.com/rockyzhengw…