FoolNLTK

Chinese Processing Kit

GitHub users have opened an open source Chinese processing toolkit built with two-way LSTM, which can not only realize word segmentation, partof speech tagging and named entity recognition, but also enhance the effect of word segmentation with user-defined dictionary.

The characteristics of

May not be the fastest open source Chinese word segmentation, but it is probably the most accurate open source Chinese word segmentation
Based on BiLSTM model training
Including word segmentation, part-of-speech tagging, entity recognition, all have relatively high accuracy
User-defined dictionary
You can train your own model
Batch processing

Dependencies :(Windows test successful)

Python3.5 +
Tensorflow > = 1.0.0

The installation

pip install foolnltk
Copy the code

Directions for use

participles

import fool

text = "A fool in Beijing."
print(fool.cut(text))
# [' a ', 'fool ',' in ', 'Beijing ']
Copy the code

Command line word segmentation. You can specify the -b parameter. The number of lines cut each time can speed up word segmentation

python -m fool [filename]
Copy the code

User-defined dictionary

The dictionary format is as follows: The higher the weight of the word, the longer the length of the word, the more likely it is to appear. The weight value should be greater than 1

Uncomfortable mushroom 10 what ghost 10 participle tool 10 Beijing 10 Beijing Tian 'anmen 10Copy the code

Loading the dictionary

import fool
fool.load_userdict(path)
text = ["I watched you suffer shiitake mushrooms at Tiananmen square in Beijing."."I was sunning myself in Beijing and you were watching snow in Africa."]
print(fool.cut(text))
#[[' I ', 'in ',' Beijing ', 'Tiananmen ',' look ', 'you ',' uncomfortable ', 'shiitake mushroom '],
# [' me ', 'in', 'Beijing', 'the sun', 'you' and 'in', 'Africa', 'look at', 'snow']]
Copy the code

Delete the dictionary

fool.delete_userdict();
Copy the code

POS tags

import fool
    
text = ["A fool in Beijing."]
print(fool.pos_cut(text))
# [[(' a ', 'm'), (' fool ', 'n'), (' in ', 'p'), (' Beijing ', 'ns)]]
Copy the code

Entity recognition

import fool 
text = ["A fool in Beijing."."Hello?"]
words, ners = fool.analysis(text)
print(ners)
#[[(5, 8, 'location', 'Beijing ')]]
Copy the code

Note that for any missing model files, try looking at sys.prefix at the /usr/local/open source address: github.com/rockyzhengw…

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

How to use fool-nLTK Chinese Processing Toolkit

FoolNLTK

The characteristics of

Dependencies :(Windows test successful)

The installation

Directions for use

participles

User-defined dictionary

How to use fool-nLTK Chinese Processing Toolkit

FoolNLTK

The characteristics of

Dependencies :(Windows test successful)

The installation

Directions for use

participles

User-defined dictionary

Related Posts

Machine learning: Bayesian spam detection APP

【 mathematical Modeling 】2021 BEAUTY B problem solving ideas (forest fires and uav layout)

Machine Learning — PCA Dimensionality Reduction (Machine Learning practice)