The boss gives him a task to do word segmentation. He wants to start with stuttering word segmentation and regular expressions. It turns out that the results are not good, and it needs to be screened over and over again [80% of the data is screened by the first criterion, then a second criterion is developed, then a third criterion is developed, etc.]

I used a stutter participle, feel for people’s names, place names, organization names, just used once. When it comes to actual separation, it is not possible to separate agency names very well. The hanLP participle was used instead

But the downside of hanLP participles is that they only work on Java, which has always been my weakness. So write a blog post here describing how to use HANLP from start to finish

Besides, Chubby locked my computer in a cabinet in the North Division Library. I don’t have a computer at work, so I use the little fat’s computer. That is to say, ALL the basic variables need to be matched by myself, so it is equivalent to the process from a white paper to hanLP.

Step 1: Download a JDK

Go to the openJDK website and install the next one.

After installation, there are three environment variables to configure, respectively

1. The JAVA_HOME: C: \ Program Files \ Java \ jdk1.8.0 _73

2.CLASSPATH: this is the lib directory that the JDK opened

3.PATH: is the bin directory behind JDK

After the configuration is complete, enter Java -version in the CMD file under Windows to see if the response is correct

I have a small problem here, in the fat computer, do not know what she installed before, with a JRE1.6 but I installed jRE1.8 in CMD error, said to find jRE1.6 later I read the Internet, said that maybe you other software will also download Java environment, So you may have a lot of different packages, and the system will look for the path based on the environment variable configured above by default. Therefore, we need to put the latest environment variable at the top of the pile of environment variables, just try.

Once the JDK has been downloaded and installed successfully, the second step is to download Eclipse

Go to the official website and remember that x86 is 32-bit and x64 is 64-bit. After downloading, set the location of project.

After the installation is successful, the third step is to download hanLP’s various things

Method 1. Maven method, download a 0 configuration. But I can’t play

Method 2: Download the jar package hanlP-1.2.8. jar

Hanlp.linrunsoft.com/services.ht…

Then download the data.zip package. You can choose to download standard data or mini data or all data. Different sizes. I typed the standard version. 40M

This is a file that ends in “properties”. I’ve never seen it before, but you can open it in TXT.

Step 4: Import the downloads into Eclipse. Build path

1. Import the JAR package to the Eclipse lib directory

Jingyan.baidu.com/article/ca4…

2. Create a package in SRC and create a class in the package. The package will be in the root directory I set D:/, the class name must begin with a capital letter? If you don’t capitalize, it will be rejected.

3. Unpack the data package and place it in your preferred directory (D://py/). Then, in the hanlp.properties file, change root to the directory above where data is stored.

4. Drag hanlp.properties to the SRC directory

Then I tried a demo test, found an error, and clicked import Import com.hankcs.hanlp.hanlp;

And then I run the program,

> test0320 > test0320 > test0320 > test0320 > test0320 > test0320

And then it runs, and it works

The article comes from tianbwin2995’s blog