Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.
First, keyword extraction
If you want to do a sensitive word detection feature, what is your first thought about the requirement?
Of course, open the browser, Baidu, Google, Bing, CV engineers online;
What if you want to implement one yourself?
Find a vocabulary of sensitive words and then direct policy matching, or string based indexOf direct traversal.
Is there any other way?
So, so, so, so, you’ve seen this, you must have the same problem, right?
Nodejieba is a nodejs-based theshotomy library.
The library is small and has a default set of word priorities, which are as follows when used simply.
var nodejieba = require("nodejieba"); Var result = nodejieba.cut(" nanjing Yangtze River Bridge "); console.log(result); //[" nanjing "," Yangtze River Bridge "]Copy the code
After the code is introduced into the library, the sentence to be split will be put in, and the final word segmentation result will be returned, which can be directly printed.
2. Content participle
If it’s a whole sentence or a paragraph of actual development, an article, there’s a lot of connectives when you break it down.
Const nodejieba = require('nodejieba') var result = nodejieba.cut(' imperialists want to divide our land ') console.log(reult) /* [' imperialists ', 'want ', 'take ',' our ', 'sweet potato ',' share ', 'drop'] */Copy the code
So you can deal with it in the following way.
Keywords extraction
The following code can view the extraction of meaningful words and their priorities, because only the extraction of keywords may be omitted, you can choose according to the actual needs of oh.
var nodejieba = require('nodejieba') nodejieba.load({ userDict: './user.utf8'}) var article = 'a large string of Chinese characters' // all lowercase string, avoid the same word difference calculation, Var result = nodejieba.extract(article.tolowercase (), 4) console.log(result); /* [{word: 'person', weight: 716.091462732096}, {word: 'prototype', weight: 551.7426024329264}, {word: 'prototype ', weight: 335.0089885136}, {word: 'constructor', weight: 305.21931198417207}] */Copy the code