Before we talk about technology, you don’t understand the world of foodies
Articles translated by Zhongcheng have tags. Users can quickly screen articles they are interested in based on tags, and articles can also be recommended according to tag association. But now the tag of zhongcheng translation is set when recommending articles, which are all in English, and the manual setting is inevitably not standard and incomplete. We can edit posts manually, but we can’t expect users or administrators to edit the appropriate tags all the time, so we need tools to automatically generate tags.
Among current open source word segmentation tools, Jieba is a powerful and high-performance word segmentation tool, and fortunately, node version is available.
Nodejieba is simple to install and use:
npm install nodejieba var nodejieba = require("nodejieba"); Var result = nodejieba.cut(" the imperialists want to carve up our land "); console.log(result); / / [' imperialism ', 'to', 'the', 'we', 'the', 'to', 'grab', 'off'] result = nodejieba. The cut (' land, the an old great in where? '); console.log(result); / / [' land, 'and', 'I' and 'old', 'sun', 'the', 'great' and 'in', 'where' and '? '] result = nodejieba. The cut (' powers, your great is great for a special type with your head! '); console.log(result); / / [' risk ', ', ', 'you', 'the', 'great', 'is' and' good 'and' in ', 'special' and 'and', 'you', 'the', 'head type', '!]Copy the code
We can load our own dictionary and assign weight and part of speech to each word in the dictionary:
Edit user. Uft8
Sweet potato 9999nGold hoop 9999nGood at 9999Copy the code
Then load the dictionary via nodeJieba.load.
var nodejieba = require("nodejieba");
nodejieba.load({
userDict: './user.utf8'});var result = nodejieba.cut("The imperialists want to carve up our land.");
console.log(result);
//[' imperialist ', 'want ',' our ', 'share ',' drop ']
result = nodejieba.cut('The earth, where is my great staff? ');
console.log(result);
/ / [' land, 'and', 'I', 'old', 'sun', 'the', 'great' and 'in', 'where' and '? ']
result = nodejieba.cut('Great Sage, your cudgel is perfect for your head! ');
console.log(result);
/ / [' risk ', ', ', 'you', 'the', 'golden collar,' stick stick on ', 'special' and 'and', 'you', 'the', 'head type', '!]
Copy the code
In addition to word segmentation, we can use Nodejieba to extract keywords:
const content = The purpose of this article is to tell you why you should migrate from HTTP to HTTPS and why you should add support for HTTP/2 by comparing them. Before comparing HTTP to HTTP/2, let's take a look at what HTTP is. What is HTTP HTTP is a set of rules for communicating on the World Wide Web. HTTP is an application-layer protocol that runs on top of TCP/IP. When a user requests a Web page through a browser, HTTP handles the request and establishes a connection between the Web server and the client. With HTTP/2, performance can be improved without Sprite graphics, compression, and concatenation. However, this does not mean that these technologies should not be used. However, this makes it clear that we need to move from HTTP/1.1 to HTTP/2. `;
const nodejieba = require("nodejieba");
const result = nodejieba.extract(content, 20);
console.log(result);
Copy the code
The output should look something like this:
[{word: 'HTTP'.weight: 140.8704516850025 },
{ word: The 'request'.weight: 14.23018001394 },
{ word: 'should'.weight: 14.052171126120001 },
{ word: The World Wide Web.weight: 12.2912397395 },
{ word: 'TCP'.weight: 11.739204307083542 },
{ word: '1.1'.weight: 11.739204307083542 },
{ word: 'Web'.weight: 11.739204307083542 },
{ word: 'Sprite'.weight: 11.739204307083542 },
{ word: 'HTTPS'.weight: 11.739204307083542 },
{ word: 'IP'.weight: 11.739204307083542 },
{ word: 'Application layer'.weight: 11.2616203224 },
{ word: 'Client'.weight: 11.1926274509 },
{ word: 'browser'.weight: 10.8561552143 },
{ word: 'together'.weight: 9.85762638414 },
{ word: 'compare'.weight: 9.5435285574 },
{ word: 'pages'.weight: 9.53122979951 },
{ word: 'server'.weight: 9.41204128224 },
{ word: 'use'.weight: 9.03259988558 },
{ word: 'necessity'.weight: 8.81927328699 },
{ word: 'add'.weight: 8.0484751722}]Copy the code
Let’s add some new keywords to the dictionary:
Performance of HTTP / 2Copy the code
The following output is displayed:
[{word: 'HTTP'.weight: 105.65283876375187 },
{ word: 'HTTP/2'.weight: 58.69602153541771 },
{ word: The 'request'.weight: 14.23018001394 },
{ word: 'should'.weight: 14.052171126120001 },
{ word: 'performance'.weight: 12.61259281884 },
{ word: The World Wide Web.weight: 12.2912397395 },
{ word: 'IP'.weight: 11.739204307083542 },
{ word: 'HTTPS'.weight: 11.739204307083542 },
{ word: '1.1'.weight: 11.739204307083542 },
{ word: 'TCP'.weight: 11.739204307083542 },
{ word: 'Web'.weight: 11.739204307083542 },
{ word: 'Sprite'.weight: 11.739204307083542 },
{ word: 'Application layer'.weight: 11.2616203224 },
{ word: 'Client'.weight: 11.1926274509 },
{ word: 'browser'.weight: 10.8561552143 },
{ word: 'together'.weight: 9.85762638414 },
{ word: 'compare'.weight: 9.5435285574 },
{ word: 'pages'.weight: 9.53122979951 },
{ word: 'server'.weight: 9.41204128224 },
{ word: 'use'.weight: 9.03259988558}]Copy the code
Based on this, we whitelisted some words that could be used as tags:
const content = The purpose of this article is to tell you why you should migrate from HTTP to HTTPS and why you should add support for HTTP/2 by comparing them. Before comparing HTTP to HTTP/2, let's take a look at what HTTP is. What is HTTP HTTP is a set of rules for communicating on the World Wide Web. HTTP is an application-layer protocol that runs on top of TCP/IP. When a user requests a Web page through a browser, HTTP handles the request and establishes a connection between the Web server and the client. With HTTP/2, performance can be improved without Sprite graphics, compression, and concatenation. However, this does not mean that these technologies should not be used. However, this makes it clear that we need to move from HTTP/1.1 to HTTP/2. `;
const nodejieba = require("nodejieba");
nodejieba.load({
userDict: './user.utf8'});const result = nodejieba.extract(content, 20);
const tagList = ['HTTPS'.'HTTP'.'HTTP/2'.'Web'.'browser'.'performance'];
console.log(result.filter(item= > tagList.indexOf(item.word) >= 0));
Copy the code
Finally get:
[{word: 'HTTP'.weight: 105.65283876375187 },
{ word: 'HTTP/2'.weight: 58.69602153541771 },
{ word: 'performance'.weight: 12.61259281884 },
{ word: 'HTTPS'.weight: 11.739204307083542 },
{ word: 'Web'.weight: 11.739204307083542 },
{ word: 'browser'.weight: 10.8561552143}]Copy the code
That’s what we want.
The above is the basic usage method of nodeJieba. In the future, we can use nodeJieba to add corresponding tag to the automatic analysis of translated texts published by Zhongcheng Translation, so as to provide better user experience for translators and readers.
Any questions, please feel free to discuss in the discussion section