Scheme 1: Based on ElasticSearch

See this article for examples of ElasticSearch implementations. This is done by aggregating or retrieving termVector from ES fieldData. HTTP Requset, HighClient, Springboot Data Repository, ElasticSearch Resttemplate, etc

Java Elasticsearch 7.10 tutorial – Word frequency statistics

zhuanlan.zhihu.com] (zhuanlan.zhihu.com/p/315888125)

Solution 2: Based on Spark

Spark is a memory-based distributed computing component. Spark provides a JavaWordCount demo. For details, see the following article

Word frequency statistics several pits for remote calls

zhuanlan.zhihu.com] (zhuanlan.zhihu.com/p/329967589)

Scheme 3. Based on Python

If you can only implement simple word frequency statistics for data, Python is the best way to do it in a few lines of code

text = "http requset highclient springboot"
data = text.lower().split()
words = {}
for word in data:
    if word not in words:
        words[word] = 1
    else:
        words[word] = words[word] + 1
result = sorted(words.items(), reverse=True)
print(result)
Copy the code

Of course, the above segmentation to achieve Chinese support are dependent on Chinese word segmentation. Chinese word segmentation can refer to:

How to choose jieba ik-Analyzer ansj_seg HanLP

blog.csdn.net] (Link.zhihu.com/?target=htt…)