Word frequency statistics scheme and concrete implementation - ElasticSearch, Spark, Python

Scheme 1: Based on ElasticSearch

See this article for examples of ElasticSearch implementations. This is done by aggregating or retrieving termVector from ES fieldData. HTTP Requset, HighClient, Springboot Data Repository, ElasticSearch Resttemplate, etc

Java Elasticsearch 7.10 tutorial – Word frequency statistics

zhuanlan.zhihu.com] (zhuanlan.zhihu.com/p/315888125)

Solution 2: Based on Spark

Spark is a memory-based distributed computing component. Spark provides a JavaWordCount demo. For details, see the following article

Word frequency statistics several pits for remote calls

zhuanlan.zhihu.com] (zhuanlan.zhihu.com/p/329967589)

Scheme 3. Based on Python

If you can only implement simple word frequency statistics for data, Python is the best way to do it in a few lines of code

text = "http requset highclient springboot"
data = text.lower().split()
words = {}
for word in data:
    if word not in words:
        words[word] = 1
    else:
        words[word] = words[word] + 1
result = sorted(words.items(), reverse=True)
print(result)
Copy the code

Of course, the above segmentation to achieve Chinese support are dependent on Chinese word segmentation. Chinese word segmentation can refer to:

How to choose jieba ik-Analyzer ansj_seg HanLP

blog.csdn.net] (Link.zhihu.com/?target=htt…)

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Word frequency statistics scheme and concrete implementation – ElasticSearch, Spark, Python

Scheme 1: Based on ElasticSearch

Solution 2: Based on Spark

Scheme 3. Based on Python

Word frequency statistics scheme and concrete implementation – ElasticSearch, Spark, Python

Scheme 1: Based on ElasticSearch

Solution 2: Based on Spark

Scheme 3. Based on Python

Related Posts

Data Structures and Algorithms: Stack: How to implement browser forward and backward functionality?

$’\r’: command not found

Fine-grained lock implementations common in Java