preface
- With the development of cloud computing and big data, distributed architectures have become the norm. In distributed systems, logs are also distributed to multiple servers. At this time, to use logs to troubleshoot system problems, or analyze business data, etc., the cost is much higher than the traditional stand-alone system
- From the perspective of big data, the sources of big data mainly include
- The database
- The log file
- The crawler
- Log files are the most common and largest source of data. Crawlers also often store preliminary processed data in the form of files, which can also be classified as log files. Collecting, parsing, and analyzing log files is also a common requirement in the era of big data
- Therefore, in the era of cloud computing and big data, it is a common requirement to collect logs distributed on multiple servers and store, parse, search and analyze them in a unified manner. In the open source world,
ELK
The technology stack is a popular solution to logging problems. In addition to the traditional search function based on inverted indexes, column storage is introducedDocValue
, with good analytical skills - But using this open source product requires a deep understanding of the details of each component and a technically competent team to maintain and develop it