The practical path of realizing trillion-level multidimensional retrieval and real-time analysis based on Lucene

Lucene is the most popular search engine in the industry. Solr and ElasticSearch are both implemented based on Lucene. But as the data volume continues to increase, when under the scenario of the trillions of data, all the normal operation of huge amounts of data to bring the huge pressure, how to retain the Lucene efficient full-text retrieval ability of cases to the challenge of trillions of data, at the same time to break the big data technology stack components function of a single, fit complicated problem. With that in mind, we will be sharing the problems and solutions we have encountered in implementing the Trillion-data Challenge based on Lucene over the years at QCon.

01 Introduction to lecturer

Zheng Qihua, Technical director of Xinshu Software

Former senior engineer of FNST (Fujitsu Nanda), Fujitsu System monitoring middleware product project manager, more than 10 years of software development and maintenance experience
Fujitsu Middleware Lifecycle Management and Job Management certified expert
Responsible for huawei RTOS (real-time embedded operating system) maintenance, has rich experience in Linux kernel, system monitoring and other aspects
Invited speaker of 2020 Automotive Enterprise Digitization Seminar of China Automotive Research Institute

02 Content Notice

The challenge and implementation of terabyte data

One trillion challenges: data storage

How to solve the problem of unbalanced read/write and automatically balance disks?

How to solve data security problems to prevent disk damage and deletion from affecting production?

How to solve the hardware problem of high data storage cost and heavy dependence on SSDS?

The second trillion challenge, retrieval performance

How to achieve second-level response in full text retrieval of trillions of data?

Trillion challenge number three, multidimensional statistics

How to reduce I/O consumption and export millions of data instantly?

Trillionchallenge number four: region search

How to improve the ability and accuracy of geographic location retrieval?

Trillionchallenge number five, computing frameworks

How can I improve Spark performance to greatly improve the system response time?

See you in Beijing on May 29th!

QCon Global Software Development Conference (Beijing)

The practical path of realizing trillion-level multidimensional retrieval and real-time analysis based on Lucene

Related Posts

Number of cpus and speed? Ternary operators? Test unit?

Video processing To add a beginning to a video

This time, let’s fully master Java multithreading (2/10)