We graduated! Hacking Camp 2021: Six ecological projects enter a new phase

On November 7th, Hacking Camp 2021 Ecology, co-hosted by TiDB Community X Matrix China and sponsored by Chuxin Capital, Mingzhi Capital, Jiyuan Capital and JuiceFS, held a defence session to explain the project’s progress and outlook for future work.

Hacking Camp has some of its star projects from TiDB Hackathon and some of its new ideas from eco-partners. With the theme of ecology, Hacking Camp aims to help partners incubate their projects. The six Hacking Camp projects have basically achieved their goals. After graduation, they will continue to improve their features and iterate on new versions until they become more stable, with mentors providing guidance and helping to refine their projects.

JuiceFS, a distributed POSIX file system with TiKV as its metadata engine, is one of the projects Hacking Camp participated in

Serverlessdb for HTAP Based on TiDB implementation to provide Serverlessdb service

Optimized TiDB for PostgreSQL for PG compatibility on TiDB

TiDB’s one-stop solution TiBigData in the field of big data

HugeGraph with TiKV as back-end storage

Use TiDB as the Doris Connector upstream of data

The judges evaluated ServerlessDB for HTAP from the aspects of project completion, application value, ecological contribution to TiDB and defense completion. In the end, ServerlessDB for HTAP received unanimous high marks from the jury and won two awards of “Outstanding Graduate” and “Best Application”.

Special thanks to the following judges: Zhihao Xu, Executive Manager of MZ Capital, Yang Liu, Co-founder of Flomesh CTO, Cong Wang, TiDB Team Tech Leader, Jian Zhang, R&d Director of PingCAP, Jianjun Li, TiKV Maintainer

Let’s take a look at the graduation results of the project

JuiceFS:

JuiceFS is a cloud-native POSIX distributed file system. Combined with TiKV as the data element engine, JuiceFS can provide tens of billions of file sizes and eB-level data storage capabilities while maintaining latency and stability at large scale. In the performance test of metadata operation, the average time of TiKV engine is about 2~4 times that of Redis, slightly better than local MySQL.

At present, the main functions have been developed and released in V0.16, and passed the PJDFSTest test. Already used by users in test and production environments. In the future, JuiceFS will consider TiKV as the leading metadata engine for mass production environments and actively introduce new features of TiKV while ensuring compatibility.

ServerlessDB for HTAP

The ultimate goal of the project is to turn the cloud database service into a black box, so that application developers only need to focus on how the business is converted to SQL, and users no longer need to worry about the amount of data, business load, whether the SQL type is AP or TP, and other things that are irrelevant to the business.

Develop content

Business load module:

The service load module evaluates whether the current service resources match the current service load and builds a service load model to make decisions on capacity expansion and reduction.

Serverless module:

The Serverless module checks the CPU usage and underlying storage capacity of all compute nodes in real time to expand or shrink computing or storage resources.

Database middleware:

The middleware is used to decouple user connections from back-end database service nodes so that even if users use connection pooling, the middleware can balance traffic to all new nodes after capacity expansion.

Rule system:

With a rule system, you can fix resource allocation within a specific time range. You can set rules to allocate resources in advance before traffic increases

Serverless Service Choreography module:

The service orchestration module realizes the creation and release of TiDB cluster and dynamically adjusts the expansion and contraction of TiDB components. K8s local site management to solve the problem of private deployment can not provide cloud disk;

When developing admission- Webhook to implement TiDB component scaling, the middleware registry records are deleted in advance to achieve user unaware scaling.

Follow-up RESEARCH and development plan:

Hints and rules modules are planned to be added to more accurately distinguish BETWEEN TP/AP, which is estimated to reduce CPU utilization of middleware by more than half

Provide richer load balancing algorithms such as SQL based runtime cost

Middleware adds business traffic control, and if the business load grows too fast for Serverless to handle, the background service becomes unstable. Through traffic control, service traffic surge can be handled well.

The project also won the Award for Best Graduate and Best App from Hacking Camp. It seems the judges were impressed by the project’s vision and development capabilities

Project address: github.com/tidb-incuba…

TiDB for PostgreSQL

The project was initiated by Digital China to provide TiDB compatibility with PostgreSQL while preserving TiDB’s high availability, flexibility and scalability. Allows users to connect existing PostgreSQL clients to TiDB using postgresQL-specific syntax.

Currently completed development:

Delete syntax modification
Add specific PgSQL syntax Returning keyword
Complete the Sysbench_tpcc PgSQL protocol test and compare with the native TiDB test under this version
Complete BenchMarkSQL PgSQ L protocol benchmarking and comparison with native TiDB tests under this release

Benchmark Test results comparison:

Future plans support system library table structure, graphical client, and abstract protocol layer, switching between different protocols at any time. Welcome to play with us

Project address: github.com/DigitalChin…

TiBigData

TiBigData provides connector for various OLAP computing engines of TiDB, including Flink, Presto and MapReduce. In Hacking Camp, I mainly worked on Flink related feature development.

We have implemented Snapshot source and TiCDC Streaming Source in Flink, combining these two sources, we have achieved the integration of TiDB streaming batch.
The second is data interworking. We use TiKV’s cross-data center deployment and Flink Connector’s Follower Read function to achieve real interworking of offline data.
Finally, the calculation push, we are compatible with TiKV in all types of connector, can greatly improve the data scanning and calculation efficiency.

TiBigData core enhancements:

TiDB Java Client has been enhanced with general-purpose capabilities. We have implemented TiDB encoders whose code is decoupled from TiSpark, which can be used by other OLAP engines and can be used as a general-purpose tool by other community partners who need it.

Implement some data type conversion tools, flink/ Presto data type and TiDB data type conversion.
The distributed client of TiKV is realized, which is more suitable for distributed computing framework from the API level.

In the future, we will continue to develop Change Log Write, TiDB x Preto/Trino, Flink State Backend in TiKV, etc. Interested students can join the community to play ~

Project address: github.com/tidb-incuba…

HugeGraph on TiKV

HugeGraph on TiKV is suitable for scenarios that require large-scale graph databases, high read and write performance, and have TiVK storage operation and maintenance teams.

Realized functions:

Support for single graph instances
Supports adding, deleting, modifying, and querying schemas
Supports adding, deleting, modifying, and searching vertices and edges
Traversal algorithm support Gremlin query support index query (incomplete)

Effect display:

Import data [COVID-19 data set of Xinyu City], and check the map effect using HugeGraph-Hubble interface:

Performance test results:

Import speed (write)

Query by ID (random read)

Follow-up plan:

functional

Supports advanced functions such as multiple graph instances, TRUNCate/Clear graph data, and metrics and TTL monitoring

Performance optimization

Write performance optimization: commit mode, batch size adjustment, etc

Query performance optimization: data coding optimization, paging optimization, etc

Project address: github.com/tidb-incuba…

Doris Connector:

TiDB was used as the data source to provide Doris with the native connector to open up the data flow in tP-AP scenario. Supports DML/DDL synchronization and filters data under specified conditions. The current project progress is 70%.

Design ideas

Stream Load: An independent service is designed in TiDB to read and parse TiDB binlog files regularly and assemble data lines into CSV files and import them to Doris through Stream Load.
Routine Load: Uses TiDB’s Drainer to synchronize binlog to Kafka. Doris adds the TiDB binlog data format to synchronize data
TiDB native protocol synchronization: Implement the TiDB copy synchronization protocol in Doris and disguise Doris as a node in the TiDB cluster.

Follow-up planning:

The project will continue to iterate to make the data processing link more unimpeded based on the user’s real scene. The later stage of the project will be merged into the Doris trunk.

Project address: github.com/apache/incu…

This phase of Hacking Camp ended with six excellent projects, but the maintenance of ecology is long-term, and we will continue to provide follow-up support for these excellent ecological projects to ensure their lasting vitality. If you are interested in the project, please pay attention to the following tweets. The founding team will interpret the value of the project to the whole TiDB ecosystem from the application level, and there will be special Meetup planning, please look forward to it!

From planet Ti to the cosmic vault, we use Hacking to connect the wider ecosystem. 2021 TiDB Hackathon will also be launched soon, come and explore the secrets of database technology with us!

We graduated! Hacking Camp 2021: Six ecological projects enter a new phase

JuiceFS:

ServerlessDB for HTAP

Develop content

Follow-up RESEARCH and development plan:

TiDB for PostgreSQL

Currently completed development:

TiBigData

TiBigData core enhancements:

HugeGraph on TiKV

Realized functions:

Effect display:

Performance test results:

Follow-up plan:

Doris Connector:

Design ideas

Follow-up planning:

Related Posts

How many threads is appropriate for a concurrent program?

2 Pandas Cleans data

What have I been thinking about since Zhao Liying’s divorce