Can quickly achieve query and data analysis, high availability, high expansion capacity.
It’s been just over 20 days since the last update, and just over three months since the 0.17 release, Druid is getting another major update, and Druid is going strong.
Apache Druid 0.18.0 is updated with more than 200 new features, performance enhancements, BUG fixes, and documentation improvements from 42 contributors.
New features
The Join support
Join is a key operation in data analysis. Prior to 0.18.0, Druid supported some join-related features, such as Lookups or semi-joins in SQL. However, the use cases for these capabilities are very limited, and for other join use cases, users must normalize the data source when ingesting data, rather than adding it to the query, which can lead to data volume surges and longer ingestion times.
Druid 0.18.0 supports real joins for the first time. Druid currently supports INNER, LEFT, and CROSS joins. For native queries, joins are introduced as a new data source to represent joins of two data sources.
Currently, only left-deep joins are allowed. This means that the left-hand data source only allows one table or another join data source. For the data source on the right, lookup, inline, or Query data sources are allowed.
Druid SQL supports joins. Essentially, SQL JOIN queries are converted into one or more containing native queries.
Join will affect the performance of the query, we need to pay attention to:
- The LOOKUP function performs better,
LOOKUP
Consider using this feature if it suits your needs. - When using Join in Druid SQL, remember that it generates subqueries that are not explicitly included in the query.
- A common reason for the generation of a formula query is whether the types of two equal halves do not match. For example, since the lookup key is always a string
druid.d JOIN lookup.l ON d.field = l.field
ifd.field
Is a string, the performance is the best. - Starting with Druid 0.18.0, the JOIN operator must evaluate conditions on each row. In the future, we hope to do both early conditional evaluation and delayed conditional evaluation at the same time, and hopefully improve performance significantly in the general case.
Future work:
RIGHT OUTER and FULL OUTER JOIN
To improve the performance
Inline queries
Druid can now perform nested queries via inline subqueries. Any type of subquery can sit on top of another type of subquery, as in the following example:
topN
|
(join datasource)
/ \
(table datasource) groupBy
Copy the code
To execute this query, the Broker first evaluates the groupBy subquery; It sends the subquery to the data node and collects the results. The collected results are implemented in the Broker store. After the Broker has collected all the results of the groupBy query, it will rewrite the topN query by replacing the groupBy with an inline data source that has the results of the groupBy query. Finally, the rewritten query is sent to the data node to perform the topN query.
Query channels and priorities
When running more than one query at a time, you may sometimes want to control the resource allocation of the query based on the priority of the query. For example, you might want to limit the resources allocated to less important queries so that important queries can be executed in a timely manner and not be interrupted by less important queries.
With query channels, you can control the utilization of query workloads. Specific Settings are as follows:
Property | Description | Default |
---|---|---|
druid.query.scheduler.numThreads |
Maximum number of HTTP threads to dedicate to query processing. To save HTTP thread capacity, this should be lower than druid.server.http.numThreads , but it is worth noting that like druid.server.http.enableRequestLimit is set that query requests over this limit will be denied instead of waiting in the Jetty HTTP request queue. |
Unbounded |
druid.query.scheduler.laning.strategy |
Channel strategy | none |
druid.query.scheduler.prioritization.strategy |
Priority policy | manual |
Query the new dimension of an indicator
SubQueryId Each subquery has a different subQueryId, but has the same queryId
The new configuration
Druid. Server. HTTP. MaxSubqueryRows Broker in memory to realize the maximum number of lines
Grouping SQL
GROUPING SETS is now supported, allowing you to combine multiple GROUP BY clauses into a single GROUP BY clause.
SQL dynamic parameters
Druid now supports dynamic parameters for SQL. To use dynamic parameters, use the question mark (?) Character replaces all text in the query.
Important changes
applyLimitPushDownToSegments
Disabled by default
ApplyLimitPushDownToSegments has been added in the 0.17.0, but if the query processing involves many segments, may lead to performance degradation. This is because limit pushdown to segment scan initializes an aggregate buffer for each segment, and its overhead is not negligible. Enable this configuration only later when a query involves a relatively small number of segments per historical or real-time task.
New lag metrics for Kinesis
The Kinesis indexing service now provides new lagging metrics listed below:
-
Ingest /{supervisor type}/lag/time: the total time (in milliseconds) of the last offset in the stream
-
Ingest /{supervisor type}/maxLag/time: the longest time in milliseconds after the last offset of the stream
-
Ingest /{supervisor Type}/avgLag/time: the average time after the last offset of the stream (in milliseconds)
The default Roaring bitmaps
Druid supports two bitmaps, Roaring and CONCISE, which are switched to Roaring by default for performance reasons
Array expression syntax changed
Druid expressions now support typed constructors for creating arrays. Arrays can be defined using explicit types. For example,
[1, 2, NULL] creates an array LONG that contains 1, 2, and NULL. Note that you can still create arrays with no explicit type. For example, [1, 2, NULL] is still a valid syntax for creating equivalent arrays. In this case, Druid will infer the type of the array based on its elements. This new syntax also applies to empty arrays.
[],
[], and
[] will create empty arrays of STRING, DOUBLE, and LONG types.
The customTransform
For developers released the transform of the extended interface Details: druid.apache.org/docs/0.18.0…
chunkPeriod
deleted
ChunkPeriod is not recommended from 0.14.0 because of its limited usage, and this query has now been removed from 0.18.0.
Support Java 11
Druid now supports Java11. You can use Java 11 to run the same Druid binary package as Java 8. Our tests on Travis include:
- Compile and run unit tests using Java 11
- Compile using Java 8 and run integration tests using Java 11
Starting with Java 9, it will issue a warning when some libraries use reflection to illegally access the JDK’s internal apis. These warnings will be addressed by modifying the Druid code or updating the library version in future releases. These warnings — Add-exports can now be suppressed by adding JVM options such as — Add-exports or — Open or.
The 2020-01-22 T21:30:08, 893 WARN [main] org. Apache. Druid. Java. Util. Metrics. AllocationMetricCollectors - always initialize org.apache.druid.java.util.metrics.AllocationMetricCollector java.lang.reflect.InaccessibleObjectException: Unable to make public long[] com.sun.management.internal.HotSpotThreadImpl.getThreadAllocatedBytes(long[]) accessible: module jdk.management does not"exports com.sun.management.internal" to unnamed module @6955cb39
Copy the code
Can display this warning – add – by adding a ban on exports Java. The base/JDK. Internal. The perf = ALL – UNNAMED
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.google.inject.internal.cglib.core.$ReflectUtilsThe $1 to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
WARNING: Please consider reporting this to the maintainers of com.google.inject.internal.cglib.core.$ReflectUtilsThe $1
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Copy the code
This warning can be disallowed by adding — add-attach java.base/java.lang= all-unnamed
The 2020-01-22 T21:30:08, 902 WARN [main] org. Apache. Druid. Java. Util. Metrics. JvmMonitor - always initialize a GC counters. If running JDK11 and above, add --add-exports java.base/jdk.internal.perf=ALL-UNNAMED to the JVM arguments toenable GC counters.
Copy the code
Can display this warning – add – by adding a ban on exports Java. The base/JDK. Internal. The perf = ALL – UNNAMED
Update Kafka client to 2.2.2
Kafka client library has been updated to 2.2.2
Bug fix
Druid 0.18.0 includes 40 bug fixes. A complete list about bug fixes, see https://github.com/apache/druid/pulls?page=1&q=is%3Apr+milestone%3A0.18.0+is%3Aclosed+label%3ABug
- Fixed superBatch merge last partition boundary (# 9448)
- Reuse converters in stream indexes (# 9625)
- Leave the null value of the compressed numeric type size (# 9622)
- DruidInputSource can add new dimensions during reuptake (# 9590)
- Value counter overflows errors instead of writing the wrong segment (# 9559)
- Fixed some issues with filters on numeric columns with null values (# 9251)
- Timestamp_format expr (# 9282)
- KIS task fails when setting segmentGranularity (# 8690) with time zone
- Solve grouping problems by extracting constraints of Fn, expression, join, etc. (# 9662)