Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”. This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.
That’s easy, but we’re far from solving the whole problem. What does the winning trip actually look like? At this point, we need to introduce arrays. Let’s start with just part of the problem and add the departure time for each jump.
SELECT Carrier, TailNum, groupArray(DepTime) as Departures, length(groupArray(DepTime)) as Hops FROM ontime WHERE (FlightDate = toDate('2017-01-15')) AND (DepTime < ArrTime) GROUP BY Carrier, TailNum ORDER BY Hops DESC, Carrier, TailNum LIMIT 5 SQL answer / * * / ┌ ─ Carrier ─ ┬ ─ TailNum ─ ┬ ─ Departures ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ Hops ─ ┐ │ HA │ N488HA │ [1157181], 3192, 1104, 2658544170... 14 │ │ │ HA │ N492HA │ [718824938151, 2162, 2105, 6174 3...]. 14 │ │ │ HA │ N493HA │ [627728101, 2164, 7180, 1904152 3...]. 14 │ │ │ HA │ N483HA │ [...] 845725180 0190 4125 9140 7113 12 │ │ │ HA │ N489HA │ [2047193, 6103, 1160, 2143, 8125, 6, 9... 12 │ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ┘Copy the code
This query introduces groupArray(), which is an aggregation function.
It collects the values of each GROUP BY subset and adds them to the array in the order in which the rows are processed. If you call groupArray() on different columns, it will create an array for each column. Most importantly, the generated array will always be the same size and in the same order. In other words, groupArray() is the reverse operation of the ARRAY JOIN operator that we learned in our last blog post. (This is important: one column to row, one row to column)
There’s a lot of work to build the flight path. First of all, we only have departure time, no airport. Second, the departure time is not sorted. Let’s solve this problem in the next example, which adds more arrays and sorts them by departure time:
SELECT Carrier, TailNum, arraySort(groupArray(DepTime)) as Departures, arraySort((x,y) -> y, groupArray(Origin), groupArray(DepTime)) as Origins, length(groupArray(DepTime)) as Hops FROM ontime WHERE (FlightDate = toDate('2017-01-15')) AND (DepTime < ArrTime) GROUP BY Carrier, TailNum ORDER BY Hops DESC, Carrier, TailNum LIMIT 5 \G /* SQL answer*/ Row 1: ── Carrier: HA N488HA Departures: [544658818930104 2115 7132 2144 3155 6170 8181 3192 1202 5212 6] Origins: ['HNL','KOA','HNL','LIH','OGG','LIH','HNL','OGG','LIH','OGG','HNL','OGG','HNL','OGG'] Hops: 14 ...Copy the code
As shown above, we rely on a new and very generic function: arraySort().
The previous query illustrates the different ways it can be used. In its simplest form, arraySort() takes an array parameter and returns the sorted value. Here’s an example.
SELECT arraySort([1224, 1923, 1003, 745]) AS SQL answer s / * * / ┌ ─ s ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ [745100, 3122, 4192 3] └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘Copy the code
We can make things even more interesting by adding a lambda expression that instructs arraySort to process each array value it encounters.
Lambda expressions are always the first argument, if any. The following example uses a lambda expression to reverse the sort order. This lambda expression is applied to each successive array value, and ClickHouse uses the resulting value as the sorting key for the values in the array.
SELECT arraySort(x -> (-x), [1224, 1923, 1003, 745]) AS SQL answer s / * * / ┌ ─ s ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ [1923122, 4100, 3745] └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘Copy the code
Lambda is not always necessary. You can use another method: arrayReverseSort(). The following query returns exactly the same answer:
SELECT arrayReverseSort([1224, 1923, 1003, 745]) AS s
Copy the code
Finally, we can add a different key entirely (the other key is the one that specifies the order). Can understand the alternative sort key). The following example has two arrays and uses a lambda expression to pick the second array as the sort key. ClickHouse will sort the first array using the value of the second array as the key.
SELECT arraySort((x, y) -> y, [1224, 1923, 1003, 745], [2, 1, 3, 4]) AS SQL answer s / * * / ┌ ─ s ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ [1923122, 4100, 3745] └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘Copy the code