This is the 10th day of my participation in the August More Text Challenge

Statement:

The database is impala

Requirements:

1. The three data sources have the same field “batCH_no”. According to the same field, the data from the three data sources are displayed in a report

2. The “account period” field to be displayed does not exist in the data source. Judging from the start date and end date of the account period in the data source, the earliest start date and the latest end date are selected

Account period: the earliest account period is 2020-05-20 00:00:00, and the latest account period is 2020-07-10 00:00:00

The account period is 202005/202006/202007 or 202005-202007

3. If a certain field, such as “invoice Number”, “voucher number”, “industry” and “channel name”, contains multiple contents, it will be displayed in the same line. In addition, if the invoice number or voucher number is consecutive, use “~”, if not related, use “/” to separate, duplicate data only one (i.e., deduplication).

For example: invoice number: 1,2,3,4,2; Is continuous and 2 is repeated, the results are 1 to 4

Invoice no. : 1,3,4,8,3; Non-continuous and 3 is repeated, the result is 1/4/4/8 instead of 1/4/8/3

Solution steps:

Step 1: View the source table data as follows

What the customer wants to show:

Step 2: The group_concat(column) method of the impala is used to group the data of a certain column together. Group_concat (column); group_concat (column);

First, let’s merge multiple rows into one row

select b.batch_no,

GROUP_CONCAT(b.invoice_no, '/') invoice_no,

GROUP_CONCAT(b.voucher_no, '/') voucher_no

from receive_data.CVP_V_GET_BATCH_NBMS a

left join receive_data.cvp_v_get_invoice_nbms b

on a.batch_no=b.batch_no

where a.deleted_flag='0'

and b.deleted_flag='0'

and a.batch_no='INBMS202005290100'

group by b.batch_no;
Copy the code

The results are as follows:

If you are careful, you will notice that invoice_no is continuous and uses “~” instead of “/”

Step 3: connect continuous “~” and discontinuous “/”

select b.batch_no,

case when max(invoice_no) = min(invoice_no) then max(invoice_no)

     when cast(max(invoice_no) as int)+1 - cast(min(invoice_no) as int)=count(invoice_no) then concat(min(invoice_no),'~',max(invoice_no))

     else GROUP_CONCAT(b.invoice_no, '/')

     end invoice_no,

case when max(voucher_no) = min(voucher_no) then max(voucher_no)

     when cast(max(voucher_no) as int)+1 - cast(min(voucher_no) as int)=count(voucher_no) then concat(min(voucher_no),'~',max(voucher_no))

     else GROUP_CONCAT(b.voucher_no, '/')

     end voucher_no

from receive_data.CVP_V_GET_BATCH_NBMS a

left join receive_data.cvp_v_get_invoice_nbms b

on a.batch_no=b.batch_no

where a.deleted_flag='0'

and b.deleted_flag='0'

and a.batch_no='INBMS202005290100'

group by b.batch_no;
Copy the code

Step 4: Now we are one step closer to our goal, but the requirements still need to be reweighted. You might want to use Distinct. Let’s take a look at the results:

select b.batch_no,

case when max(invoice_no) = min(invoice_no) then max(invoice_no)

     when cast(max(invoice_no) as int)+1 - cast(min(invoice_no) as int)=count(distinct invoice_no) then concat(min(invoice_no),'~',max(invoice_no))

     else GROUP_CONCAT(distinct b.invoice_no, '/')

     end invoice_no,

case when max(voucher_no) = min(voucher_no) then max(voucher_no)

     when cast(max(voucher_no) as int)+1 - cast(min(voucher_no) as int)=count(distinct voucher_no) then concat(min(voucher_no),'~',max(voucher_no))

     else GROUP_CONCAT(distinct b.voucher_no, '/')

     end voucher_no

from receive_data.CVP_V_GET_BATCH_NBMS a

left join receive_data.cvp_v_get_invoice_nbms b

on a.batch_no=b.batch_no

where a.deleted_flag='0'

and b.deleted_flag='0'

and a.batch_no='INBMS202005290100'

group by b.batch_no;
Copy the code

Alas, error!

Here I post the error message:

[Code: 500051, SQL State: HY000]  [Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:AnalysisException: all DISTINCT aggregate functions need to have the same set of parameters as count(DISTINCT invoice_no); deviating function: count(DISTINCT voucher_no)

), Query: select b.batch_no,

case when max(invoice_no) = min(invoice_no) then max(invoice_no)

     when cast(max(invoice_no) as int)+1 - cast(min(invoice_no) as int)=count(distinct invoice_no) then concat(min(invoice_no),'~',max(invoice_no))

     else GROUP_CONCAT(distinct b.invoice_no, '/')

     end invoice_no,

case when max(voucher_no) = min(voucher_no) then max(voucher_no)

     when cast(max(voucher_no) as int)+1 - cast(min(voucher_no) as int)=count(distinct voucher_no) then concat(min(voucher_no),'~',max(voucher_no))

     else GROUP_CONCAT(distinct b.voucher_no, '/')

     end voucher_no

from receive_data.CVP_V_GET_BATCH_NBMS a

left join receive_data.cvp_v_get_invoice_nbms b

on a.batch_no=b.batch_no

where a.deleted_flag='0'

and b.deleted_flag='0'

and a.batch_no='INBMS202005290100'

group by b.batch_no.
Copy the code

We only need to look at this part:

[Code: 500051, SQL State: HY000]  [Simba]ImpalaJDBCDriver ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:AnalysisException: all DISTINCT aggregate functions need to have the same set of parameters as count(DISTINCT invoice_no); deviating function: count(DISTINCT voucher_no)

All different aggregate functions need to have the same set of arguments

Reason: DISTINCT is only allowed on a column combination.

Step 5: Resolve the use of multiple GROUP_CONCAT in a select. We can use join instead.

select i.batch_no, invoice_no, voucher_no from ( select b.batch_no, max(invoice_no) invoice_no_max, Invoice_no_min (invoice_no) invoice_no_min (invoice_no) invoice_no_min (invoice_no) count(distinct Invoice_no) num) Case when Max (invoice_NO) = min(invoice_NO) then Max (invoice_NO) -- Determine whether there is only one invoice_NO in batch_NO when cast(max(invoice_no) as int)+1 - cast(min(invoice_no) as int)=count(distinct invoice_no) then Concat (min(invoice_no),'~', Max (invoice_no)) If the connection is continuous, use "~" invoice_no else GROUP_CONCAT(distinct B.invoice_no, '/') -- if the "/" concatenation invoice_no end invoice_no from receive_data.CVP_V_GET_BATCH_NBMS a left join is not used continuously receive_data.cvp_v_get_invoice_nbms b on a.batch_no=b.batch_no where a.deleted_flag='0' and b.deleted_flag='0' and a.batch_no='INBMS202005290100' group by b.batch_no) i join (select b.batch_no, max(voucher_no) voucher_no_max, min(voucher_no) voucher_no_min, count(distinct voucher_no) num, case when max(voucher_no) = min(voucher_no) then max(voucher_no) when cast(max(voucher_no) as int)+1 - cast(min(voucher_no) as int)=count(distinct voucher_no) then concat(min(voucher_no),'~',max(voucher_no)) else GROUP_CONCAT(distinct b.voucher_no, '/') end voucher_no from receive_data.CVP_V_GET_BATCH_NBMS a left join receive_data.cvp_v_get_invoice_nbms b on a.batch_no=b.batch_no where a.deleted_flag='0' and b.deleted_flag='0' and a.batch_no='INBMS202005290100' group by b.batch_no ) v on i.batch_no = v.batch_no;Copy the code

The results are as follows:

If you are careful, you may find that we still have “credit period” to deal with. The next step is to deal with the debit period.

Let’s look at the debit period.

Take a look at the source table:

select c.batch_no, sheet_start_date, -- Start sheet_end_date -- End time from receive_data.CVP_V_GET_BATCH_NBMS a left join Receive_data a.batch_no=c.batch_no where a.deleted_flag='0' and c.deleted_flag='0' and a.batch_no='INBMS202005290100';Copy the code

The results are as follows: a total of 85 entries, only partial data are displayed

The results the customer wants

I’m going to post the SQL directly here, just like above, so I don’t want to talk nonsense here.

select batch_no,sheet_start_date,sheet_end_date, Elseconcat (start_date,'~',end_date) end account_date -- If not, Use "~" to connect from (select C.batch_no, min(c.sheet_start_date) sheet_start_date, -- Get the earliest sheet_date Max (C.sheet_end_date) sheet_END_date, Concat (substr(min(c.scheet_start_date),1,4),substr(min(c.scheet_start_date),6,2)) - take the first payment days years concat (substr (Max (c.s. heet_end_date), 1, 4), substr (Max (c.s. heet_end_date), 6, 2)) end_date - take the from monthly payment days at the latest receive_data.CVP_V_GET_BATCH_NBMS a left join receive_data.cvp_v_get_sheet_nbms c on a.batch_no=c.batch_no where a.deleted_flag='0' and c.deleted_flag='0' and a.batch_no='INBMS202005290100' group by c.batch_no ) tCopy the code

To this point, we put the customer demand perfect solution.

Front desk presentation: FineReport is used here.

By the way, I would like to show my personal credentials:

Follow me and take you through more real cases.