background

Play the awk database processing, will be a column in the json data format of data extraction and parse it out, and additional behind the corresponding to the data, with | up output. The original data format is:

335970 | 115 | {"key1":value,"key2":value}
Copy the code

Output:

Sgm6p 335970 | 115 | 0421000841 | 13075235 | 703160 | | 0. 703160Copy the code

Implementation analysis

First, define a data file data.log with the following contents:

335970 | 115 | {"traceid":"0421000841sgm6p"."pixeldata": - 13075235."pixelcoordinate":"703160"."pixelabnormaldata": 0."collectedpixeldata":"703160"335971 | | {116}"traceid":"0421000666sgm6p"."pixeldata": - 12325235."pixelcoordinate":"733144"."pixelabnormaldata": 1,"collectedpixeldata":"333132Copy the code

Write processing orders, separated with | and | output after splicing, here is an important data reprocessing of the third column, awk is amazing, you can write a long logic to deal with.

awk -F "|"  -vOFS="|"  '{l=split($3,arr,",\""); $3 = ""; for(i=1; i<=l; i++){ll=split(arr[i],arr2,":"); if(i! = 1) $3 = $3 "|". $3=$3arr2[2]} ; gsub(/\"/,"",$3); gsub(/\}/,"",$3); print }' ./data.log
Copy the code

Execute command:

Process analysis

Awk does not format the process, so the command is very long and difficult to read. Here is a breakdown of the process:

{
l=split($3,arr,", \ ""); // For column 3,"Split, get JSON key-value$3=""; // initialize the third column for(I =1; i<=l; I ++){ll=split(arr[I],arr2,":"); // Split if(I! = 1)$3=$3"|"; / / the first key - the value of time, needs to connect | symbols$3=$3arr2[2]}. Gsub (/\"/,");",$3); // Replace the value of"
   gsub(/\}/,"".$3); // Replace the right}print 
}
Copy the code

Here, if there is no gSUB substitution logic, the output is marked with JSON quotes and parentheses:

Apocalypse of Programming

Awk commands can write very long processing logic, which is very powerful. Here are a few things to note:

  1. -vOFS="|"Specify the output splicing symbol for |;
  2. Array subscripts in processing logic start at 1, contrary to normal programming specification;
  3. Stitching JSON values, the first value does not need | connector, because the awk oneself with the connector;
  4. To remove useless characters from the final result, usegsubGlobal substitution, we need to get rid of it"}They are all processed logic keywords that must be carried outescape;
  5. Last sentenceprintIs the direct output of AWK.