background
Play the awk database processing, will be a column in the json data format of data extraction and parse it out, and additional behind the corresponding to the data, with | up output. The original data format is:
335970 | 115 | {"key1":value,"key2":value}
Copy the code
Output:
Sgm6p 335970 | 115 | 0421000841 | 13075235 | 703160 | | 0. 703160Copy the code
Implementation analysis
First, define a data file data.log with the following contents:
335970 | 115 | {"traceid":"0421000841sgm6p"."pixeldata": - 13075235."pixelcoordinate":"703160"."pixelabnormaldata": 0."collectedpixeldata":"703160"335971 | | {116}"traceid":"0421000666sgm6p"."pixeldata": - 12325235."pixelcoordinate":"733144"."pixelabnormaldata": 1,"collectedpixeldata":"333132Copy the code
Write processing orders, separated with | and | output after splicing, here is an important data reprocessing of the third column, awk is amazing, you can write a long logic to deal with.
awk -F "|" -vOFS="|" '{l=split($3,arr,",\""); $3 = ""; for(i=1; i<=l; i++){ll=split(arr[i],arr2,":"); if(i! = 1) $3 = $3 "|". $3=$3arr2[2]} ; gsub(/\"/,"",$3); gsub(/\}/,"",$3); print }' ./data.log
Copy the code
Execute command:
Process analysis
Awk does not format the process, so the command is very long and difficult to read. Here is a breakdown of the process:
{
l=split($3,arr,", \ ""); // For column 3,"Split, get JSON key-value$3=""; // initialize the third column for(I =1; i<=l; I ++){ll=split(arr[I],arr2,":"); // Split if(I! = 1)$3=$3"|"; / / the first key - the value of time, needs to connect | symbols$3=$3arr2[2]}. Gsub (/\"/,");",$3); // Replace the value of"
gsub(/\}/,"".$3); // Replace the right}print
}
Copy the code
Here, if there is no gSUB substitution logic, the output is marked with JSON quotes and parentheses:
Apocalypse of Programming
Awk commands can write very long processing logic, which is very powerful. Here are a few things to note:
-vOFS="|"
Specify the output splicing symbol for |;- Array subscripts in processing logic start at 1, contrary to normal programming specification;
- Stitching JSON values, the first value does not need | connector, because the awk oneself with the connector;
- To remove useless characters from the final result, use
gsub
Global substitution, we need to get rid of it"
和}
They are all processed logic keywords that must be carried outescape; - Last sentence
print
Is the direct output of AWK.