This is the second day of my participation in the November Gwen Challenge. Check out the details: the last Gwen Challenge 2021

preface

Recently, in the work of data access, two kinds of JSON log data were added. There was only one field whose name was inconsistent. I stored the data in Kafka and then parsed it into CSV format. Finally, I generated files and loaded them into the database to generate data tables. Here I decided to use Flume to fulfill this requirement.

The implementation method is to write Flume interceptor through Java to parse JSON. A common practice would be to define two entity classes and define two interceptors for parsing. But I think there must be a better way to do this, and having recently ditched FastJSON, I decided to embrace Gson and work on defining an entity class and Interceptor for parsing.

The entity class code is as follows:

public class NlkfReqDomain {
    private String logday;
    @SerializedName(value = "reqcontent", alternate = {"rspcontent"})
    private String content;
    private String reqdate;
    @SerializedName("@version")
    private String version;
    private String type;
    @SerializedName("@timestamp")
    private String timestamp;
 }
Copy the code

As you can see from the code above, we use the SerializedName annotation for the Content field.

The value attribute of @serializedName is the name of the serialized and deserialized field. Alternate is a set of alternate names used only when there is no value corresponding field name in deserialization

Gson version problem

For example, define two strings in JSON format

Converting from a string to an entity class is deserialization, and converting from an entity class to JSON is serialization:

It is easy to see that the inContent value of the value attribute participates in serialization and deserialization, and the alternate outContent only participates in deserialization.

The outContent field was resolved to null when I was fully confident about the usage of @serializedName above. I thought I had forgotten to compile. After cleaning, I tried the package again and found that the output was still NULL.

Because the outContent field is very long, I remember flume has a maximum event(data) limit, so I went to check flume printed logs, but no error!!

I wrote a demo on the machine and found that outContent can be parsed normally. Is it code paranormal event ???? After searching multiple sources, I found

Alternate is available only after version 2.4

Gson-2.2.2.jar is used locally and dependencies are put into jar packages, but there is a gson-2.2.2.jar in lib. According to the principle of Java class loading, the repeated classes in Gson2.8 will not be loaded after gson2.2 is loaded. After deleting the gson-2.2.2.jar, the small hand knocks. The JPS -m | | grep XXX xargs kill 9 and sh XXX. Sh restart the flume, complete parsing!!!!!

conclusion

Python’s JSON parsing can be done through json packages, without the need to define entity classes beforehand, which is really convenient. Flume does not support Python. Flume does not support Python. .

import json str1 = "{\"inContent\": \"Hello World\"}" str2 = "{\"outContent\": Hello World print(json.load (str1)['inContent']) print(json.load (str2)['outContent'])Copy the code