This is the fourth day of my participation in the August Text Challenge.More challenges in August
Layering is introduced
Demand analysis and implementation ideas
As discussed earlier when introducing the concept of real-time data warehouse, the main purpose of building real-time data warehouse is to increase the reuse of data calculation. Every time a new statistical demand is added, it is not calculated from the original data, but from the semi-finished products.
Collect kafka directly as ODS layer.
User behavior logs and business data are read from Kafka’s ODS layer, processed simply, and written back to Kafka as the DWD layer.
DWS from DWD may have double calculations, so DWM layers are extracted.
DWS is lightly aggregated to handle many real-time queries.
A simple collection of ADS that provides an external interface.
Specific responsibilities of each layer
layered | Data description | Generating computing tools | Storage media |
---|---|---|---|
ODS | Raw data, logs, and business data. | Log server, Maxwell | kafka |
DWD | Triage by data object, such as orders, page visits, and so on. | FLINK | kafka |
DIM | Dimensional data | FLINK | HBase |
DWM | Further processing of some data objects, such as independent access, jump out behavior. It’s still detailed data. | FLINK | kafka |
DWS | A topic wide table is formed by lightly aggregating multiple fact data based on a dimensional topic. | FLINK | Clickhouse |
ADS | Filter and aggregate Clickhouse data for your visual needs. | Clickhouse SQL | Visual presentation |
Computing project preparation
-
New project Tmallt-RealTime
-
The new package
The package name use app Flink all calculation procedures app.dwd Flink DWD layer calculation procedures app.dwm Flink DWM layer calculation procedures app.dws Flink DWS layer calculation procedures app.func Calculation function bean Entity class common Public constants enums Enumeration class utils Utility class -
Pom adds dependencies
<! -- Config version --> <properties> <java.version>1.8</java.version> <maven.compiler.source>${java.version}</maven.compiler.source> <maven.compiler.target>${java.version}</maven.compiler.target> <flink.version>1.12.0</flink.version> <scala.version>2.12</scala.version> <hadoop.version>3.1.3</hadoop.version> </properties> <! - rely on -- -- > <dependencies> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>${flink.version}</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java_${scala.version}</artifactId> <version>${flink.version}</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-kafka_${scala.version}</artifactId> <version>${flink.version}</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-elasticsearch6_${scala.version}</artifactId> <version>${flink.version}</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients_${scala.version}</artifactId> <version>${flink.version}</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-cep_${scala.version}</artifactId> <version>${flink.version}</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-json</artifactId> <version>${flink.version}</version> </dependency> <dependency> <groupId>com.alibaba</groupId> <artifactId>fastjson</artifactId> <version>1.2.68</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> </dependency> <dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-api</artifactId> <version>1.7.25</version> </dependency> <dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-log4j12</artifactId> <version>1.7.25</version> </dependency> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-to-slf4j</artifactId> <version>2.14.0</version> </dependency> <dependency> <groupId>org.projectlombok</groupId> <artifactId>lombok</artifactId> <version>1.18.20</version> </dependency> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>5.1.47</version> </dependency> <dependency> <groupId>com.alibaba.ververica</groupId> <artifactId>flink-connector-mysql-cdc</artifactId> <version>1.2.0</version> </dependency> <dependency> <groupId>commons-beanutils</groupId> <artifactId>commons-beanutils</artifactId> <version>1.9.3</version> </dependency> <dependency> <groupId>org.apache.phoenix</groupId> <artifactId>phoenix-spark</artifactId> <version>5.0.0 HBase - 2.0</version> <exclusions> <exclusion> <groupId>org.glassfish</groupId> <artifactId>javax.el</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>redis.clients</groupId> <artifactId>jedis</artifactId> <version>3.3.0</version> </dependency> <dependency> <groupId>ru.yandex.clickhouse</groupId> <artifactId>clickhouse-jdbc</artifactId> <version>0.3.0</version> <exclusions> <exclusion> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> </exclusion> <exclusion> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-core</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-jdbc_${scala.version}</artifactId> <version>${flink.version}</version> </dependency> </dependencies> <! -- Package plugin --> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-assembly-plugin</artifactId> <version>3.0.0</version> <configuration> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> <executions> <execution> <id>make-assembly</id> <phase>package</phase> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin> </plugins> </build> Copy the code
-
The configuration file
log4j.rootLogger=info,stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
Copy the code
Next period: ODS layer and DWD layer concrete implementation
The column continues to be updated at 👇🏻👇🏻👇🏻👇🏻👇🏻👇🏻👇🏻👇🏻