Recs   FlinkCommodityRecommendationSystem (recommendation system based on Flink)

1. Introduction

The System is named Recs, inspired by Recommendation System. The logo is made on an online logo website. The author developed this project in order to learn Flink and related big data middleware. For presentation purposes, a matching Web was developed using Springboot + Vue. The author has experience in Web development of Python + Django + JavaScript. Considering that the project uses Java for development, in order to unify the technology stack, I have learned Springboot framework and Vue.

This project uses the ECommerceRecommendSystem open source learning project for reference, and the front-end part is much used for reference, which is optimized on the basis of the framework built by the author. Modified UI and some bugs, and some new functions. After the development and exercise of this project, the author has a systematic understanding of the technology related to big data, and has gained a lot. During the development process, we encountered many problems, but solved them one by one. The author’s experience is that the best way to solve the problem is to read official documents and actively use Google. Finally, the relevant technologies are all learned and applied, and the knowledge is relatively one-sided, so there are many areas to be optimized in this project. Welcome everyone to issue, learn together and make progress together.

2. Project Introduction

2.1 Recs System Architecture

Main workflow of the system:

  • User login/registration system.

  • Users rate items.

  • The rating data is sent to the real-time recommendation task of the recommendation module through Kafka.

  • The system performs real-time recommendation tasks and stores data in the hbase rating and userProduct tables. Real-time tasks include real-time topN and recommendations based on user behavior.

  • Real-time topN Stores the calculated results in the hbase onlineHot table. Based on user behaviors, it is recommended to store the calculated results in the hbase onlineRecommend table.

  • You can query hbase on the Web to obtain data required by related modules and display the results.

2.2 the home page

There are four modules:

  • Based on user behavior recommendation, when the user scores the product, Flink will score the product according to the user’s history and calculate the recommendation result combined with itemCF.

  • Hot commodities: Historical hot commodities

  • Good reviews: Goods with high ratings

  • Real-time hot items: Use the Flink time slide window to tally the hot items in the past hour, sliding every 5 minutes.

2.3 Product Details

  • Display product details

  • People who have seen the item have also seen: Recommendations based on itemCF

2.4 the login

3. Module description

3.1 Recommendation Modules

Development environment: IDEA + Maven + Git + Windows && WSL

** Software architecture: ** Flink + hbase + Kafka + mysql + Redis

Development guidance: The calculation tasks of Flink are stored in the task package, DataLoader is the data loading task, OfflineRecommender is the offline recommendation task, OnlineRecommender is the real-time recommendation task. Read the code module by module.

3.1.1 Guess you like it

Real-time recommendations:

  • Query the list of recently rated items from redist with redis key “ONLINE_PREFIX_” + userId

  • Query the product list of historical user scores in the hbase table userProduct.

  • Query related product lists in the hbase table itemCFRecommend based on the productId scored by the user

  • Filter the list of related goods according to the list of recently rated goods and the list of historical rated goods.

  • Re-order the recommended items according to the similarity of the recently rated items to this item and the user’s historical rating.

3.1.2 Hot commodities

The products rated by users at all times are sorted in reverse order according to the scoring times, and the popular products are selected.

  • Flink loads the hbase rating table into the memory, calculates the occurrence times according to the productId Group

  • Sort in reverse order by number of occurrences.

3.1.3 Favorable products

In reverse order of average product score,

3.1.4 Real-time Hot Items

Use Flink timeWindow to sort the data of the past hour and select the hot items. The time window slides every five minutes.

3.1.5 People who have seen the product have also seen it

Item Based Recommendation (itemCF)

3.1.6 Data loading module

Consume data of kafka topic as rating and store data in hbase rating table. To ensure data uniqueness, rowKey format is:

userId_productId_timestamp

3.2 the back-end (recommend_backend)

Development environment: IDEA + Maven + Git + Windows && WSL (Ubuntu 20.4) + PostWomen

Technical architecture: Springboot + Hibernate + mysql + hbase

Development guidance: Controller module is the core of the back end, starting from restFul API.

Project structure:

3.3 the front (recommend_front)

Development environment: VScode + NodeJS + Windows && WSL

Technical architecture: Vue + typescript + element-UI

4. Develop operation procedures

4.1 Environment Construction

  • mysql

  • hbase

  • flink

  • redis

  • kafka

  • zookeeper

4.2 Creating a Data table

  • mysql

There are two tables, one is Product, which stores the details of an item, and the other is User, which stores the user information.

Build table SQL script in the recommendation/SRC/main/resources/mysql in SQL

  • hbase

    • rating

    • userProduct

    • itemCFRecommend

    • goodProducts

    • historyHotProducts

    • onlineRecommend

    • onlineHot

Build table statements in the recommendation/SRC/main/resources/hbase. TXT

4.3 Data Storage

Commodity information stored in the recommendation/SRC/main/resources/product in the CSV file, we run a flink task load the data to mysql. The corresponding table is the Product table we created earlier

  • Start Flink, run Recommendation /… /task/DataLoader/DataLoaderTask.java

  • Product information is stored in mysql

4.4 Starting the Development Environment

  • Executing the startup script

The startup script is used to start previously deployed hbase, Kafka, Flink, Redis, and ZooKeeper with one click

To facilitate the development, the author wrote a shell script to start and stop environment, on the recommendation/main/resources directory, startAll respectively. Sh and stopAll. Sh

  • Start the Springboot back-end project

  • Start the VUE front end

  • Start a real-time recommendation task

  • Offline tasks start periodically

Finally, the author is going through the autumn 2020 recruitment, if you think this project is good, welcome to give a star!