The data set required by artificial intelligence industry is not only in great demand, difficult to annotate, but also has serious “island” problem. In the horizontal and vertical dimensions of the industry, data is difficult to be jointly applied for various reasons, so it is difficult to improve the training model of all parties, and the scarcity of data in turn aggravates the “closed protection” of all parties to the data, which is an intractable endless cycle. “Even a clever housewife can make bricks without rice”, the biggest “pain point” of artificial intelligence is this. \

Faced with this problem, AI practitioners have been working hard. In 2016, Google introduced the concept of “federated learning”, which provides a new way to solve such data problems. Federated learning is essentially an encrypted distributed machine learning technology that can build models without disclosing the underlying data and encryption patterns. In 2018, WeBank submitted a proposal to IEEE Standards Society to establish a federal learning standard and was approved, which is also the first international project to establish a standard for the artificial intelligence collaborative technology framework.

Now, the content and implementation method of federated learning have been continuously perfected by weizhong AI team, finally laying a solid foundation for the joint application of data.

Federal learning, bridging the gap ********

Then, how to realize model co-construction while protecting data privacy? This is a problem that federal learning can solve.

To this end, Webank expands the application scope of federal learning and forms three systems: vertical federal learning, horizontal federal learning and federal transfer learning.

The three types of systems are divided by referring to two dimensions: users and user characteristics. The user dimension refers to the user ID, which is generally a data set, including a series of data such as phone number and ID number, to distinguish different users. The user characteristic dimension refers to the user’s financial data, travel data, hobby data and other characteristic data related to the user portrait.

Next, we look at the three concepts of federated learning and how they are divided by user and user characteristic dimensions.

Longitudinal federated learning

Longitudinal federated learning refers to that when there is more user overlap but less user feature overlap between two data sets, we slice the data set according to longitudinal (i.e. feature dimension), and take out the part of data with the same users but different user features for training.

For example, a bank with financial data and a social-media company with user profiles can hardly share data to build models. This is where vertical federated learning comes in, first for sample alignment, then for algorithm dismantling, and finally for a federated model with data privacy, to everyone’s satisfaction.

Horizontal federated learning

Horizontal federation learning refers to that, when the user features of two data sets overlap more and less, we slice the data set according to the horizontal (that is, the user dimension), and take out the part of data with the same user features but different users for training.

The most typical example is the inter-bank joint anti-money laundering model. Although there is a lot of overlap between user characteristics, the sample size is very thin. The final appeal is to build a joint anti-money laundering model under the condition of ensuring the privacy of their respective data, and the model effect is better than unilateral data modeling.

Federated transfer learning

Federated transfer learning refers to that we use transfer learning to overcome data or label shortage without data segmentation when there is less overlap of users and user features in two data sets.

For example, in smart retail, where data on user buying behavior, user preferences, and product characteristics may be scattered across three different companies, federated migration learning can help build models together.

The three types of federated learning finally realize model co-construction without sharing data, which is equal to bridging the gap between data islands, but from theory to practice, pioneers still need to take the lead.

Fortunately, the Banks AI team finished the FATE platform construction and announced in January 2019, open source (Github:https://github.com/WeBankFinTech/FATE), and the first external code contributor was born in March. An open source, industrial-grade federated learning application building platform has finally arrived.

2. Understand FATE and construct federated learning application ********

In terms of architecture, the core functions of FATE are divided into four layers:

1. FATE Workflow: Define federated learning algorithm Workflow using DAG diagrams. FATE FederatedML Functions: includes all functional components of the federated learning algorithm. EggRoll: Distributed computing and storage abstraction. Federated Network: Abstraction of cross-site Network communication.

The core part is FederatedML Function, which is divided into five layers. The Eggroll & Federation API, which contains the computation and storage engine, also provides a friendly API for the target algorithm to call.

Understanding the architecture design and functional module division of FATE platform is only the first step. The learning method of development engineers is always “practice and theory in parallel”. On this point, the whole FATE open source platform has very detailed documentation notes.

Install the deployment command: git clone https://github.com/WeBankFinTech/FATE.git test command: sh. / federatedml/test/run_test. Sh

In the example folder of the project, there are also many examples implemented for reference, Such as homo_logistic_regression (https://github.com/WeBankFinTech/FATE/tree/master/examples/homo_logistic_regression).

This example is a handy way to test Homo Logistic Regression, a horizontally federated Logistic Regression method, with the command sh run_logistic_regression_standalone

Your_install_path/logs/homo_logistic_regression _ {timestamp}

Once you are familiar with the main features and modules of the FATE open source platform through the code, you can start building the federated learning application in four main steps:

1. Select a machine learning algorithm and design multi-party secure computing protocol. 2. Define information variables for multi-party interaction. 3. Build algorithm execution workflow. 4. Implement each functional component in the algorithm engineering flow based on EggRoll & Federation API.

Once you start coding, once you start on the path of learning FATE and federation, artificial intelligence is no longer a high-technology concept that is hard to reach, but a technical problem that you can actually touch and solve.

FinTechathon challenge yourself ********

A person embarking on this journey needs to overcome many difficulties and obstacles, wandering without direction can only make people lost. For this reason, the first FinTechathon College Technology Competition was established to guide theoretical learning and correct growth path through competition.

FinTechathon aims to create the most influential college technology event in the fintech sector. It is a competition for teams of students in the cutting-edge technology fields of artificial intelligence (AI) and Blockchain. The competition is committed to promoting domestic and overseas college students to explore technological breakthroughs and application innovation in the frontier fields of fintech, promoting cross-school and school-enterprise exchanges in related majors, and comprehensively improving students’ innovation ability, practical ability and employment competitiveness.

The competition is divided into AI and blockchain two tracks. AI questions will be based on horizontal and vertical federation learning scenarios, and innovative product applications will be designed by using algorithms supported by FATE, the open source AI federation learning platform. The blockchain competition focuses on FISCO BCOS open source platform, with no limitation of scenarios. The blockchain system is designed and developed. Entries will be judged on the basis of product integrity, commercial value and innovation, opening the door to the mystery of fintech.

In addition, you will have access to:

  • Bonus certificate: up to 100,000 event reward
  • Employment assistance: internship and employment opportunities are preferred
  • Communication: Zero distance with the top experts in the field of technology
  • Peak showdown: PK with elite technical students from universities at home and abroad
  • Project incubation: Excellent projects can get venture capital opportunities
  • Continuous exposure: highly regarded by the fintech sector

Each team is required to be 2-5 people. The organizing committee will provide a team platform for students who register individually. At present, the preliminary stage of the competition has officially opened, click the link to read the original text to directly register for more details. We look forward to working with you to build the future of FinTech!