Vertical federated learning requirements modeling scenarios
Fintech – Risk management for Small and Micro Business Credit
Pain points
Ideal is full, reality is very skinny
Banks expect intensive and comprehensive information about enterprises and their controllers
But in practice banks usually only have central bank credit reports
Therefore, there is a lack of comprehensive understanding of customers and the distribution of data is severely skewed
Copy the code
Federated learning-based solutions
The bank cooperates with the bill company
Through the joint modeling of billing amount and credit rating label attributes of the central Bank in the last 3/6 months, the expectation is predicted
Copy the code
Insurtech – Personalized pricing
Pain points
-
Insurance company plump ideal
- Accurate personalized user portrait (hundreds of dimensions)
- Comprehensive data coverage
-
The reality of the backbone of insurance companies
- Lack of comprehensive understanding of customers
- The distribution of data is heavily skewed
Personalized insurance pricing based on vertical federalization
Through the federal learning modeling of age, occupation, car rental and other sub-label attributes, the probability of risk prediction, decision whether to risk
Copy the code
Horizontal federated learning requirements modeling scenarios
Weizhong and cooperative bank jointly build anti-money laundering model, hoping to optimize the anti-money laundering model
Copy the code
set
- Y indicates whether there is money laundering
- Cooperative row and micro have (X,Y)
- They don’t expose their (X,Y)
Problems of traditional modeling methods
There are not enough samples of wemedia and partner banks
Copy the code
Expect the result
- Under the condition of privacy protection, the joint model is established
- Federated models outperform unilateral data modeling
Horizontal federated learning
The characteristics of
- All participants have the same data characteristics (including data labels)
The traditional way to view a database is in a table
Group data horizontally in rows
Each row contains the same data characteristics
Copy the code
- Participants are not required to exchange information
- There are FedAvg algorithm
- Good support for deep learning (deep neural networks)
Horizontal federated application scenarios
Security field
Visual business in different places: pedestrian detection, travel detection, area detection, equipment anomaly detection, helmet detection, flame detection, smoke detection
Copy the code
Pain points
- Low number of labels
- Data is scattered and centralized management costs are high
- Discrete delay model update and feedback
Federated Learning Solutions
- Online model update and feedback
- Centralized data upload is not required
- Data protection, high privacy
- Compared with local modeling, the accuracy of the algorithm is further improved
- The formation of network effect, will lower the long tail application cost, improve the overall profit margin of visual business
Horizontal federation addresses healthcare big data
Pain points
- Medical data is highly private, and data maintainers are strict in patient data management and use
- Data is scattered and there are not enough samples available for a single organization
Naturally suitable for medical big data scenarios
- Data security sharing mechanism, effectively protect user privacy
- Securely connect disparate data sources to build data models
- Security federation modeling is almost nondestructive
Multi-agency combined stroke prediction
Federal learning to establish stroke probability prediction model
- Three grade A hospitals + two small hospitals
- Patient hospitalization process data and physical signs data
The effect
- The effect of joint modeling based on federated learning is better than that of any hospital data independent modeling
- There is little difference between the effect of federal learning training model and that of centralized data training model
Samples from each hospital
AUC results were calculated separately for each hospital, all data were calculated centrally, and AUC results were compared federally
Epoch: A complete training of the model using all the data in the training set is called generation training
Similar concepts:
Batch: a backpropagation parameter update of model weights is performed using a small part of samples in the training set, which is called a Batch of data
Iteration: The process of updating the model parameters once using a Batch of data is called a training
Copy the code
FATE
- Industrial level Federal learning system
- Effectively assist multiple organizations in data use and joint modeling in compliance with data security and government regulations
Design principles
- Support a variety of mainstream algorithms: machine learning, deep learning, transfer learning to provide high performance federated learning mechanism
- Supports a variety of multi-party secure computing protocols: homomorphic encryption, secret sharing, hashing, etc
- Friendly cross – domain interaction management solution to solve the federal learning information security audit problem
The technical architecture
Federated ML
Federated learning algorithms: Federated Feature Engineering, Federated Statistics, Federated LR, GBDT, DNN
Copy the code
Fate-Board
Federated modeling visualization:
A. Federated modeling task lifecycle process visualization
B. Visualization of federation model and evaluation report
Copy the code
Fate-Flow
End-to-end federated modeling Pipeline scheduling
A. Federated Modeling of multi-task scheduling
B. Fault tolerance and automatic error recovery
Copy the code
Fate-Serving
Online reasoning service for production environment
A. Online prediction ability of the model
B. Online model management ability
Copy the code
Fate-Cloud Manager
Data cooperation grid sets up basic management facilities
Multi-party association
Copy the code
KubeFate
Cloud native technology management Fate Workload
Have Fate rapidly deployed on the K8S
Copy the code
End-to-end federated modeling Pipeline scheduling and management
DAG defines federated learning pipelines
- Multi – asymmetric Pipeline DAG
- General JSON format DAG DSL, DSL Parser
Federated task cooperative scheduling
- Multi-task queue
- Distribute the tasks
- State synchronization and other collaborative scheduling
Federated model Management
- Federated model access, consistency, versioning, release management
Federated task lifecycle management
- Multi-party start and stop, state detection
Real-time tracking of federated state input and output
- Data, models, and custom indicator logs are recorded and stored in real time
Federated modeling Pipeline scheduling and management
Fate-serving: High-performance federated online inference service
Help customers solve the problem of complex model deployment and inefficient manual resource expansion
Copy the code
- High performance, GRPC protocol based, batch federated request, federated participant model results multi-level caching
- High availability, stateless design, abnormal degradation function
- High elasticity, model & data processing App loading dynamically
Architecture diagram