Abstract: Huawei Cloud TICS service, together with Trusted Technology Laboratory of Munich, Germany, launched the application of differential privacy algorithm based on multi-party SQL job, which can realize the data protection of internal individuals for large-scale aggregation computing.

This article is shared from Huawei cloud community “Multi-party computing, every time there is a huge hidden danger, this article tells you how to solve it!” By breakDawn.

Leakage prevention problem in multiple calculations

In the federated multi-party computing scenario, one party cannot perceive the data in the operation of the other party. The security of data calculation can be ensured through MPC algorithm or TEE security hardware.

However, when the calculation is completed, the results contain potential security risks.

Suppose that an organization hopes to obtain the total tax revenue of all citizens of a province from the government affairs database, so as to conduct strength analysis and statistics of each province. Such requests are normally justified because the aggregate conceals the actual tax amount of each citizen.

select sum(xxx) from ...
Copy the code

However, when performing the second statistical operation, if the organization removes a certain person in the intersection and finds that the total tax revenue decreases by X yuan, then the tax revenue of the person can be inferred immediately, and this process is not well perceived by the computing party and data provider.

Huawei Differential Privacy solution

To solve this problem, Huawei Cloud TICS service, in cooperation with Trusted Technology Laboratory of Munich, Germany, launched the application of differential privacy algorithm based on multi-party SQL job, which can protect the data of internal individuals for large-scale aggregation computing.

The following is the official application of differential privacy algorithm published in the product documentation:

  1. TICS Data Alliance manager chooses to turn on “Result Differential Privacy” in alliance management.

2. When the data provider in the alliance publishes the data set, it selects the sensitive numerical data as the “sensitive” field classification and publishes it.

The explanation of field classification in the TICS product documentation

  • Unique identifier: A field used to identify the entity identity of a thing. Such as id card, employee number, company code and so on. After this parameter is selected, the ids in the data set are protected by certain syntax restrictions and run-time verification to ensure that the ids cannot be pushed back.

  • Sensitive: refers to the sensitive data involved in statistics and calculation. Such as salaries, taxes, electricity consumption. After checked, other participants can only use the four operations sensitive to irreversible deduction, aggregate calculation (SUM/AVG) and conditional filtering (WHERE). TICS protects unique identifiers and sensitive data from being disclosed in plaintext in pairs, and adds differential noise to the sum calculation of sensitive data to protect sensitive data from being disclosed.

  • Non-sensitive: data that does not participate in numerical analysis and is not associated with unique identity. Such as rank, company type.

3. The initiator performs an aggregation job. Take the total tax revenue of various industries as an example, the SQL can be as follows

Select industry, sum(tax_bal), Sum (electric_bal) from data provider Tax a join job initiator power_data b on A.id = B.id group by industryCopy the code

After the rule verification and approval, the job initiator calculates through the platform and obtains the following results:

The job originator then executes an SQL that filters out an ID and attempts to calculate the individual’s tax value using the difference.

Select industry, sum(tax_bal), Tax a join job initiator power_data b on A.id = B.id where A.id <> '123400558' group by industryCopy the code

The id 123400558 corresponds to an individual belonging to the Internet, and the actual tax value is 274

Let’s see what happens when the initiator executes the job a second time

66539.583321490225131-66078.857559963717677 = -461 it can be seen that the actual 274 difference value is not obtained as expected by the user, but a negative number. There is an error in the total aggregation result, but it is within the acceptable range.

As can be seen from the above, TICS successfully protects individual data security in the case of large number of statistics through differential privacy algorithm.

Huawei Trusted Intelligent Computing Service TICS website link:

www.huaweicloud.com/product/tic…

Welcome to the latest version of the experience.

Tics serves the exchange community:

Bbs.huaweicloud.com/forum/forum…

Click to follow, the first time to learn about Huawei cloud fresh technology ~