** Abstract: ** How to ensure that enterprise big data can meet the data access needs of various business departments and at the same time to ensure the security of data access and avoid data leakage is a topic that every enterprise big data asset manager must pay attention to.
Nowadays, enterprise data assets in auxiliary decision-making, user portrait, recommendation system, and many other plays an increasingly important role in the business process, how to ensure enterprise big data in meet the demand of each business department data access and can ensure the security of data access, at the same time to avoid data leakage is each big enterprise data assets managers must focus on the topic.
Based on the technical precipitation in huawei cloud data lake exploration service and rich experience in enterprise data security management, the author discusses how to fine guarantee enterprise big data security from the following points.
1. Security challenges of enterprise big data
2. General practice of data asset permission management
3. Taking Huawei Cloud DLI as an example, the practice of data asset management & case analysis
4. Future prospects
Data isolation, hierarchical access, authorization, a lot of problems
As big data accumulates over a long period of time, enterprises naturally face big data security challenges: Data comes from a wide range of sources, comes from different business units, and needs to serve various business units. Different permissions need to be set for employees at different levels. How to prevent enterprise data from being accessed by unauthorized users, manage data sharing among different service units, and isolate sensitive enterprise data? Enterprises may face the following challenges:
1.1 Data Isolation
Different projects need to isolate business data, such as game operation data, the enterprises in the design of large data analysis platform might expect A game of business data used to support A game operations analysis, B games have support B game operation business data analysis, so need to isolation of business data according to the project, A game operations staff can only access A operational data, Employees of B’s game operation department can only access B’s game operation data
1.2 Hierarchical Data Access
Service departments at different levels have different data access permissions. Upper-level departments can access data at lower-level departments, but lower-level departments cannot access data at higher-level departments. For example, provincial departments can access prefecture-level data, while prefecture-level departments can only access local city data, but cannot access cross-region data or provincial department data. This requires hierarchical management ability to grant different permissions to data.
1.3 Column level Data Authorization
Different service departments have different access permissions on the same data. Therefore, fine-grained data authorization is required. For example, in the banking system, the ID number information in the user table is sensitive information, so the counter system can query the ID number of the user, but the recommendation system does not need the ID information, only the user ID. In this scenario, the user table must be configured with different permissions for different service units.
1.4 Batch Authorization
As the size of an enterprise increases, the number of employees may be very large. It is also a common business scenario to authorize employees by department or batch. For example, if the sales department has a large number of employees, it will be very troublesome to authorize the sales personnel individually, and it is also very complicated to cancel the authorization when the personnel flow. In this case, the authorization model that can be batch authorization or basic role is needed to realize the purpose of once authorization and all the employees in the department can use it.
Which of the four permission models is better?
Currently, popular big data analysis platforms include Hadoop, Hive, Spark, etc., which use POSIX model, ACL model, SQL Standard model and RBAC model. Hadoop uses POSIX and ACL permission models to manage data, and HIVE and Spark use ACL and RBAC permission models to manage data.
POSIX permission model is a file-based permission model, similar to Linux file system permissions. That is, a file has the corresponding OWNER and GROUP. You can only set the OWNER, GROUP, and other user permissions, and only have read and write permissions.
This model is not suitable for enterprise users. There is an obvious disadvantage that it has only one GROUP, so different groups cannot have different permissions, nor can refined permission management be realized. It can only be authorized at the file level, and the authorization is limited to read and write permissions and execute permissions.
ACL is the Access Control List. The ACL permission model makes up for the POSIX permission model and implements refined permission management. By setting access control lists, we can grant multiple permissions to one user or grant different permissions to different users. However, ACLs also have significant disadvantages. When the number of users is large, ACLs become large and difficult to maintain, especially in large enterprises.
Role-based Access Control (RBAC) model is also a common permission model in the industry. It is a role-based permission management model, which first authorizes one or more permissions to a role, and then binds the role to the user to realize user authorization. A user can be bound to one or more roles. The permissions of the user are the permission combination of the roles. RBAC is a popular permission management model that can implement batch authorization and flexibly maintain user permissions.
The SQL Standard model is one of the Hive/Spark permission models. Essentially, the SQL authorization syntax is used to manage permissions. The permission model in Hive is also based on the ACL and RBAC models, that is, users can be authorized directly or by roles.
Data Lake Exploration how to do data asset management?
Huawei cloud DLI combines ACL and RBAC to manage user permissions. Concepts involved in DLI include:
DLI user: The DLI user refers to the IAM account and its subusers. The users in the following access permission description refer to the IAM account and its subusers.
DLI resources: DLI resources are classified into databases, tables, views, Jobs, and queues. Resources are isolated by project, and resources of different projects are not accessible to each other. Tables and views are child resources under the Database.
DLI rights: DLI rights are required for performing DLI operations. DLI operations have different permissions, such as CREATE_TABLE for creating a table, DROP_TABLE for deleting a table, and SELECT for querying a table.
DLI uses unified Identity Authentication (IAM) policies and ACCESS control lists (ACLs) of DLI to manage resource access permissions. The unified Identity Authentication (IAM) policy controls the isolation of project-level resources and defines users as project administrators or ordinary users. Access control lists (ACLs) control access permissions and authorization management for queues, databases, tables, views, and columns.
DLI uses unified identity authentication to complete user authentication and user role management. DLI predefined roles in IAM are as follows: Tenant Administrator (Tenant Administrator), DLI Service Admin (DLI Administrator), and DLI Service User (DLI common User). The tenant administrator or DLI administrator is the administrator in THE DLI and can operate all resources of the project, including creating databases, creating queues, and operating databases, tables, views, queues, and jobs under the project. Common users cannot create databases or queues. They are authorized by the administrator and can perform operations such as creating tables and querying tables.
DLI uses ACL and RBAC models to manage user permissions. An administrator or the owner of a resource can grant one or more permissions to another user, or create roles, grant permissions to the created roles, and then bind the roles and users.
DLI provides apis and SQL statements to implement the above permission management, facilitating flexible authorization. For details, see DLI Permission Management.
Case analysis
Take the bank’s big data practice to analyze how to use DLI to manage data permissions. As we all know, banks have accumulated a large amount of user data, including user information, transaction information, account information and so on. The banking business is also very complex, involving various business lines such as teller system, supervision department, operation department and Marketing Department. Each business line has different requirements for data and different access rights. We take the anti-money laundering business and portrait business to briefly introduce how to use THE DLI platform to achieve large number analysis and data asset rights management.
Typical anti-money laundering business is generally a large amount of early warning and blacklist mechanism. It is necessary to screen out large amount of transaction data or transaction data of blacklisted persons from massive transaction data and feed these data back to the supervisor for further analysis. The data involved are transaction data, account information and blacklist information.
Generally, the portrait will analyze the transaction type and transaction data of the user, infer the user’s interests and hobbies, give the user a portrait, and mark the user’s interest points. Involves the transaction type and account information in the transaction information.
In DLI, the data manager generates user information table, transaction data table, account information table and blacklist information table, and imports the corresponding data.
In reflection business money, awarded the anti-money laundering business department or personnel account information table query, query transaction data tables permissions, blacklist information query, and the account information table trading joint query, data table and blacklist table to find out the abnormal transaction information and related personnel, feedback to anti-money laundering supervision personnel.
Portraits in the painting business, by the data administrator granted business units or personnel of user information table query access, transaction data in the table type and amount, trade merchants the query and authority of the column, such as account information in the table account ID and user ID column of the query access, through the form of joint and aggregation query, find out the user common trading information, include the transaction type, The amount of money, the location, and so on.
future
Traditional enterprise data assets face several challenges. Data is generated by all service departments. Data standards are inconsistent and maintenance is complex. The data of each business department is stored in different systems, so the data is easy to form islands and cannot be effectively mined and utilized. Data sharing between departments is complex, and it is easy to form a network authorization network, resulting in high maintenance costs.
Data lake DLI solution can solve such problems by using a unified data management platform, data storage, and data standards for unified data asset management and authorization management.
During huawei Cloud 828 enterprise cloud Festival, data lake exploration DLI is also among the active products, enterprises with data analysis needs to take advantage of the promotion to try.
Click to follow, the first time to learn about Huawei cloud fresh technology ~