Hive Permission Management
The purpose of permission management is to verify whether a user has the permission to perform an operation, and the purpose of security authentication is to verify that a user is a legitimate user. Today, we mainly introduce permission management rather than security authentication
Requirements for cluster security:
- Supports multiple components, preferably major components of the current big data stack, such as HDFS, HBASE, HIVE, YARN, and KAFKA
- Supports fine-grained permission control, including HIVE columns, HDFS directories, HBASE columns, and YARN queues
- Open source, active community, minimal changes to existing clusters, and in line with industry trends.
Existing schemes:
- Hadoop and Hive permission control
- Kerberos Security Authentication
- Apache Ranger Permission management solution
Hadoop and Hive permission authentication
Hadoop permissions:
- Hadoop distributed file system implements a file and directory permission model similar to POSIX systems
- Each file and directory has an owner and a group
- A file or directory has different permissions for its owner, other users in the group, and all other users
- File or directory operations pass pathnames to NameNode to check permissions on paths
- The user who starts NameNode is the super user and can pass all permission checks
- You can specify a specific group of users as superusers through configuration
Hive can use Hadoop’s file system to manage permissions by checking whether current users have permissions to operate files
Hive’s default permission control is not completely secure. Hive permission control is used to prevent users from performing inappropriate operations.
Three authorization models
Storage Based Authorization in the Metastore Server – Protects Metastore metadata, but does not provide more granular access control (e.g., column level, row level). SQL Standards Based Authorization in HiveServer2 Hive Authorization Based on SQL Standards – This mode is fully compatible with SQL Authorization model. SQL standards-based SQL compliant authorization model that supports both user authorization and role authorization
Role is a set of rights. A user can have one or more roles through role. By default, a user can have two roles: public and admin
By default, two roles are provided for accessing Hive data using Hiveserver2: public and admin. All users belong to the public role by default, and only users with the admin role can be authorized. Common users can only grant their permissions to other users.
Therefore, we must add at least one user with the role admin. Create/delete roles command users and groups using users and groups on Linux machines, and roles must be created by ourselves (hive).
Public users have permission to perform authorized operations, but by default public users do not have permission to create tables
3. Default Hive Authorization (Legacy Mode) Hive Authorization – Designed only to prevent user misoperations, but not to prevent malicious users from accessing unauthorized data
Default Hive Authorization (Legacy Mode)
Hive’s default permission control is not completely secure. Hive permission control is used to prevent users from performing inappropriate operations. Instead of preventing unauthorized users from accessing data
This is because the permission management mechanism is not perfect. There is no verification mechanism. For example, if you perform a grant operation, it does not check whether you have permissions.
Storage Based Authorization in the Metastore Server
In earlier Hive versions, Linux users and user groups control user permissions, but the CREATE, SELECT, and DROP operations of Hive tables cannot be controlled. Hive manages multiple users and controls permissions based on a metadata database. Data is divided into metadata and data files. The metadata is stored in mysql and the data file is HDFS. The control metadata is the data file that can be accessed.
The Hcatcalog API accesses Hive data in the form of metaStore metadata, such as MapReduce, Impala, Pig, Spark SQL, and Hive Command Line. In fact, this permission control occurs when interacting with Metastore service. The realization is to implement permission verification when calling Metastore Api, which mainly prevents malicious users from accessing and modifying Metastore data, but does not provide more fine-grained access control (for example: Column level, row level).
Here is the configuration
<property>
<name>hive.security.metastore.authorization.manager</name>
<value>org.apache.hadoop.hive.ql.security.authorization.DefaultHiveMetastoreAuthorizationProvider</value>
<description>authorization manager class name to be used in the metastore for authorization.
The user defined authorization class should implement interface
org.apache.hadoop.hive.ql.security.authorization.HiveMetastoreAuthorizationProvider.
</description>
</property>
<property>
<name>hive.security.metastore.authenticator.manager</name>
<value>org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator</value>
<description>authenticator manager class name to be used in the metastore for authentication.
The user defined authenticator should implement interface
org.apache.hadoop.hive.ql.security.HiveAuthenticationProvider.
</description>
</property>
<property>
<name>hive.metastore.pre.event.listeners</name>
<value> </value>
<description>pre-event listener classes to be loaded on the metastore side to run code
whenever databases, tables, and partitions are created, altered, or dropped.
Set to org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener
if metastore-side authorization is desired.
</description>
</property>
Copy the code
SQL Standards Based Authorization in HiveServer2
- Fully compatible with SQL authorization model
- In addition to user authentication, user authentication and role authentication are supported
- A role is a set of permissions that can be used to authorize users
- A user can have one or more roles
The admin user can be configured in the configuration file. The role name is case-insensitive, which is the same as SQL, but the user name is case-sensitive.
<property>
<name>hive.users.in.admin.role</name>
<value>root</value>
</property>
Copy the code
- After the current authentication mode is enabled, commands such as DFS, add, delete, compile, and reset are disabled. Setting hive Configuration using the set command is restricted to certain users. Can also be used by modifying the configuration file hive – site. XML hive. Security. Authorization. SQLSTD. Confwhitelist configured, which users can use these commands, Add or drop functions belong to the admin role, so if you want to add custom functions, you can add a permanent function for the admin user and then use it by other users.
- Add, delete functions and macros (batch scale) operations are only available to users with admin.
- User-defined functions (open support for permanent custom functions), available through the admin role
- The Transform function is disabled.
Public role
By default, all users belong to role public by default, and authorization can be completed only by users with role admin (common users can only grant their permissions to other users).
Public users have permission to perform authorized operations, but by default public users do not have permission to create tables
The admin role
All roles other than admin are assigned to users by default. That is, if you have permissions for the role, when you perform show current roles; If you belong to the list of roles, you need to set role admin. To obtain permissions for this role
That is, the admin role is not in the user’s current Roles list
You can set hive.users.in.admin.role; View which users have admin rights
The basic configuration
<!-- Enabled authentication is disabled by default -->
<property>
<name>hive.security.authorization.enabled</name>
<value>true</value>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
</property>
<!-- User list of admin role -->
<property>
<name>hive.users.in.admin.role</name>
<value>root</value>
</property>
<property>
<name>hive.security.authorization.manager</name>
<value>org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory</value>
</property>
<property>
<name>hive.security.authenticator.manager</name>
<value>org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator</value>
</property>
Copy the code
Note: Users with an admin role need to run the “set role admin” command to obtain permissions for the admin role, which means that even if you are in the admin role list, you still need to obtain permissions once
Permission to create files in HIVE
Hive uses a default setting to configure the default permissions for new files. Set the file authorization mask to 0002, that is, 664 permission. For details, see Hadoop and Hive user configurations.
<property>
<name>hive.files.umask.value</name>
<value>0002</value>
<description>The dfs.umask value for the hive created folders</description>
</property>
Copy the code
HIVE authorized storage check
When a hive. Metastore. Authorization. Storage. Checks attribute is set to true, the hive will stop without permissions users delete table. However, the default value for this configuration is false and should be set to true.
<property>
<name>hive.metastore.authorization.storage.checks</name>
<value>true</value>
<description>Should the metastore do authorization checks against the underlying storage for operations like drop-partition (disallow the drop-partition if the user in question doesn't have permissions to delete the corresponding directory on the storage).</description>
</property>
Copy the code
The table creator has full permissions on the table
<property>
<name>hive.security.authorization.createtable.owner.grants</name>
<value>ALL</value>
<description>The privileges automatically granted to the owner whenever a table gets created.
An example like "select,drop" will grant select and drop privilege to the owner of the table
</description>
</property>
Copy the code
This configuration is NULL by default. It is recommended to set it to ALL so that the user can access the table he or she created, otherwise the creator of the table cannot access the table, which is obviously not reasonable.
The user who creates the hive table is the Owner of the hive table. In fact, the Owner of the HDFS folder is the Owner of the HIVE table. This user has the same permissions as the Linux operating system
Hive permission operation
Before we start, one thing to note about the HiveServer2 command line client is that when you log in without specifying the current user, the Hive user is not the current user of your system, but an anonymous user. I’ve had a headache with this for a while. So just separate this out before we start
1. Authorize users
First I create a new user ‘kingcall’ and then I cut to that user, And then use this user connects to the hive up beeline -u JDBC: hive2: / / localhost: 10000 / ods -u root -p www1234 -n kingcall, next we use admin permissions of the user to create a table
create table role_test(id int,name string); The KingCall user is then asked to query
0: jdbc:hive2://localhost:10000/ods> select * from role_test;
Error: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: Principal [name=kingcall, type=USER] does not have following privileges for operation QUERY [[SELECT] on Object [type=TABLE_OR_VIEW, name=ods.role_test]] (state=42000,code=40000)
Copy the code
Create table owner (admin); create table owner (admin); create table owner (admin)
GRANT SELECT ON table role_test to user kingcall; Now let’s just ask the KingCall user to query again, and that’s the basic assignment
If you perform delete, you will still not be able to perform the delete operation, because we only gave the SELECT privilege, you can look at the current USER permissions SHOW GRANT USER KINGCAL;
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Error showing privileges:
User : liuwenqiang is not allowed check privileges of another user : KINGCAL. User has to belong to ADMIN role and have it as current role, for this action. (state=08S01,code=1)
Copy the code
Although our liuWENqiang user is a user in the admin role list, it cannot view the permissions of other users because it does not have an admin role. Therefore, we need to execute set role admin first. To view the permissions of other users
In fact, this is a simple demonstration of how to assign permissions to users. In fact, we always maintain the principle of minimum permissions in any system, so we recommend that we also adopt this strategy when assigning permissions to users. There are many permissions that can be given to users. The following table lists all permissions that can be performed in Hive. In addition to granting privileges to a specific table, we can also grant privileges to an entire library.
2. Authorize groups
When using N tables and N users in Hive, it will crash if the administrator authorizes each table for each user. So, at this point, you can do GROUP authorization. Hive user groups are equivalent to POSIX user groups.
-- Database based
grant select on database default to group admin;
-- Based on a table
grant select on table ppdata to group admin;
Copy the code
3. Role management
ROLES come into play when authorization to user groups becomes inflexible. A user can be placed in a role, and the role can then be authorized. Roles are managed internally by Hive, unlike user groups, which are controlled by the operating system. Users and user groups are managed by the operating system.
Create the role
CREATE ROLE role_name;
Role_database_select = roLE_database_select = role_database_select = role_database_select = role_database_select = role_database_select = role_database_select = role_database_select = role_database_select = role_database_select = role_database_select = role_database_select Then, the whole permission control can be completed by cooperating with the user group. For example, we can create a role for the query permission of our report layer library, which can be taken by the data analyst of the business side, or we can subdivide the permission of the report layer library according to the business line.
Empower characters
CREATE ROLE role_ods_select;
Grant table_name on [table] table_name to role role_name grant table_name on database_name
grant select on database ods to role role_ods_select;
-- Queries the permissions of a role
show grant role role_ods_select on database ods;
+-----------+--------+------------+---------+------------------+-----------------+------------+---------------+---------- ------+--------------+
| database | table | partition | column | principal_name | principal_type | privilege | grant_option | grant_time | grantor |
+-----------+--------+------------+---------+------------------+-----------------+------------+---------------+---------- ------+--------------+
| ods | | | | role_ods_select | ROLE | SELECT | false | 1610504539000 | liuwenqiang |
+-----------+--------+------------+---------+------------------+-----------------+------------+---------------+---------- ------+--------------+
Copy the code
Assign roles to users
We revoke the select privileges for the ROLE_test table that we gave the kingCall user in the previous demo
revoke select on table role_test from user kingcall;
Select roLE_ODS_select from roLE_test; select roLE_ODs_select from roLE_test
grant role role_ods_select to user kingcall;
To see what roles the current user has, note that if you use Show current Roles to get the same roles, you need to use Set roles to see the new roles
show role grant user kingcall;
+------------------+---------------+----------------+--------------+
| role | grant_option | grant_time | grantor |
+------------------+---------------+----------------+--------------+
| public | false | 0 | |
| role_ods_select | false | 1610504936000 | liuwenqiang |
+------------------+---------------+----------------+--------------+
-- then perform the query
select * from role_test;
Copy the code
SHOW CURRENT ROLES All users can perform this function
This command can be executed only by admin when the show roles command displays all hive roles
show role grant user kingcall; Ordinary can only view their own, admin user can view all people
SHOW PRINCIPALS role_ods_create To check which users are assigned a role
show grant user kingcall on all;
show grant role role_ods_select on all;
show grant on table test_role;
set role
Set role is an interesting command, it is intended to obtain user permissions, because the role is assigned to the user by the administrator, so you can not consider it to obtain the role, because the user’s current role
CREATE ROLE role_ods_create;
grant ALL on database ods to role role_ods_create;
Let's take a look at what permissions this role now has
show grant role role_ods_create on database ods;
+-----------+--------+------------+---------+------------------+-----------------+------------+---------------+---------- ------+--------------+
| database | table | partition | column | principal_name | principal_type | privilege | grant_option | grant_time | grantor |
+-----------+--------+------------+---------+------------------+-----------------+------------+---------------+---------- ------+--------------+
| ods | | | | role_ods_create | ROLE | DELETE | false | 1610506484000 | liuwenqiang |
| ods | | | | role_ods_create | ROLE | INSERT | false | 1610506484000 | liuwenqiang |
| ods | | | | role_ods_create | ROLE | SELECT | false | 1610506484000 | liuwenqiang |
| ods | | | | role_ods_create | ROLE | UPDATE | false | 1610506484000 | liuwenqiang |
+-----------+--------+------------+---------+------------------+-----------------+------------+---------------+---------- ------+--------------+
Then you can show current Roles and see that the Kingcall user does not have this privilege
Copy the code
SET ROLE (role_name|ALL|NONE);
4. Hive permission management commands
set role admin; Set the current user role to admin
Add, delete, view, and set roles:
CREATE ROLE role_name;
Delete role
DROP ROLE role_name;
-- Set role (set role for current user)
SET ROLE (role_name|ALL|NONE);
-- View the current role
SHOW CURRENT ROLES;
SHOW ROLE GRANT USER liuwenqiang;
-- View all existing roles
SHOW ROLES;
-- Check user permissions
SHOW GRANT USER root ON DATABASE ods;
-- View the user's role
show role grant user user_name;
Reclaim privileges for a role:
revoke create on database database_name from role role_name
revoke select on [table] table_name from role role_name
Mysql > select * from user where user = 'user' where user = 'user';
show grant [role|user] role_name on database database_name
show grant [role|user] role_name on [table] table_name
Copy the code
Hive permission assignment table
Action | Select | Insert | Update | Delete | Owership | Admin | URL Privilege(RWX Permission + Ownership) |
---|---|---|---|---|---|---|---|
ALTER DATABASE | Y | ||||||
ALTER INDEX PROPERTIES | Y | ||||||
ALTER INDEX REBUILD | Y | ||||||
ALTER PARTITION LOCATION | Y | Y (for new partition location) | |||||
ALTER TABLE (all of them except the ones above) | Y | ||||||
ALTER TABLE ADD PARTITION | Y | Y (for partition location) | |||||
ALTER TABLE DROP PARTITION | Y | ||||||
ALTER TABLE LOCATION | Y | Y (for new location) | |||||
ALTER VIEW PROPERTIES | Y | ||||||
ALTER VIEW RENAME | Y | ||||||
ANALYZE TABLE | Y | Y | |||||
CREATE DATABASE | Y (if custom location specified) | ||||||
CREATE FUNCTION | Y | ||||||
CREATE INDEX | Y (of table) | ||||||
CREATE MACRO | Y | ||||||
CREATE TABLE | Y (of database) | Y (for create external table — the location) | |||||
CREATE TABLE AS SELECT | Y (of input) | Y (of database) | |||||
CREATE VIEW | Y + G | ||||||
DELETE | Y | ||||||
DESCRIBE TABLE | Y | ||||||
DROP DATABASE | Y | ||||||
DROP FUNCTION | Y | ||||||
DROP INDEX | Y | ||||||
DROP MACRO | Y | ||||||
DROP TABLE | Y | ||||||
DROP VIEW | Y | ||||||
DROP VIEW PROPERTIES | Y | ||||||
EXPLAIN | Y | ||||||
INSERT | Y | Y (for OVERWRITE) | |||||
LOAD | Y (output) | Y (output) | Y (input location) | ||||
MSCK (metastore check) | Y | ||||||
SELECT | Y | ||||||
SHOW COLUMNS | Y | ||||||
SHOW CREATE TABLE | Y+G | ||||||
SHOW PARTITIONS | Y | ||||||
SHOW TABLE PROPERTIES | Y | ||||||
SHOW TABLE STATUS | Y | ||||||
TRUNCATE TABLE | Y | ||||||
UPDATE | Y |
“ALL” : indicates ALL permissions
ALTER: Allows the modification of metadata (metadatadata of object), which is the table data
UPDATE: Allows you to modify physicaldata of object
CREATE: Allows the CREATE operation
DROP: Allows the DROP operation
LOCK: Allows users to LOCK and UNLOCK when concurrent use occurs
SELECT: allows users to perform SELECT operations:
SHOW_DATABASE: allows users to view available databases
extension
How do you determine which permissions are required for an SQL execution
We’ve seen how to assign permissions to users, but how do we know what permissions are required when we execute a more complex SQL. Remember when we learned about execution plans we had an option parameter? Yes, that’s it.
EXPLAIN AUTHORIZATION select * from role_test;
| hdfs://kingcall:9000/tmp/hive/liuwenqiang/5b3a7a3a-fb84-442e-a91b-855ed826b9ef/hive_2021-01-12_21-58-26_399_549455335308 1655039-5/-mr-10001 | | CURRENT_USER: | | kingcall | | OPERATION: | | QUERYCopy the code
Implementation of super administrator
As mentioned earlier, there is no super administrator in Hive. Any user can perform Grant/Revoke (create a table or create a library), which makes permission management meaningless. To solve this problem, we need to develop and implement our own permission control classes to ensure that a user is a superuser.
Need to introduce dependencies
<dependencies> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-exec</artifactId> The < version > 3.1.0 < / version > < / dependency > < / dependencies >Copy the code
Next, implement the custom hooks
package com.kingcall.bigdata.HiveAccess;
import com.google.common.base.Joiner;
import org.apache.hadoop.hive.ql.parse.*;
import org.apache.hadoop.hive.ql.session.SessionState;
/** * Customize the Hive superuser **@author 01
* @dateThe 2020-11-09 * * /
public class HiveAdmin extends AbstractSemanticAnalyzerHook {
/** * to define a superuser, you can define multiple */
private static final String[] ADMINS = {"root"};
/** * Permission type list */
private static final int[] TOKEN_TYPES = {
HiveParser.TOK_CREATEDATABASE, HiveParser.TOK_DROPDATABASE,
HiveParser.TOK_CREATEROLE, HiveParser.TOK_DROPROLE,
HiveParser.TOK_GRANT, HiveParser.TOK_REVOKE,
HiveParser.TOK_GRANT_ROLE, HiveParser.TOK_REVOKE_ROLE,
HiveParser.TOK_CREATETABLE
};
/** * Get the current login user name **@returnUser name * /
private String getUserName(a) {
booleanhasUserName = SessionState.get() ! =null&& SessionState.get().getAuthenticator().getUserName() ! =null;
return hasUserName ? SessionState.get().getAuthenticator().getUserName() : null;
}
private boolean isInTokenTypes(int type) {
for (int tokenType : TOKEN_TYPES) {
if (tokenType == type) {
return true; }}return false;
}
private boolean isAdmin(String userName) {
for (String admin : ADMINS) {
if (admin.equalsIgnoreCase(userName)) {
return true; }}return false;
}
@Override
public ASTNode preAnalyze(HiveSemanticAnalyzerHookContext context, ASTNode ast) throws SemanticException {
if(! isInTokenTypes(ast.getToken().getType())) {return ast;
}
String userName = getUserName();
if (isAdmin(userName)) {
return ast;
}
throw new SemanticException(userName +
" is not Admin, except " +
Joiner.on(",").join(ADMINS) ); }}Copy the code
Add the package to the hive lib directory cp target/ original-hiveudf-0.0.4. jar /usr/local/hive-3.1.2/lib/
<property>
<name>hive.semantic.analyzer.hook</name>
<value>com.kingcall.bigdata.HiveAccess.HiveAdmin</value>
<description>The hook program is used to identify the super administrator for authorization control</description>
</property>
Copy the code
Restart the Hiveserver2 service
Then you can try to do the weights
grant select on table role_test to user kingcall;
Copy the code
Then you get the following error, and we achieve our goal of controlling permissions
Error: Error while compiling statement: FAILED: SemanticException hive is not Admin, except root (state=42000,code=40000)
conclusion
- Preferred permission management can be achieved through views, such as field-level permission control
- The ownership of objects (tables, views, Databases) is generally owned by the creator, including the permission to perform authorization
- The admin user can be configured in the configuration file. The role name is case-insensitive, as in SQL, but the user name is case-sensitive
Refer to the article: cwiki.apache.org/confluence/…