Hive Permission Management

The purpose of permission management is to verify whether a user has the permission to perform an operation, and the purpose of security authentication is to verify that a user is a legitimate user. Today, we mainly introduce permission management rather than security authentication

Requirements for cluster security:

  • Supports multiple components, preferably major components of the current big data stack, such as HDFS, HBASE, HIVE, YARN, and KAFKA
  • Supports fine-grained permission control, including HIVE columns, HDFS directories, HBASE columns, and YARN queues
  • Open source, active community, minimal changes to existing clusters, and in line with industry trends.

Existing schemes:

  • Hadoop and Hive permission control
  • Kerberos Security Authentication
  • Apache Ranger Permission management solution

Hadoop and Hive permission authentication

Hadoop permissions:

  • Hadoop distributed file system implements a file and directory permission model similar to POSIX systems
  • Each file and directory has an owner and a group
  • A file or directory has different permissions for its owner, other users in the group, and all other users
  • File or directory operations pass pathnames to NameNode to check permissions on paths
  • The user who starts NameNode is the super user and can pass all permission checks
  • You can specify a specific group of users as superusers through configuration

Hive can use Hadoop’s file system to manage permissions by checking whether current users have permissions to operate files

Hive’s default permission control is not completely secure. Hive permission control is used to prevent users from performing inappropriate operations.

Three authorization models

Storage Based Authorization in the Metastore Server – Protects Metastore metadata, but does not provide more granular access control (e.g., column level, row level). SQL Standards Based Authorization in HiveServer2 Hive Authorization Based on SQL Standards – This mode is fully compatible with SQL Authorization model. SQL standards-based SQL compliant authorization model that supports both user authorization and role authorization

Role is a set of rights. A user can have one or more roles through role. By default, a user can have two roles: public and admin

By default, two roles are provided for accessing Hive data using Hiveserver2: public and admin. All users belong to the public role by default, and only users with the admin role can be authorized. Common users can only grant their permissions to other users.

Therefore, we must add at least one user with the role admin. Create/delete roles command users and groups using users and groups on Linux machines, and roles must be created by ourselves (hive).

Public users have permission to perform authorized operations, but by default public users do not have permission to create tables

3. Default Hive Authorization (Legacy Mode) Hive Authorization – Designed only to prevent user misoperations, but not to prevent malicious users from accessing unauthorized data

Default Hive Authorization (Legacy Mode)

Hive’s default permission control is not completely secure. Hive permission control is used to prevent users from performing inappropriate operations. Instead of preventing unauthorized users from accessing data

This is because the permission management mechanism is not perfect. There is no verification mechanism. For example, if you perform a grant operation, it does not check whether you have permissions.

Storage Based Authorization in the Metastore Server

In earlier Hive versions, Linux users and user groups control user permissions, but the CREATE, SELECT, and DROP operations of Hive tables cannot be controlled. Hive manages multiple users and controls permissions based on a metadata database. Data is divided into metadata and data files. The metadata is stored in mysql and the data file is HDFS. The control metadata is the data file that can be accessed.

The Hcatcalog API accesses Hive data in the form of metaStore metadata, such as MapReduce, Impala, Pig, Spark SQL, and Hive Command Line. In fact, this permission control occurs when interacting with Metastore service. The realization is to implement permission verification when calling Metastore Api, which mainly prevents malicious users from accessing and modifying Metastore data, but does not provide more fine-grained access control (for example: Column level, row level).

Here is the configuration

<property>
  <name>hive.security.metastore.authorization.manager</name>
  <value>org.apache.hadoop.hive.ql.security.authorization.DefaultHiveMetastoreAuthorizationProvider</value>
  <description>authorization manager class name to be used in the metastore for authorization.
  The user defined authorization class should implement interface
  org.apache.hadoop.hive.ql.security.authorization.HiveMetastoreAuthorizationProvider.
  </description>
 </property>
 
<property>
  <name>hive.security.metastore.authenticator.manager</name>
  <value>org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator</value>
  <description>authenticator manager class name to be used in the metastore for authentication.
  The user defined authenticator should implement interface 
  org.apache.hadoop.hive.ql.security.HiveAuthenticationProvider.
  </description>
</property>
 
<property>
  <name>hive.metastore.pre.event.listeners</name>
  <value> </value>
  <description>pre-event listener classes to be loaded on the metastore side to run code
  whenever databases, tables, and partitions are created, altered, or dropped.
  Set to org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener
  if metastore-side authorization is desired.
  </description>
</property>
Copy the code

SQL Standards Based Authorization in HiveServer2

  • Fully compatible with SQL authorization model
  • In addition to user authentication, user authentication and role authentication are supported
  • A role is a set of permissions that can be used to authorize users
  • A user can have one or more roles

The admin user can be configured in the configuration file. The role name is case-insensitive, which is the same as SQL, but the user name is case-sensitive.

<property>
  <name>hive.users.in.admin.role</name>
  <value>root</value>
</property>
Copy the code
  • After the current authentication mode is enabled, commands such as DFS, add, delete, compile, and reset are disabled. Setting hive Configuration using the set command is restricted to certain users. Can also be used by modifying the configuration file hive – site. XML hive. Security. Authorization. SQLSTD. Confwhitelist configured, which users can use these commands, Add or drop functions belong to the admin role, so if you want to add custom functions, you can add a permanent function for the admin user and then use it by other users.
  • Add, delete functions and macros (batch scale) operations are only available to users with admin.
  • User-defined functions (open support for permanent custom functions), available through the admin role
  • The Transform function is disabled.

Public role

By default, all users belong to role public by default, and authorization can be completed only by users with role admin (common users can only grant their permissions to other users).

Public users have permission to perform authorized operations, but by default public users do not have permission to create tables

The admin role

All roles other than admin are assigned to users by default. That is, if you have permissions for the role, when you perform show current roles; If you belong to the list of roles, you need to set role admin. To obtain permissions for this role

That is, the admin role is not in the user’s current Roles list

You can set hive.users.in.admin.role; View which users have admin rights

The basic configuration

<!-- Enabled authentication is disabled by default -->
<property>
  <name>hive.security.authorization.enabled</name>
  <value>true</value>
</property>
<property>
  <name>hive.server2.enable.doAs</name>
  <value>false</value>
</property>
<!-- User list of admin role -->
<property>
  <name>hive.users.in.admin.role</name>
  <value>root</value>
</property>
<property>
  <name>hive.security.authorization.manager</name>
  <value>org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory</value>
</property>
<property>
  <name>hive.security.authenticator.manager</name>
  <value>org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator</value>
</property>
Copy the code

Note: Users with an admin role need to run the “set role admin” command to obtain permissions for the admin role, which means that even if you are in the admin role list, you still need to obtain permissions once

Permission to create files in HIVE

Hive uses a default setting to configure the default permissions for new files. Set the file authorization mask to 0002, that is, 664 permission. For details, see Hadoop and Hive user configurations.

<property>  
  <name>hive.files.umask.value</name>  
  <value>0002</value>  
  <description>The dfs.umask value for the hive created folders</description>  
</property> 
Copy the code

HIVE authorized storage check

When a hive. Metastore. Authorization. Storage. Checks attribute is set to true, the hive will stop without permissions users delete table. However, the default value for this configuration is false and should be set to true.

<property>  
  <name>hive.metastore.authorization.storage.checks</name>  
  <value>true</value>  
  <description>Should the metastore do authorization checks against the underlying storage for operations like drop-partition (disallow  the drop-partition if the user in question doesn't have permissions to delete the corresponding directory on the storage).</description>  
</property>
Copy the code

The table creator has full permissions on the table

<property>  
  <name>hive.security.authorization.createtable.owner.grants</name>  
  <value>ALL</value>  
  <description>The privileges automatically granted to the owner whenever a table gets created.
    An example like "select,drop" will grant select and drop privilege to the owner of the table
  </description>  
</property>  
Copy the code

This configuration is NULL by default. It is recommended to set it to ALL so that the user can access the table he or she created, otherwise the creator of the table cannot access the table, which is obviously not reasonable.

The user who creates the hive table is the Owner of the hive table. In fact, the Owner of the HDFS folder is the Owner of the HIVE table. This user has the same permissions as the Linux operating system

Hive permission operation

Before we start, one thing to note about the HiveServer2 command line client is that when you log in without specifying the current user, the Hive user is not the current user of your system, but an anonymous user. I’ve had a headache with this for a while. So just separate this out before we start

1. Authorize users

First I create a new user ‘kingcall’ and then I cut to that user, And then use this user connects to the hive up beeline -u JDBC: hive2: / / localhost: 10000 / ods -u root -p www1234 -n kingcall, next we use admin permissions of the user to create a table

create table role_test(id int,name string); The KingCall user is then asked to query

0: jdbc:hive2://localhost:10000/ods> select * from role_test;
Error: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: Principal [name=kingcall, type=USER] does not have following privileges for operation QUERY [[SELECT] on Object [type=TABLE_OR_VIEW, name=ods.role_test]] (state=42000,code=40000)
Copy the code

Create table owner (admin); create table owner (admin); create table owner (admin)

GRANT SELECT ON table role_test to user kingcall; Now let’s just ask the KingCall user to query again, and that’s the basic assignment

If you perform delete, you will still not be able to perform the delete operation, because we only gave the SELECT privilege, you can look at the current USER permissions SHOW GRANT USER KINGCAL;

Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Error showing privileges:
User : liuwenqiang is not allowed check privileges of another user : KINGCAL. User has to belong to ADMIN role and have it as current role, for this action. (state=08S01,code=1)
Copy the code

Although our liuWENqiang user is a user in the admin role list, it cannot view the permissions of other users because it does not have an admin role. Therefore, we need to execute set role admin first. To view the permissions of other users

In fact, this is a simple demonstration of how to assign permissions to users. In fact, we always maintain the principle of minimum permissions in any system, so we recommend that we also adopt this strategy when assigning permissions to users. There are many permissions that can be given to users. The following table lists all permissions that can be performed in Hive. In addition to granting privileges to a specific table, we can also grant privileges to an entire library.

2. Authorize groups

When using N tables and N users in Hive, it will crash if the administrator authorizes each table for each user. So, at this point, you can do GROUP authorization. Hive user groups are equivalent to POSIX user groups.

-- Database based
grant select on database default to  group  admin;
-- Based on a table
grant select on table ppdata  to  group  admin;
Copy the code

3. Role management

ROLES come into play when authorization to user groups becomes inflexible. A user can be placed in a role, and the role can then be authorized. Roles are managed internally by Hive, unlike user groups, which are controlled by the operating system. Users and user groups are managed by the operating system.

Create the role

CREATE ROLE role_name;

Role_database_select = roLE_database_select = role_database_select = role_database_select = role_database_select = role_database_select = role_database_select = role_database_select = role_database_select = role_database_select = role_database_select = role_database_select Then, the whole permission control can be completed by cooperating with the user group. For example, we can create a role for the query permission of our report layer library, which can be taken by the data analyst of the business side, or we can subdivide the permission of the report layer library according to the business line.

Empower characters

CREATE ROLE role_ods_select;
Grant table_name on [table] table_name to role role_name grant table_name on database_name
grant select on database ods to role role_ods_select;
 -- Queries the permissions of a role
 show grant role role_ods_select on database ods;
 +-----------+--------+------------+---------+------------------+-----------------+------------+---------------+---------- ------+--------------+
| database  | table  | partition  | column  |  principal_name  | principal_type  | privilege  | grant_option  |   grant_time   |   grantor    |
+-----------+--------+------------+---------+------------------+-----------------+------------+---------------+---------- ------+--------------+
| ods       |        |            |         | role_ods_select  | ROLE            | SELECT     | false         | 1610504539000  | liuwenqiang  |
+-----------+--------+------------+---------+------------------+-----------------+------------+---------------+---------- ------+--------------+
Copy the code

Assign roles to users

We revoke the select privileges for the ROLE_test table that we gave the kingCall user in the previous demo
revoke select on table role_test from user kingcall;
Select roLE_ODS_select from roLE_test; select roLE_ODs_select from roLE_test
 grant role role_ods_select to user kingcall;
 To see what roles the current user has, note that if you use Show current Roles to get the same roles, you need to use Set roles to see the new roles
show role grant user kingcall;
+------------------+---------------+----------------+--------------+
|       role       | grant_option  |   grant_time   |   grantor    |
+------------------+---------------+----------------+--------------+
| public           | false         | 0              |              |
| role_ods_select  | false         | 1610504936000  | liuwenqiang  |
+------------------+---------------+----------------+--------------+
 -- then perform the query
 select * from role_test;
Copy the code

SHOW CURRENT ROLES All users can perform this function

This command can be executed only by admin when the show roles command displays all hive roles

show role grant user kingcall; Ordinary can only view their own, admin user can view all people

SHOW PRINCIPALS role_ods_create To check which users are assigned a role

show grant user kingcall on all;

show grant role role_ods_select on all;

show grant on table test_role;

set role

Set role is an interesting command, it is intended to obtain user permissions, because the role is assigned to the user by the administrator, so you can not consider it to obtain the role, because the user’s current role

CREATE ROLE role_ods_create;
grant ALL on database ods to role role_ods_create;
Let's take a look at what permissions this role now has
 show grant role role_ods_create on database ods;
+-----------+--------+------------+---------+------------------+-----------------+------------+---------------+---------- ------+--------------+
| database  | table  | partition  | column  |  principal_name  | principal_type  | privilege  | grant_option  |   grant_time   |   grantor    |
+-----------+--------+------------+---------+------------------+-----------------+------------+---------------+---------- ------+--------------+
| ods       |        |            |         | role_ods_create  | ROLE            | DELETE     | false         | 1610506484000  | liuwenqiang  |
| ods       |        |            |         | role_ods_create  | ROLE            | INSERT     | false         | 1610506484000  | liuwenqiang  |
| ods       |        |            |         | role_ods_create  | ROLE            | SELECT     | false         | 1610506484000  | liuwenqiang  |
| ods       |        |            |         | role_ods_create  | ROLE            | UPDATE     | false         | 1610506484000  | liuwenqiang  |
+-----------+--------+------------+---------+------------------+-----------------+------------+---------------+---------- ------+--------------+
Then you can show current Roles and see that the Kingcall user does not have this privilege
Copy the code

SET ROLE (role_name|ALL|NONE);

4. Hive permission management commands

set role admin; Set the current user role to admin

Add, delete, view, and set roles:
CREATE ROLE role_name;  
Delete role
DROP ROLE role_name; 
-- Set role (set role for current user)
SET ROLE (role_name|ALL|NONE); 
-- View the current role
SHOW CURRENT ROLES;  
SHOW ROLE GRANT USER liuwenqiang;
-- View all existing roles
SHOW ROLES;  
-- Check user permissions
SHOW GRANT USER root ON DATABASE ods;
-- View the user's role
show role grant user user_name;
Reclaim privileges for a role:
revoke create on database database_name from role role_name
revoke select on [table] table_name from role role_name
Mysql > select * from user where user = 'user' where user = 'user';
show grant [role|user] role_name on database database_name
show grant [role|user] role_name on [table] table_name
Copy the code

Hive permission assignment table

Action Select Insert Update Delete Owership Admin URL Privilege(RWX Permission + Ownership)
ALTER DATABASE Y
ALTER INDEX PROPERTIES Y
ALTER INDEX REBUILD Y
ALTER PARTITION LOCATION Y Y (for new partition location)
ALTER TABLE (all of them except the ones above) Y
ALTER TABLE ADD PARTITION Y Y (for partition location)
ALTER TABLE DROP PARTITION Y
ALTER TABLE LOCATION Y Y (for new location)
ALTER VIEW PROPERTIES Y
ALTER VIEW RENAME Y
ANALYZE TABLE Y Y
CREATE DATABASE Y (if custom location specified)
CREATE FUNCTION Y
CREATE INDEX Y (of table)
CREATE MACRO Y
CREATE TABLE Y (of database) Y (for create external table — the location)
CREATE TABLE AS SELECT Y (of input) Y (of database)
CREATE VIEW Y + G
DELETE Y
DESCRIBE TABLE Y
DROP DATABASE Y
DROP FUNCTION Y
DROP INDEX Y
DROP MACRO Y
DROP TABLE Y
DROP VIEW Y
DROP VIEW PROPERTIES Y
EXPLAIN Y
INSERT Y Y (for OVERWRITE)
LOAD Y (output) Y (output) Y (input location)
MSCK (metastore check) Y
SELECT Y
SHOW COLUMNS Y
SHOW CREATE TABLE Y+G
SHOW PARTITIONS Y
SHOW TABLE PROPERTIES Y
SHOW TABLE STATUS Y
TRUNCATE TABLE Y
UPDATE Y

“ALL” : indicates ALL permissions

ALTER: Allows the modification of metadata (metadatadata of object), which is the table data

UPDATE: Allows you to modify physicaldata of object

CREATE: Allows the CREATE operation

DROP: Allows the DROP operation

LOCK: Allows users to LOCK and UNLOCK when concurrent use occurs

SELECT: allows users to perform SELECT operations:

SHOW_DATABASE: allows users to view available databases

extension

How do you determine which permissions are required for an SQL execution

We’ve seen how to assign permissions to users, but how do we know what permissions are required when we execute a more complex SQL. Remember when we learned about execution plans we had an option parameter? Yes, that’s it.

EXPLAIN AUTHORIZATION select * from role_test;

| hdfs://kingcall:9000/tmp/hive/liuwenqiang/5b3a7a3a-fb84-442e-a91b-855ed826b9ef/hive_2021-01-12_21-58-26_399_549455335308 1655039-5/-mr-10001 | | CURRENT_USER: | | kingcall | | OPERATION: | | QUERYCopy the code

Implementation of super administrator

As mentioned earlier, there is no super administrator in Hive. Any user can perform Grant/Revoke (create a table or create a library), which makes permission management meaningless. To solve this problem, we need to develop and implement our own permission control classes to ensure that a user is a superuser.

Need to introduce dependencies

<dependencies> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-exec</artifactId> The < version > 3.1.0 < / version > < / dependency > < / dependencies >Copy the code

Next, implement the custom hooks

package com.kingcall.bigdata.HiveAccess;

import com.google.common.base.Joiner;
import org.apache.hadoop.hive.ql.parse.*;
import org.apache.hadoop.hive.ql.session.SessionState;

/** * Customize the Hive superuser **@author 01
 * @dateThe 2020-11-09 * * /
public class HiveAdmin extends AbstractSemanticAnalyzerHook {

    /** * to define a superuser, you can define multiple */
    private static final String[] ADMINS = {"root"};

    /** * Permission type list */
    private static final int[] TOKEN_TYPES = {
            HiveParser.TOK_CREATEDATABASE, HiveParser.TOK_DROPDATABASE,
            HiveParser.TOK_CREATEROLE, HiveParser.TOK_DROPROLE,
            HiveParser.TOK_GRANT, HiveParser.TOK_REVOKE,
            HiveParser.TOK_GRANT_ROLE, HiveParser.TOK_REVOKE_ROLE,
            HiveParser.TOK_CREATETABLE
    };

    /** * Get the current login user name **@returnUser name * /
    private String getUserName(a) {
        booleanhasUserName = SessionState.get() ! =null&& SessionState.get().getAuthenticator().getUserName() ! =null;

        return hasUserName ? SessionState.get().getAuthenticator().getUserName() : null;
    }

    private boolean isInTokenTypes(int type) {
        for (int tokenType : TOKEN_TYPES) {
            if (tokenType == type) {
                return true; }}return false;
    }

    private boolean isAdmin(String userName) {
        for (String admin : ADMINS) {
            if (admin.equalsIgnoreCase(userName)) {
                return true; }}return false;
    }

    @Override
    public ASTNode preAnalyze(HiveSemanticAnalyzerHookContext context, ASTNode ast) throws SemanticException {
        if(! isInTokenTypes(ast.getToken().getType())) {return ast;
        }

        String userName = getUserName();
        if (isAdmin(userName)) {
            return ast;
        }

        throw new SemanticException(userName +
                " is not Admin, except " +
                Joiner.on(",").join(ADMINS) ); }}Copy the code

Add the package to the hive lib directory cp target/ original-hiveudf-0.0.4. jar /usr/local/hive-3.1.2/lib/

<property>
    <name>hive.semantic.analyzer.hook</name>
    <value>com.kingcall.bigdata.HiveAccess.HiveAdmin</value>
    <description>The hook program is used to identify the super administrator for authorization control</description>
</property>
Copy the code

Restart the Hiveserver2 service

Then you can try to do the weights

grant select on table role_test to user kingcall;
Copy the code

Then you get the following error, and we achieve our goal of controlling permissions

Error: Error while compiling statement: FAILED: SemanticException hive is not Admin, except root (state=42000,code=40000)

conclusion

  1. Preferred permission management can be achieved through views, such as field-level permission control
  2. The ownership of objects (tables, views, Databases) is generally owned by the creator, including the permission to perform authorization
  3. The admin user can be configured in the configuration file. The role name is case-insensitive, as in SQL, but the user name is case-sensitive

Refer to the article: cwiki.apache.org/confluence/…