On October 12, 2021, Japanese security vendor Flatt Security disclosed the Linux kernel entitlement vulnerability CVE-2021-34866. On November 5, @HexRabbit posted on Github how to exploit the bug and wrote a highly technical and concise analysis. But as the first research related content, the author made some more basic content supplement.

eBPF

EBPF is a new technology for accessing kernel services and hardware. Prior to this technology, if you needed to execute custom code in the Linux kernel, you could either submit code to a native kernel project without further elaboration. The second is the use of kernel modules, which add code to the kernel in an extended manner that can be dynamically loaded and unloaded at run time. However, kernel modules also have obvious disadvantages: they need to be adapted for each kernel version; If there is a problem with the code, the kernel will crash.

EBPF can solve the problem of implementing custom code in kernel space. EBPF is a highly flexible and efficient virtual-like technology in the Linux kernel, with its own bytecode syntax and a specific compiler that allows bytecode execution at various hook points in a secure manner. It is available for many Linux kernel subsystems, most notably networking, tracing, and security.

Implementationally, eBPF programs can be created by calling BPF_PROg_load (). A structure containing eBPF instruction bpF_INSN is passed in.

Verifier

Security is a prominent feature of eBPF. If you need to load an eBPF program into the kernel, you need to pass Verifier. It checks for infinite loops, program sizes, out of bounds, parameter errors, and so on in eBPF programs. The main checking function in Verifier is BPF_check (). In this function, there is a function check_map_func_compatibility for checking helper function parameters, which is the function where the vulnerability is located. \

Helper function

EBPF programs cannot call arbitrary kernel functions, which causes eBPF programs to be bound to a particular kernel version. So eBPF provides some common and stable apis, called helper functions, for eBPF to interact with the kernel. All helper names and roles are declared in bpF.h. There are different helper functions available for each eBPF program type. The following are the prototypes and functions of the helper functions that we will analyze later.

/* long bpf_ringbuf_output(void *ringbuf, void *data, u64 size, u64 flags) * Description * Copy *size* bytes from *data* into a ring buffer *ringbuf*. * If **BPF_RB_NO_WAKEUP** is specified in *flags*, no notification * of new data availability is sent. * If **BPF_RB_FORCE_WAKEUP** is specified in *flags*, notification * of new data availability is sent unconditionally. * If **0** is specified in *flags*, an adaptive notification * of new data availability is sent. * * An adaptive notification is a notification sent whenever the user-space * process has caught up and consumed all available payloads. In case the user-space * process is  still processing a previous payload, then no notification is needed * as it will process the newly added payload automatically. * Return * 0 on success, or a negative error in case of failure. * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * void *bpf_ringbuf_reserve(void *ringbuf, u64 size, u64 flags) * Description * Reserve *size* bytes of payload in a ring buffer *ringbuf*. * *flags* must be 0. * Return * Valid pointer with *size* bytes of memory available; NULL, * otherwise. * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * u64 bpf_ringbuf_query(void *ringbuf, u64 flags) *Description *Query various characteristics of provided ring buffer. What *exactly is queries is determined by *flags*: * ** **BPF_RB_AVAIL_DATA**: Amount of data not yet consumed. ** **BPF_RB_RING_SIZE**: The size of ring buffer. ** **BPF_RB_CONS_POS**: Consumer position (can wrap around). ** **BPF_RB_PROD_POS**: Producer(s) position (can wrap around). * *Data returned is just a momentary snapshot of actual values *and could be inaccurate, so this facility should be used to *power heuristics and for reporting, not to make 100% correct *calculation. *Return *Requested value, or 0, if *flags* are not recognized. */ #define __BPF_FUNC_MAPPER(FN)\ FN(ringbuf_output),\ FN(ringbuf_reserve),\ FN(ringbuf_query),\Copy the code

The helper function is not called directly, but is wrapped in the form of system call with the macros BPF_CALL_0~BPF_CALL_5, which are used to call the helper function directly when writing eBPF instructions. Bpf_func_proto structures record information about helper functions, including return value types, parameter types, and so on. An example of the call is as follows:

BPF_CALL_4(bpf_map_update_elem, struct bpf_map *, map, void *, key, void *, value, u64, flags) { WARN_ON_ONCE(! rcu_read_lock_held()); return map->ops->map_update_elem(map, key, value, flags); } const struct bpf_func_proto bpf_map_update_elem_proto = { .func = bpf_map_update_elem, .gpl_only = false, .ret_type = RET_INTEGER, .arg1_type = ARG_CONST_MAP_PTR, .arg2_type = ARG_PTR_TO_MAP_KEY, .arg3_type = ARG_PTR_TO_MAP_VALUE, .arg4_type = ARG_ANYTHING, };Copy the code

Maps

Map is a key-value pair that eBPF programs can use to share data with the kernel or user space. Maps work like this:

EBPF programs can use helper functions to manipulate maps. There are different types of maps, and the data structures of each type are different. The following are 31 map types currently supported:

enum bpf_map_type {
BPF_MAP_TYPE_UNSPEC,
BPF_MAP_TYPE_HASH,
BPF_MAP_TYPE_ARRAY,
BPF_MAP_TYPE_PROG_ARRAY,
BPF_MAP_TYPE_PERF_EVENT_ARRAY,
BPF_MAP_TYPE_PERCPU_HASH,
BPF_MAP_TYPE_PERCPU_ARRAY,
BPF_MAP_TYPE_STACK_TRACE,
BPF_MAP_TYPE_CGROUP_ARRAY,
BPF_MAP_TYPE_LRU_HASH,
BPF_MAP_TYPE_LRU_PERCPU_HASH,
BPF_MAP_TYPE_LPM_TRIE,
BPF_MAP_TYPE_ARRAY_OF_MAPS,
BPF_MAP_TYPE_HASH_OF_MAPS,
BPF_MAP_TYPE_DEVMAP,
BPF_MAP_TYPE_SOCKMAP,
BPF_MAP_TYPE_CPUMAP,
BPF_MAP_TYPE_XSKMAP,
BPF_MAP_TYPE_SOCKHASH,
BPF_MAP_TYPE_CGROUP_STORAGE,
BPF_MAP_TYPE_REUSEPORT_SOCKARRAY,
BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
BPF_MAP_TYPE_QUEUE,
BPF_MAP_TYPE_STACK,
BPF_MAP_TYPE_SK_STORAGE,
BPF_MAP_TYPE_DEVMAP_HASH,
BPF_MAP_TYPE_STRUCT_OPS,
BPF_MAP_TYPE_RINGBUF,
BPF_MAP_TYPE_INODE_STORAGE,
BPF_MAP_TYPE_TASK_STORAGE,
BPF_MAP_TYPE_BLOOM_FILTER,
};
Copy the code

Ringbuf

Ringbuf is a CPU shared buffer that can be used to send data from the kernel to user space. The map type that manages Ringbuf is BPF_MAP_TYPE_RINGBUF, and its data structure is:

struct bpf_ringbuf_map { struct bpf_map map; struct bpf_ringbuf *rb; }; struct bpf_ringbuf { wait_queue_head_t waitq; struct irq_work work; u64 mask; struct page **pages; int nr_pages; spinlock_t spinlock ____cacheline_aligned_in_smp; /* Consumer and producer counters are put into separate pages to allow * mapping consumer page as r/w, but restrict producer page to r/o. * This protects producer position from being modified by user-space * application and  ruining in-kernel position tracking. */ unsigned long consumer_pos __aligned(PAGE_SIZE); unsigned long producer_pos __aligned(PAGE_SIZE); char data[] __aligned(PAGE_SIZE); };Copy the code

Holes cause

The vulnerability is found in the check_map_func_compatibility function, which mainly checks whether the called helper function matches the corresponding map type. This is a two-way check. The first switch check checks whether the created map->map_type can call the required helper function, and the second switch check checks whether the called helper function can handle the corresponding map type.

static int check_map_func_compatibility(struct bpf_verifier_env *env, struct bpf_map *map, int func_id) { if (! map) return 0; /* We need a two way check, first is from map perspective ... */ switch (map->map_type) { case BPF_MAP_TYPE_PROG_ARRAY: if (func_id ! = BPF_FUNC_tail_call) goto error; . default: break; } / *... and second from the function itself. */ switch (func_id) { case BPF_MAP_TYPE_PROG_ARRAY: if (func_id ! = BPF_FUNC_tail_call) goto error; break; . } return 0; error: verbose(env, "cannot pass %d into func %s#%d\n", map->map_type, func_id_name(func_id), func_id); return -EINVAL; }Copy the code

In the first switch, there are 22 cases checked, but this does not cover all map types. The remaining map types that do not require this step are:

BPF_MAP_TYPE_PERCPU_HASH

BPF_MAP_TYPE_PERCPU_ARRAY
BPF_MAP_TYPE_LPM_TRIE
BPF_MAP_TYPE_STRUCT_OPS
BPF_MAP_TYPE_LRU_HASH
BPF_MAP_TYPE_ARRAY
BPF_MAP_TYPE_LRU_PERCPU_HASH
BPF_MAP_TYPE_HASH
BPF_MAP_TYPE_UNSPEC
Copy the code

Similarly, in the second switch, not all helpers are checked.

The above two – way check is missing, resulting in the generation of the leak. Start by looking at the commit for fixing the vulnerability. As you can see, the map type generated by the vulnerability is BPF_MAP_TYPE_RINGBUF, And in the second check, BPF_FUNC_ringbuf_output,BPF_FUNC_ringbuf_reserve,BPF_FUNC_ringbuf_query functions are added. These functions operate on ringbuf. If you define a map type other than BPF_MAP_TYPE_RINGBUF and call the above three functions, Structures that are not of type BPF_MAP_TYPE_RINGBUF are then resolved as if they were of type BPF_MAP_TYPE_RINGBUF, resulting in type confusion.

@@static int check_map_func_compatibility(struct bpf_verifier_env *env, case BPF_MAP_TYPE_RINGBUF: if (func_id ! = BPF_FUNC_ringbuf_output && func_id ! = BPF_FUNC_ringbuf_reserve && - func_id ! = BPF_FUNC_ringbuf_submit && - func_id ! = BPF_FUNC_ringbuf_discard && func_id ! = BPF_FUNC_ringbuf_query) goto error; break; Struct pf_verifier_env *env, if (map->map_type! = BPF_MAP_TYPE_PERF_EVENT_ARRAY) goto error; break; +case BPF_FUNC_ringbuf_output: +case BPF_FUNC_ringbuf_reserve: +case BPF_FUNC_ringbuf_query: +if (map->map_type ! = BPF_MAP_TYPE_RINGBUF) +goto error; +break; case BPF_FUNC_get_stackid: if (map->map_type ! = BPF_MAP_TYPE_STACK_TRACE) goto error;Copy the code

For this vulnerability, BPF_MAP_TYPE_LPM_TRIE can replace BPF_MAP_TYPE_RINGBUF and call BPF_FUNC_ringbuf_reserve to trigger type obfuscations:

int vuln_mapfd = bpf_create_map(BPF_MAP_TYPE_LPM_TRIE, key_size, 0x3000, 1, BPF_F_NO_PREALLOC); . struct bpf_insn prog[] = { BPF_LD_MAP_FD(BPF_REG_1, vuln_mapfd), ... BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_ringbuf_reserve), }Copy the code

Then you need to bypass the __BPF_ringbuf_Reserve function to check the bpF_ringbuf structure for size,consumer_pos, and producer_pos; It then betrays the kernel base and heap addresses by overwriting bpF_Array that was designed in advance through heap injection, Modify the map_delete_ELEm and MAP_fD_put_ptr Pointers to FD_ARRAY_MAP_DELEte_ELEm and COMMIT_CREds, respectively, by forking bpF_array.map. ops to BPF_MAP_OPS. Finally, bpF_delete_ELEm () is called to trigger the modified function pointer, and commit_CREds (&init_cred) is called to raise weights.

Repair plan

  1. The vulnerability has been fixed in 5.13.14 kernel version, please update in time.
  2. Set up the/proc/sys/kernel/unprivileged_bpf_disabledA value of 1 forbids non-privileged users from using eBPF for mitigation.

Refer to the link

Flatt. Tech/cve/cve – 202…

Github.com/HexRabbit/C…

Blog. Hexrabbit. IO / 2021/11/03 /…

Blog. Hexrabbit. IO / 2021/02/07 /…

Ebpf. IO/what – is – ebp…

Docs. Cilium. IO/en/stable/b…

For more information, follow the public account “Moyun Security”