This is a study note, mainly on “Linux system kernel space and user space communication implementation and analysis” in the source imp2 analysis. The source code can be downloaded at the following URL:

Www-128.ibm.com/developerwo…



[size=3] [/size]

Implementation and Analysis of Communication between Kernel Space and user space in Linux System

Www-128.ibm.com/developerwo…

Way to exchange user-space and kernel-space data under Linux

Www-128.ibm.com/developerwo…



[size=3] [/size]

In Linux 2.4 version after the version of the kernel, almost all the interrupt process and user mode process communication are using NetLink socket implementation, such as iProte2 network management tool, it and the kernel all use NetLink interaction, Netfilter, the well-known kernel packet filtering framework, has also been changed to NetLink in the latest version, which will undoubtedly be one of the main ways for Linux user and kernel to communicate. It communicates on the basis of an identifier corresponding to the process, generally defined as the ID of the process. This identifier is 0 when one end of the communication is in an interrupt process. When netLink sockets are used for communication and both parties are user-mode processes, the usage is similar to message queues. However, one end of the communication is the interrupt process, which is used in a different way. The main feature of netLink socket is its support for interrupt process, which no longer requires the user to start a kernel thread when the kernel space receives user-space data. Instead, another soft interrupt calls the user-specified receive function. Working principle is shown as follows:



 



As shown in the figure, soft interrupts are used instead of kernel threads to receive data, thus ensuring real-time data reception.

When a NetLink socket is used for communication between kernel space and user space, it is created in user space in a similar way as a socket, but in kernel space it is created in a different way. The following figure shows how a Netlink socket is created for such communication:



 





The user space

User-mode applications use standard sockets to communicate with the kernel. The standard socket API functions, socket(), bind(), sendmsg(), recvmsg(), and close(), can be easily applied to NetLink sockets.

To create a netlink socket, the user calls the socket() with the following argument: \

socket(AF_NETLINK, SOCK_RAW, netlink_type)
Copy the code

The netlink protocol cluster is AF_NETLINK. The second parameter must be SOCK_RAW or SOCK_DGRAM. The third parameter specifies the netlink protocol type

#define NETLINK_ROUTE 0 /* Routing/device hook */ #define NETLINK_W1 1 /* 1-wire subsystem */ #define NETLINK_USERSOCK 2  /* Reserved for user mode socket protocols */ #define NETLINK_FIREWALL 3 /* Firewalling hook */ #define NETLINK_INET_DIAG 4 /* INET socket monitoring */ #define NETLINK_NFLOG 5 /* netfilter/iptables ULOG */ #define NETLINK_XFRM 6 /* ipsec */ #define NETLINK_SELINUX 7 /* SELinux event notifications */ #define NETLINK_ISCSI 8 /* Open-iSCSI */ #define NETLINK_AUDIT 9 /* auditing */ #define NETLINK_FIB_LOOKUP 10 #define NETLINK_CONNECTOR 11 #define NETLINK_NETFILTER 12 /* netfilter subsystem */ #define NETLINK_IP6_FW 13 #define NETLINK_DNRTMSG 14 /* DECnet routing messages */ #define NETLINK_KOBJECT_UEVENT 15 /* Kernel messages to userspace */ #define NETLINK_GENERIC 16Copy the code

Similarly, the socket function returns a socket that can be handed to bing and other functions by calling: \

static int skfd;
skfd = socket(PF_NETLINK, SOCK_RAW, NL_IMP2);
if(skfd < 0)
{
      printf("can not create a netlink socket\n");
      exit(0);
}
Copy the code

The socket address of netlink is struct sockaddr_nl

struct sockaddr_nl
{
  sa_family_t    nl_family;
  unsigned short nl_pad;
  __u32          nl_pid;
  __u32          nl_groups;
};
Copy the code

Member NL_pad is not currently in use, so always set to 0. Member NL_PID is the ID of the process that receives or sends messages. If you want the kernel to process messages or multicast messages, set this field to 0. Otherwise, set it to the ID of the process that processes the message. The nl_groups member is used to specify the multicast group, and the bind function is used to add the calling process to the multicast group specified by this field. If set to 0, the caller does not join any multicast group: \

struct sockaddr_nl local; memset(&local, 0, sizeof(local)); local.nl_family = AF_NETLINK; local.nl_pid = getpid(); /* set pid to its own pid */ local.nl_groups = 0; /* Bind (SKFD, (struct sockaddr*)&local, sizeof(local))! = 0) { printf("bind() error\n"); return -1; }Copy the code

The user space can call the send function cluster to send a message to the kernel, such as sendto, sendmsg, etc. Similarly, it can also use struct sockaddr_nl to describe a remote address to be called by the send function. Slightly different from the local address, because the remote is the kernel, So the NL_PID member needs to be set to 0: \

struct sockaddr_nl kpeer;
memset(&kpeer, 0, sizeof(kpeer));
kpeer.nl_family = AF_NETLINK;
kpeer.nl_pid = 0;
kpeer.nl_groups = 0;
Copy the code

Another problem is the composition of the message sent by the kernel. If we send an IP network packet, the packet structure is “IP header +IP data”. Similarly, the message structure of NetLink is “NetLink header + data”. Netlink message headers are described using the struct NLMSGHDR structure: \

{
  __u32 nlmsg_len;   /* Length of message */
  __u16 nlmsg_type;  /* Message type*/
  __u16 nlmsg_flags; /* Additional flags */
  __u32 nlmsg_seq;   /* Sequence number */
  __u32 nlmsg_pid;   /* Sending process PID */
};
Copy the code

The nlMSG_len field specifies the total length of the message, including the length of the data portion immediately following the structure and the size of the structure. Normally, we use the netLink macro NLMSG_LENGTH to calculate this length. We simply supply the NLMSG_LENGTH macro with the length of the data to send. It automatically calculates the total length after alignment: \

#define NLMSG_LENGTH(len) ((len)+NLMSG_ALIGN(sizeof(struct NLMSGHDR))) /* byte alignment */ #define NLMSG_ALIGN(len) ( ((len)+NLMSG_ALIGNTO-1) & ~(NLMSG_ALIGNTO-1) )Copy the code

Netlink macros can be used to write netLink macros. The nlMSg_type field is used for the type of the application-defined message. It is transparent to the NetLink kernel implementation and therefore is set to 0 in most cases. The nlMSG_flags segment is used to set the message flags. For advanced applications such as Netfilter and routing daemons that require it to perform complex operations, the nlMSG_seq and NLMSG_PID fields are used by applications to trace messages, with the former representing the sequence number and the latter the source process ID. \

{struct NLMSGHDR HDR; }; struct msg_to_kernel message; memset(&message, 0, sizeof(message)); message.hdr.nlmsg_len = NLMSG_LENGTH(0); Nlmsg_flags = 0*/ message.hdr. nlMSG_flags = 0; message.hdr.nlmsg_type = IMP2_U_PID; Nlmsg_pid = local.nl_pid; /* Set the custom message type */ message.hdr. nlMSg_pid = local.nl_pid; /* Set the PID of the sender */ So that we have the local address, the peer address, and the sent data, we can call the send function to send the message to the kernel: /* Send a request */ sendto(SKFD, &message, message.hdr. Nlmsg_len, 0, (struct sockaddr*)&kpeer, sizeof(kpeer));Copy the code

When the request is sent, the recV function cluster can be called to receive data from the kernel. The received data contains the netLink header and the data to be transferred: \

Struct u_packet_info {struct NLMSGHDR HDR; / / struct u_packet_info {struct NLMSGHDR HDR; struct packet_info icmp_info; }; struct u_packet_info info; while(1) { kpeerlen = sizeof(struct sockaddr_nl); */ rcvlen = recvfrom(SKFD, &info, sizeof(struct u_packet_info), 0, (struct sockaddr*)&kpeer, & kpeerLen); /* Process the received data */...... }Copy the code

Similarly, the close function is used to close open Netlink sockets. In the program, because the program has been circulating to receive the message processing kernel, need to receive the user to close the signal will exit, so the work of closing the socket in the custom signal function SIG_int processing: \

*/ static void sig_int(int signo) {struct sockaddr_nl kpeer; struct msg_to_kernel message; memset(&kpeer, 0, sizeof(kpeer)); kpeer.nl_family = AF_NETLINK; kpeer.nl_pid = 0; kpeer.nl_groups = 0; memset(&message, 0, sizeof(message)); message.hdr.nlmsg_len = NLMSG_LENGTH(0); message.hdr.nlmsg_flags = 0; message.hdr.nlmsg_type = IMP2_CLOSE; message.hdr.nlmsg_pid = getpid(); /* Sends a message to the kernel, indicated by nlMSg_type, */ sendto(SKFD, &message, message.hdr. nlMSG_len, 0, (struct sockaddr *)(&kpeer), sizeof(kpeer)); close(skfd); exit(0); }Copy the code

This terminating function sends a message to the kernel saying “I have exited” and then calls close to close the NetLink socket and exit the program. Kernel space and application kernel. Kernel space also does three things: N Create a netlink socket n Receive and process data sent from user space n Send data to user space API function netlink_kernel_create used to create a Netlink socket. At the same time, register a callback function used to receive and process user space messages: \

struct sock *
netlink_kernel_create(int unit, void (*input)(struct sock *sk, int len));
Copy the code

The unit parameter represents the NetLink protocol type, such as NL_IMP2, and the input parameter is the NetLink message processing function defined by the kernel module. When a message reaches the Netlink socket, the input function pointer is referenced. The input parameter sk is the struct sock pointer returned by netlink_kernel_create. Sock is a kernel representation of the socket data structure. Sockets created by user-mode applications are also represented by a struct sock structure in the kernel. \

static int __init init(void) { rwlock_init(&user_proc.lock); /* Create a netlink socket. The protocol type is ML_IMP2. Kernel_reveive is the acceptance handler */ NLFD = netlink_kernel_create(NL_IMP2, kernel_receive); if(! {printk("can not create a netlink socket\n"); return -1; } /* Register a Netfilter hook */ return nf_register_hook(&imp2_ops); } module_init(init);Copy the code

User space sends two custom message types to the kernel: IMP2_U_PID and IMP2_CLOSE, which are request and close, respectively. The kernel_receive function handles both messages separately: \

DECLARE_MUTEX(receive_sem); Static void kernel_receive(struct sock *sk, int len) {do {struct sk_buff * SKB; If (down_trylock(&receive_sem)) /* Get semaphore */ return; While ((SKB = skb_dequeue(&sk->receive_queue))! = NULL) { { struct nlmsghdr *nlh = NULL; If (SKB ->len >= sizeof(struct NLMSGHDR)) {NLH = (struct NLMSGHDR *) SKB ->data; If ((NLH -> nlMSG_len >= sizeof(struct NLMSGHDR)) && (SKB ->len >= NLH -> nlMSg_len)) { */ if(NLH -> nlMSg_type == IMP2_U_PID) /* request */ {write_lock_BH (&user_proc.pid); user_proc.pid = nlh->nlmsg_pid; write_unlock_bh(&user_proc.pid); } else if(NLH -> nlMSg_type == IMP2_CLOSE) /* Application close */ {write_lock_bh(&user_proc.pid); if(nlh->nlmsg_pid == user_proc.pid) user_proc.pid = 0; write_unlock_bh(&user_proc.pid); } } } } kfree_skb(skb); } up(&receive_sem); /* Return semaphore */}while(NLFD && NLFD ->receive_queue.qlen); }Copy the code

Because kernel modules can be called by multiple processes at the same time, semaphores and locks are used in functions for mutual exclusion. skb = skb_dequeue(&sk-& gt; Receive_queue) is used to get messages from the socket SK receive queue, and is returned as a struct SK_buff. SKB ->data points to the actual NetLink message. A Netfilter hook is registered in the program. The hook function is get_icmp, which intercepts ICMP packets and sends the data to the application space process by calling send_to_user. The data sent is the info structure variable, which is the struct packet_info structure. This structure contains two members of the source/destination address. Netfilter Hook is not the focus of this article and will be skipped. Send_to_user is used to send data to the user-space process, which calls the API function netlink_unicast to do: \

int netlink_unicast(struct sock *sk, struct sk_buff *skb, u32 pid, int nonblock);
Copy the code

Parameter SK is the socket returned by function netlink_kernel_create(), parameter SKB stores the message to be sent, its data field points to the netLink message structure to be sent, and the control block of SKB stores the address information of the message, parameter PID is the PID of the process receiving the message, The nonblock argument indicates whether the function is non-blocking, and if it is 1, the function returns immediately when no receive cache is available, while if it is 0, the function sleeps when no receive cache is available. A message sent to a user-space process consists of three parts: The control field contains the target address and source address that the kernel needs to set when sending netLink messages. Messages in the kernel are managed by sk_buff. The NETLINK_CB macro is defined in Linux /netlink.h to facilitate the address setting of messages: \

#define NETLINK_CB(skb)         (*(struct netlink_skb_parms*)&((skb)->cb))
Copy the code

For example: \

NETLINK_CB(skb).pid = 0;
NETLINK_CB(skb).dst_pid = 0;
NETLINK_CB(skb).dst_group = 1;
Copy the code

Dst_pid indicates the process ID of the message sender, which is the source address. For the kernel, it is 0. Dst_pid indicates the process ID of the message receiver, which is the destination address. Dst_group should be set to 0. \

static int send_to_user(struct packet_info *info) { int ret; int size; unsigned char *old_tail; struct sk_buff *skb; struct nlmsghdr *nlh; struct packet_info *packet; */ size = NLMSG_SPACE(sizeof(*info)); /* Allocate a new socket cache */ SKB = alloc_skb(size, GFP_ATOMIC); old_tail = skb->tail; /* Initializes a netlink header */ NLH = NLMSG_PUT(SKB, 0, 0, IMP2_K_MSG, sie-sizeof (* NLH)); /* skip message header, point to data */ packet = NLMSG_DATA(NLH); /* memset(packet, 0, sizeof(struct packet_info)); /* Fill data to be sent */ packet-> SRC = info-> SRC; packet->dest = info->dest; NLH -> nlMSg_len = SKB ->tail - old_tail; /* Set control field */ NETLINK_CB(SKB).dst_groups = 0; /* Send data */ read_lock_bh(&user_proc.lock); ret = netlink_unicast(nlfd, skb, user_proc.pid, MSG_DONTWAIT); read_unlock_bh(&user_proc.lock); }Copy the code

The netLink_unicast function initializes the netLink message header, populates the data area, sets the control fields, all three of which are contained in the SKB_buff, and finally calls the Netlink_unicast function to send the data. An important netLink macro called NLMSG_PUT is used to initialize the netLink message header: \

#define NLMSG_PUT(skb, pid, seq, type, len) \
({ if (skb_tailroom(skb) < (int)NLMSG_SPACE(len)) goto nlmsg_failure; \
   __nlmsg_put(skb, pid, seq, type, len); })
static __inline__ struct nlmsghdr *
__nlmsg_put(struct sk_buff *skb, u32 pid, u32 seq, int type, int len)
{
	struct nlmsghdr *nlh;
	int size = NLMSG_LENGTH(len);
	nlh = (struct nlmsghdr*)skb_put(skb, NLMSG_ALIGN(size));
	nlh->nlmsg_type = type;
	nlh->nlmsg_len = size;
	nlh->nlmsg_flags = 0;
	nlh->nlmsg_pid = pid;
	nlh->nlmsg_seq = seq;
	return nlh;
}
Copy the code

One thing to note about this macro is that it calls the NLMSG_Failure tag, so you should define this tag in your program. Use the function sock_release in the kernel to release the netlink socket: \ created by the function netlink_kernel_create()

void sock_release(struct socket * sock);
Copy the code

Release NetLink Sockets and Netfilter hook in exit module: \

static void __exit fini(void) { if(nlfd) { sock_release(nlfd->socket); /* Release netlink socket*/} nf_unregister_hook(&imp2_ops); / * from lock netfilter hooks * /} from: http://www.chinaunix.net/jh/4/822500.html code excerpt from: http://www-128.ibm.com/developerworks/cn/linux/l-netlink/imp2.tar.gz flw2 brother changed the above code, and make it can run on the 2.6.25 can: http://linux.chinaunix.net/bbs/viewthread.php?tid=1015818&extra=&page=2 can run on 2.6.9 code:  imp2_k.c #ifndef __KERNEL__ #define __KERNEL__ #endif #ifndef MODULE #define MODULE #endif #include <linux/module.h> #include <linux/kernel.h> #include <linux/init.h> #include <linux/types.h> #include <linux/netdevice.h> #include <linux/skbuff.h> #include <linux/netfilter_ipv4.h> #include <linux/inet.h> #include <linux/in.h> #include <linux/ip.h> #include <linux/netlink.h> #include <linux/spinlock.h> #include <asm/semaphore.h> #include <net/sock.h> #include "imp2.h" DECLARE_MUTEX(receive_sem); static struct sock *nlfd; struct { __u32 pid; rwlock_t lock; }user_proc; static void kernel_receive(struct sock *sk, int len) { do { struct sk_buff *skb; if(down_trylock(&receive_sem)) return; while((skb = skb_dequeue(&sk->sk_receive_queue)) ! = NULL) { { struct nlmsghdr *nlh = NULL; if(skb->len >= sizeof(struct nlmsghdr)) { nlh = (struct nlmsghdr *)skb->data; if((nlh->nlmsg_len >= sizeof(struct nlmsghdr)) && (skb->len >= nlh->nlmsg_len)) { if(nlh->nlmsg_type == IMP2_U_PID) { write_lock_bh(&user_proc.pid); user_proc.pid = nlh->nlmsg_pid; write_unlock_bh(&user_proc.pid); } else if(nlh->nlmsg_type == IMP2_CLOSE) { write_lock_bh(&user_proc.pid); if(nlh->nlmsg_pid == user_proc.pid) user_proc.pid = 0; write_unlock_bh(&user_proc.pid); } } } } kfree_skb(skb); } up(&receive_sem); }while(nlfd && nlfd->sk_receive_queue.qlen); } static int send_to_user(struct packet_info *info) { int ret; int size; unsigned char *old_tail; struct sk_buff *skb; struct nlmsghdr *nlh; struct packet_info *packet; size = NLMSG_SPACE(sizeof(*info)); skb = alloc_skb(size, GFP_ATOMIC); old_tail = skb->tail; nlh = NLMSG_PUT(skb, 0, 0, IMP2_K_MSG, size-sizeof(*nlh)); packet = NLMSG_DATA(nlh); memset(packet, 0, sizeof(struct packet_info)); packet->src = info->src; packet->dest = info->dest; nlh->nlmsg_len = skb->tail - old_tail; NETLINK_CB(skb).dst_groups = 0; read_lock_bh(&user_proc.lock); ret = netlink_unicast(nlfd, skb, user_proc.pid, MSG_DONTWAIT); read_unlock_bh(&user_proc.lock); return ret; nlmsg_failure: if(skb) kfree_skb(skb); return -1; } static unsigned int get_icmp(unsigned int hook, struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) { struct iphdr *iph = (*pskb)->nh.iph; struct packet_info info; if(iph->protocol == IPPROTO_ICMP) { read_lock_bh(&user_proc.lock); if(user_proc.pid ! = 0) { read_unlock_bh(&user_proc.lock); info.src = iph->saddr; info.dest = iph->daddr; send_to_user(&info); } else read_unlock_bh(&user_proc.lock); } return NF_ACCEPT; } static struct nf_hook_ops imp2_ops = { .hook = get_icmp, .pf = PF_INET, .hooknum = NF_IP_PRE_ROUTING, .priority = NF_IP_PRI_FILTER -1, }; static int __init init(void) { rwlock_init(&user_proc.lock); nlfd = netlink_kernel_create(NL_IMP2, kernel_receive); if(! nlfd) { printk("can not create a netlink socket\n"); return -1; } return nf_register_hook(&imp2_ops); } static void __exit fini(void) { if(nlfd) { sock_release(nlfd->sk_socket); } nf_unregister_hook(&imp2_ops); } module_init(init); module_exit(fini);Copy the code
The code that runs on 2.6.29 is imp2_K.cCopy the code
#ifndef __KERNEL__ #define __KERNEL__ #endif #ifndef MODULE #define MODULE #endif #include <linux/module.h> #include <linux/kernel.h> #include <linux/init.h> #include <linux/types.h> #include <linux/netdevice.h> #include <linux/skbuff.h> #include <linux/netfilter_ipv4.h> #include <linux/inet.h> #include <linux/in.h> #include <linux/ip.h> #include <linux/netlink.h> #include <linux/spinlock.h> #include <linux/semaphore.h> #include <net/sock.h> #include "imp2.h" DECLARE_MUTEX(receive_sem); static struct sock *nlfd; struct { __u32 pid; rwlock_t lock; }user_proc; static void kernel_receive(struct sk_buff *skb) { struct nlmsghdr *nlh = NULL; //printk("%s: skb->user: %d\n", __func__, atomic_read(&skb->users)); return; if(down_trylock(&receive_sem)) return; if(skb->len < sizeof(struct nlmsghdr)) goto out; nlh = nlmsg_hdr(skb); if((nlh->nlmsg_len >= sizeof(struct nlmsghdr)) && (skb->len >= nlh->nlmsg_len)) { if(nlh->nlmsg_type == IMP2_U_PID) { write_lock_bh(&user_proc.lock); user_proc.pid = nlh->nlmsg_pid; write_unlock_bh(&user_proc.lock); } else if(nlh->nlmsg_type == IMP2_CLOSE) { write_lock_bh(&user_proc.lock); if(nlh->nlmsg_pid == user_proc.pid) user_proc.pid = 0; write_unlock_bh(&user_proc.lock); } } //kfree_skb(skb); out: up(&receive_sem); } static int send_to_user(struct packet_info *info) { int ret; int size; unsigned char *old_tail; struct sk_buff *skb; struct nlmsghdr *nlh; struct packet_info *packet; size = NLMSG_SPACE(sizeof(*info)); skb = alloc_skb(size, GFP_ATOMIC); old_tail = skb->tail; nlh = NLMSG_PUT(skb, 0, 0, IMP2_K_MSG, size-sizeof(*nlh)); packet = NLMSG_DATA(nlh); memset(packet, 0, sizeof(struct packet_info)); packet->src = info->src; packet->dest = info->dest; nlh->nlmsg_len = skb->tail - old_tail; NETLINK_CB(skb).dst_group = 0; read_lock_bh(&user_proc.lock); ret = netlink_unicast(nlfd, skb, user_proc.pid, MSG_DONTWAIT); read_unlock_bh(&user_proc.lock); return ret; nlmsg_failure: if(skb) kfree_skb(skb); return -1; } static unsigned int get_icmp(unsigned int hook, struct sk_buff *pskb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) { printk("%s\n", __func__); struct iphdr *iph = (struct iphdr *)skb_network_header(pskb); struct packet_info info; if(iph->protocol == IPPROTO_ICMP) { read_lock_bh(&user_proc.lock); if(user_proc.pid ! = 0) { read_unlock_bh(&user_proc.lock); info.src = iph->saddr; info.dest = iph->daddr; send_to_user(&info); } else read_unlock_bh(&user_proc.lock); } return NF_STOLEN; } static struct nf_hook_ops imp2_ops = { .hook = get_icmp, .pf = PF_INET, .hooknum = NF_INET_PRE_ROUTING, .priority = NF_IP_PRI_FIRST, }; static int __init init(void) { rwlock_init(&user_proc.lock); //nlfd = netlink_kernel_create(NL_IMP2, kernel_receive); nlfd = netlink_kernel_create(&init_net, NL_IMP2, 0, kernel_receive, NULL,THIS_MODULE); if(! nlfd) { printk("can not create a netlink socket\n"); return -1; } return nf_register_hook(&imp2_ops); } static void __exit fini(void) { if(nlfd) { sock_release(nlfd->sk_socket); } nf_unregister_hook(&imp2_ops); } module_init(init); module_exit(fini);