In this paper, the point

  • The introduction

  • Ethernet interface

    • Leintr function

    • Leread function

    • Ether_input function

    • Ether_output function

    • Lestart function

  • Ioctl system call

    • Ifioctl function

    • Ifconf function

    • Ioctl command for the general interface

    • If_down and if_UP functions

    • Ethernet, SLIP, and loopback

The introduction

The interface layer in chapter 3 of TCP/IP Discusses the data structures used by all interfaces and how to initialize them. This article will show how the Ethernet device driver receives and transmits data frames after initialization. Configure the common IOCtl command for the network device at the same time.

We’re not going to look at the source code for the entire Ethernet drive, because it’s about 1,000 lines of code, half of which is the hardware details of a particular interface card, but we’re going to look at the hardware-independent parts of the Ethernet code and how the driver interacts with the rest of the kernel.

Network device drivers are accessed through seven Pointers to the IFNET structure. Figure 1 lists the entry points to our three sample drivers.

Input functions are not included in Figure 1 because they are network device interruption-driven. The configuration of interrupt service routines is hardware dependent and beyond the scope of this article. We want to understand the functions that handle device interrupts, not the mechanism by which they are called.

Figure 1 shows the interface functions of the example driver

   

Ethernet interface

Net/3 Ethernet device drivers all follow the same design. As is the case with most Unix device drivers, the driver for writing a new interface card is always modified from an existing driver. Here is a brief overview of the Ethernet standard and an Ethernet driver design, illustrated with the LANCE driver.

Figure 2 illustrates the Ethernet encapsulation of an IP packet.

Figure 2. Ethernet encapsulation of an IP packet

An Ethernet frame consists of a 48-bit destination address and a 48-bit source address, followed by a 16-bit type field that identifies the format of the data carried by the frame. For IP packets, the type is 0x0800. At the end of the frame is a 32bit CRC loop redundancy check that checks for frame errors.

We use a 48-bit Ethernet address as the hardware address. ARP is used to translate IP addresses into hardware addresses, and RARP is used to translate hardware addresses into IP addresses. There are two types of Ethernet addresses: unicast and multicast. A unicast address describes a single Ethernet interface, while a multicast address describes a group of Ethernet interfaces. Ethernet broadcast is a multicast received by all interfaces. Ethernet unicast addresses are assigned by the device manufacturer, but some device addresses can be changed by software.

Figure 3 illustrates the data structure and functions of the Ethernet interface.

Figure 3 Ethernet device driver

In Figure 3, ellipses represent a function (LEINTR), boxes represent a data structure (LE_sofTC [0]), and rounded boxes identify a group of functions (ARP protocol).

Figure 3 shows the input queue for OSI connectionless network layer protocols, IP, and ARP in the upper left corner. Ether_input divides Ethernet frames into multiple protocol queues.

1. Leintr function

We start with Ethernet frame reception. In normal operation, an Ethernet interface receives a frame whose destination address is its unicast address or Ethernet broadcast address. When a full frame is available, the interface generates an interrupt and the kernel calls LEINTr.

Leintr checks the hardware, and if a frame arrives, it calls Leread to transfer the frame from the interface to an MBUF chain (with m_devget). If the hardware reports that a frame has been transmitted or an error is detected (such as an incorrect checksum), LeINTr updates the corresponding interface statistics, resets the hardware, and calls LeStart to transmit another frame.

All Ethernet device drivers pass the frames they receive to ether_INPUT for further processing. The MBUF chain constructed by the device driver does not include the Ethernet head, which is passed to ether_INPUT as an independent parameter. The ether_header structure is shown in Figure 4.

Figure 4 ether_header structure

38 to 42 Ethernet CRC is not always correct. It is computed and validated by the interface hardware, which discards incoming CRC error frames. The Ethernet device driver is responsible for converting ether_type’s network and host byte sequences.

2. Leread function

The function Leread (Figure 5) starts with a continuous memory buffer passed to it by Leintr, and constructs an ether_header structure and an MBUF chain. This linked list is used to store data from Ethernet frames. Leread also passes input frames to the BPF.

Figure 5. Function Leread

The 528 to 539 function leintr passes three arguments to Leread: Unit, which indicates the specific interface card receiving this frame; Buf, which points to the received frame; Len, which is the number of bytes (including header and CRC) of the frame.

Et points to the beginning of the cache BUF and converts Ethernet byte sequences into host byte sequences to construct ether_header.

540 to 551 The number of bytes of data is obtained by subtracting len from the Ethernet header and CRC length. A short packet is an illegal Ethernet frame that is too short and is logged, counted, and discarded.

Destination addresses 552 to 557 are detected and determined to be Ethernet broadcast or multicast addresses. Ethernet broadcast address is a special case of Ethernet multicast address. Each of its bits is set to 1. Etherbroadcastaddr is an array defined as follows:

    u_char etherbroadcastaddr[6]={0xff, 0xff, 0xff,0xff,0xff,0xff}

BCMP compares etherBroadcastADDR with ether_dhost and sets the flag M_BCAST if they are the same. An Ethernet multicast address is identified by the low bits of the first byte of the address, as shown in Figure 6.

Figure 6 verifies an Ethernet multicast address

Not all Ethernet multicast frames are IP multicast datagrams, and the IP must further detect this grouping. If the multicast bit for this address is set, set M_MCAST in the mBUF header. The order of detection is important: it is first compared to the Ethernet broadcast address and, if different, detects the low-order bits that identify the first byte of the Ethernet multicast address.

If the interface has BPF, call BPF_TAP to pass the frame directly to BPF. A specific BPF frame is constructed for SLIP and loopback interfaces because these networks do not have a link layer header (unlike Ethernet).

If a packet sends a unicast address that does not match the address of the interface, leread discards it.

M_devget copies the cache passed to Leread into one of its allocated MBUF chains and returns a pointer to the MBUF. The first parameter passed to m_devGET points to the first byte after the Ethernet header, which is the first data byte in this frame. If m_devget fails to get memory, leread returns immediately. Additionally the broadcast and multicast flags are set in the first MBUF in the linked list, and ether_input handles this grouping.

3. Ether_input function

The function ether_input, shown in Figure 7, checks the structure ether_header to determine the type of data received and queues the received packets for processing.

Figure 7 ether_input function

A. Identification of broadcast and multicast

The parameters 196 to 209 passed to ether_input were: IFp, a pointer to the IFNET structure of the interface that received the packet; Eh, a pointer to the Ethernet header of the receiving packet; M, a pointer to the receive packet (excluding the Ethernet header).

Any packets that reach a non-working interface are discarded. A protocol address may not be configured for the interface, or the interface may be displayed disabled by the ifconfig program.

The 210-218 variable time is a global timeval structure used by the kernel to maintain the current time and date, which is the number of seconds and microseconds since the Unix epoch (00:00:00 January 1, 1970). The structure timeVal is commonly encountered in NET/3 source code:

    struct timeval {

        long tv_sec;

        long tv_usec;

    };

Update if_lastchange with the time and add if_ibytes to the length of the input packet (the packet length plus the 14-byte Ethernet header).

Ether_input then determines again if it is a broadcast or multicast group.

B. Use the link layer

219 to 227 ether_INPUT Jumps based on the Ethernet type field. For an IP packet, schednetisr schedules an IP software outage and selects an IP input queue, IPintrq. For an ARP packet, schedule an ARP software interrupt and select ArpintrQ. An ISR is an interrupt service routine.

228 to 307 By default, packets that do not recognize Ethernet types or are encapsulated according to 802.3 standards (such as OSI connectionless transport) are processed.

Note: The Ethernet type field and the 802.3 length field occupy the same space in an Ethernet frame. The two packages are distinguishable because the type range of an Ethernet package is different from the length range of an 802.3 package, as shown in Figure 8.

Figure 8 Ethernet Type field and 802.3 Length field

The frame format of the Ethernet and 802.3 standards is shown in 9.

Figure 9 Ethernet and 802.3 standard frame formats

C. Queue in groups

308 to 315 ether_INPUT Indicates that the group is placed in the selected queue. If the queue is full, the group is discarded. The default length limit for IP and ARP queues is 50 packets each.

When ether_INPUT returns, the device driver notifies the hardware that it is ready to receive the next group, which may already exist on the device. Process the packet input queue when a software outage scheduled by schednetisr occurs; Specifically, ipINTr is called to handle packets in the IP input queue.

4. Ether_output function

We now look at the output of Ethernet frames, which begins to process when a network-layer protocol, such as IP, calls the function if_output specified in the IFNET structure of this interface. The IF_output of all Ethernet devices is ether_output. Ether_output encapsulates the data portion of an Ethernet frame with a 14-byte Ethernet header and places it in the interface’s send queue. This is a long function, which is explained in four parts:

  • validation

  • Protocol specific processing

  • Structural frame

  • The interface queue

Figure 10 includes the first part of the function.

Figure 10 function EHter_output: validation

Parameters of ether_output include: IFp, which points to the IFNET structure of the output interface; M0, the packet to send; DST, the destination address of the packet; Rt0, routing information.

65~67 Call macro senderr many times in ether_output

# define senderr(e) { error = (e); goto bad; }

Senderr saves the error and jumps to the end of the function, bad, where the grouping is discarded and returns err.

If the interface is up and running, the last update time of the interface is updated; otherwise, ENETDOWN is returned.

A. Host route

68 to 74 Rt0 points to the route item found by IP_output and is passed to ether_output. Rt0 can be null if ether_output is called from BPF. In this case, control is transferred to the code in Figure 11. Otherwise, verify the route. If the route is invalid, refer to the routing table to find a new route. If no route is found, return EHOSTUNREACH. At this point, RT0 and RT point to a valid route to the next hop destination.

Figure 11 Function ether_output: network protocol processing

B. Gateway routes

75 to 85 If the next hop of a packet is a gateway, but not the destination, find a route to this gateway and rt points to it. If a gateway route cannot be found, EHOSTUNERACH is returned. In this case, RT points to the route to the next hop destination. The next hop may be a gateway or final destination address.

C. Avoid ARP flooding

86 to 90 When the target does not respond to ARP requests, the ARP code sets RTF_REJECT to discard packets that reach the target.

Ether_output continues processing based on the destination address of this packet. Because Ethernet devices respond only to Ethernet addresses, to send a packet, ether_output must discover the Ethernet address corresponding to the IP address of the next hop destination. The ARP protocol is used to implement this transformation, and Figure 11 shows how the driver accesses ARP.

D. the IP output

91 to 101 ether_output Jumps based on sa_family in the destination ADDRESS. Figure 11 shows only the code with case AF_INET, AF_ISO, and AF_UNSPEC, but ignores the code with case AF_ISO.

If case is AF_INET, arpresolve is called to determine the Ethernet address corresponding to the destination IP address. If the Ethernet address already exists in the ARP cache, arpresolve returns 1 and ether_output continues execution. Otherwise, the IP group is controlled by ARP, and ARP determines the address, calling ether_output from the function in_arpINPUT.

Assuming the ARP cache contains the current device hardware address, ether_output checks if the packet is to be broadcast and if the interface is one-way (for example, it can’t receive the packet it sent). If true, m_copy copies the grouping. After the switch is executed, the replicated packets are queued like packets arriving at the Ethernet interface. It is a requirement of the broadcast definition that the sending host must receive a copy of this packet.

E. Display Ethernet output

142 to 146 Some protocols, such as ARP, explicitly specify Ethernet destinations and types. The address family constant AF_UNSPEC indicates that DST points to an Ethernet header. Bcopy copies the destination address in edST and sets the Ethernet type to Type. It does not have to call arpresolve because the Ethernet destination address is already provided by the caller display.

F. Unrecognized address families

An unrecognized address class generates a console message and returns EAFNOSUPPORT. Figure 12 shows the next part of Ether_output: constructing an Ethernet frame.

Figure 12 function ether_output: Constructing an Ethernet

G. Ethernet header

152~167 If the code in the switch copies the packet, the packet copy is processed by a call to LOOutput, just like the packet received on the output interface.

M_PREPEND ensures that some space is left in front of the grouping, so M_PREPEND only needs to adjust some Pointers.

Ether_output uses Type, EDST, and AC_enADDR to form the Ethernet header. Ac_enaddr is the Ethernet unicast address associated with this output interface and is the source address of all frames transmitted from this interface. Ether_header overrides the source address that the caller might specify in the ether_header structure with ac_enaddr, making it harder to forge an Ethernet frame.

At this point, the mbuf contains a full Ethernet frame except for 32bitCRC, which is calculated by the Ethernet hardware at transmission time. The code shown in Figure 13 queues the frames to be sent by the device.

Figure 13 Function ether_output: output queuing

168 to 185 If the output queue is full, ether_output discards the frame and returns ENOBUFS. Otherwise, the frame is placed in the interface’s send queue, and if the interface is not activated, the interface’s IF_start function starts transmitting the frame.

Macro Senderr skips to BAD, where the frame is discarded and returns an error.

5. Lestart function

The function lestart takes the queued frame from the interface output queue and gives it to LANCE Ethernet to send. If the device is idle, call this function to start sending frames. For example, at the end of ether_output in Figure 13, lestart is called directly through the interface’s if_start function.

If the device is busy, an interrupt occurs when it finishes transmitting the current frame. The device calls lestart to unqueue and transmit the next frame. Once started, the protocol layer no longer calls LeStart to queue frames, because the driver keeps dropping out until the queue is empty.

Figure 14 shows the function lestart. Lestart assumes that splIMP has been called to block all device interrupts.

Figure 14. Function lestart

A. The interface must be initialized

325 to 333 If the interface is not initialized, lestart returns immediately.

B. Unqueue the frame from the output queue

335 to 342 If the interface is initialized, the next frame is removed from the queue. If the interface output queue is empty, lestart returns.

C. Transmit frames and pass them to BPF

343 to 350 LePUT copies the frames in M to the hardware cache pointed to by the first parameter of leput. If the interface has BPF, pass the frame to BPF_TAP. Device-specific initialization code for frame transfer in the hardware cache is not discussed.

D. If the device is ready, send multiple frames repeatedly

359 Lestart stops sending frames to the device when le-> sc_txCNt equals LETBUF. Some Ethernet interfaces can queue multiple Ethernet output frames. For the LANCE drive, LETBUF is the number of hardware transfer caches available for this drive, and LE -> SC_TXCNt keeps track of how many caches are in use.

E. Mark the device as busy

Finally, lestart sets IFF_OACTIVE in the IFNET structure to indicate that the device is busy transmitting frames.

Ioctl system call

Ioctl system calls provide a generic command interface that a process uses to access features of a device that are not supported by standard system calls. Ioctl prototypes are:

    int ioctl( int fd, unsigned long com, …)

Fd is a descriptor, usually a device or network connection. Each type of descriptor supports its own set of IOCtl commands, specified by the second argument com. The third parameter appears in the prototype as “…” Because it is a pointer to the type of ioctl command being called. If the command is to retrieve information, the third argument must be a pointer to a sufficient cache to hold the data. In this article, we will only discuss ioctl commands for socket descriptors, and Figure 15 lists the commonly used IOCTL commands.

Figure 15 Ioctl commands for the interface

The symbolic constant shown in the first column identifies the IOCtl command (the second argument, com). The second column shows the type of the third parameter when passing the system call to the command shown in the first column. The third column is the name of the function that implements the command.

Figure 16 shows the organization of the various functions that handle iocTL commands. Shaded functions are described in this chapter.

Figure 16. Function organization of ioctl commands

1. Ifioctl function

The system call IOCTL passes the five commands listed in Figure 15 to the iFIoctl function shown in Figure 17.

Figure 17. Ifioctl: Overview and SIOCGIFCONF

394~405 With the SIOCGIFCONF command, ifioctl calls ifconf to construct a variable-length ifreq-structured table.

For other IOCtl commands, the data argument is a pointer to an ifreq structure. Ifunit looks in the IFNet list for interfaces whose names are the text names (such as “sl0”, “LE1”) provided by the process in ifr->ifr_name. Ifioctl returns ENXIO if there is no matching interface. The rest of the code relies on CMD, which is illustrated in Figure 24.

447~454 If the interface ioctl command is not recognized, ifioctl sends the command to the user request function of the protocol associated with the requested socket. Returns 0 if control is outside the switch statement.

2. Ifconf function

Ifconf provides a standard way for processes to discover interfaces and configured addresses in a system. The interfaces represented by the structures ifreq and ifconf are shown in Figure 18 and 19.

FIG. 18 Structure IFREq

Figure 19 Structure ifConf

262 to 279 In the ifreq structure, ifr_name indicates the interface name. Other members of the union are accessed by various IOCtl commands. Macros are often used to simplify access syntax to members of a consortium.

292~300 In ifconf, ifc_len is the number of bytes in the cache pointed to by ifc_buf. This cache is allocated by the process, but filled by ifConf with an array of variable-length IFREq structures. For the function ifconf, ifr_addr is the associated member of the ifreq structure. Ifreq structures are of variable length because the length of ifr_ADDR (a sockADDR structure) varies depending on the type of address. The end of each item must be located with sa_len, a member of the structure sockADDR, and Figure 20 illustrates the data structure maintained by ifConf.

Figure 20. Ifconf data structure

In Figure 20, the data on the left is in the kernel, while the data on the right is in a process. We use this diagram to discuss the ifconf function shown in Figure 21.

Figure 21 The function ifconf

The two arguments to ifconf are CMD, which is ignored; Data, which points to a copy of the ifconf structure specified by this process.

Ifc is an ifconf structure pointer to data; Ifp iterates through the list of interfaces starting with IFNET (the list header), while IFA iterates through the address list of each interface. Cp and EP control the construction of interface text names in IFR, an IFREQ structure that holds interface names and addresses before they are copied into the process’s cache. Ifrq points to this cache and points to the next one after each address is copied. Space is the number of bytes remaining in the process cache, cp is used to search for the end of the name, and EP marks the last possible location of the numeric portion of the interface name.

475 to 488 for loops through the interface list. For each interface, the text name is copied to ifr_name, followed by the text representation of the if_unit number. If no address is assigned to the interface, an all-0 address is constructed, and the resulting IFREq structure is copied into the process, reducing space and increasing IFRP.

489 to 515 If an interface has one or more addresses, use the for loop to process each address. The address is added to the interface name in the IFR, which is then copied into the process. Addresses longer than the standard SockADDR structure are not placed in the IFR and are copied directly to the process. After copying each address, adjust the space and IFRP values. After all interfaces are processed, the cache length is updated (ifC -> ifC_len) and ifconf returns. The system call IOCtl is responsible for copying the new contents of the ifConf structure back into the ifconf structure in the process.

Figure 22 shows the configuration of the interface structure for Ethernet, SLIP, and loopback interface initialization.

Figure 22. Interface and address data structure

The contents of the IFC and buffer after executing the following code are shown in Figure 23.

    struct ifconf ifc;

    char buffer[144];

    int s

    ifc.ifc_len = 144

    ifc.ifc_buf = buffer;

    if ( ioctl(s, SIOCGIFCONF, &ifc)<0 ){

        perror(“ioctl failed”);

        exit(1);

    }

There is no restriction on the socket type of the command SIOCGIFCONF operation, which, as we can see, returns the addresses of all protocol families.

In Figure 22, ioctl changes ifc_len from 144 to 108 because the three addresses returned from the cache occupy only 108 (36*3) bytes. Returns three sockadDR_DL addresses, and the next 36 bytes of this cache are unused. The first 16 bytes of each item contain the text name of the interface, of which only the first 3 bytes are used.

Figure 23 Data returned from the SIOCGIFCONF command

Ifr_addr is in the form of a sockADDR structure, so the first value is the length (20 bytes) and the second value is the type of the address (18, AF_LINK). The next value is sdl_index, which, like sdl_type, is different for each interface (IFT_ETHER, IFT_SLIP, and IFT_LOOP correspond to values 6, 28, and 24).

The next three values are sa_nlen (text name length), sa_alen (hardware address length), and sa_slen (unused, always 0).

Finally, the text name of the interface, followed by the hardware address (for Ethernet only). SLIP and loopback interfaces do not store a hardware-level address in the SOckadDR_DL structure.

In this case, only the SOdkadDR_DL address is returned, because no other address types are configured in Figure 22, so each entry in the cache has the same size. If you configure other addresses (such as IP addresses) for each interface, they are returned together with the SOckadDR_DL address, and the size of each item varies depending on the type of address returned.

3. General interface ioctl commands

The rest four interfaces lags behind in Figure 15 (SIOCGIFFLAGS, SIOCGIFMETRIC, SIOCSIFFLAGS and SIOCSIFMETRIC) are all processed by the function Ifioctl. Figure 24 shows the case statements that process these commands.

Figure 24. Function ifioctl: flags and metrics

A. SIOCGIFLAGS and SIOCGIFMETRIC

410 to 416 For the two SIOCGxxx commands, ifioctl copies the IF_flags or IF_metric value of each interface to the ifreq structure. Flags use the union member IFr_flags; The metric uses the ifR_ metric member.

    b. SIOCSIFFLAGS

To change the flag of the interface, the calling process must have superuser permission. If a process is closing a running interface or starting a non-running interface, call if_DOWN and if_UP, respectively.

C. Ignore the IFF_CANTCHAGE flag

Some interface flags from 430 to 434 cannot be changed by processes. The expression (ifp->if_flags&IFF_CANTCHANGE) clears interface flags that can be changed by the process, while the expression (ifr->ifr_flags&~IFF_CANTCHANGE) clears flags that cannot be changed by the process in the request. Both expressions perform or operations as new values in ifp-> IF_flags. Before returning, the request is passed to the if_IOCtl function associated with the device (for example, Leioctl figure 26 for the LANCE driver).

    d. SIOCSIFMETRIC

It is easier to change interface metrics from 435 to 439; The process also has superuser privileges, and ifioctl copies the interface’s new metric to if_metric.

4. If_down and if_UP functions

An administrator lags behind or disables an interface with the siocsifffflags command set or cleared with the IFF_UP flag. Figure 25 shows the code for the if_down and if_UP functions.

Figure 25. Functions if_down and if_UP

292 to 302 When an interface is disabled, the IFF_UP flag is cleared and the PRC_IFDOWN command is sent using pfctlinput for each address associated with the interface. This gives each protocol an opportunity to respond to the closed interface. Some protocols, such as OSI, use interfaces to terminate connections. For IP, reroute the connection through other interfaces if possible. TCP and UDP ignore invalid interfaces and rely on routing protocols to discover alternative paths for packets.

If_qflush ignores any queue grouping for the interface. Rt_ifmsg Notifies the routing system of changes. TCP automatic retransmission lost packet; UDP applications must detect and respond to this situation explicitly.

308 to 315 When an interface is enabled, the IFF_UP flag is set and rt_IFMSG informs the routing system of the interface status change.

5. Ethernet, SLIP and loopback

The code behind the SIOCSIFFLAGS command in Figure 24 shows the ifioctl interface and lags behind the if_ioctl function. In our three sample interfaces, the slioctl and Loioctl lags behind the siocsfffflags command and return The EINVAL lags behind. Figure 26 lags behind the SIOCSIFFLAGS command of the Function Leioctl and LANCE Ethernet driver.

Figure 26. Function Leioctl

614 to 623 Leioctl converts the third parameter data to a pointer to the IFADDR structure, which is stored in the IFA. The LE pointer references the LE_SOFTC structure subscript IFP ->unit. The CMd-based switch statement forms the body of this function.

The case lags behind “SIOCSIFFLAGS” in Figure 26. This time ifioctl calls Leioctl, the interface flag is changed, forcing the physical interface to enter the configured state of the flag. If you want to close an interface but the interface is working, close it. To start an unoperated interface, the interface is initialized and restarted.

672 to 677 Send EINVAL in default case of unrecognized command and return at function end.