Speaking of byte alignment, let’s first look at the memory arrangement of addresses.

Multi-module interleaved memory

A main memory consisting of several modules that is linearly addressed. There are two ways to arrange these addresses in each module: one is sequential, and the other is cross.

In conventional main memory design, address access is sequential. As shown in the figure, the memory capacity is 32 words, divided into four modules M0-M3, each module stores 8 words. Access addresses are sequentially assigned to one module and then sequentially assigned to the next. Thus, the 32 words of the memory can be indicated by a 5-bit address register, where the upper two bits select one of the four modules and the lower three bits select eight words from each module.

The advantage of this approach is that when one module fails, other modules can still work, and it is convenient to expand memory capacity by adding modules. The disadvantage is that modules work sequentially, so memory bandwidth is limited.

Another approach is the crossover approach. As shown in the figure, the memory capacity is also 32 words, also divided into four modules, each module 8 words. But the way addresses are assigned is interleaved. First allocate 4 linear addresses 0, 1, 2, and 3 to M0, M1, M2, and M3, then allocate 5, 6, 7, and 8 to M0, M1, M2, and M3, and repeat until all linear addresses are allocated. When the memory is addressed, the lower two bits of the address register are used to select one of the four modules, and the higher three bits are used to select eight words in the module.

Select different modules through decoding with the low field, and point to the storage word in the corresponding module with the high field. In this way, the continuous linear address is distributed in the adjacent modules, but the address in the same module is discontinuous. For the block transmission of continuous words, the cross-mode memory can realize the parallel access of multi-module flow, which greatly improves the memory bandwidth.

Byte alignment

Our computers usually take the second approach.

For pentium 32-bit CPU connected storage as shown below:

D0 to D31 are connected to the 32-bit data bus. The 32-bit address bus can address up to 4GB of space. The 4GB memory space is composed of four memory chips, each 1GB, and THE A31~A2 address bus of CPU is connected to each memory chip (not shown in the figure). In fact, the main memory address given by CPU is not A1~A0, instead of the 4-byte permissible signal BE3~BE0. BE3 to BE0 Controls whether the selected storage unit is allowed to complete read/write access. On the left side of the graph is the hexadecimal memory address. This address is external. For each memory chip, it looks like the address layout is still 0,1,2,3… It’s arranged like this.

Using the above connection, the CPU can access memory by two words (32), words (16), and bytes (8).

Access by double word

Select the same address of the four memory chips (not the same address for the memory chip itself) through A31 to A2 in the main memory address of the CPU, and then make BE3 to BE0 fully valid, the continuous double word (32-bit) can be read.

According to the word access

Similar to access double word, by controlling BE3~BE0 at the same time select two contiguous address chips can be.

Byte access

Similar to access double-word, select one of the memory chips by controlling BE2~BE0.


Data is stored at an address that is an integer multiple of the length of the data byte.

Knowing how the CPU reads memory words, we found that it takes two cycles to fetch non-byte aligned words or double words.


conclusion

Because of the way the CPU and memory are connected. And the CPU’s main memory address does not have a1-A0, resulting in the CPU’s memory address is always a multiple of 4 (32-bit CPU). If the storage address of the data is not byte aligned, the normal operation will not be affected, but the performance will be reduced. The CPU needs two cycles to read the data and then concatenate the data to read the data. If the data is stored in byte aligned addresses, the CPU can read one double word in a cycle.


The resources

CPU Access memory