• First public number: Rand_cs, for attention to support

Scroll Rendering (Basic)

This article continues the PPU topic of scrolling. From playing games as kids, we know that the NES supported pixel-level scrolling, which was a first for its time, and one of the reasons why FC/NES was so popular

How does the PPU support pixel-level scrolling? This starts with a look at some of the hardware parts of the PPU.

Memory mapped register

Let’s start by looking at some of the registers that map to the CPU’s address space, which is also the port through which the CPU communicates with the PPU. From the beginning of this article, I will use 0x as the hexadecimal number. There are too many formatting problems with the $. In the previous article, I added \ escape to each $, and then used mdnice to guide the format. Hex numbers would have to be $if you were developing for NES assembly, but almost no one would do that. Without further ado, let’s look at the PPU register:

PPUCTRL

Control register (0x2000) :

  • Bit0-1, select NameTable, 00-0x2000, 01-0x2400, 10-0x2800, 11-0x2C00
  • Bit2, there is a special register to record the address when accessing the VRAM, each time accessing the existing VRAM, this value is increased, 0: increase by 1 is horizontal move, 1: increase by 32 is vertical move
  • Bit3, which PatternTable the Sprite uses, 0:0x0000, 1:0x1000
  • Bit4, which PatternTable is used for the background, 0:0x0000, 1:0x1000
  • Bit5, Sprite size, 0:8 ×88\times88×8, 1:8 ×168\times168×16
  • Bit7 produces NMI at the beginning of V_Blank

Basically all of this has been said, and the details will be covered later.

PPUMASK

Mask knows what it means and what to block, so this register controls what to render and what not to render:

  • Bit0, 0: display normal color, 1: display black and white image
  • Bit1, 1: render the background in the left 8 columns of pixels, 0: do not render
  • Bit2, when the Sprite is at the far left of the screen, 1: renders 8 columns of pixels to the left of the Sprite, 0: does not render
  • Bit3, 1: render background, 0: do not render
  • Bit4, 1: render Sprite, 0: no render

PPUSTATUS

The status register mainly records three states:

Bit5: whether Sprite overflow, Sprite overflow is only if there are more than 8 sprites in the current scan line, if there are more than 1, the table overflow

Bit6: Sprite 0 hit, position 1 when the opacity of Sprite 0 overlapped with the background opacity. This is mainly used for screen segmentation, which is to create the large scale effect

Bit7: Whether you are in V_Blank. If yes, set 1

OAMADDR & OAMDATA & DMA

Addr is used to select an address, and then read and write from the data register. Indeed, these two ports are used to operate the OAM space. Note that since the address bus has 16 bits and the data only has 8 bits, two consecutive operations must be performed each time the address-related information is read and written.

However, it is generally not used in this way, because each time the data is transmitted through the CPU, it is too slow, usually the two ports OAMADDR and OAMDMA(0x4014) are used together. DMA will automatically load the Sprite from the CPU address space into the OAM by placing the first 8 bits of the Sprite (usually 0x200) in ADDR and the first 8 bits in DMA. No need to pass through the CPU each time the speed is greatly increased.

Also, the OAM should not be changed most of the time. Normally, the OAM should only be changed during V_Blank, because the rest of the render is in the render phase. For example, the render frame starts with the Sprite on the ground, and the render frame ends with the Sprite in the sky.

Scroll

Scroll register, write only, write twice in a row to determine which pixel is in the upper left corner of the screen. To give you an intuitive example, here are Mario’s two nametables:

If I write 24,16 to Scroll successively, the rendering will start from the position shown below:

ADDRESS&DATA

The PPUADDR register address is 0x2006, and the PPUDATA register address is 0x2007. The memory here is the memory of the PPU address space, that is, the PPU RAM can be accessed using these two registers, PatternTable, Pallete, and the rest is basically the same.

Internal register

This section describes the internal invisible registers of the PPU. The previous several memory mapped registers can be accessed by us, but the following registers cannot be accessed directly.

v

Currrent VRAM address, 15 bit, that is, the current VRAM address stored in the V

t

Temporary VRAM address, 15 bits, stores the VRAM address to be accessed temporarily, or stores the scrolling address.

x

Fine X Scroll, 3bit To store the detailed address of the X axis when scrolling.

w

Because the address has 16 bits and the data bus only has 8 bits, the address needs to be written twice in a row. Therefore, a toggle is needed to record whether the address is written first or second time.

Scrolling analysis

Before scrolling, the Scroll register is explained in a bit more detail, and also explains the relationship between the memory mapped register and its internal register.

Previously we said that writing to the Scroll register twice in a row (X address and Y address) will set which pixel of a NameTable will be in the upper left corner of the screen. Although NameTable is actually an index that holds a screen of tiles, we can logically think of it as a screen of tiles.

Which NameTable is set by writing the 0x2000 PPUCTRL register to the lower 2bit

The X address can be divided into coarse X address and fine X address. A simple translation is that the coarse X address and the fine X address are coarse. Y address is also similar, which can be divided into coarse Y and fine Y.

Coarse means the coordinates of a tile, and fine means the exact location of a pixel within the tile

And what does this have to do with t, V, x, t?

If t and v represent scrolling addresses, they have the following structure:

The diagram is very clear and I’m not going to explain it, but fine X Scroll is missing, fine X is stored separately in the X register.

The data written to 0x2000 is written 2 bits below the corresponding position of T, indicating which NameTable to use

When w = 0, that is, when writing to the Scroll register for the first time, the upper 5 bits of the X address are written into the lower 5 bits of T, and the lower 3 bits of data are written into X. After writing, setting W to 1 means that the next write will be the second write.

When w = 1, that is, when writing to the Scroll register for the second time, the Y address is directly written into the corresponding position of T, and w is cleared to 0 after writing.

This sets a pixel of a NameTable in the upper left corner of the screen, usually during V_Blank, when the CPU is processing the NMI. Increment it by 1 each time to scroll horizontally.

Write NameTable 2 bits lower than 0x2000, write X and Y twice to 0x2005, select a pixel in the upper left corner, and set V_Blank once each time.

This is just a simple scrolling mode in general, there are some advanced gameplay screen splitting technology will be discussed later, in addition, this is just from the programmer’s point of view, how the hardware to do the rendering part in detail.

Hardware matting part

I mentioned a lot of stinginess in the NES, but it was all in the software part. Let’s talk about the stinginess in the hardware part.

Writing data to 0x2005 is actually writing t, and writing the address to 0x2006 is actually writing T, only at the end of the copy from T to V. The 16-bit address also needs to be written twice, so a toggle is needed to record the number of times it is written, and this toggle also shares the w mentioned above.

When writing to 0x2005 and 0x2006, two registers, t and W, are shared.

When the high address is first written to 0x2006, only the lower 6 bits of the data are valid, the highest bit of t is zero clear, and w is 1.

Write the lower address to 0x2006 a second time, all 8 bits of the data are valid, write it to the lower 8 bits of T, and immediately copy t to V. This is the difference between writing 0x2005 and writing 0x2006. It doesn’t copy from t to V after writing 0x2005, whereas writing 0x2006 does. In addition, you always have to clear w 0 after you write it.

In addition, when reading or writing VRAM, the value in V is automatically increased by 1 or 32, which is controlled by the PPUCTRL register bit2. The increment of 1 indicates the next tile horizontally and the increment of 32 indicates the next tile vertically.

At the end of this part, you’re going to have a nice stroke of the scroll address written to 0x2005 and the normal address written to 0x2006.

It is the address of the PPU address space, but the PPU address space is 64KB, but only 8KB is useful, so actually 14 bits is enough, so only the lower 6 bits are valid for the first 0x2006 byte.

The scrolling address written to 0x2005 is not technically an address; t and x add up to a pixel position.

Obviously looking at this diagram, you can’t think of an address format, and an address can’t be split like this. However, the lower 12 bits of T, or NNYYYYYXXXXX, can be considered an address.

12bit can index 4KB, which is exactly the size of 4 NameTable & AttributeTable, and the address partition format is exactly NNYYYYYXXXXX. NN selects NameTable, YYYYY represents the Y coordinate of the tile. XXXXX represents the tile X coordinate.

Of course, the 12-bit address here is not an absolute address, but a relative address to 0x2000.

Apply colours to a drawing

Render is divided into two parts, background render and Sprite render, render in pixels. The PPU’s “per clock cycle” gets the background color information and the Sprite color information, and the two compete for priority to determine which one to output.

To clarify, we need to know some of the hardware inside the PPU:

background

  • The first is VRAM Address, temporary VRAM Address, Scroll, toggle

  • The two 16bit shift registers, which I’ll call pattern_shifter later, hold the 2 tiles that will be rendered. Be aware that tiles are stacked high and low, so one register holds the high of 2 tiles. One register holds the low place of 2 tiles. A tile pattern is 64 pixels and 128 bits of information. Since it is rendered line by line, only one row of tile information is stored, so shifter 16 bits is enough.

  • Two 8-bit shift registers, which I’ll call attribute_shifter, hold the corresponding Atrribute information.

Here’s how these hardware functions during rendering:

As mentioned earlier, render is rendered pixel by pixel in a Z shape. For a typical NTSC system there are 262 scanlines, of which 240 are visible, and each Scanline lasts 341 clock cycles, during which data is constantly fetched and rendered. Instead of explaining what each Scanline and clock cycle does, let’s look at the overall rendering process of the background.

Rendering a background pixel requires 4 bits of color information, and the rendering process is essentially taking those 4 bits of color information. How do you get it?

The PPU retrieves the address of the tile index of the pixel from v, takes the tile and stores it in the pattern_shifter register. The tile’s attribute information is then placed in the attribute_shifter register, so that all four bits of color information for a pixel are complete.

During the first 256 cycles of each Scanline, the shifter register is shifted 1 bit to the left every cycle, the next tile is loaded into the Shifter register every 8 cycles, and the pixel to render is selected based on fine_x. Here’s an example:

0x2005 sets the scrolling address and shifter register, and attribute_shifter does the same thing for attribute_shifter. This is based on the source code of the simulator, there seems to be no detailed manual on this aspect, if there is any mistake, please point out.

If you write to 0x2005 twice in a row, you can select a pixel of a NameTable in the upper left corner of the screen.

When we write to 0x2005 twice, we’re essentially writing the address of a pixel of a NameTable to T, which will be copied to V during rendering (we’ll talk about that later), so the tile that we get from v for the first time after we write 0x2005 is what we set. So I’ll put it in the top left corner of the screen. Each time the address in v is read from the tile index, it automatically adds 1 to the next tile, and so on, 960 tiles and a frame of background are rendered in a loop.

To summarize the background rendering process, get 2 bits of color information from the PatternTable to the pattern_shifter register, based on the tile address recorded in v. Then 2bit color information is retrieved from the AttributeTable to the Attribute_shifter register, and the pixel color information is selected from the Shifter register according to fine_X

The elves

For sprites, there’s all this hardware

  • Primary OAM: 256 bytes, 64 sprites per frame
  • Secondary OAM, 8 sprites supported by the scan line currently being rendered
  • The 8-bit shift registers hold Sprite tiles on the scan line currently being rendered
  • Eight latches store eight Sprite attributes
  • 8 counters that record the X coordinates of 8 sprites

Save tile pattern information to pattern_shifter and attribute information to latch the same as the background, just change the name latch other basically no different, also don’t need to know so much about it, interested in my background can reply NES to obtain the PPU manual.

Render is rendered line by line. The x coordinates of each row of pixels are in the range of [0, 255]. The x coordinates stored in the counter decrease by 1 every cycle, so when a counter decreases to 0, the Sprite is rendered.

For Sprite rendering, the overall process is much the same as for the background, mainly getting 4 bits of color information per pixel, except that the Shifter register is not active until the counter is zero (it moves left every cycle).

Fetching data into shifter requires an address, which is not in V but in the Sprite entry OAM (Primary OAM when rendering). After retrieving the address of the tile index, we fetch the tile pattern information and store it in the pattern_shifter register. Then we simply fetch the attribute information, directly from the OAM.

Now we have the 4bit color information of the Sprite and the 4bit color information of the background, and then we will compete which one to output. Of course, only when the background and Sprite overlap, there will be a competition, as follows:

If only background, output background

If the background pixel and Sprite pixel overlap:

The number indicates which color in the Pallete to use. The zero color is the same for both the background and the Sprite, and can be regarded as a universal background color for the background and transparent color for the Sprite. Priority is an attribute bit in the Sprite entry.

Good this article first said so much, this article mainly describes the memory mapping of a few registers and a few internal registers, in addition to a brief analysis of scrolling and rendering, rendering after the details of each cycle, as well as some advanced gameplay on scrolling.

  • First public number: Rand_cs, for attention to support