

		Graphics: Tile Accelerator and 3D rendering unit
		------------------------------------------------
		                v0.3, 2001-03-16


Note: This article is very beta right now.



Table of Contents
-----------------

0.	Introduction
1.	Device operation
1.1	TA operation
1.2	3D rendering unit operation
2.	TA Commands
2.1	TA Command 0: End-of-List
2.2	TA Command 1: Tileclip
2.3	TA Command 4: Striphead
2.4	TA Command 7: Vertex



0. Introduction
---------------

  The 3D rendering unit wants the rendering primitives sorted in two levels:
Translucent and opaque primitives should be separated, and for each
32x32-pixel region (tile) in the output buffer, there should be a list of
potentially covering primitives.
  A tile contains one such list of potentially covering primitives for each
primitive type (Primitive Pointer-list, or PP-list). The set of tiles to
be rendered are organized as a Tile Array.

  The 3D rendering unit takes a Tile Array, and renders the contents tile
by tile to the output buffer. Since each tile contains a set of lists of
which primitives potentially cover the tile, the 3D rendering unit will only
consider those primitives when rendering the tile.

  The Tile Accelerator (TA) accepts rendering primitives from the main CPU,
and performs most of the necessary preprocessing: the primitives are
reformatted and stored in graphics memory, and tile-coverage tests are
performed before building PP-lists to the primitives.
  The only restriction is that all primitives of one type (opaque or
translucent) must be submitted in one go, before primitives of another type
can be submitted.
  Normally, the Tile Array would be set up (by the CPU) such that it points
to the PP-lists created by the TA. Once that is done, the CPU can submit
primitives to the TA, wait for completion, and then start the 3D rendering
unit to render the scene.

  It is not necessary to use the TA to build all the lists; the CPU can
write the PP-lists and the primitives to graphics memory itself, and then
tell the 3D rendering unit to render the scene, but using the TA is
(barring exceptional circumstances) faster.
(In old PowerVR hardware, there was no TA... the graphics card's driver
 built the Tile Arrays using the CPU instead.)



1. Device operation
--------------------


1.1 TA operation
----------------

  The user sends commands to the TA by submitting 32-byte blocks of data to
a certain address range, either using Store Queues or using DMA. These blocks
can either contain control commands, which change the internal TA state,
or vertex commands, which cause the TA to write out triangle strips.

  A striphead command sets render modes for subsequent triangle strips.
Once a striphead has been submitted to the TA, zero or more triangle strips
can be submitted.

  Submitting a triangle strip is done by repeatedly submitting the Vertex
command, last time with the End-of-strip flag set. Each vertex command
contains data for one vertex.

  The TA will reformat the vertex data before storing it in graphics
memory. Each triangle strip will be prepended with a reformatted striphead
as well.
  If too long triangle strips are submitted, the TA will automatically
split them into separate, smaller triangle strips. The maximum number
of vertices in a triangle strip is selectable.

  Once a full triangle strip has been stored in graphics memory, the TA
calculates the bounding box of the strip, and adds the strip to the
PP-lists covered by the bounding box. (Tiles can be excluded using the
tileclip function.)

  When all primitives of the current type have been submitted, an
end-of-list command should be submitted. The TA will then flush its
internal buffers and append end-markers to all the lists for that type.
After that, another list type can be begun by submitting a striphead
with the new type selected.

  Several areas need to be specified to the TA before submitting
any primitives: one allocation area for primitives, another
allocation area for PP-blocks (PPs are allocated in blocks of
8, 16 or 32 PPs at a time), and a base address for an array of starting
PP-blocks (the PP-matrix).
  The PP-matrix contains the leading PP-block for each tile for each
type, such that the Tile Array can have its pointers straight into the
PP-matrix and everything works automagically.



1.2 3D rendering unit operation
-------------------------------

  Once the 3D rendering unit is told where the Tile Array is located
and started, it will iterate through all the tiles in the Tile Array
and render them one by one.

  Each tile has pointers to five different PP-lists; one with opaque
primitives, one with translucent primitives, and three with as of
yet unknown primitives. The 3D rendering unit will walk each of these
five lists (unless they are empty), and process the encountered
primitives.

  Vertical scaling is applied to the Y-coordinates of the primitives
at this stage.

  The area of each triangle is calculated. If it is smaller than the
minimum allowed triangle area and small culling is enabled,
or if the triangle is facing the wrong way, the triangle is ignored.
Otherwise, it proceeds to the second phase in the 3D rendering
unit where it is sorted against other triangles. Pixels hidden by
opaque primitives are also removed. (This is probably accomplished
using an S-buffer-like algorithm.)

  Fragments which survive through the culling, depth sorting, and
hidden surface removal are rendered with a traditional pixel pipeline
to an internal pixel buffer, 32x32 pixels in size (probably ARGB8888
format).
  If any pixels in the internal pixel buffers are not covered by any
fragments, the background plane will be rendered there.
  The pixel pipeline is supposedly able to handle at least gouraud,
texturemapping/bumpmapping, fogging, and some alphablending.

  Finally the contents of the internal pixel buffer are to be copied
out to external graphics memory.
  If horizontal scaling is enabled, the contents of the internal
pixel buffer are compressed to half its width, thereby giving 2X
supersampling horizontally.
  The area to be written is clipped against the output window
coordinates.
  If any part of the output area passes the output window clipping
test, pixels from the internal pixel buffer are converted to the
destination pixelformat and written to graphics memory.


  If vertical scaling is set to anything other than 1.0, a smoothing
filter is applied vertically over the data. The filter removes much
of the flickering of PAL/NTSC interlace modes, so it is therefore
a good idea to set vertical scaling to a little higher or lower than
1.0 in 15kHz modes.



2. TA Commands
--------------

  Commands are sent in 32-byte chunks. The first longword contains
TA control information, of which the uppermost 3 bits determines
the command.

	ccc----- -------- -------- --------

	c - Command:
	    0 - End of List
	    1 - Tileclip specification
	    4 - Striphead
	    7 - Vertex
	    others are yet unknown



2.1 TA Command 0: End of List
-----------------------------

	000----- -------- -------- --------
	(7 longwords, ignored)

  This command informs the TA that there will be no more primitives of the
current type. The TA will flush its internal buffers, and write the part of
the PP-matrix which corresponds to the current type.

  Do not send this command without first having sent a striphead which selects
a primitive type.



2.2 TA Command 1: Tileclip
--------------------------

	001----- -------- -------- --------
	(3 longwords, ignored)
	-------- -------- -------- xxxxxxxx	xmin [in tiles]
	-------- -------- -------- yyyyyyyy	ymin [in tiles]
	-------- -------- -------- xxxxxxxx	xmax [in tiles]
	-------- -------- -------- yyyyyyyy	ymax [in tiles]

  This command specifies a new tileclip-region. (xmin,ymin)-(xmax,ymax) form
a rectangular area of tiles. Triangle strips which have tileclip enabled
will be clipped against this rectangle.



2.3 TA Command 4: Striphead
---------------------------

       100--ttt ---sssaa -------- uuccuug-

       t - List type (0 .. 4), 0 = Opaque, 2 = Translucent
       s - Tristrip-splitting mode:
	   0..3 - Don't change
	   4    - Split after 3 stripverts [Default]
	   5	- Split after 4 stripverts
	   6	- Split after 6 stripverts
	   7	- Split after 8 stripverts
       a - Tile-clipping Accept mode:
	   0 - ALL - Accept for all tiles
	   1 - NONE - Reject for all tiles (nothing will be put into PP-lists)
	   2 - INSIDE - Accept for tiles inside tileclip region
	   3 - OUTSIDE - Accept for tiles outside tileclip region
       u - Unknown, must be 0 (otherwise polygon disappears)
       c - Vertex colour specification:
           0 - Packed ARGB8888 in each vertex
	   1 - Floating-point ARGB in each vertex
	   2 - Intensity, specified as floating-point ARGB in striphead (turn off Gouraud, otherwise * floating-point G in each vertex)
	   3 - Intensity from previous striphead (turn off Gouraud, otherwise * floating-point G in each vertex)
       g - Gouraud enable; set to 1 if doing gouraud, or 0 if doing intensity

       dddccz-- -------- -------- --------

       d - 1/z Depth compare operation:
           0 - never
	   1 - always
	   2 - equal
	   3 - less-or-equal
	   4 - greater
	   5 - notequal
	   6 - greater-or-equal
	   7 - always
       c - Culling operation:
           0 - none
	   1 - Small
	   2 - Small + Counterclockwise
	   3 - Small + Clockwise
       z - Disable Z-writes


       sssddduu ffca---- -------- --------

       s - Source blending mode:
	   0 - Zero
	   1 - One
	   4 - SrcAlpha
	   5 - InvSrcAlpha
	   other - Unknown
       d - Destination blending mode:
	   0 - Zero
	   1 - One
	   4 - SrcAlpha
	   5 - InvSrcAlpha
	   other - Unknown
       u - Unknown, must be 0
       f - Fog mode:
           0 - Table fog
	   1 - No fog?
	   2 - No fog
	   3 - Fog?
       c - Enable colour clamp against (colorclampmin, colorclampmax)
       a - Disable interpolated alpha channel (override with 1.0)



2.4 TA Command 7: Vertex
------------------------

	111e---- -------- -------- --------
	(format of vertex depends on striphead)

	e - Last vertex in triangle strip if set


  Specifies one stripvertex. X and Y coordinates are given as 2d screen
coordinates; the Z value should really be 1/Z. ARGB can be given either
as ARGB8888, or as four floats.
