The 8237 DMA can be operated in several modes. The main ones are:
A single byte (or word) is transferred. The DMA must release and re-acquire the bus for each additional byte. This is commonly-used by devices that cannot transfer the entire block of data immediately. The peripheral will request the DMA each time it is ready for another transfer.
The floppy disk controller only has a one-byte buffer, so it uses this mode.
Once the DMA acquires the system bus, an entire block of data is transferred, up to a maximum of 64K. If the peripheral needs additional time, it can assert the READY signal. READY should not be used excessively, and for slow peripheral transfers, the Single Transfer Mode should be used instead.
The difference between Block and Demand is the once a Block transfer is started, it runs until the transfer count reaches zero. DRQ only needs to be asserted until -DACK is asserted. Demand Mode will transfer one more bytes until DRQ is de-asserted, then when DRQ is asserted later, the transfer resumes where it was suspended.
Older hard disk controllers used Demand Mode until CPU speeds increased to the point that it was more efficient to read the data using the CPU.
This mechanism allows a DMA channel to request the bus, but then the attached peripheral device is responsible for placing addressing information on the bus. This is also known as ``Bus Mastering''.
When a DMA channel in Cascade Mode receives control of the bus, the DMA does not place addresses and I/O control signals on the bus like it normally does. Instead, the DMA only asserts the -DACK signal for this channel.
Now it is up to the device connected to that DMA channel to provide address and bus control signals. The peripheral has complete control over the system bus, and can do reads and/or writes to any address below 16Meg. When the peripheral is finished with bus, it de-asserts the DRQ line, and the DMA controller can return control to the CPU or to some other DMA channel.
Cascade Mode can be used to chain multiple DMA controllers together, and this is exactly what DMA Channel 4 is used for in the PC. When a peripheral requests the bus on DMA channels 0, 1, 2 or 3, the slave DMA controller asserts HLDREQ, but this wire is actually connected to DRQ4 on the primary DMA controller. The primary DMA controller then requests the bus from the CPU using HLDREQ. Once the bus is granted, -DACK4 is asserted, and that wire is actually connected to the HLDA signal on the slave DMA controller. The slave DMA controller then transfers data for the DMA channel that requested it, or the slave DMA may grant the bus to a peripheral that wants to perform its own bus-mastering.
Because of this wiring arrangement, only DMA channels 0, 1, 2, 3, 5, 6 and 7 are usable on PC/AT systems.
Note: DMA channel 0 was reserved for refresh operations in early IBM PC computers, but is generally available for use by peripherals in modern systems.
When a peripheral is performing Bus Mastering, it is important that the peripheral transmit data to or from memory constantly while it holds the system bus. If the peripheral cannot do this, it must release the bus frequently so that the system can perform refresh operations on memory.
Since memory read and write cycles ``count'' as refresh cycles (a refresh cycle is actually an incomplete memory read cycle), as long as the peripheral controller continues reading or writing data to sequential memory locations, that action will refresh all of memory.
Bus-mastering is found in some SCSI adapters and other high-performance peripheral cards.
This mode causes the DMA to perform Byte, Block or Demand transfers, but when the DMA transfer counter reaches zero, the counter and address is set back to where they were when the DMA channel was originally programmed. This means that as long as the device requests transfers, they will be granted. It is up to the CPU to move new data into the fixed buffer ahead of where the DMA is about to transfer it for output operations, and read new data out of the buffer behind where the DMA is writing on input operations. This technique is frequently used on audio devices that have small or no hardware ``sample'' buffers. There is additional CPU overhead to manage this ``circular'' buffer, but in some cases this may be the only way to eliminate the latency that occurs when the DMA counter reaches zero and the DMA stops until it is reprogrammed.