The following was posted in comp.os.cpm by Fred Scipione in January
2005. Fred is referring to Bruce's floppy disk routines, in this package,
which is distributed by me, Herb Johnson. I will refer any questions or
comments to Fred,  or Bruce, if you send them to me.

For more information, please check my Web site:

http://retrotechnology.com/herbs_stuff/s_drives.html

Herb Johnson

---------------------------------------------------------------

Reading Bruce's code has inspired me to share the details on
using loop un-rolling to accommodate 2 MHz Z80's.  This allows
'wait-on-read' programmed I/O floppy controllers to be run (with
the proper timing margins to accommodate the speed variations
when different drives are used for writing and reading).  No code
changes are needed for faster clocks :-).  The sibling code for
write transfers and formatting is left as an exercise for the
reader.  Similar arguments apply for 1MHz CPUs and single density
or other 250k bps arrangements.

The following assembly code is best viewed with a fixed-width font -

; Suppose you want the same (ROM) code for both 8080 and Z80 CPUs.
; The necessary tight loop timing for DD disk reads on a 2.0 MHz
; Z80 (w/ 1 clock min. wait for input) can be accomplished through
; loop un-rolling and careful attention to timing.
;
; Using a +/-3% speed tolerance, the read rate for a disk written
; on a slow drive and read on fast drive will be 16uS/byte * 94%
; = 15.04uS/byte.  Any loop that averages 15.0 uS/byte or better
; will stay synched through the input read 'wait' signal.
;
; The floppy controller buffers the read bytes, allowing nearly
; one full byte interval for read margin before under-runs.  It
; is prudent to subtract 2 floppy bit times (or 4uS) and 8 CPU
; clocks from the minimum byte interval to allow for the
; controller overhead and the CPU buss read state duration w/
; wait variations.  Thus, with 4x loop un-rolling, the maximum
; allowed interval from a synched read to a delayed read over
; 'n' bytes is n*15uS - 4uS - 8/clock_rate.
;
; To accommodate 4x loop un-rolling, sector size must be a
; multiple of 4 (which is always the case).  Thus, register 'B'
; is loaded w/ 128 or with 0 for 256 bytes per sector if the
; 'Normal:' entry is used.  For 512 or 1024 bytes per sector,
; set B to 127 or 255 and enter at 'Special:'.
;
; For 128 bytes per sector, this code can be used to bypass a
; BIOS buffer and transfer directly to a user DMA target at any
; address.
;
Normal:           ; enter here for 128 or 256 bytes/sector
; Adjust B for 4 pre-loop reads and loop unroll to 4x -
  MOV  A,B
  RAR
  RAR
  DCR  A
  ANI  03Fh
  MOV  B,A
Special:          ; enter here w/ special values for B
; 4 pre-loop reads to insure CPU synched to floppy on loop entry -
  IN   fdcport    ;FDC port
  MOV  M,A        ;sector buffer
  INX  H          ;next loc.
Strange3:         ; enter here w/ B set for 4x + 3 bytes
  IN   fdcport    ;FDC port
  MOV  M,A        ;sector buffer
  INX  H          ;next loc.
Strange2:         ; enter here w/ B set for 4x + 2 bytes
  IN   fdcport    ;FDC port
  MOV  M,A        ;sector buffer
  INX  H          ;next loc.
Strange1:         ; enter here w/ B set for 4x + 1 bytes
  IN   fdcport    ;FDC port
  MOV  M,A        ;sector buffer
;Timing w/ 1 clock min. wait on input reads -
Floppy$byte:      ;              clocks     @2mhz      @4mhz
  IN   fdcport    ;FDC port      11 D      5.50 uS    2.75 uS
  INX  H          ;next loc.      7 E      3.50 uS    1.75 uS
  MOV  M,A        ;sector buffer  7 F      3.50 uS    1.75 uS
  INX  H          ;next loc.      7 G      3.50 uS    1.75 uS
  IN   fdcport    ;FDC port      11 H      5.50 uS    2.75 uS
  MOV  M,A        ;sector buffer  7        3.50 uS    1.75 uS
  INX  H          ;next loc.      7        3.50 uS    1.75 uS
  IN   fdcport    ;FDC port      11+w      5.50 uS    2.75 uS
  MOV  M,A        ;sector buffer  7        3.50 uS    1.75 uS
  INX  H          ;next loc.      7        3.50 uS    1.75 uS
  IN   fdcport    ;FDC port      11+w      5.50 uS    2.75 uS
  MOV  M,A        ;sector buffer  7 A      3.50 uS    1.75 uS
  DCR  B          ;byte counter   5 B      2.50 uS    1.25 uS
  JNZ  floppy$byte               10 C      5.00 uS    2.50 uS
;
;Loop Total                     115       57.50 uS   28.75 uS
;Loop per-byte avg.              28.75    14.38 uS    7.19 uS
;Max delay of read (A..H/A..D)   65/33    32.50 uS    8.25 uS
;Max allowed @ 15uS/byte-4uS-8clks        37.00 uS      N.A.
;Min under-run margin of delayed reads     4.50 uS      N.A.
;
; end/tail of read routine goes here
;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;
; Note that 2x loop un-rolling can be used for 128 and 256
; byte sectors, if the BIOS sector buffer is aligned to end on
; a page boundary and 'ICR L' is used to replace both 'INX H'
; and 'DCR B' (to decrease the loop overhead).  Sectors of
; 512 and 1024 bytes can be handled by a series of 2 or 4 such
; loops with a little 'glue' code between loops.  The net code
; size is about 1/2 for 128+256 byte sectors, slightly larger if
; 512 bytes is added, and about 2x larger w/ 1024 bytes included.
; Hardware with weird sector sizes can be accomodated through
; special entry values for HL at the proper entry points.
;
; A 4x loop un-roll with this method can be used for a 1/2
; color-crystal CPU clock speed of 1.79MHz.
;
; At 23 clocks per byte, total loop un-rolling would support a
; 1.64MHz clock rate, but require 1024 bytes of code for 128
; byte sectors.
;
Rd1024:            ; enter here for 1024 byte sectors
; 2 pre-loop reads to insure CPU synched to floppy on loop entry -
  IN   fdcport    ;FDC port
  MOV  M,A        ;sector buffer
  ICR  L          ;next loc.
Weird3:           ; entry for odd sizes 771..1023
  IN   fdcport    ;FDC port
  MOV  M,A        ;sector buffer
  ICR  L          ;next loc.
Loop8x:
   IN  fdcport    ;FDC port
   MOV M,A        ;sector buffer
   ICR L          ;next loc.
Weird3a:          ; entry for odd size 769?
   IN  fdcport    ;FDC port
   MOV M,A        ;sector buffer
   ICR L          ;byte counter
  JNZ  Loop8x
  ; use 2 reads in-line to limit delay while advancing H -
Weird2a:          ; entry for even sizes 516..768?
  IN   fdcport    ;FDC port (33 clock delay here)
  ICR  H          ;next page
  MOV  M,A        ;sector buffer
Weird2:           ; entry for odd sizes 515..767
  IN   fdcport    ;FDC port (33 + 23 ==> re-synched)
  ICR  L          ;next loc.
  MOV  M,A        ;sector buffer
  ICR  L          ;next loc.
Loop6x:
   IN  fdcport    ;FDC port
   MOV M,A        ;sector buffer
   ICR L          ;next loc.
Weird2b:          ; entry for odd size 513?
   IN  fdcport    ;FDC port
   MOV M,A        ;sector buffer
   ICR L          ;byte counter
  JNZ  Loop6x
  ; use 2 reads in-line to limit delay while advancing H -
  IN   fdcport    ;FDC port (33 clock delay here)
  ICR  H          ;next page
  MOV  M,A        ;sector buffer
  IN   fdcport    ;FDC port (33 + 23 ==> re-synched)
  ICR  L          ;next loc.
  MOV  M,A        ;sector buffer
  ICR  L          ;next loc.
  ; fall through to 512 byte entry -
;
Rd512:            ; enter here for 512 byte sectors
; 2 pre-loop reads to insure CPU synched to floppy on loop entry -
  IN   fdcport    ;FDC port
  MOV  M,A        ;sector buffer
  ICR  L          ;next loc.
Weird1:           ; entry for odd sizes 259..511
  IN   fdcport    ;FDC port
  MOV  M,A        ;sector buffer
  ICR  L          ;next loc.
Loop4x:
   IN  fdcport    ;FDC port
   MOV M,A        ;sector buffer
   ICR L          ;next loc.
Weird1a:          ; entry for odd size 257?
   IN  fdcport    ;FDC port
   MOV M,A        ;sector buffer
   ICR L          ;byte counter
  JNZ  Loop4x
  ; use 2 reads in-line to limit delay while advancing H -
  IN   fdcport    ;FDC port (33 clock delay here)
  ICR  H          ;next page
  MOV  M,A        ;sector buffer
  IN   fdcport    ;FDC port (33 + 23 ==> re-synched)
  ICR  L          ;next loc.
  MOV  M,A        ;sector buffer
  ICR  L          ;next loc.
  ; fall through to 256 byte entry -
;
Read2x:           ; enter here for 128 and 256 byte sectors
; 2 pre-loop reads to insure CPU synched to floppy on loop entry -
  IN   fdcport    ;FDC port
  MOV  M,A        ;sector buffer
  ICR  L          ;next loc.
OddSize:          ; enter here with HL set for odd sizes 3..255
  IN   fdcport    ;FDC port
  MOV  M,A        ;sector buffer
  ICR  L          ;next loc.
;Timing w/ 1 clock min. wait on input reads -
Loop2x:           ;              clocks     @2mhz      @4mhz
  IN   fdcport    ;FDC port      11 D      5.50 uS    2.75 uS
  MOV  M,A        ;sector buffer  7        3.50 uS    1.75 uS
  ICR  L          ;next loc.      5        2.50 uS    1.75 uS
  IN   fdcport    ;FDC port      11+w      5.50 uS    2.75 uS
  MOV  M,A        ;sector buffer  7 A      3.50 uS    1.75 uS
  ICR  L          ;byte counter   5 B      2.50 uS    1.25 uS
  JNZ  Loop2x     ;              10 C      5.00 uS    2.50 uS
;
;Loop Total                      56       28.00 uS   14.00 uS
;Loop per-byte avg.              28       14.00 uS    7.00 uS
;Max delay of read (A..D)        33       16.50 uS    8.25 uS
;Max allowed @ 15uS/byte-4uS-8clks        22.00 uS      N.A.
;Min under-run margin of delayed reads     5.50 uS      N.A.
;
; end/tail of read routine goes here
; 