Multiply; Divide; Square Root; Abs(a-b); Negate - 503 bytes (3%)
- §1. Unsigned divide (24 bit / 16 bit)
- §2. Absolute difference (16 bit)
- §3. Negation (16 bit)
- §4. 16 bit multiply
- §5. 24 bit multiply
- §6. 32 bit square root
- §7. 48 bit square root
§1. Unsigned divide (24 bit / 16 bit).
24 bit dividend over 16 bit divisor, result is 24 bit quotient On Entry: X: VDU variable for 24 bit dividend Y: VDU variable for 16 bit divisor On Exit: 24 bit quotient result has overwritten the dividend (at VDU variable X) carry: if set means a divide by zero error
.divide24by16bits = $bdd0 Copy variables into zero page workspace STX .vduTempStoreDE remember VDU variable for dividend Copy 24 bit dividend into zero page LDA .vduVariablesStart,X read dividend STA .dividend0 store in zp LDA .vduVariablesStart+1,X read dividend STA .dividend1 store in zp LDA .vduVariablesStart+2,X read dividend STA .dividend2 store in zp Copy 16 bit divisor into zero page LDA .vduVariablesStart,Y read divisor STA .divisor1 store in zp LDA .vduVariablesStart+1,Y read divisor STA .divisor2 store in zp Keep doubling divisor until the top bit is set LDX #8 X is 8 + number of times divisor is doubled LDY #16 loop counter, loop up to 16 times LDA .divisor2 BMI .divideMain branch if done already - INX ASL .divisor1 } scale up 16 bit divisor ROL .divisor2 } BMI .divideMain branch if done DEY BNE - If we get this far, it must be a zero divisor - so error out! SEC carry set = divide by zero error JMP .divideCopyBackQuotientResult .divideMain = $be09 At this point, X is the loop counter. i.e. We loop for X = 8 + number of times the divisor has been scaled up LDA #0 STA .divisor0 STA .quotient0 zero quotient (result) STA .quotient1 STA .quotient2 .divideLoop = $be17 temp = dividend - divisor SEC LDA .dividend0 dividend SBC .divisor0 subtract divisor STA .divideTemp0 store temp LDA .dividend1 dividend SBC .divisor1 subtract divisor STA .divideTemp1 store temp LDA .dividend2 dividend SBC .divisor2 subtract divisor STA .divideTemp2 store temp BCC .divideSkip dividend = temp LDY #2 loop counter, loop 3 times to set 3 byte dividend - LDA .divideTemp0,Y temp STA .dividend0,Y DEY BPL - .divideSkip = $be40 ROL .quotient0 shift quotient ROL .quotient1 ROL .quotient2 LSR .divisor2 shift divisor ROR .divisor1 ROR .divisor0 DEX BPL .divideLoop CLC carry clear = valid result .divideCopyBackQuotientResult = $be56 Write quotient back into dividend VDU variables LDX .vduTempStoreDE recall VDU variable for dividend LDA .quotient0 and write quotient into it STA .vduVariablesStart,X LDA .quotient1 STA .vduVariablesStart+1,X LDA .quotient2 STA .vduVariablesStart+2,X RTS
§2. Absolute difference (16 bit).
Get the absolute value of the difference between VDU variables Y and X, and store it in VDU variable A. vdu[A] = abs(vdu[Y] - vdu[X]) On Entry: X,Y: VDU variables On Exit: vdu[A] carry is set if vdu[Y] < vdu[X]
.absDifference16 = $be6b STA .vduTempStoreDF remember VDU variable for the result SEC LDA .vduVariablesStart,Y SBC .vduVariablesStart,X STA .vduTempStoreDE .vduTempStoreDE = vdu[Y]-vdu[X] (low) LDA .vduVariablesStart+1,Y SBC .vduVariablesStart+1,X LDX .vduTempStoreDF recall VDU variable for the result STA .vduVariablesStart+1,X store high byte of result ROL carry = top bit (negative flag) LDA .vduTempStoreDE STA .vduVariablesStart,X store low byte of result BCC .return11 if not negative then branch (return) Negate result LDA #0 SBC .vduVariablesStart,X STA .vduVariablesStart,X LDA #0 SBC .vduVariablesStart+1,X STA .vduVariablesStart+1,X SEC .return11 = $be9a RTS
the 16 bit value in VDU variable X and place the result in VDU variable Y On Entry: X,Y are the offsets into the VDU variables to negate
.negateVDUVariableXIntoY = $be9b SEC LDA #0 SBC .vduVariablesStart,X STA .vduVariablesStart,Y LDA #0 SBC .vduVariablesStart+1,X STA .vduVariablesStart+1,Y RTS
16 bit x 16 bit unsigned multiply, 32 bit result. This is implemented by calling a 24x24 bit multiply with the top bytes zero. [NOTE: This is far from efficient in performance!] On Entry: .multiplicand01: 16 bit number .multiplier01: 16 bit number On Exit: product: 48 bit result (top two bytes zero), shares memory with multiplier
.multiply16x16 = $bead LDA #0 STA .multiplier2 STA .multiplicand2 fall through...
24 bit x 24 bit unsigned multiply, 48 bit result On Entry: .multiplicand012: 24 bit number .multiplier012: 24 bit number On Exit: product: 48 bit result, shares memory with multiplier
.multiply24x24 = $beb5 LSR .multiplier2 } ROR .multiplier1 } rotate 24 bit multiplier right ROR .multiplier0 } Carry contains the lowest bit of the multiplier rotated out LDA #0 } STA .product5 } zero the top half of the result STA .product4 } STA .product3 } LDY #23 loop counter: loops 24 times .multiplyLoop = $becb BCC .skipAdd carry contains the lowest bit from the multiplier. branch if not set CLC LDA .multiplicand0 ADC .product3 STA .product3 LDA .multiplicand1 ADC .product4 product345 += multiplicand STA .product4 LDA .multiplicand2 ADC .product5 STA .product5 .skipAdd = $bee9 Shift the 48 bit result right CLC [BUG: this CLC instruction should be removed to make top bit set numbers produce the correct result] LDX #5 loop counter: loops 6 times - ROR .product0,X DEX BPL - Carry contains the lowest bit of the multiplier rotated out (remember that the multiplier shares memory with product) DEY BPL .multiplyLoop RTS
Find square root of a 32 bit unsigned integer. [NOTE: This just falls through into the 48 bit square root function with the high bytes zero. This is not a good strategy for the best performance.] On Entry: .sqrtNumber0123: number to sqrt (32 bits) On Exit: .sqrtResult01: root (16 bits) (.sqrtRemainder0123: 32 bit remainder is unused)
.sqrt32 = $bef6 LDA #0 STA .sqrtNumber4 STA .sqrtNumber5 fall through...
Find square root of a 48 bit unsigned integer. On Entry: sqrtNumber012345: number to sqrt (48 bits) On Exit: sqrtResult012: root (24 bits) (.sqrtRemainder0123: 32 bit remainder is unused)
.sqrt48 = $befe LDA #0 STA .sqrtResult0 STA .sqrtResult1 STA .sqrtResult2 STA .sqrtRemainder0 STA .sqrtRemainder1 STA .sqrtRemainder2 STA .sqrtRemainder3 LDX #24 loop counter .sqrt_loop = $bf17 Shift number into remainder ASL .sqrtNumber0 ROL .sqrtNumber1 ROL .sqrtNumber2 ROL .sqrtNumber3 ROL .sqrtNumber4 ROL .sqrtNumber5 ROL .sqrtRemainder0 ROL .sqrtRemainder1 ROL .sqrtRemainder2 ROL .sqrtRemainder3 Shift number into remainder ASL .sqrtNumber0 ROL .sqrtNumber1 ROL .sqrtNumber2 ROL .sqrtNumber3 ROL .sqrtNumber4 ROL .sqrtNumber5 ROL .sqrtRemainder0 ROL .sqrtRemainder1 ROL .sqrtRemainder2 ROL .sqrtRemainder3 temp = root*2 LDA .sqrtResult0 ASL STA .sqrtTemp0 LDA .sqrtResult1 ROL STA .sqrtTemp1 LDA .sqrtResult2 ROL STA .sqrtTemp2 LDA #0 ROL STA .sqrtTemp3 temp = temp*2 +1 SEC ROL .sqrtTemp0 ROL .sqrtTemp1 ROL .sqrtTemp2 ROL .sqrtTemp3 Which is bigger, temp or remainder? SEC temp = remainder - temp LDA .sqrtRemainder0 SBC .sqrtTemp0 STA .sqrtTemp0 LDA .sqrtRemainder1 SBC .sqrtTemp1 STA .sqrtTemp1 LDA .sqrtRemainder2 SBC .sqrtTemp2 STA .sqrtTemp2 LDA .sqrtRemainder3 SBC .sqrtTemp3 STA .sqrtTemp3 If temp > remainder then branch BCC .sqrt_next if (carry clear) then branch (forward) remainder = temp STA .sqrtRemainder3 LDA .sqrtTemp2 STA .sqrtRemainder2 LDA .sqrtTemp1 STA .sqrtRemainder1 LDA .sqrtTemp0 STA .sqrtRemainder0 .sqrt_next = $bfb7 root = root * 2 + carry ROL .sqrtResult0 shift result left and add carry ROL .sqrtResult1 ROL .sqrtResult2 DEX BEQ + if (done 24 times) then branch JMP .sqrt_loop loop back + RTS !if MACHINE = ELECTRON { Make it relatively obvious this isn't an original 1980s Acorn version for the Electron. !text "Steve 2020" } .unused = $bfc7 skip to $bfdb, filling with zeros !align $ffff, $bfdb, 0