So far we've
- Reversed a string
- Shifted bits left and right (mostly right)
- Brute forced a known simple cipher
- Channeled our inner John Connor by cracking ATM machines
- Searched through a large number looking for matches
- Learned how to use XOR to recover data
- Created a Linear Feedback Shift Register to generate better random numbers
- Implemented a "real" but outdated cipher: RC4
Disclaimer: I'm sure I'll get an angry email from a mathematician about this, but we're going for understanding here. No one expects to be looking for large primes on a Commodore 64. Calm down.
A primer on primes
A prime number is a natural number that has exactly two distinct natural number divisors: the number 1 and itself.
Primes are used in all of the modern secure ciphers I can think of. But why?
Short short version: Primes are useful for cryptography because two large primes are computationally "easy" to multiply together, but extremely difficult to determine the factors when you want to go the other direction.
Many methods use pre storage of an array to mark results on. We're limited to 64k of RAM unless we use an REU, and even the largest of those is 16MB (limited by the number of banks which is stored in a single byte so
256*64k=16MB). We'll need to do a little more work and just loop through values and store them once we find them unless we want to limit the total number we can look for. We'll lose some speed but gain the ability to go as high as we want assuming we had enough time to wait for a 1Mhz processor to do this work.
Since we're in learning all the things mode, let's just try and work it out on our own and see where we end up.
Let's find the primes between 0 and 224.
What do we know?
- We know that 2 is the only even prime, so let's just count it and process from 3 and up.
- We can skip all even numbers other than 2.
- We can look through all divisors from 3 to the number we're testing for and check if there is a remainder. If we find none that divide evenly, our number is a prime.
- In step 3 we can save a lot of these checks by reducing them to the square root. Anything greater than the root would be a multiple of an integer we already checked.
This is easy to understand and it's obviously pretty computationally labor intensive. Let's do it in BASIC so it'll be super slow!
We're in Simon's BASIC again. I actually really like it and we don't need much RAM for any of these problems so far, so the missing 8K it takes up isn’t an issue. Plus it was written by a teenager, David Simons which just makes me love it more. Someone suggested I do a mod function and load it into the USR vector instead of using Simon's BASIC for these.
You CAN do
This looks really messy to me. You'd have to GOSUB this after assigning A and B every time. And GOSUB is pretty slow. But if you need to be pure BASIC 2.0, then that's one option.
Implement our labor intensive, er of love program
Using the 4 steps above, we get something like:
100 C=0:N=0 110 FOR N=1TO10000 200 IF N < 1 THEN PRINT N;:GOTO 300 210 IF N = 2 THEN PRINT N;:C=C+1:GOTO300 220 IF N > 2 AND MOD(N,2)=0 THEN GOTO 300 230 P=INT(SQR(N)) 240 FOR I= 3 TO P STEP 2 250 IF MOD(N,I)=0 THEN GOTO 300 260 NEXT I 270 C=C+1:PRINT N 300 NEXT N 400 PRINT:PRINT"PRIMES: ";:PRINT C
Let's break it down in case it's not clear.
Line 210 - 2 is the only even prime. Count it.
Line 220 - Skip all even numbers greater than 2
Line 230 - We are iterating over all of the possible odd factors but we can skip anything greater than the square root of N since those would be multiples of what we've already checked.
It's slow, but sorta satisfying isn't it?
SIMON'S BASIC Limits
As a funny result of working on the benchmark table later down the post, I found out that the mod function in SIMON'S BASIC is 16 bits. It can only do modulus to 65535. I suppose I should have expected that.
Assembly Version, still overly labor intensive but faster
We already know from this silly benchmarking post that these machines take about a day to COUNT to 232 so we're going to stick to 24 bits here to keep this from taking TOO long.
We need a Square Root routine
There is an excellent and easy to read 24 Bit square-root routine by Verz on Codebase64. Back to "let's not reinvent the wheel" so I'm using this one with credit to the author.
Let's do this
We're going to try very hard to make this as close to the BASIC version as possible. I think the end result is pretty close and easy to understand. At least as easy to understand as Assembly code can be.
findprimes .block loop lda n sta dividend lda n+1 sta dividend+1 lda n+2 sta dividend+2 lda #2 sta divisor jsr divide24 lda remainder bne cont jmp notprime ;divides by 2 cont lda n sta square lda n+1 sta square+1 lda n+2 sta square+2 jsr sqrt24 lda p clc adc #1 sta p bcc aheadp lda p+1 adc #0 sta p+1 aheadp lda #2 sta i lda #0 sta i+1 inner lda n sta dividend lda n+1 sta dividend+1 lda n+2 sta dividend+2 lda i+1 lda i sta divisor lda i+1 sta divisor+1 lda i+2 sta divisor+2 jsr divide24 lda remainder beq notprime clc lda i adc #1 sta i bcc aheadc lda i+1 adc #0 sta i+1 aheadc lda i+1 cmp p+1 bne again lda i cmp p bcs doneinner again jmp inner doneinner prime lda n+2 sta dvalue+2 lda n+1 sta dvalue+1 lda n sta dvalue jsr printdec lda #32 jsr $ffd2 notprime clc lda n adc #1 sta n bcc ahead1 lda n+1 adc #0 sta n+1 bcc ahead1 lda n+2 adc #0 sta n+2 ahead1 ;lda n+2 ;cmp #$ff ;bne againc lda n+1 cmp #$ff bne againc lda n cmp #$ff bcs done againc jmp loop done ; how long did this take? lda $a2 sta dvalue lda $a1 sta dvalue+1 lda $a0 sta dvalue+2 jsr printdec ;prints value as decimal rts .bend
i .byte 0,0,0 n .byte 0,0,0 p = sqrt
This isn't the fastest way to do this and there are several areas that could be improved pretty simply.
We'll do A LOT of optimization when we go through my SHA-256 implementation, but I've kept it readable here as much as possible since there are probably 100+ algorithms to do this, many of which are dramatically more efficient for a computer.
I'd love to chat about these if you're so inclined. mike at imapenguin dot com is a good place to start.
And now we get to the benchmarking part of the show. I'll stick to NTSC here since this is all raw math and it will be slightly faster.
I also did exactly the same functions in Python 3.9 on a Macbook just to give some context to an average (but very popular) programming language on a mid 2021 laptop for speed comparisons. Not an Apples to Apples by any stretch, but interesting anyway.
|Platform||From 1 to:||Time.|
|Simon's BASIC||256||17 seconds|
|Simon's BASIC||65536||9819 (2.7 hours)|
|Assembly||65536||3733 seconds ( 1 hourish)|
|Python||4294967296||407516 seconds (4 days 17 hours)|
Our method works okayish on the Commodore up to numbers of about a million, but after that having to loop through so many iterations becomes pretty inefficient. Later in the series we'll explore a few methods to make this faster using some pretty clever shortcuts.
Here's the python version to 65535, happens so fast you can't really see much.
Let's compare them from slowest to fastest
- Make simple improvements to the Commodore versions to get some speed out of it
- Add a counter for the number of primes like in the BASIC version
This method really is inefficient
I let the python program run to 232 and it took a while...