So far in this 8 part series on doing weird things with Ciphers on old computers we've:
- Reversed a string
- Shifted bits left and right (mostly right)
- Brute forced a known simple cipher
- Channeled our inner John Connor by cracking ATM machines
- Searched through a large number looking for matches
- Learned how to use XOR to recover data
- Created a Linear Feedback Shift Register to generate better random numbers
"In September, 1994 someone posted source code to the Cyberpunks mailing list-anonymously. It quickly spread to the Usenet newsgroup sci.crypt, and via the Internet to ftp sites around the world. Readers with legal copies of RC4 confirmed compatibility. RDA Data Security, Inc. tried to put the genie back into the bottle, claiming that it was still a trade secret even through it was public, it was too late. It has since been discussed and dissected on Usenet, distributed at conferences, and taught in cryptography courses." - Applied Cryptography, Bruce Schneier, 1997
We’re going to turn to the above book for this one. If there was ever a Crypto 101 bible, Applied Cryptography by the mighty Bruce has to be it.
We’re going to tackle RC4 today that was in widespread use for quite a long time.
RC4 was written by Ron Rivest in 1987 for RSA Security and is both simple in design and very fast.
It was the basis for encryption in:
and is/was optional in many others including
- Secure shell
- Remote Desktop Protocol
Warning: RC4 was once a premier cipher, used in millions of systems, but has been broken for quite some time. Use this for learning, not actual cryptography.
How it works
The concept is simple enough (for an encryption protocol):
This is word for word from Applied Cryptography p 397-398 with very minor edits to make it play nice on a web page:
It has a 8 * 8 S-box:
s0,s1,...,S255The entries are a permutation of the numbers 0 through 255, and the permutation is a function of the variable-length key. It has two counters, i and j, initialized to zero.
To generate a random byte, do the following:
i = (i + 1) mod 256 j = (j + Si) mod 256 swap Si and Sj t=(Si +Sj)mod256 K = St
The byte K is XORed with the plaintext to produce ciphertext or XORed with the ciphertext to produce plaintext. Encryption is fast—about 10 times faster than DES.
Initializing the S-box is also easy. First, fill it linearly so:
S0=0,S1=1,...,S255=255 Then fill another 256-byte array with the key, repeating the key as necessary to fill the entire array
K0,K1,...,K255 Set the index j to zero. Then:
for i = 0 to 255: j=(j+Si +Ki)mod256 swap Si and Sj
We're going to simply encrypt the phrase "ARE YOU KEEPING UP?" with the key "SECRET".
Note that I'm using uppercase here, this will make the already complicated Assembly code easier since the Commodore is going to store these codes anyway. We want the output in hexadecimal. We've tackled printing hex in previous parts of this series, but it will present a challenge in the BASIC version in a few minutes.
This will be easier to visualize with some Python code. The pseudo-code SEEMS complicated, but it's really not. Python has a way of making concepts like this easy to understand. It's no mystery it has become one of the most popular programming languages and stayed at the top for decades.
#!/usr/bin/env python def rc4encrypt(key, data): # First initialize the S-Box j = 0 S = list(range(256)) for i in list(range(256)): j = (j + S[i] + ord(key[i % len(key)])) % 256 S[i], S[j] = S[j], S[i] #swap places i = 0 j = 0 for char in data: i = (i + 1) % 256 j = (j + S[i]) % 256 S[i], S[j] = S[j], S[i] # swap places print(hex(ord(char) ^ S[(S[i] + S[j]) % 256]), end =" ") if __name__ == '__main__': rc4encrypt('SECRET','ARE YOU KEEPING UP?')
So we create the S-Box and then iterate over it, swapping positions before the XOR for each of the characters in the message. If you're not familiar
ord(char) just returns the ASCII number of the character we're currently working in. That
ord will byte me later as we'll see.
This outputs the encrypted message:
0xdd 0x3c 0x29 0x5f 0x56 0x6f 0x76 0xe8 0xa9 0x23 0xf9 0xe3 0x28 0x4c 0xa2 0x27 0xf9 0x39 0x3e
Wasn't too difficult right?
Just for fun, I implemented this in Commodore C64 BASIC. I don't have an easy way to copy the code into a post without retyping it all, so you'll just have to live with video.
The BASIC version works fine but it's very very slow. It takes around 10 seconds to actually do the S-Box generation and then encrypt.
Since we want to match the Assembly version below as much as possible, the outputs need to be in hexadecimal. BASIC 2.0 does not natively support printing in hex, so there's a GOSUB at the end to convert and print them. We're using Simons BASIC here again for the XOR and MOD functions, but even it doesn't have a decimal to hex conversion. Our GOSUB at the end adds a TON of processing time to this. It DOES look cool though, and it works so we'll just go with it.
Note: This works... but don’t go using it as a model for elegant code. We could probably spend a few weeks refining it if you wanted to publish it for widespread use commercially or something like that. That's not the point here.
Understanding the cipher is the core concept and working code that's “easy” to follow is the goal.
We'll be using the
divide24 routine from Part 7 of this series to get our modulus function. Get it from there if you're not familiar.
We're also using a couple zero page labels, zp1 and zp2. These could certainly be better named, but they are temporary. Call them what you want. There are MANY parts of this that could be optimized where I did it "the long way" to keep the readability in tact. I've tried to keep everything as close to the naming conventions Mr Shneier uses in Applied Cryptography.
We'll need some places to hold things:
;divide24 remainder .byte 0,0,0 dividend .byte 0,0,0 divisor .byte 0,0,0 answer = dividend message .null "are you keeping up?" messagelen .byte 0 cipher .null " " key .null "secret" keylen .byte 0 s .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 .byte 0,0,0,0,0,0,0,0 i .byte 0 j .byte 0 count .byte 0
And then we've got a little routine to clear the memory where the division will take place:
cleardiv lda #0 sta dividend sta dividend+1 sta dividend+2 sta divisor sta divisor+1 sta divisor+2 rts
RC4 makes use of a position swap between
S[i] and S[j] twice. It's pretty straightforward:
swap ldx i ldy j lda s,x sta zp2 lda s,y sta s,x lda zp2 sta s,y
Filling the S-Box is first.
dosbox ;it's already initialized with zeros, but we'll do it again ;to match the book .block ldx #$00 filloop txa sta s,x inx bne filloop lda #$00 sta i sboxloop jsr cleardiv lda i sta dividend lda keylen sta divisor jsr divide24 lda remainder tay lda key,y sta zp2 clc ldx i lda s,x adc zp2 sta zp2 adc #0 sta zp2+1 lda j clc adc zp2 sta dividend adc zp2+1 sta dividend+1 lda #$00 sta divisor lda #$01 sta divisor+1 jsr divide24 lda remainder sta j swap ldx i ldy j lda s,x sta zp2 lda s,y sta s,x lda zp2 sta s,y skip inc i ldx i cpx #0 bne sboxloop done rts .bend
You have got to be kidding me
As I was writing this version, I got really really stuck. For days. I started to question if I should even bother to make an Assembly version of these things at all. The BASIC version took 30 minutes after all and it does work on a vintage computer.
Why am I even here? What's the meaning of it all?!? Okay, pull yourself together and stop being a baby Michael.
If you code long enough you’ll run into what I did.
My python version above uses
ord() to get the ASCII representation of the character during the last real line of work. As I was writing this Assembly version, I must have transposed
ord to ora in my mind. I banged my head on the rest of the week and started to doubt my suitability in doing this series.
On a Saturday morning I went through it character by character, explaining it to the cat (my rubber duck) and then finally saw it.
OR is quite different from
XOR when you're looking for accurate results.
Anyway, try and avoid this sort of mistake. And WHEN you make one like it, remember that there is a reason you’re getting the result you’re getting. It’s not magic. Mostly. Or is it?
Now for the encryption loop and position swap.
doencrypt .block ldx #0 stx i stx j stx count loop jsr cleardiv clc lda i adc #1 sta dividend adc #0 sta dividend+1 lda #$00 sta divisor lda #$01 sta divisor+1 jsr divide24 lda remainder sta i doj jsr cleardiv ldx i clc lda j adc s,x sta dividend adc #0 sta dividend+1 lda #$00 sta divisor lda #$01 sta divisor+1 jsr divide24 lda remainder sta j swap ldx i ldy j lda s,x sta zp2 lda s,y sta s,x lda zp2 sta s,y doxor jsr cleardiv ldx i ldy j clc lda s,x adc s,y tax lda s,x sta zp2 ldx count lda message,x eor zp2 sta dividend lda #$00 sta divisor lda #$01 sta divisor+1 jsr divide24 lda remainder ldx count sta cipher,x skip inc count lda count cmp messagelen ;this is actually more than 128 away to bne back to the loop ;at the top, so we'll test for the end and do a jmp if ;we're not there beq done jmp loop done rts .bend
Benchmarking for fun and profit
Reiterating once again that NO EFFORT has been made to optimize the code for speed. So this is just for funzies:
I made a 128 bit key, which was a realistic key size when RC4 was in widespread use and we can do:
Not great, but not terrible. It'll be interesting to compare to the more complicated ciphers on this platform in a completely non scientific manner.
- Optimize parts of the code and benchmark. How much speed can you squeeze out of the 6510/6502 to do this sort of work?
- Implement this on a different 8 Bit machine