Making and breaking cyphers on Commodore Part 15 - Letter frequency analysis
By Michael Doornbos
- 2 minutes read - 379 wordsThe frequency distribution of letters in English language texts varies depending on the source, but a common reference is the letter frequency analysis of the English language done by linguist and information theorist Claude Shannon. Here are approximate frequency percentages for the 26 English letters, listed from most to least common:
Listfrequency percentages for the 26 English letters in markdown table
| Letter | Frequency |
|---|---|
| E | 12.02 |
| T | 9.10 |
| A | 8.12 |
| O | 7.68 |
| I | 7.31 |
| N | 6.95 |
| S | 6.28 |
| R | 6.02 |
| H | 5.92 |
| D | 4.32 |
| L | 3.98 |
| U | 2.88 |
| C | 2.71 |
| M | 2.61 |
| F | 2.30 |
| Y | 2.11 |
| W | 2.09 |
| G | 2.03 |
| P | 1.82 |
| B | 1.49 |
| V | 1.11 |
| K | 0.69 |
| X | 0.17 |
| Q | 0.11 |
| J | 0.10 |
| Z | 0.07 |
Graph of letter frequency with percentages using plotly
import plotly.graph_objects as go
x = ['E', 'T', 'A', 'O', 'I', 'N', 'S', 'R', 'H', 'D', 'L', 'U', 'C', 'M', 'F', 'Y', 'W', 'G', 'P', 'B', 'V', 'K', 'X', 'Q', 'J', 'Z']
y = [12.02, 9.10, 8.12, 7.68, 7.31, 6.95, 6.28, 6.02, 5.92, 4.32, 3.98, 2.88, 2.71, 2.61, 2.30, 2.11, 2.09, 2.03, 1.82, 1.49, 1.11, 0.69, 0.17, 0.11, 0.10, 0.07]
fig = go.Figure([go.Bar(
x=x,
y=y,
text=y,
textposition='auto',
)])
fig.update_layout(
title_text='Letter Frequency in English',
xaxis_title="Letter",
yaxis_title="% Frequency",
)
fig.show()
TK Pic of graph
These are averages. The exact frequencies can change depending on the specific text being analyzed. For instance, certain authors may have a distinct style that uses certain letters more frequently than others. Furthermore, certain genres of text (such as technical manuals or poetry) might have different letter frequencies due to the specialized language they use.
Super Chart from the Inner Space Anthology p 29-30
10 dim fr(26)
20 for i=1 to 26: fr(i)=0: next i
30 open 1,8,2,"benfrank2"
40 get#1,a$: if st<>0 then goto 70
50 l=asc(a$)-65: if l>=0 and l<=25 then fr(l)=fr(l)+1
60 l=asc(a$)-193: if l>=0 and l<=25 then fr(l)=fr(l)+1
65 goto 40
70 close 1
80 for i=0 to 5: print chr$(i+65);":";fr(i);: next i:?
90 for i=6 to 11: print chr$(i+65);":";fr(i);: next i:?
100 for i=12 to 17: print chr$(i+65);":";fr(i);: next i:?
110 for i=18 to 23: print chr$(i+65);":";fr(i);: next i:?
120 for i=24 to 25: print chr$(i+65);":";fr(i);: next i
TK Pic of running
Resources
https://www.atarimagazines.com/compute/gazette/198705-speedscript.html
Gazette May 1987
Speedscript note is from 7/11 in Obsidian along with screenshots to use here.