QWERTY has been by far the most popular keyboard layout in the world. However, it was not optimized for typing English. Several key placements are clearly suboptimal. For example, "j" is placed in the home position of right index finger, one of the most comfortable positions, yet "j" is the least used key in English texts.
A lot of alternative keyboard layouts have been designed in the past 150 years since QWERTY was created. The most popular ones are Drovak and Colemak. Many of those alternative layouts are designed for a more comfortable English typing experience. However, they were mostly designed manually and there were very few quantitative or automatic approaches to this problem.
In this note, we provide a combinatorial algorithm for designing keyboard layouts optimized for English texts. We collect real text frequency data using publicly available books from Project Gutenberg, and optimized a quantitative objective with the following two goals in mind:
(1) The more frequently used keys should be closer to the home positions and pressed using stronger fingers.
(2) Alternating the fingers as much as possible (i.e. avoid typing two consecutive keys using the same finger).
We assume the keyboard has
A keyboard layout
# required packages:
import os
import pandas as pd
import numpy as np
from scipy.optimize import quadratic_assignment
alphabet="abcdefghijklmnopqrstuvwxyz,./' "
We introduce a simple model for quantifying the difficulty of typing a given text. This model was inspired by (and simplified from) the analysis in Colemak-DH.
Define the total effort for typing
Here,
This effort model has two components: BaseEffort and Penalty. From a high level, BaseEffort encourages the more frequently used keys to be closer to the home positions / pressed using stronger fingers; Penalty discourages pressing two consecutive keys using the same finger.
The definitions of these two components are both slight variants from Colemak-DH. However, Colemak-DH did not provide any algorithmic way to optimize their objective. To our best knowledge, all of the attempts to improve Colemak on the Colemak forum, has been done manually.
This is a measurement of the difficulty to press each key on the keyboard. See Here for details. For mathematical convenience, we also introduce an extra "space" key which has zero effort to press.
Recall that Penalty discourages pressing two consecutive keys using the same finger. We let
Specifically, when
We note that the scoring values chosen in Penalty and BaseEffort are highly subjective, as noted by the author of Colemak-DH. We decided to use their scoring values as is, for two reasons: (1) designing the scoring system is not the focus of this project; (2) the author has put some quite significant effort to design the system, which has also been accepted by the Colemak users community. Our algorithmic approach generalizes to other choice of the scoring system in a straightforward manner.
def load_config(kb="matrix30"):
# load the config of keyboard layout
if kb == "matrix30": # 10x3 matrix keyboard
# Total number of keys, the additional key is a space key
num_keys = 30 +1
num_rows = 3 # number of rows on the keyboard
keys_to_ignore = "" # the characters to be ignored in the text
effort_grid = [3.0, 2.4, 2.0, 2.2, 3.2, 3.2, 2.2, 2.0, 2.4, 3.0,
1.6, 1.3, 1.1, 1.0, 2.9, 2.9, 1.0, 1.1, 1.3, 1.6,
3.2, 2.6, 2.3, 1.6, 3.0, 3.0, 1.6, 2.3, 2.6, 3.2,
0]
# for each key, specify which finger is used to type this key
finger = [0,1,2,3,3, 6,6,7,8,9,
0,1,2,3,3, 6,6,7,8,9,
0,1,2,3,3, 6,6,7,8,9,
5]
# for each key, specify which row is it on the keyboard
row = [0,0,0,0,0,0,0,0,0,0,
1,1,1,1,1,1,1,1,1,1,
2,2,2,2,2,2,2,2,2,2,
2]
# same finger bigram penalty
sfb_pen = [2.5,3,4]
return num_keys,num_rows,keys_to_ignore,effort_grid,finger,row,sfb_pen
def compute_effort_matices(num_keys,effort_grid,finger,row,sfb_pen):
# compute base_effort[i][j] , same_finger[i][j] and penalty[i][j]
base_effort_matrix = np.zeros([num_keys,num_keys])
same_finger_bigram_matrix = np.zeros([num_keys,num_keys])
penalty_matrix = np.zeros([num_keys,num_keys])
for i in range(num_keys):
for j in range(num_keys):
base_effort_matrix[i][j] = (effort_grid[i] + effort_grid[j])/2
# the special case for space
if finger[i] == 5:
base_effort_matrix[i][j] = effort_grid[j]
if finger[j] == 5:
base_effort_matrix[i][j] = effort_grid[i]
row_diff = np.abs(row[i] - row[j])
if finger[i] == finger[j]:
if i != j:
same_finger_bigram_matrix[i][j] = 1
penalty_matrix[i][j] = sfb_pen[row_diff]
return base_effort_matrix, same_finger_bigram_matrix, penalty_matrix
In this section we show how the proposed objective can be formulated as a Quadratic Assignment Problem, which leads to efficient optimization.
Since we assumed
Denote
This is an instance of Quadratic Assignment Problem. Here, the problem is to find a bijection
It is exactly the definition of TotalEffort if we choose
def char_to_int(c):
# return i iff alphabet[i] == c, otherwise treat it as a space key
char_to_id = dict(zip(alphabet,list(range(len(alphabet))) ))
return char_to_id[c] if c in alphabet else num_keys-1
def compute_bigram_freq(keys_to_ignore):
# compute the letter frequency.
# bigram_matrix[i][j] = N(alpha[i], alpha[j], s)
df_bigrams = pd.read_csv(os.path.join(os.path.dirname(os.path.realpath('__file__')), 'stats.csv'), header=0, sep=',', index_col=0)
bigram_matrix = np.zeros([num_keys,num_keys])
df_bigrams = df_bigrams / df_bigrams.sum(axis=0)
# ignore keys
for c in keys_to_ignore:
alphabet=alphabet.replace(c, "")
for key in df_bigrams['books'].keys():
i = char_to_int(key[0])
j = char_to_int(key[1])
bigram_matrix[i][j] += df_bigrams['books'][key]
# count total unigram/bigrams
total_nonspace_unigram = 1 - bigram_matrix[num_keys-1][num_keys-1]
total_nonspace_bigram = 1 - np.sum(bigram_matrix,0)[num_keys-1] - np.sum(bigram_matrix,1)[num_keys-1] + bigram_matrix[num_keys-1][num_keys-1]
return bigram_matrix, total_nonspace_unigram , total_nonspace_bigram
def print_layout(perm, show = True):
# print the layout in a nice looking way
ans = ""
disp = ["" for i in range(num_rows)]
for i in range(num_keys):
c = alphabet[perm[i]]
row_c = row[i]
ans= ans + c
disp[row_c] = disp[row_c] + c + ' '
if show:
for i in range(num_rows):
print(disp[i])
return ans
def permute_matrix(p):
# turns a permutation into corresponding permutation matrix
P = np.zeros([num_keys,num_keys])
for i in range(num_keys):
P[i][p[i]] = 1
return P
def layout_to_perm(s):
#turns a layout into corresponding permutation pi
ans = np.zeros(num_keys, dtype= np.int8)
for i in range(num_keys):
ans[i] = char_to_int(s[i])
return ans
# initialize all of the quantities required
num_keys,num_rows,keys_to_ignore,effort_grid,finger,row,sfb_pen = load_config()
base_effort_matrix, same_finger_bigram_matrix, penalty_matrix = compute_effort_matices(num_keys,effort_grid,finger,row,sfb_pen)
bigram_matrix, total_nonspace_unigram , total_nonspace_bigram = compute_bigram_freq(keys_to_ignore)
coeff_matrix = base_effort_matrix/total_nonspace_unigram + penalty_matrix/ total_nonspace_bigram
While Quadratic Assignment Problem is NP-Hard in general, scipy has provided a very efficient optimization algorithm for it. This algorithm is based on a first order (gradient-descent-like) algorithm over a continuous relaxation of the objective, followed by a greedy combinatorial local swap step. With a few thousands of random initializations, our algorithm runs in seconds on a 10-year-old Intel Core i7 2600 desktop.
The algorithm also supports fixing some of the key placements. This is very useful since it is desirable to keep some of the punctuation keys or common shortcuts in the original place, like ctrl-a, ctrl-q, ctrl-c, ctrl-v, etc.
Here we generate a few candidate layouts based on different levels of constraint: one with fixed qwazxcv and punctuations (like colemak), one with fixed qwazxc and punctuations (like colemak-dh), fixed punctuations only, or no constraints at all.
def search_layout(fixed, n_rand_init):
# fixed: the fixed assignment of keys
# n_rand_init: number of random initializations in the optimization step
# process the constraints
for i in range(len(fixed)):
fixed[i][1] = char_to_int(fixed[i][1])
fixed.append([num_keys - 1, num_keys - 1])
fixed.sort()
# step 1: optimize with random initialization and grad descent on continuous relaxation
options1 = {"P0": "randomized", "partial_match": fixed}
sol = min([quadratic_assignment(coeff_matrix, bigram_matrix, options=options1) for i in range(n_rand_init)], key=lambda x: x.fun)
# step 2: optimize with greedy swaps
options2 = {"partial_guess": np.array([np.arange(num_keys), sol.col_ind]).T, "partial_match":fixed}
sol = quadratic_assignment(coeff_matrix, bigram_matrix, method="2opt", options=options2)
layout = print_layout(sol.col_ind,show = True)
return layout
#generate layouts
fixed_qwazxc =[[0,'q'] ,
[1,'w'],
[9,'\''],
[10,'a'],
[13,'t'],
[20,'z'],
[21,'x'],
[22,'c'],
[27, ','],
[28, '.'],
[29, '/']]
fixed_qwazxcv =[[0,'q'] ,
[1,'w'],
[9,'\''],
[10,'a'],
[13,'t'],
[20,'z'],
[21,'x'],
[22,'c'],
[23,'v'],
[27, ','],
[28, '.'],
[29, '/']]
fixed_punc =[ [9,'\''],
[27, ','],
[28, '.'],
[29, '/']]
fixed_none =[]
n_rand_init = 1000
print("generated with fixed qwazxcv and punctuations:")
layout_qwazxcv = search_layout(fixed_qwazxcv, n_rand_init)
print("generated with fixed qwazxc and punctuations:")
layout_qwazxc = search_layout(fixed_qwazxc, n_rand_init)
print("generated with fixed punctuations only:")
layout_punc = search_layout(fixed_punc, n_rand_init)
print("generated with no restrictions:")
layout_none = search_layout(fixed_none, n_rand_init)
generated with fixed qwazxcv and punctuations:
q w l d k j f u y '
a r s t m b n e i o
z x c v g p h , . /
generated with fixed qwazxc and punctuations:
q w l m k j f u y '
a r s t g b n e i o
z x c d v p h , . /
generated with fixed punctuations only:
x c l f z j m u y '
a s r n b k t e i o
q g w h p v d , . /
generated with no restrictions:
v f u , q z w l c y
r n e a . b t h s i
x p ' o / k d m g j
We compare our algorithmically generated layouts with existing manually designed ones, including QWERTY, Drovak, Colemak and more.
These comparisons are based on our total effort, base effort, penalty scores and same finger bigram percentage (i.e. how often is it needed to press two consecutive keys with the same finger).
def evaluate(layout, print_score = True):
# evaluate the layout based on score function, base effort, penalty and same_finger_bigrams
perm = layout_to_perm(layout)
P = permute_matrix(perm)
penalty = np.trace(penalty_matrix.T @ P @ bigram_matrix @ P.T) / total_nonspace_bigram
base_effort = np.trace(base_effort_matrix.T @ P @ bigram_matrix @ P.T) /total_nonspace_unigram
overall_score = base_effort + penalty
sfb = np.trace(same_finger_bigram_matrix.T @ P @ bigram_matrix @ P.T) / total_nonspace_bigram * 100
if print_score:
print('score:', overall_score)
print('same_finger_bigram:', sfb, "%")
print('base effort:', base_effort)
print('penalty:', penalty)
return [overall_score, sfb,base_effort,penalty]
# print and evaluate the candidate layouts
layouts = [["qwerty", "qwertyuiopasdfghjkl'zxcvbnm,./ "],
["colemak-dh", "qwfpbjluy'arstgmneiozxcdvkh,./ "],
["colemak", "qwfpgjluy'arstdhneiozxcvbkm,./ "],
["drovak", "',.pyfgcrlaoeuidhtns/qjkxbmwvz "],
["handsdown", "qchpvkyoj'rsntgwueiaxmldbzf,./ "],
["soul", "qwldpkmuy'asrtgfneiozxcvjbh,./ "],
["aset", "qwjfgypul'asetdhniorzxcvbkm,./ "],
["minimak8 ", "qwdrkyuilpastfghneo'zxcvbjm,./ "],
["ours-qwazxcv", layout_qwazxcv],
["ours-qwazxc", layout_qwazxc],
["ours-punct", layout_punc],
["ours-none", layout_none]]
for i in range(len(layouts)):
layouts[i] = layouts[i] + evaluate(layouts[i][1], print_score = False)
layouts.sort(key = lambda x: x[2])
for i in range(len(layouts)):
print("rank",i, ":",layouts[i][0], layouts[i][2])
print_layout(layout_to_perm(layouts[i][1]))
print("-------------------")
df = pd.DataFrame(layouts, columns =['name', 'layout', 'totaleffort', 'same_finger_bigram', 'base', 'penalty'])
df
rank 0 : ours-none 1.6589118258928328
v f u , q z w l c y
r n e a . b t h s i
x p ' o / k d m g j
-------------------
rank 1 : ours-punct 1.6801091955890144
x c l f z j m u y '
a s r n b k t e i o
q g w h p v d , . /
-------------------
rank 2 : ours-qwazxc 1.6905006857296292
q w l m k j f u y '
a r s t g b n e i o
z x c d v p h , . /
-------------------
rank 3 : colemak-dh 1.704741478170455
q w f p b j l u y '
a r s t g m n e i o
z x c d v k h , . /
-------------------
rank 4 : ours-qwazxcv 1.728803412528172
q w l d k j f u y '
a r s t m b n e i o
z x c v g p h , . /
-------------------
rank 5 : soul 1.7423222997847656
q w l d p k m u y '
a s r t g f n e i o
z x c v j b h , . /
-------------------
rank 6 : handsdown 1.7542377505320454
q c h p v k y o j '
r s n t g w u e i a
x m l d b z f , . /
-------------------
rank 7 : colemak 1.7987685793082813
q w f p g j l u y '
a r s t d h n e i o
z x c v b k m , . /
-------------------
rank 8 : aset 1.8667974476688138
q w j f g y p u l '
a s e t d h n i o r
z x c v b k m , . /
-------------------
rank 9 : minimak8 1.923265759408603
q w d r k y u i l p
a s t f g h n e o '
z x c v b j m , . /
-------------------
rank 10 : drovak 1.9248213849510105
' , . p y f g c r l
a o e u i d h t n s
/ q j k x b m w v z
-------------------
rank 11 : qwerty 2.401630225079672
q w e r t y u i o p
a s d f g h j k l '
z x c v b n m , . /
-------------------
name | layout | totaleffort | same_finger_bigram | base | penalty | |
---|---|---|---|---|---|---|
0 | ours-none | vfu,qzwlcyrnea.bthsixp'o/kdmgj | 1.658912 | 0.905721 | 1.630558 | 0.028353 |
1 | ours-punct | xclfzjmuy'asrnbkteioqgwhpvd,./ | 1.680109 | 1.403423 | 1.635438 | 0.044672 |
2 | ours-qwazxc | qwlmkjfuy'arstgbneiozxcdvph,./ | 1.690501 | 1.541221 | 1.640971 | 0.049530 |
3 | colemak-dh | qwfpbjluy'arstgmneiozxcdvkh,./ | 1.704741 | 1.631484 | 1.653080 | 0.051661 |
4 | ours-qwazxcv | qwldkjfuy'arstmbneiozxcvgph,./ | 1.728803 | 1.541221 | 1.679006 | 0.049798 |
5 | soul | qwldpkmuy'asrtgfneiozxcvjbh,./ | 1.742322 | 1.888932 | 1.681498 | 0.060824 |
6 | handsdown | qchpvkyoj'rsntgwueiaxmldbzf,./ | 1.754238 | 1.398541 | 1.710238 | 0.044000 |
7 | colemak | qwfpgjluy'arstdhneiozxcvbkm,./ | 1.798769 | 1.631484 | 1.746691 | 0.052077 |
8 | aset | qwjfgypul'asetdhniorzxcvbkm,./ | 1.866797 | 3.064345 | 1.769551 | 0.097247 |
9 | minimak8 | qwdrkyuilpastfghneo'zxcvbjm,./ | 1.923266 | 4.090840 | 1.793553 | 0.129712 |
10 | drovak | ',.pyfgcrlaoeuidhtns/qjkxbmwvz | 1.924821 | 2.665626 | 1.843624 | 0.081197 |
11 | qwerty | qwertyuiopasdfghjkl'zxcvbnm,./ | 2.401630 | 6.588611 | 2.186692 | 0.214938 |
From the above comparison, we can see that our algorithmically generated layouts scores better than almost all of the manually designed layouts. We can see that in our score model, colemak-dh is a very well-designed layout (perhaps not so surprising because the scoring model we used was, for the most part, designed by the author of colemak-dh). It's only marginally worse than the algorithmically generated layout ours-qwazxc, and better than ours-qwazxcv (which has one extra constraint on the placement of 'v'). We also note that ours-qwazxc layout is extremely similar to colemak-dh: they are only different on 6 key assignments, mostly infrequently used ones. All layouts improved over QWERTY by a very big margin.