About
fonts > ClearType
The Science of Word
Recognition
or how I learned to
stop worrying and love the bouma
Kevin
Larson Advanced Reading Technology, Microsoft Corporation July
2004
Introduction
Evidence from the last
20 years of work in cognitive psychology indicate that we use the
letters within a word to recognize a word. Many typographers and
other text enthusiasts I’ve met insist that words are recognized by
the outline made around the word shape. Some have used the term
bouma as a synonym for word shape, though I was unfamiliar with the
term. The term bouma appears in Paul Saenger’s 1997 book Space
Between Words: The Origins of Silent Reading. There I learned to my
chagrin that we recognize words from their word shape and that
“Modern psychologists call this image the ‘Bouma shape.’”
This paper is written
from the perspective of a reading psychologist. The data from dozens
of experiments all come from peer reviewed journals where the
experiments are well specified so that anyone could reproduce the
experiment and expect to achieve the same result. This paper was
originally presented as a talk at the ATypI conference in Vancouver in
September, 2003.
The goal of this paper
is to review the history of why psychologists moved from a word
shape model of word recognition to a letter recognition model, and
to help others to come to the same conclusion. This paper will cover
many topics in relatively few pages. Along the way I will present
experiments and models that I couldn’t hope to cover completely
without boring the reader. If you want more details on an
experiment, all of the references are at the end of the paper as
well as suggested readings for those interested in more information
on some topics. Most papers are widely available at academic
libraries.
I will start by
describing three major categories of word recognition models: the
word shape model, and serial and parallel models of letter
recognition. I will present representative data that was used as
evidence to support each model. After all the evidence has been
presented, I will evaluate the models in terms of their ability to
support the data. And finally I will describe some recent
developments in word recognition and a more detailed model that is
currently popular among psychologists.
Model #1:
Word Shape
The word recognition
model that says words are recognized as complete units is the oldest
model in the psychological literature, and is likely much older than
the psychological literature. The general idea is that we see words
as a complete patterns rather than the sum of letter parts. Some
claim that the information used to recognize a word is the pattern
of ascending, descending, and neutral characters. Another
formulation is to use the envelope created by the outline of the
word. The word patterns are recognizable to us as an image because
we have seen each of the patterns many times before. James Cattell
(1886) was the first psychologist to propose this as a model of word
recognition. Cattell is recognized as an influential founder of the
field of psycholinguistics, which includes the scientific study of
reading.
 
Figure
1:
Word shape recognition using the pattern of ascending, descending,
and neutral characters
characters
 
Figure
2:
Word shape recognition using the envelope around the word
Cattell supported the
word shape model because it provided the best explanation of the
available experimental evidence. Cattell had discovered a
fascinating effect that today we call the Word Superiority Effect.
He presented letter and word stimuli to subjects for a very brief
period of time (5-10ms), and found that subjects were more accurate
at recognizing the words than the letters. He concluded that
subjects were more accurate at recognizing words in a short period
of time because whole words are the units that we
recognize.
Cattell’s study was
sloppy by modern standards, but the same effect was replicated in
1969 by Reicher. He presented strings of letters – half the time real
words, half the time not – for brief periods. The
subjects were asked if one of two letters were contained in the
string, for example D or K. Reicher found that
subjects were more accurate at recognizing D when it was in the
context of WORD than when in the
context of ORWD. This supports the
word shape model because the word allows the subject to quickly
recognize the familiar shape. Once the shape has been recognized,
then the subject can deduce the presence of the correct letter long
after the stimulus presentation.
The second key piece
of experimental data to support the word shape model is that
lowercase text is read faster than uppercase text. Woodworth (1938)
was the first to report this finding in his influential textbook
Experimental Psychology. This finding has been confirmed more
recently by Smith (1969) and Fisher (1975). Participants were asked
to read comparable passages of text, half completely in uppercase
text and half presented in standard lowercase text. In each study,
participants read reliably faster with the lowercase text by a 5-10%
speed difference. This supports the word shape model because
lowercase text enables unique patterns of ascending, descending, and
neutral characters. When text is presented in all uppercase, all
letters have the same text size and thus are more difficult and
slower to read.
The patterns of errors
that are missed while proofreading text provide the third key piece
of experimental evidence to support the word shape model. Subjects
were asked to carefully read passages of text for comprehension and
at the same time mark any misspelling they found in the passage. The
passage had been carefully designed to have an equal number of two
kinds of misspellings: misspellings that are consistent with word
shape, and misspellings that are inconsistent with word shape. A
misspelling that is consistent with word shape is one that contains
the same patterns of ascenders, descenders, and neutral characters,
while a misspelling that is inconsistent with word shape changes the
pattern of ascenders, descenders, and neutral characters.
If test is the correctly
spelled word, tesf would be an example
of a misspelling consistent with word shape and tesc would be an example
of a misspelling inconsistent with word shape. The word shape model
would predict that consistent word shapes would be caught less often
than an inconsistent word shape because words are more confusable if
they have the same shape. Haber & Schindler (1981) and Monk
& Hulme (1983) found that misspellings consistent with word
shape were twice as likely to be missed as misspellings inconsistent
with word shape.
Target word:
test |
Error
rate |
Consistent word
shape (tesf) |
13% |
Inconsistent
word shape (tesc) |
7% |
Figure
3:
Misspellings that are consistent with word shape are missed more
often
The fourth piece of
evidence supporting the word shape model is that it is difficult to
read text in alternating case. AlTeRnAtInG case is where the letters
of a word change from uppercase to lowercase multiple times within a
word. The word shape model predicts that this is difficult because
it gives a pattern of ascending, descending, and neutral characters
that is different than exists in a word in its natural all lowercase
form. Alternating case has been shown to be more difficult than
either lowercase or uppercase text in a variety of studies. Smith
(1969) showed that it slowed the reading speed of a passage of text,
Mason (1978) showed that the time to name a word was slowed,
Pollatsek, Well, & Schindler (1975) showed that same-difference
matching was hindered, and Meyer & Gutschera (1975) showed that
category decision times were decreased.
Model #2:
Serial Letter Recognition
The shortest lived
model of word recognition is that words are read letter-by-letter
serially from left to right. Gough (1972) proposed this model
because it was easy to understand, and far more testable than the
word shape model of reading. In essence, recognizing a word in the
mental lexicon was analogous to looking up a word in a dictionary.
You start off by finding the first letter, than the second, and so
on until you recognize the word.
This model is
consistent with Sperling’s (1963) finding that letters can be
recognized at a rate of 10-20ms per letter. Sperling showed
participants strings of random letters for brief periods of time,
asking if a particular letter was contained in the string. He found
that if participants were given 10ms per letter, they could
successfully complete the task. For example, if the target letter
was in the fourth position and the string was presented for 30ms,
the participant couldn’t complete the task successfully, but if
string was presented for 40ms, they could complete the task
successfully. Gough noted that a rate of 10ms per letter would be
consistent with a typical reading rate of 300 wpm.
The serial letter
recognition model is also able to successfully predict that shorter
words are recognized faster than longer words. It is a very robust
finding that word recognition takes more time with longer words. It
takes more time to recognize a 5-letter word than a 4-letter word,
and 6-letter words take more time to recognize than 5-letter words.
The serial letter recognition model predicts that this should
happen, while a word shape model does not make this prediction. In
fact, the word shape model should expect longer words with more
unique patterns to be easier to recognize than shorter words.
The serial letter
recognition model fails because it cannot explain the Word
Superiority Effect. The Word Superiority Effect showed that readers
are better able to identify letters in the context of a word than in
isolation, while the serial letter recognition model would expect
that a letter in the third position in a word should take three
times as long to recognize as a letter in isolation.
Model #3:
Parallel Letter Recognition
The model that most
psychologists currently accept as most accurate is the parallel
letter recognition model. This model says that the letters within a
word are recognized simultaneously, and the letter information is
used to recognize the words. This is a very active area of research
and there are many specific models that fit into this general
category. I will only discuss one popular formulation of this
model.
Figure 4 shows a
generic activation based parallel letter recognition model. In this
example, the reader is seeing the word work. Each of the stimulus
letters are processed simultaneously. The first step of processing
is recognizing the features of the individual letters, such as
horizontal lines, diagonal lines, and curves. The details of this
level are not critical for our purposes. These features are then
sent to the letter detector level, where each of the letters in the
stimulus word are recognized simultaneously. The letter level then
sends activation to the word detector level. The W in the first letter
detector position sends activation to all the words that have a
W in the first position
(WORD and WORK). The O in the second letter
detector position sends activation to all the words that have an
O in the second
position (FORK, WORD, and WORK). While
FORK and WORD have activation from
three of the four letters, WORK has the most
activation because it has all four letters activated, and is thus
the recognized word.

Figure
4:
Parallel Letter Recognition
Much of the evidence
for the parallel letter recognition model comes from the eye
movement literature. A great deal has been learned about how we read
with the advent of fast eye trackers and computers. We now have the
ability to make changes to text in real time while people read,
which has provided insights into reading processes that weren’t
previously possible.
It has been known for
over 100 years that when we read, our eyes don’t move smoothly
across the page, but rather make discrete jumps from word to word.
We fixate on a word for a period of time, roughly 200-250ms, then
make a ballistic movement to another word. These movements are
called saccades and usually take 20-35ms. Most saccades are forward
movements from 7 to 9 letters,*
but 10-15% of all saccades are regressive or backwards movements.
Most readers are completely unaware of the frequency of regressive
saccades while reading. The location of the fixation is not random.
Fixations never occur between words, and usually occur just to the
left of the middle of a word. Not all words are fixated; short words
and particularly function words are frequently skipped. Figure 5
shows a diagram of the fixation points of a typical
reader.

Figure
5:
Saccadic eye movements
During a single
fixation, there is a limit to the amount of information that can be
recognized. The fovea, which is the clear center point of our
vision, can only see three to four letters to the left and right of
fixation at normal reading distances. Visual acuity decreases
quickly in the parafovea, which extends out as far as 15 to 20
letters to the left and right of the fixation point.
Eye movement studies
that I will discuss shortly indicate that there are three zones of
visual identification. Readers collect information from all three
zones during the span of a fixation. Closest to the fixation point
is where word recognition takes place. This zone is usually large
enough to capture the word being fixated, and often includes smaller
function words directly to the right of the fixated word. The next
zone extends a few letters past the word recognition zone, and
readers gather preliminary information about the next letters in
this zone. The final zone extends out to 15 letters past the
fixation point. Information gathered out this far is used to
identify the length of upcoming words and to identify the best
location for the next fixation point. For example, in Figure 5, the
first fixation point is on the s in Roadside. The reader is able
to recognize the word Roadside, beginning letter
information from the first few letters in joggers, as well as complete
word length information about the word joggers. A more interesting
fixation in Figure 5 is the word sweat. In this fixation
both the words sweat and pain are short enough to
be fully recognized, while beginning letter information is gathered
for and. Because
and is a high frequency
function word, this is enough information to skip this word as well.
Word length information is gathered all the way out to
angry, which becomes the
location of the next fixation.
There are two
experimental methodologies that have been critical for understanding
the fixation span: the moving window paradigm and the boundary study
paradigm. These methodologies make it possible to study readers
while they are engaged in ordinary reading. Both rely on fast eye
trackers and computers to perform clever text manipulations while a
reader is making a saccade. While making a saccade, the reader is
functionally blind. The reader will not perceive that text has
changed if the change is completed before the saccade has
finished.
Moving
Window Study
In the moving window
technique we restrict the amount of text that is visible to a
certain number of letters around the fixation point, and replace all
of the other letters on a page with the letter x. The readers task is
simply to read the page of text. Interestingly it is also possible
to do the reverse and just replace the letters at the fixation point
with the letter x, but this is very
frustrating to the reader. If just the three letters to the left and
right of the fixation point are replaced with x, then reading rate
drops to 11 words per minute. McConkie & Rayner (1975) examined
how many letters around the fixation point are needed to provide a
normal reading experience. Figure 6 shows a snapshot of what a
reader would see if they are reading a passage and fixated on the
second e in experiment. If the reader is
provided three letters past the fixation point, then they won’t see
the entire word for experiment, and their average reading rate will
be a slow 207 words per minute. If the reader is given 9 letters
past the fixation point, they will see the entire word
experiment, and part of the word
was. With 9 letters,
reading rate is moderately slowed. If the reader is given 15 letters
past the fixation point, reading speed is just as fast as if there
was no moving window present. Up to 15 letters there is a linear
relation between the number of letters that are available to the
reader and the speed of reading.
Window
Size |
Sentence |
Reading
Rate |
3
letters |
An
experimxxx xxx xxxxxxxxx xx |
207
wpm |
9
letters |
An
experiment wax xxxxxxxxx xx |
308
wpm |
15
letters |
An
experiment was condxxxxx xx |
340
wpm |
Figure
6:
Linear relationship between letters available in moving window and
reading rate.
From this study we
learned that our perceptual span is roughly 15 letters. This is
interesting as the average saccade length is 7-9 letters, or roughly
half our perceptual span. This indicates that while readers are
recognizing words closer to the fovea, we are using additional
information further out to guide our reading. It should be noted
that we’re only using information to the right of our fixation
point, and that we don’t use any letters to the left of the word
that is currently being fixated. In figure 6, where the user’s
fixation point is on the second e in experiment, if the word
An is removed, it will
not further slow reading rate.
The moving window
study demonstrates the importance of letters in reading, but is not
airtight. The word shape model of reading would also expect that
reading speed would decrease as word shape information disappears.
The word shape model would make the additional prediction that
reading would be significantly improved if information on the whole
word shape were always retained. This turns out to be false.
Figure 7 shows the
reading rate when three letters are available. It is roughly
equivalent to the reading rate when the fixated word is entirely
there. That’s true even though the entire word has 0.7 more letters
available on average. When the fixated word and the following word
are entirely available, reading rate is equivalent to when 9
letters are available. Reading rate is also equivalent when three
words or 15 letters are available. This means that reading is not
necessarily faster when entire subsequent words are available;
similar reading speeds can be found when only a few letters are
available.
Window
Size |
Sentence |
Reading
Rate |
3
letters |
An
experimxxx xxx xxxxxxxxx xx |
207
wpm |
1 word (3.7
letters) |
An
experiment xxx xxxxxxxxx xx |
212
wpm |
|
|
|
9
letters |
An
experiment wax xxxxxxxxx xx |
308
wpm |
2 words (9.6
letters) |
An
experiment was xxxxxxxxx xx |
309
wpm |
|
|
|
15
letters |
An
experiment was condxxxxx xx |
340
wpm |
3 words (15.0
letters) |
An
experiment was conducted xx |
339
wpm |
Figure
7:
Full word information does not improve reading rate.
Pollatsek & Rayner
(1982) used the moving window paradigm to compare reading when the
word spaces were present to when they are replaced with an x.
They found that saccade length is shorter when word space
information is not available.
Boundary
Study
The boundary study
(Rayner, 1975) is another innovative paradigm that eye trackers and
computers made possible. With the boundary study we can examine what
information the reader is using inside the perceptual span (15
letters), but outside of the word that is being fixated. Figure 8
illustrates what the reader sees in this kind of study. While
reading the words The old
captain, the reader will be
performing ordinary reading. When the reader reaches the word
put, the key word of
interest becomes available within the reader’s fixation span. In
this example the key word is ebovf. When the reader
saccades from put to ebovf, the saccade will
cross an invisible boundary which triggers a change in the text.
Before the saccade finishes, the text will change to the correct
text for the sentence, in this case chart. The reader will
always fixate on the correct word for the sentence.

Figure
8:
The string of letters ebovf after the boundary
changes to chart during the
saccade.
The critical word in
this study is presented in different conditions including an
identical control condition (chart), similar word shape
and some letters in common (chovt), dissimilar word
shape with some letters in common (chyft), and similar word
shape with no letters in common (ebovf). The fixation times
for the words both before and after the boundary are measured. The
fixation times before the boundary are the same for the control
condition and the three experimental conditions. After the boundary,
readers were fastest reading with the control condition
(chart), next fastest
reading with similar word shape and some letters in common
(chovt), third fastest with
the condition with only some letters in common (chyft), and slowest with
the condition with only similar word shape (ebovf). This demonstrates
that letter information is being collected within the fixation span
even when the entire word is not being recognized.
chart |
Identical word
(control) |
210ms |
chovt |
Similar word
shape
Some letters in
common |
240ms |
chyft |
Dissimilar word
shape
Some letters in
common |
280ms |
ebovf |
Similar word
shape
No letters in
common |
300ms |
Figure
9:
Relative speed of boundary study conditions
Having letters in
common played greater role in fixation times in this study. But it
does not eliminate the role of word shape because of the combination
of word shape and letters in common facilitates word recognition.
Rayner (1975) further investigated what happens with a capitalized
form of the critical word (CHART). This eliminates the
role of word shape, but retains perfect letter information. They
found that the fixation times are the same as the control condition!
This demonstrates that it is not visual information about either
word shape or even letter shape that is being retained from saccade
to saccade, but rather abstracted information about which letters
are coming up.
The eye movement
literature demonstrates that we are using letter information to
recognize words, as we are better able to read when more letters are
available to us. We combine abstracted letter information across
saccades to help facilitate word recognition, so it is letter
information that we are gathering in the periphery. And finally we
are using word space information to program the location of our next
saccade.
Evidence
for Word Shape Revisited
So far I’ve presented
evidence that supports the word recognition model, evidence that
contradicts the serial word recognition model, and eye tracking data
that contradicts the word shape model while supporting the parallel
letter recognition model. In this section I will reexamine the data
used to support the word shape model to see if it is incongruent
with the parallel letter recognition model.
The strongest evidence
for the word shape model is perhaps the word superiority effect
which showed that letters can be more accurately recognized in the
context of a word than in isolation, for example subjects are more
accurate at recognizing D in the context of
WORD than in the context
of ORWD (Reicher, 1969). This
supports word shape because subjects are able to quickly recognize
the familiar word shape, and deduce the presence of letter
information after the stimulus presentation has finished while the
nonword can only be read letter by letter. McClelland & Johnson
(1977) demonstrated that the reason for the word superiority effect
wasn’t the recognition of word shapes, but rather the existence of
regular letter combinations. Pseudowords are not words in the
English language, but have the phonetic regularity that make them
easily pronounceable. Mave and rint are two examples of
pseudowords. Because pseudowords do not have semantic content and
have not been seen previously by the subjects, they should not have
a familiar word shape. McClelland & Johnson found that letters
are recognized faster in the context of pseudowords (mave) than in the context
of nonwords (amve). This demonstrates
that the word superiority effect is caused by regular letter
combinations and not word shape.
The weakest evidence
in support of word shape is that lowercase text is read faster than
uppercase text. This is entirely a practice effect. Most readers
spend the bulk of their time reading lowercase text and are
therefore more proficient at it. When readers are forced to read
large quantities of uppercase text, their reading speed will
eventually increase to the rate of lowercase text. Even text
oriented as if you were seeing it in a mirror will quickly increase
in reading speed with practice (Kolers & Perkins,
1975).
Haber & Schindler
(1981) found that readers were twice as likely to fail to notice a
misspelling in a proofreading task when the misspelling was
consistent with word shape (tesf, 13% missed) than
when it is inconsistent with word shape (tesc, 7% missed). This is
seemingly a convincing result until you realize that word shape and
letter shape are confounded. The study compared errors that were
consistent both in word and letter shape to errors that are
inconsistent both in word and letter shape. Paap, Newsome, &
Noel (1984) determined the relative contribution of word shape and
letter shape and found that the entire effect is driven by letter
shape.
Figure 10 shows the
example word than in each of the four
permutations of same and different word shape, and same and
different letter shape. As with Haber & Schindler, subjects fail
to notice misspellings with the same word shape and same letter
shape (tban, 15% missed) far more
often than when there is a different word shape and letter shape
(tman, 10% missed). The two
in between conditions of different word shape with same letter shape
(tnan, 19% missed) and same
word shape with different letter shape (tdan, 8% missed) are
illuminating. There is a statistically reliable difference between
the larger number of proofreading errors when the letter shape is
the same (tban and tnan) than when the letter
shape is different (tdan and tman). While there is no
statistically reliable difference between conditions with same word
shape (tban and tdan) and different word
shape (tnan and tman), more errors are
missed when the word shape is different. This trend sharply
contradicts the conclusions of the earlier studies.
than |
Same word
shape |
Different word
shape |
Same
letter
shape |
tban
15%
missed |
tnan
19%
missed |
Different
letter
shape |
tdan
8%
missed |
tman
10%
missed |
Figure
10:
Word shape and letter shape contributions to proofreading
errors.
The final source of
evidence supporting the word shape model is that text written in
alternating case is read slower than either text in lowercase or
uppercase. This supports the word shape model because subjects are
able to quickly recognize the familiar pattern of a word written
entirely in lowercase or uppercase, while words written in
alternating case will have an entirely novel word shape. Adams
(1979) showed that this is not the case by examining the effect of
alternating case on words, which should have a familiar pattern when
written in lowercase or uppercase words, and pseudowords, which
should not have a familiar pattern in any form because the subjects
would never have come across that sequence of letters before. Adams
found that both words and pseudowords are equally hurt by
alternating case. Since pseudowords are also impacted by alternating
case, then the effect is not caused by word shape.
Further examination of
the evidence used to support the word shape model has demonstrated
that the case for the word shape model was not as strong as it
seemed. The word superiority effect is caused by familiar letter
sequences and not word shapes. Lowercase is faster than uppercase
because of practice. Letter shape similarities rather than word
shape similarities drive mistakes in the proofreading task. And
pseudowords also suffer from decreased reading speed with
alternating case text. All of these findings make more sense with
the parallel letter recognition model of reading than the word shape
model.
In the next section I
will describe an active area of research within the parallel letter
recognition model of reading. There are many models of reading
within parallel letter recognition, but it is beyond the scope of
this paper to discuss them all. Neural network modeling, sometimes
called connectionist modeling or parallel distributed processing,
has been particularly successful in advancing our understanding of
reading processes.
Neural
Network Modeling
In neural network
modeling we use simple, low-level mechanisms that we know to exist
in the brain in order to model complex, human behavior. Two of the
core biological principles have been known for a long time.
McCulloch & Pitts (1943, 1947) showed that neurons sum data from
other neurons. Figure 11 shows a tiny two dimensional field of
neurons (the dark triangles) and more importantly the many, many
input and output connections for each neuron. Current estimates say
that every neuron in the cerebral cortex has 4,000 synapses. Every
synapse has a baseline rate of communication between neurons and can
either increase that rate of communication to indicate an excitatory
event or decrease the rate of communication to indicate an
inhibitory event. When a neuron gets more excitatory information
than inhibitory information, it will become active. The other core
biological principle is that learning is based on the modification
of synaptic connections (Hebb, 1949). When the information coming
from a synapse is important the connection between the two neurons
will become physically stronger, and when information from a synapse
is less important the synapse will weaken or even die
off.

Figure
11: A
field of neurons and synapses in the cerebral cortex
The first well-known
neural network model of reading was McClelland & Rumelhart’s
Interactive Activation model (1981). Figure 12 diagrams how this
model works. The reader here is processing the letter
T in the first position
in a word. The flow of information here starts at the bottom where
there are visual feature detectors. The two nodes on the left are
active because they match the features of an uppercase
T, while the three
nodes on the right are not active because they don’t match. Every
node in the visual feature detector level is connected to every node
in the letter detector level. The letters seen here apply only to
the first letter of a word. The connections between the visual
feature detector level and the letter level are all either
excitatory (represented with an arrow at the end of the connection)
or inhibitory (represented with a circle at the end of the
connection). The letters A, T, and S all received some
excitatory activation from the two left feature detectors because
all three have a crossbar at the top of the letter (at least in this
font). The inhibitory connections between each of the letters will
result in the T being the most
activated letter node because it has the most incoming excitatory
activation. The letter node for T will then send
excitatory activation to all the words that start with
T and inhibitory
activation to all the other words. As word nodes gain in activation,
they will send inhibitory activation to all other words, excitatory
activation back to letter nodes from letters in the word, and
inhibitory activation to all other letter nodes. Letters in
positions other than the first are needed in order to figure out
which of the words that start with T is being
read.

Figure
12:
McClelland & Rumelhart’s Interactive Activation model: A few of
the neighbors of the node for the letter T in the first position in
a word, and their interconnections.
One of the joys of
neural network modeling is that it’s specific enough to be
programmed into a computer and tested. The interactive activation
model is able to explain human behaviors that it was not
specifically designed for. For example when a human is shown the
degraded stimulus in figure 13, it is very easy to figure out that
WORK is the degraded word,
but the computer simulation of this model can also solve this
problem.

Figure
13:
This degraded stimulus is easily read as WORK by human
readers.
The computer
simulation does not attempt to solve the visual perception problem,
but rather is told which of the visual feature detectors are on for
each letter position. For the fourth letter position the computer
simulation is told that there is a vertical line on the left, a
crossbar in the middle, and a diagonal pointing towards the bottom
right. Figures 14 and 15 show the activation levels of certain
letter and word nodes over time. Time in the computer is measured in
epochs of activation events. Figure 14 shows the early activation
equally rising for the k and r letter nodes. This is
because the visual feature information supports both of those
letters, while the d letter node is unsupported. During the
early epochs the letter nodes are only receiving activation from the
visual feature nodes, but later activation is provided by the word
nodes. Figure 15 shows the activation among four words:
work, word, weak, and wear. Since the first
three letters of the word are not degraded, the letter nodes easily
recognized them as w, o, and r for the first three
positions respectively. These letters provide early activation for
the words work and word, but not for
weak and wear. The word nodes then
start to send activation back down to the letter node level
indicating that the fourth letter could be k or d. Since
k is already an active
letter node while d is an inactive node,
the k node is further
strengthened. This allows the k letter node and the
word work to continuously
increase in activation and send inhibitory activation to their
competitors, the letter r and the word
word. Similar activation
patterns can also explain the word superiority effect.

Figure
14:
The activation level over time for letter nodes in the fourth
position of a word.

Figure
15:
The activation level over time for four word nodes.
Seidenberg &
McClelland (1989) and Plaut, McClelland, Seidenberg, & Patterson
(1996) have made great progress in developing neural network models
of reading that can account for more human reading behaviors. Both
of these models concentrate on the reading processes that start
after each of the letters in a word have been recognized. The
internal representations for these models convert the letter
information to phonemic information, which is seen as a mandatory
step for word recognition. It is well known that words that have a
consistent spelling to sound correspondence such as mint, tint, and hint are recognized faster
than words that have an inconsistent spelling to sound
correspondence such as pint (Glushko, 1979).
These models are able to generate correct word pronunciations (i.e.
read) without the use of specific word nodes. The more recent model
is also able to read pseudowords at a near human rate and account
for consistency and frequency effects.
The Seidenberg &
McClelland and Plaut et. al. models are able to simulate not only
adult reading, but can also simulate a child learning to read.
Initially the neural network model starts out with no knowledge
about the relationship between letters and pronunciations, only that
letters and sounds exist. The neural net goes through a training
phase where the network is given examples of correct pronunciations
for different words. After seeing a correct sample, the network will
calculate the error in its guess of the pronunciation, and then
modifies the strength of each of the nodes that are connected to it
so that the error will be slightly less next time. This is analogous
to what the brain does. After a few rounds of training, the model
may be able to read a few of the most high frequency, regular words.
After many rounds of training the model will be able to read not
only words it has seen before, but words it hasn’t seen before as
well.
Conclusions
Given that all the
reading research psychologists I know support some version of the
parallel letter recognition model of reading, how is it that all the
typographers I know say that we read by matching whole word shapes?
It appears to be a grand misunderstanding. The paper by Bouma that
is most frequently cited does not support a word shape model of
reading. Bouma (1973) presented words and unpronounceable letter
strings to subjects away from the fixation point and measured their
ability to name the first and last letters. He found
that:
A)
Subjects are more
successful at naming letters to the right of fixation than to the
left of fixation.
B)
When distance to the
right of the fixation point is controlled, subjects are better able
to recognize the last letter of a word than the first letter of
word. This data explains why it is that we tend to fixate just to
the left of the middle of a word.
Bouwhuis & Bouma
(1979) extended the Bouma (1973) paper by not only finding the
probability of recognizing the first and last letters of a word, but
also the middle letters. They used this data to develop a model of
word recognition based on the probability of recognizing each of the
letters within a word. They conclude that ‘word shape … might be
satisfactorily described in terms of the letters in their
positions.’ This model of word recognition clearly influenced the
McClelland & Rumelhart neural network model discussed earlier
which also used letters in their positions to probabilistically
recognize words.
Word shape is no
longer a viable model of word recognition. The bulk of scientific
evidence says that we recognize a word’s component letters, then use
that visual information to recognize a word. In addition to
perceptual information, we also use contextual information to help
recognize words during ordinary reading, but that has no bearing on
the word shape versus parallel letter recognition debate. It is
hopefully clear that the readability and legibility of a typeface
should not be evaluated on its ability to generate a good bouma
shape.
Why I wrote this
paper
I am a psychologist
who has been working for Microsoft in different capacities since
1996. In 2000 I completed my PhD in cognitive psychology from the
University of Texas at Austin studying word recognition and reading
acquisition. I joined the ClearType team in 2002 to help get a
better scientific understanding of the benefits of ClearType and
other reading technologies with the goal of achieving a great
on-screen reading experience.
During my first year
with the team I gave a series of talks on relevant psychological
topics, some of which instigated strong disagreement. At the crux of
the disagreement was that the team believed that we recognized words
by looking at the outline that goes around a whole word, while I
believed that we recognize individual letters. In my young career as
a reading psychologist I had never encountered a model of reading
that used word shape as perceptual units, and knew of no
psychologists who were working on such a model. But it turns out
that the model had a very long history that I was unfamiliar
with.
References
Adams, M.J. (1979).
Models of word recognition. Cognitive Psychology, 11,
133-176.
Bouma, H. (1973).
Visual Interference in the Parafoveal Recognition of Initial and
Final Letters of Words, Vision Research, 13,
762-782.
Bouwhuis, D. &
Bouma, H. (1979). Visual word recognition of three letter words as
derived from the recognition of the constituent letters,
Perception and Psychophysics, 25, 12-22.
Cattell, J. (1886).
The time taken up by cerebral operations. Mind, 11, 277-282,
524-538.
Fisher, D.F. (1975).
Reading and visual search. Memory and Cognition, 3,
188-196.
Glushko, R.J. (1979).
The organization and activation of orthographic knowledge in reading
aloud. Journal of Experimental Psychology: Human Perception and
Performance, 5, 674-691.
Gough, P.B. (1972).
One second of reading. In Kavanagh & Mattingly’s Language by
ear and by eye. Cambridge, MA: MIT Press.
Haber, R.N. &
Schindler, R.M. (1981). Errors in proofreading: Evidence of
syntactic control of letter processing? Journal of Experimental
Psychology: Human Perception and Performance, 7,
573-579.
Hebb, D.O. (1949).
The organization of behavior. New York: Wiley.
Mason, M. (1978). From
print to sound in mature readers as a function of reader ability and
two forms of orthographic regularity, Memory and Cognition,
6, 568-581.
Kolers, P.A. &
Perkins, D.N. (1975). Spatial and ordinal components of form
perception and literacy. Cognitive Psychology, 7,
228-267.
McClelland, J.L. &
Johnson, J.C. (1977). The role of familiar units in perception of
words and nonwords. Perception and Psychophysics, 22,
249-261.
McClelland, J.L. &
Rumelhart, D.E. (1981). An interactive activation model of context
effects in letter perception: Part 1. An account of basic findings.
Psychological Review, 88, 375–407.
McCulloch, W.S. &
Pitts, W. (1943). A logical calculus of the ideas immanent in
nervous activity. Bulletin of Mathematical Biophysics, 5,
115-133.
McConkie, G.W. & Rayner, K. (1975). The span of the
effective stimulus during a fixation in reading. Perception and
Psychophysics, 17, 578-586.
Meyer, D.E. &
Gutschera, K.D. (1975). Orthographic versus phonemic processing of
printed words. Psychonomic Society Presentation.
Monk, A.F. &
Hulme, C. (1983). Errors in proofreading: Evidence for the use of
word shape in word recognition. Memory and Cognition, 11,
16-23.
Paap, K.R., Newsome,
S.L., & Noel, R.W. (1984). Word shape’s in poor shape for the
race to the lexicon. Journal of Experimental Psychology: Human
Perception and Performance, 10, 413-428.
Pitts, W. &
McCulloch, W.S. (1947). How we know universals: the perception of
auditory and visual form. Bulletin of Mathematical Biophysics
9: 127-147.
Plaut, D.C., McClelland, J.L., Seidenberg, M.S., &
Patterson, K. (1996). Understanding normal
and impaired word reading: Computational principles in quasi-regular
domains. Psychological Review, 103, 56–115.
Pollatsek, A. &
Rayner, K. (1982). Eye movement control in reading: The role of word
boundaries. Journal of Experimental Psychology: Human Perception
and Performance, 8, 817-833.
Pollatsek, A., Well,
A.D., & Schindler, R.M. (1975). Effects of segmentation and
expectancy on matching time for words and nonwords. Journal of
Experimental Psychology: Human Perception and Performance, 1,
328-338.
Rayner, K. (1975). The
perceptual span and peripheral cues in reading. Cognitive
Psychology, 7, 65-81.
Rayner, K., McConkie, G.W., & Zola, D. (1980).
Integrating
information across eye movements. Cognitive Psychology, 12,
206-226.
Reicher, G.M. (1969).
Perceptual recognition as a function of meaningfulness of stimulus
material. Journal of Experimental Psychology, 81,
275-280.
Seidenberg, M.S.,
& McClelland, J.L. (1989). A distributed, developmental model of
word recognition and naming. Psychological Review, 96,
523–568.
Smith, F. (1969).
Familiarity of configuration vs. discriminability of features in the
visual identification of words. Psychonomic Science, 14,
261-262.
Sperling, G. (1963). A
model for visual memory tasks. Human Factors, 5,
19-31.
Woodworth, R.S.
(1938). Experimental psychology. New York; Holt.
Suggested
Readings
If you’re just looking
for a couple of papers on reading psychology. I recommend these
four:
1. Rayner, K. (1998).
Eye movements in reading and information processing: 20 years of
research. Psychological Review, 124 (3), 372-422. This paper is
an account of the eye movement field from the premier eye tracking
researcher.
2. Plaut, D.C., McClelland, J.L., Seidenberg, M.S., &
Patterson, K. (1996). Understanding normal
and impaired word reading: Computational principles in quasi-regular
domains. Psychological Review, 103, 56–115. This is
the most recent of the major neural network papers, and is available
on David Plaut’s website. http://www.cnbc.cmu.edu/~plaut/
3. Stanovich, K.E
(1986). Matthew effects in reading: Some consequences of individual
differences in the acquisition of literacy. Reading Research
Quarterly, 21, 360-407. This is one of the most cited reading
papers of all time. If you are interested in reading acquisition
this is the place to start.
4. Hoover, W.A. &
Gough, P.B. (1990). The simple view of reading. Reading &
Writing, 2(2), 127-160. This paper demonstrates that word
recognition and context are two separate skills that are both
necessary for reading.
Top
of page
|