Baseball Box Scores in the Newspaper: Helpful Statistics
or Sports Hieroglyphics?
Numerous researchers and writers for the popular press have examined the
readability of newspaper copy and have pointed to difficult copy as a reason why
readership continues to decline. The decline in readership has prompted changes
in style and content at newspapers across the country (Gillman, 1994).
Readability is a concern to on-line or "real-time" journalists as well, and
research has shown no significant difference between newspaper articles and
real-time articles (Aronson and Sylvie-Todd, 1996). Both forms are generally
deemed more difficult than standard reading.
In that mass media outlets attempt to reach large heterogeneous audiences, the
managers of those institutions typically strive to present as few challenges to
easy communication as possible (Nolan, 1991). Fusaro and Conover (1983) note
that reader satisfaction will plummet if stories are too difficult to read. But
the readability of articles is a continuing concern. Though some writers
consider "readability" a code word for watering down their work, major newspaper
chains have wondered for years about a literacy problem in the United States and
whether newspapers are too difficult to read for many potential customers
According to a research report from the American Society of Newspaper Editors'
Journalism Values Institute, some people turn away from newspapers because they
consider them dense, irrelevant, boring, and written in the often confusing
technical language of experts (Hart, 1997). In the five decades since Flesch
(1949) developed the first readability scale, numerous academic studies have
documented ponderous leads (Cantalo, 1990), long, rambling sentences (Stapler,
1985), and readability scores consistently as high as 16 (Danielson and Bryan,
1964; Hoskins, 1973; Wanta and Gao, 1994). A news story with a readability
score of 16 would require the reader to have completed college to fully
comprehend the story. Difficult articles are found in both major metropolitan
newspapers and smaller dailies (Gillman, 1994). Some argue that by writing
articles in the "difficult-to-read" range, newspapers lose their effectiveness
(Burgoon, Burgoon, and Wilsinson, 1981) and isolate large segments of the
population (Smith, 1983).
Readability experts say that most people prefer their leisure reading to be
about three grade levels below their educational level (Hart, 1993) and
newspaper reading is something that most people engage in because they want to,
not because they have to. The average American adult has completed one year of
college, and the median educational attainment in the United States is 12.9.
Half of the American population has a high school diploma or less (U.S.
Department of Commerce, 1997).
The story-telling style of sports reporting has been characterized as more
free-wheeling than news writing (Gillman, 1994) and some think the popularity of
the sports section has slowed the atrophy of newspaper readership (Rambo, 1989).
Other have found that dense prose dominates all sections of the newspaper,
including sports (Stempel, 1981). One study documented the readability scores
of sports articles at 18.7 (McAdams, 1992). The reading level required for an
article scoring 18.7 is equivalent to that of a second-year doctoral student.
In addition to articles that are sometimes difficult to read, the sports section
of the newspaper contains one ore more pages of sports minutiae that might be
compared to financial tables in another section of the newspaper - "a
never-never land filled with arcane codes and hidden formulas that are not
designed for mere mortals to comprehend or act upon" (Frailey, 1990, p. 83).
The purpose of this research is to examine readers' ability to decipher sports
statistics found in newspapers, specifically, the baseball box score. The box
score is the basic chart used to give statistics for a game, and includes
sections for pitching, hitting, and fielding. It can include as many as 12
player designations and more than 30 other items.
The author of a popular sports writing text notes that if statistics are
reported in a consistent manner day after day, readers learn the code (Garrison,
1993). However, although some publications have dropped box scores from their
pages in lieu of more analysis and opinion (Moore, 1992; Elliott, 1992) others
have continued to add categories and data, leaving only a complete play-by-play
lacking (W's, L's, RBIs. . . , 1990). The box score is now nearly as detailed
as a Melville novel (Lopez, 1997). Baseball has the most statistics of any
sport reported and newspapers are continually coming up with new ones (Feinberg,
1992), putting readers in the position of having to learn the new abbreviations.
Some of the new statistics are incredibly minute, particularly to the casual fan
(Kazuba, 1997). There is anecdotal evidence to suggest that those for whom
English is not the native language have particular difficulty. Immigrant
Chinese who the New York Daily News tries to attract as readers call baseball
box scores the most mystifying thing about American culture (Lopez, 1997).
Baseball is a sport well known for its accumulation of data and its aficionados
are noted for their thirst for numbers and sometimes curious ways of
interpreting the data (Casella and Berger, 1994). American sports reporting is
heavily quantitative and a degree of spectator enjoyment is derived from being
able to interpret the wealth of statistics that have evolved to describe the
game (Guthrie, 1994). However, there is some disagreement about the importance
of including so many statistics in the sports section of the newspaper. Some
writers adhere to the adage that readers want sports pages filled with
statistics, noting that some serious sports fans turn to the agate page to get
their statistical fix for the day before reading stories about games (Feinberg,
1992). Others write that some readers feel a void if unable to read box scores
in the morning paper (Vescey, 1994) and find them as essential as a morning cup
of coffee (Lopez, 1997). But others equate reading through a plethora of stats
with reading an algebra textbook, and doubt that would be appealing to most
print media consumers (Gleisser, 1994).
Though most American newspapers devote a large amount of space to sports
statistics, they offer only infrequent help to readers interpret the data - an
occasional article explaining all the abbreviations and notations (Jones, 1986;
How to read a box score, 1998).1 The man credited with inventing the box score
in the early 1850s, Henry Chadwick (Kazuba, 1997), might barely recognize its
"mutant descendant" (Lopez, 1997).
The literature leads the investigators to the following research questions:
R1: What is the level of ability of average baseball fans (and potential
newspaper readers) to recognize and understand the abbreviations and notations
in a modern-day expanded box score?
R2: Which demographic factors, if any, affect that ability?
In the summer of 1998, researchers distributed questionnaires at two Major
League Baseball games in South Florida. Student volunteers asked fans attending
the games to fill out the questionnaires. The first stadium visit netted 153
usable questionnaires; 154 fans filled out questionnaires during the
researchers' second visit, for a total of 307 responses.
The questionnaires contained two sample box scores taken from a daily
national newspaper. The box scores were chosen because between them they
contained all the categories utilized by the publication on a daily basis. The
respondents were asked to identify 30 abbreviations and notations used in the
box scores. The first set of fans received no assistance. On the second
stadium visit, the researchers supplied fans with a guide of where to find items
within the box scores. For example, the guide noted that the abbreviation ab
could be found in column 1, row 3. No other assistance was offered to the
second group of respondents.
The questionnaires also contained several items to ascertain demographic
information about the respondents, including age, sex, whether English was a
respondent's native language, and whether a respondent played sports in high
school. Fans were also asked to rate their level of understanding of baseball
terminology, to indicate how readable they consider box scores to be, to
indicate the degree to which they consider themselves sports fans, and to state
how many box scores they read on an average day. All self- ratings were on a 10
The fans who completed questionnaires ranged in age from eight to 73. The
median age was 31.1 years. Most of the respondents (73.9%) were males. Fifty
(16.3%) indicated that English is not their native language. Nearly seven in 10
(68.7%) played high school sports; 43.6 percent played either baseball or
softball in high school. A significant majority (76.5%) indicated that they
understand baseball terminology quite well (eight or higher on a 10-point scale)
and nearly as many (70.7%) categorized themselves as avid sports fans.
Nearly a third of the respondents (30.9%) reported reading all or nearly all of
the box scores in the newspaper each day. More than a third (35.9%) indicated
that they read five or fewer box score on a given day and 22.8 percent said they
do not read box scores at all. Many (40.1%) of the respondents indicated that
they taught themselves to read box scores, with another 30.3% being taught by
their fathers. The rest were taught by a coach, another family member, or a
friend. Many (37.4%) indicated that they consider box scores easy to read, but
nearly as many (35.2%) said that box scores are very difficult to decipher.
The respondents exhibited a wide range of knowledge of the items contained in
the sample box scores. Of the 307 fans who completed questionnaires, none
correctly identified all 30 abbreviations/notations and only one scored 29.
Twenty correctly identified 28 of the items and another 20 were correct on 27.
More than four in 10 (44.6%) were correct on 21 or more of the items, with 29.6%
answering correctly on 10 or fewer. Table 1 summarizes the number of correct
Insert Table1 about here
Many of the individual items contained in the box scores were known to most of
the respondents, but several of the abbreviations were problematic for most of
the fans. Table 2 shows the percentage of respondents correctly identifying
each of the listed items in the sample box scores.
Insert table two about here
A number of variables affected how well respondents scored on the identification
portion of the questionnaire. Males (M = 18.4) scored significantly higher than
females (M = 10.7) F(1,296) = 45.92, p < .001. Language was also a significant
variable, as the 50 respondents who indicated that English is not their native
language had more difficulty identifying the items in the box scores (M = 13.4)
than did those for whom English is the first language (M = 17.1) F(1,305) =
7.21, p < .009. Those who played sports in high school (M = 18.02) scored
significantly higher than those who did not engage in scholastic competition (M
= 12.47) F (1,284) = 6.09, p < .02. The age of the respondents had no
There was a significant correlation between the self-ratings on two scales and
the scores achieved. Those who reported that they understand sports terminology
well tended to score well, Adj. R2 = .59, p < .001, as did those who
characterized themselves as avid sports fans Adj. R2 = .42, p < .001. Regarding
the relative difficulty of reading baseball box scores, there was no significant
difference between those who report understanding terminology and those who do
not, nor between those who describe themselves as avid fans and those who do
The session during which a respondent took part in the study was also highly
significant as those for whom a guide was provided scored higher (M = 18.9) than
those in the first group, M = 14.4) who were not told where to locate particular
items F(1,306) = 20.41, p < .001. Both males and females in the second session
scored higher than their counterparts in the first session, with females
benefiting the most from having the guide available. Table 3 shows that males in
the latter session outscored those in the first session by more than three and a
half points. The average score for females in the second session was more than
four and a half points higher than for females in session one.
Insert table 3 about here
In answering open-ended questions, some respondents indicated that they do not
read box scores because they get all the information they require from
television of the Internet, or simply do not have the time to read box scores.
General thoughts from fans about box scores are that they are confusing, too
complex, and boring. Several fans suggested including explanations of the
abbreviations in the form of a legend.
The majority of respondents to this survey labeled themselves as avid sports
fans, think they understand baseball terminology well, and read at least a few
box scores each day. Essentially, it was a sample of knowledgeable people. Yet
more than half of the respondents said box score were moderately to very
difficult to read. The readability ratings assigned to box scores by the
respondents was nor correlated with the scores achieved, nor was level of
fandom. Many of those who scored well and many of those who classified
themselves as avid fans said it is difficult to decipher baseball box scores.
During the study, researchers observed some fans spending as much as 45 minutes
trying to figure out the meaning of certain items, perhaps trying to prove to
themselves or to someone else that they are as knowledgeable about baseball as
they claim to be. We suspect that most readers will not spend 45 seconds trying
to decipher something they do not understand.
Some people simply cannot crack the code, and even avid baseball fans have
difficulty with some of the abbreviations and notations, particularly those that
are recent additions. The items in the sample box score that seemed most
problematic for the widest range of fans were items that have not been used as
for as long as have items such as ERA or 1B. That no one correctly identified
all of the items listed in the sample box scores indicates that there seems to
be a learning curve involved when a new item is introduced, even for those most
knowledgeable about the sport. The results also indicate that those who have
"grown up around" sports have a decided advantage over those who are newer fans.
This finding is consistent with earlier readability research, which showed that
familiarity with a subject leads people to rate articles about that subject in
the easy-to-read range.
In modern sports, (and media coverage of it) there seems to be a statistic for
everything: A player's performance on grass vs. Astroturf, in domes, on
particular days of the week, against left-handed middle relievers. There
certainly is nothing wrong with including any and all of that information in the
newspaper, if readers know what they are looking at.
One reason behind including all the detail might be fantasy baseball leagues
that allow fans to construct their own teams "on paper" and pit them against one
another, at times with wagering involved. But by continually adding information
to box scores, newspapers might be catering to the needs of a few at expense of
attracting the many. However, it is possible to serve fantasy leaguers, avid
fans, and novices alike. This study showed that offering the slightest bit of
direction to help fans decipher the box scores markedly improved their scores.
Merely pointing out where an abbreviation could be found offered enough context
for fans to be able to figure out some of the more difficult items.
Based on these findings, we suggest that newspapers include a legend on the
agate page. It would not be necessary to define items or tell readers how to
calculate slugging percentage, but the legend could indicate what the
abbreviations stand for. This might be helpful for more than the novice fan.
The web master at the Tallahassee Democrat who created the page on how to read a
box score notes that many fans are sheepish about admitting that they really do
not know what a particular abbreviation stands for and appreciate the assistance
(L.K. Mirrer, personal conversation, January 5, 1999). We suspect that many
readers would react similarly to the inclusion of a legend on the agate page -
even those readers who would be reluctant to admit needing to refer to it.
1 At least one publication, the Tallahassee (Fla.) Democrat, includes a section
on its web page that explains the abbreviations contained in a box score and
describes how some statistics are figured.
Aronson, Karla. 1996). Real-time journalism: Implications for news writing.
Newspaper Research Journal, 17(3-4), pp. 53-67.
Burgoon, J., Burgoon, M., and Wilkinson, M. (1981). Writing style as a
predictor of newspaper readership, satisfaction and image. Journalism
Quarterly, 58, pp. 225-231.
Casella, George and Berger, Roger. (1994). Estimation with selected binomial
information or do you really believe that Dave Winfield is batting .471?
Journal of the American Statistical Association, 89(427), pp. 1080-1090.
Danielson, W. and Bryan, S. (1964). Readability of wire stories in eight news
categories. Journalism Quarterly, 41, pp. 105-106.
Elliot, Stuart. (1992, February 24). In sporting news, the box scores lose.
New York Times, p. C8.
Feinberg, Jeremy. (1992). Reading the sports page: A guide to understanding
sports statistics. New York: Macmillian.
Flesch, R. (1949). The Art or Readable Writing. New York: Harper and Row.
Frailey, Fred. (1990, March.) How to decode the financial pages. Changing
Times, 44(3), pp. 83-86.
Fusaro, J., and Conover, W. (1983). Readability of two tabloid and two
non-tabloid newspapers. Journalism Quarterly, 60, pp. 142-144.
Garrison, Bruce. (1993). Sports Reporting. Ames, Iowa: Iowa State University
Gillman, Timothy. (1994). The problem of long leads in news and sports
stories. Newspaper Research Journal, 15(4), pp. 29-39.
Gleisser, Ben. (1994, August). How to hit a home run in sports reporting.
Writer's Digest, 78(8), pp. 38-40.
Guthrie, Don. (1994). Statistics in sports. Journal of the American
Statistical Association, 89(427), pp. 1064-1065.
Hart, Jack. (1997, April 19). Why worry about words? Editor & Publisher, p.
Hart, Jack. (1993, November 6). Writing to be read. Editor & Publisher, p. 5.
Haynes, Steve. (1989, May-June). See spot write. Columbia Journalism Review,
28(1), p. 17.
Hoskins R. (1973). A readability study of AP and UPI wire copy. Journalism
Quarterly, 50, pp. 360-363.
How to read a box score. (1998, March 31). St. Petersburg (FL) Times, p. 16H.
Jones, Del. (1986). Baseball box tells a lot. El Paso Times, p. 1C.
Kaszuba, Dave. (1997, June 14). Inventor of the box score. Editor &
Publisher, pp. 14, 15, 43.
Lopez, Steve. (1997, April 21). Agate addiction. Sports Illustrated, 86(16)
McAdams, K. (1992). Readability reconsidered: A study of reader reactions to
fog indexes. Newspaper Research Journal, 13, 14(4,1), pp. 50-59.
Moore, Rob. (1992, April 16). Readers cry "foul" as Sporting News drops box
scores. St. Louis Business Journal, 12(29) p. 9A.
Nolan, Jack. (1991). Effects of cuing familiar and unfamiliar acronyms in
newspaper stories, an experiment. Journalism Quarterly, 68(1-2), p. 188-194.
Smith, R. (1983). How consistently do readability tests measure the difficulty
of newswriting? Newspaper Research Journal, 5, pp. 1-8.
Stapler, H. (1985). The one sentence/long sentence habits of writing leads and
how it hurts readership. Newspaper Research Journal, 7, pp. 17-27.
Stempel, G. (1981, fall). Readability of six kinds of content in newspapers.
Newspaper Research Journal, pp. 32-37.
U.S. Department of Commerce, Bureau of the Census. (1997). Highest level of
education attained by persons 25 years and older: March, 1996.
Vecsey, George. (1992, November 4.) Antidote for reality: Box scores. New
York Times, p. B9.
W's, L's, RBIs, even the wind. (1990, April 30). Newsweek, 115(18), p. 67A.
Wanta, W., and Gao, D. (1994). Young readers and the newspaper: Information
recall and perceived enjoyment, readability, and attractiveness. Journalism
Quarterly, 71(4), pp. 926-935.
Number of Items Correctly Identified
Identification of Box Score Items
Item Correct response % answering correctly
ab at bats 92.2
e error 85.7
so strike outs 80.8
ERA earned run average 79.2
ph pinch hitter 76.2
lob left on base 73.3
DP double play 69.1
ip innings pitched 68.4
er earned runs 68.1
36 (Anaheim) total at bats for team 67.1
2B double 66.8
bi (runs) batted in 65.8
HBP hit by pitch 65.8
X home team did not bat in the 61.9
bottom of the ninth inning
Att. attendance 59.9
SF sacrifice fly 59.0
CS caught stealing 55.4
W, 3-1 (Chicago) pitcher who got the win, his 50.5
T time (length) of contest 48.5
GIDP grounded into double play 45.0
nickname, visitors White Sox 44.3
RBI: Baines (7) his seventh run batted in for season 41.7
lo left on 41.4
IBB intentional base on balls 41.0
WP wild pitch 33.2
Alicea (5) lo Alicea responsible for leaving 30.9
five runners on base
bf batters faced 29.6
S (batting) sacrifice 26.4
balls by Stottlemyre total pitches minus strikes, 48 20.5
H (pitching) hold by middle reliever 1.3
Scores for Males and Females by Session
Sex Session Mean N
Males One 16.5 107
Males Two 20.1 120
Females One 9.0 43
Females Two 13.6 27
A survey of fans in attendance at two Major League baseball games reveals that
many fans experience difficulty when trying to decipher box scores in the
newspaper. Several factors affected the ability to read box scores, including
having played sports in high school, the sex of the respondent, and whether
English was the person's native language. The authors suggest that publications
include a legend on the agate page that explains the meanings of the