Computerized Content Analysis to Measure Affective Tone:
An Amplification of "Sensationalism" by Comparing the Tonal Values of the
New York Times to the New York Post Using Whissell's Dictionary of Affect
Paul Crandon, Ph.D.
University of North Alabama
John J. Lombardi, Ph.D.
Frostburg State University
"Choices of words and their organization into news stories are not trivial
matters. They hold great power in setting the context for debate, defining
issues under consideration, summoning a variety of mental representations, and
providing the basic tools to discuss the issues at hand."
– Pan & Kosicki
This brief investigation marks an attempt to borrow what we believe is a
clever and useful research instrument/method and apply it to a practical
discussion in journalism and media studies. The instrument is a database
of thousands of words scale-rated in emotional dimensions, and the
discussion at hand is how to define and measure sensationalism.
The idea centers around the connotative meanings – or affective tonal
value – of words, and the idea that this tone can be measured and dealt
with statistically to compare and content analyze bodies of text.
While many words share roughly the same denotative meaning, e.g. wallet
and billfold, no two words convey the exact same affective, or connotative,
meaning; all words evoke emotional responses that are different from all
other words. Though there is some variance between respondents – not all
people react to a word in the exact same manner – scales have been
developed that offer measures of emotional tone, and they have received at
least some evidence of external validity. What if there were a corpus of
words, thousands of them, that were each rated in multiple dimensions of
emotion, along with computer programming that can render emotional measures
from any text, large or small, instantly? There is.
This instrument, called the Dictionary of Affect in Language, was developed
by Cynthia Whissell and has been used in conjunction with computerized
content analysis software to measure the affective tone of copy from a host
of sources. Although Whissell's is not the first attempt to catalogue the
affective element of large numbers of words, the DAL is the most
comprehensive and extensively used to date.
The dictionary was composed using Osgood's semantic differential
techniques to rate thousands of words in terms of three important
dimensions: the words' pleasantness (pleasant – unpleasant), activation
(active – passive), and the words' imagery (hard – easy to imagine). The
goal was to compile a reference list of the affective or emotional
"meanings" of frequently used words that could later be used to analyze
text by computer. Words were chosen for inclusion based on their frequency
of use in common spoken and written English. In the end, nearly 10,000
words were checked for spelling and included in the list. 
The usefulness of such an instrument should be quite apparent: researchers
could use Whissell's dictionary to measure the tone of large quantities of
copy instantly, comparing publications alone and to each other, and across
time. Studies could use these methods to examine the emotional tone with
which a particular issue is portrayed by different media and whether that
tone changes over time. Political speeches could be analyzed for changes
in emotion from one to the next or from speaker to speaker. One could
compare the tone of coverage from local media versus national media, for
example, or analyze coverage from a single source over the life of an
issue. Studies in public relations could look at the tonal values of an
in-house newsletter compared with mainstream media (Are newsletters more
pleasant than "real" news? Less active? Higher in imagery?). Advertising
scholars and executives alike could examine trends in the field and study
the efficacy of ads using different tonal values. Provocative questions
could be probed – How has coverage of AIDS changed in tone from the early
80s to today? Are the news media becoming more arousing in their
coverage? Less arousing? Does coverage of the War in Iraq differ in tone
from coverage of the Gulf War, the Vietnam War or other military actions?
Using the DAL
Each word in the dictionary has a decimal number rating between 1-3 for
each of the three scales of PLEASANTNESS, ACTIVITY, and IMAGERY. A body of
text can be computer analyzed and a mean rating for each dimension can be
found. For example, the word "yesterday" was rated 2.57 on the
pleasantness scale, 1.83 on the activity scale, and 1.60 on the imagery
scale. This would indicate that subjects found this word to be relatively
pleasant, not particularly active (or passive), and somewhat difficult to
imagine. In another example, this time a much more neutral word, the word
"central" scored as follows: 1.67 pleasantness, 1.67 activity, and 1.40
imagery. It is easy to see that subjects found this word to be neither
pleasant nor unpleasant, neither active nor passive, and perhaps a bit
difficult to imagine. With three separate scores for thousands of commonly
used words, one can begin to appreciate the utility of the dictionary.
In addition, Whissell has devised a method of scrutiny whereby extreme
words can be located and tabulated. Words in the extremes of these three
dimensions have been isolated and given appropriate labels. For example,
decidedly PLEASANT words include those words that rated in the 10
percentile of pleasantness of all rated words. Similarly, UNPLEASANT words
are those words that were rated in the bottom 10 percentile of this
dimension. ACTIVE words are those words rated by subjects in the top 10
percentile of the activity dimension, and PASSIVE words are words in the
bottom 10 percentile of this dimension. Finally, HIGH IMAGERY and LOW
IMAGERY words are those words that scored in the top and bottom 10
percentiles of this dimension. Thus far, six different categories of
extreme words have been "tagged" in the dictionary. Notice that in each
grouping, the line of demarcation was located at 10 percent.
Whissell has also combine two of these dimensions – pleasantness and
activity – to form four more categories of extreme words. By taking the
top and bottom quartiles of each of these, Whissell devised these new
categories: NICE words (top 25 percentile for pleasantness/bottom 25
percentile for activity), SAD words (bottom 25 percentile for both
pleasantness and activity), CHEERFUL words (top 25 percentile of both
pleasantness and activity), and NASTY words (bottom 25 percentile for
pleasantness and top 25 percent for activity). Notice that when two
dimensions are combined, the range of inclusion is broadened to 25 percent
for each scale.
Whissell's dictionary has been used in a number of unique studies, most
merely designed to test the fitness of the instrument itself. In a
stylometric study examining the song lyrics of Paul McCartney and John
Lennon, for example, Whissell was able to replicate earlier critical
studies whose findings suggest which writer was more happy, cheerful, etc.
and which was more sad or depressed. She was also able to show
quantitatively how the "mood" of the authors' lyrics changed over time,
again, in agreement with other literary and music scholars' previous
qualitative or hand-coded works. In addition, the dictionary was able to
take a sample of song lyrics and correctly identify which writer composed
it based primarily on the tone of the passage. This is an important
finding because it lends credibility to Whissell's methods and
instrumentation, and offers the DAL as a valid tool for
stylometrists. Others have used the instrument to explore a number of
issues across a host of disciplines, from measuring the emotional tone of
open-ended responses in management questionnaires to comparing the
written sexual fantasies of men and women.
Whissell herself has ventured into the realm of media studies. In one
example, Whissell and McCall found differences in the tonal values of
advertisements aimed at men and women. The authors compared the copy from
print ads in leading men's magazines such as Gentlemen's Quarterly and
Popular Mechanics to those found in women's magazines, such as Ladies' Home
Journal and Homemakers. The study found that ads directed at men were
more arousing and less pleasant than the ads aimed at women. Within
this study, a follow-up experiment revealed that women tend to rate ads as
more successful in their appeals when words higher in pleasantness are
used, while ads using words higher in arousal were rated more effective by
both men and women. This study was later extended to incorporate the third
dimension – imagery – and to include children as subjects as well.
What seems noteworthy for this discussion and for communication scholars,
is the fact that these studies have not found their way into our
journals. Read on:
In an experiment designed to investigate the emotional tone of newspaper
headlines, Fournier et al. sought to provide a testable operational
definition of "sensationalism" using the Whissell DAL. The researchers
obtained newspaper headline copy from three newspapers: the Toronto Globe
and Mail and the Wall Street Journal, and one considered to be sensational,
the Toronto Star. As an external check, the authors also included a
similar sampling of article titles from the academic journal Psychological
Reports. Results indicate that by using the DAL to analyze copy the
researchers were able to identify copy deemed sensational:
Sensationalism could be defined, in terms of the Dictionary of Affect, in
one of two ways: it could involve a high level of activity [arousal levels]
in language regardless of evaluation [pleasantness] in which case Toronto
Star headlines and Psychological Reports titles would both be classified as
"sensational". Although readers of the Toronto Star might readily agree
with this classification, authors of papers in Psychological Reports would
probably be surprised to find their material so described. An alternative
definition of sensationalism would require the relatively high usage of
active, unpleasant words. By this definition, titles in the Toronto Star
would still be classified as sensational, but those in Psychological
Reports would not.
Some irony (at least for media scholars) in this case might be found in
the fact that this was published in a psychology journal.
These examples are among a limited number of studies attempting to shed
other than anecdotal light on sensationalism. Few attempts have been made
to operationally define sensationalism, let alone quantify or measure
it. Indeed, an informal database search of 20 years of refereed journals,
using "sensationalism" as the sole subject search term with no other
limitations, yielded only 82 articles, with most of these (53) in unrelated
fields. Of those appearing within the fields of communication/media
(29 total), 18 appeared in the American Journalism Review. Only one
article attempted to clarify our conception of sensationalism beyond the
popular usage of the term: Grabe, et al., did study the packaging of TV
news and how video maneuvers (zooms in/out, shot duration, etc.) and
decorative effects (sound effects, wipes and fades, etc.) associated
strongly with what has come to be called sensational TV journalism.
How does copy taken from an arguably non-sensational newspaper differ from
copy taken from a newspaper that is arguably sensational? Can we arrive at
a better understanding of what constitutes sensationalism by examining the
features of copy from a source that has been called sensational? Can some
validation of the instrument itself be extended, should results intuitively
To address these research questions we analyzed the headlines and the lead
sentences from three weeks of front-page lead stories from both the New
York Times and the New York Post. These publications were selected because
of their historically competing approaches to news reporting. The New York
Times is often referred to as the nation's "newspaper of record" and enjoys
a long-standing reputation for high quality reporting. The New York Post,
by comparison, tends to be perceived as the "yin" to the Times' "yang,"
with less concern over journalistic integrity. In an effort to avoid
the potential for repetition of weekday stories often found in weekend
editions, only front-page stories from the Monday through Friday editions
of each publication were included in this study. Page-one lead stories
(one from each day) from February 24 through March 21, 2003 were obtained
through a LexisNexis database search, yielding four sample data sets: 15
headlines from the Times and 15 from the Post, and as many leads from each
of the two papers.
The New York Post routinely has only one major story on its front page,
attributable largely to its tabloid format. This made locating this as the
lead story effortless. The New York Times, however, obviously prints
several stories on the front page of each edition, so some judgement had to
be made to determine which would constitute the lead story. The lead was
determined to be the story located in the first column, above the fold, in
the upper left portion of each edition. Once the lead stories were
identified by a visual inspection of the paper version, the digital text
was downloaded using LexisNexis. Using this database retrieval method
eliminates the risk of using "early" or regional editions (only final
national versions appear in the database), as well as the risks of
contamination associated with OCR software.
Once the text for each story's headline and lead was obtained, separate
sample files were created and each file was checked for any spelling errors
or extraneous wording that may have been copied over from the Nexis
download. These files were loaded directly into software designed
exclusively for use with Whissell's dictionary for analysis.
All told, 15 headlines and 15 leads were examined from each of the two
publications. With respect to headlines, the New York Times tallied 164
words or 10.93 words per headline while the New York Post tallied 188 words
or 12.53 words per headline (see Table 1). Keeping in mind that the DAL
does not recognize all words in the English language it is important to
realize that 98.0% of all words in the Times' headlines and 87.5% of all
words in the Post's headlines were recognized by the DAL.
A total of 566 words or 37.73 words per sentence were found in the New York
Times leads. A total
TABLE 1 – Frequencies
NY Times NY Post
Total words 164 188
Average length (words) 10.93 12.53
Hit Rate 98.0% 87.5%
Total words 566 623
Average length (words) 37.73 41.53
Hit Rate 96.6% 94.4%
of 623 words or 41.53 words per sentence were found in the New York Post
leads. A total of 96.6% (Times) and 94.4% (Post) of all words in the leads
were analyzed by the DAL (See Table 1).
With respect to headlines, significant differences were found in
pleasantness between the two newspapers (f = 6.30, df = 195, p =
.013). While the means of the two samples suggest little differences in
the arithmetic average (1.74 versus 1.81), the higher standard deviation
seen in Post headlines implies a greater use of extreme words (regardless
of whether they fall within the 10 percentile necessary for classification
as a decidedly pleasant word). No statistical significance was found when
comparing the mean level of activation or imagery among
headlines. However, chi square analysis did reveal some significant
differences among the percentage of certain classes of extreme words used
in the headlines. For example, the Post used significantly more pleasant
words than did the Times (chi square = 9.67, df = 1, p = .002) (See Table 2).
Chi square analysis also revealed that the Post used significantly more
high imagery words in its headlines than did the Times. These words are
ones that fall into the top 10% of the imagery category (chi square = 3.81,
df = 1, p = .051). Of additional significance, it was found that the Times
used significantly more sad words than did the Post (chi square = 4.75, df
= 1, p = .029). According to the DAL, sad words are ones that score at the
bottom 10% of both activation and pleasantness (See Table 2).
With respect to the crafting of lead sentences, another t-test reveals
significant differences in their use of pleasant words (f = 13.01, df =
766, p = .000). The Post also had a significantly higher mean imagery
rating than did the Times (f = 8.22, df = 766, p = .004) (See Table 3).
Table 3 further shows that when we break down the words used in the
headlines into each of the ten extreme word classes, there are several
significant differences between the two publications. The Post
TABLE 2 – Headlines
NY Times (s) NY Post (s)
Interval Data (t-test) f df Sig.
Mean Pleasantness 1.74 (.39) 1.81 (.51) 6.30 195 .013
Mean Activation (Arousal) 1.83 (.41) 1.86 (.45) .803 195 .371
Mean Imagery 1.99 (.64) 2.10 (.71) 2.28 195 .133
Nominal data (Chi-Square) X2 df Sig.
% Pleasant 2.04 14.14 9.67 1 .002
% Unpleasant 15.31 14.14 .053 1 .818
% Active 7.14 11.11 .934 1 .334
% Passive 15.31 13.13 .191 1 .662
% High Imagery 10.20 20.20 3.81 1 .051
% Low Imagery 13.27 13.13 .001 1 .978
% Nasty (hybrid) 10.20 12.12 .182 1 .669
% Sad (hybrid) 7.14 1.01 4.75 1 .029
% Nice (hybrid) 1.02 4.04 1.82 1 .178
% Cheerful (hybrid) 7.14 9.09 .250 1 .617
TABLE 3 – Leads
NY Times (s) NY Post (s)
Interval Data (t-test) f df Sig.
Mean Pleasantness 1.81 (.34) 1.82 (.41) 13.01 766 .000
Mean Activation 1.71 (.42) 1.72 (.40) 1.33 766 .249
Mean Imagery 1.70 (.67) 1.84 (.74) 8.22 766 .004
Nominal data (Chi-Square) X2 df Sig.
% Pleasant 4.63 8.18 4.06 1 .044
% Unpleasant 4.88 8.71 4.44 1 .035
% Active 6.43 6.33 .003 1 .957
% Passive 24.16 20.32 1.64 1 .200
% High Imagery 7.46 12.66 5.78 1 .016
% Low Imagery 30.85 26.65 1.65 1 .199
% Nasty 4.88 8.97 4.99 1 .025
% Sad 4.37 4.49 .006 1 .938
% Nice 1.03 4.22 7.72 1 .005
% Cheerful 5.66 5.01 .157 1 .692
used significantly more pleasant words in its leads than did the Times (chi
square = 4.06, df = 1, p = .044). The Post used significantly more
unpleasant words, as well (chi square = 4.44, df = 1, p = .035). The same
can be said for high imagery words, where 12.66% of words used in the Post
leads were considered high imagery words, compared to only 7.46% of words
in the Times leads (chi square = 5.78, df = 1, p = .016). Post leads also
contained a higher percentage of nasty words and nice words than did the
Times (8.97% vs. 4.88% for nasty and 4.22% vs. 1.03% for nice). Both
categories proved to be significant. Nasty words were significantly higher
in the Post leads (chi square = 4.99, df = 1, p = .025) as were nice words
(chi square = 7.72, df = 1, p = .005) (see Table 3).
Apart from exploring the concept of sensationalism, the chief purpose of
this report is to introduce an innovative method of computerized content
analysis to the study of media and other communication processes. One of
our research questions was whether some validation of the instrument itself
could be established, should these results intuitively make sense. As we
discuss the findings further we believe we a strong case can be made to
answer this question in the affirmative.
Before delving too deeply into this discussion we must first preface our
remarks with a quick definition of "sensationalism." The concept of
sensationalism as it pertains to journalistic practices carries a decidedly
negative connotation. What should be considered when thinking of
sensationalism as a descriptive term is that the essence of the word
connotes an altering of the senses, regardless of evaluation. In this case
the "sense" is not touch, taste, sight, sound, or smell by itself, but is a
combination of any or all of these senses. The sense that is jogged by
sensationalism is that of the imagination.
Perhaps one of the most provocative findings presented here lay in the
counterintuitive significant differences found in the overall pleasantness
of the "sensational" copy. Indeed, the headlines from the source assumed
to be sensational were found to be significantly more pleasant in overall
tone, and were also found to incorporate significantly more decidedly
pleasant words. There were also significantly more nice words included in
the Post leads than in the Times. At first blush, this may seem to
contradict the conventional characterization of sensationalism: that
sensational journalism feeds only on the negative, that sensational
journalists represent the bottom-dwellers, reporting only the seedy
underbelly of bad news. On the other hand, could not what we might call
sensational in other contexts be significantly more pleasant? Would we not
say something is "sensational" if it appeals to senses other than the baser
ones, such as a "sensational" Broadway musical? As an exploratory study,
there was the more or less tacit agreement that if there were differences
to be found in pleasantness, they would in all likelihood be in the other
direction. That this isn't the case tells us perhaps as much about our
preconceived ideas of sensationalism as it does about what sensationalism
might really involve.
This is not to imply a full vindication of the Post, however. The Post
leads boasted significantly more sad words in their headlines, as well as
significantly more unpleasant and nasty words in their leads. It seems
that indeed sensationalism can operate in both directions. It appears the
Times may have adopted more of a keep-it-in-the-middle approach to their
reporting, while the Post seems to feel free to steer to both sides of the
road. These findings seem to support a re-tooling of some of the accepted
wisdoms surrounding sensational journalism. It seems that sensationalism
may merely be the presence of more extremes, be they nasty and unpleasant,
or nice and pleasant.
Another finding that deserves attention is that of imagery. There were
nearly twice as many high imagery words used in the Post headlines, and 70
percent more in the leads. Indeed, the overall tone of the leads under
this dimension was significantly higher for the Post, accompanied with
greater variance (the tendency to lean toward the extremes) as well. This
falls comfortably under the popular ideas of what constitutes sensational
reporting. It appears from these data that the increased use of words that
are more concrete, or easier to imagine, words helps produce what might
"feel" like sensationalism. It should be reiterated, however, that this
spike in imagery still does not suggest a direction of pleasantness, nor
does it carry any necessarily negative connotations. This simply suggests
that using higher imagery words – positive or negative – contribute to what
is popularly known as sensationalism. Does this imply that editors should
advise cub reporters to stay away from high imagery words, lest they be
accused of sensationalizing? Perhaps that question is best left as
rhetorical. But consider this: How might the stodgy stalwart journalists
of just a generation or two ago respond to the brassy new journalism of
today? Would they consider the leads we see in contemporary news coverage
to be sensational? Again, we'll choose to leave this question as rhetorical.
The extreme words that are outlined in the DAL are categorized as pleasant,
unpleasant, active, passive, high imagery, low imagery, nasty, sad, nice,
and cheerful. This research has found that the New York Post more often
than the New York Times uses higher amounts of these words. A quick review
of the results suggests that while the Times used significantly more sad
words in their headlines than did the Post, the Post used significantly
more pleasant and high imagery words in their headlines. The Post also
used significantly more pleasant, unpleasant, high imagery, nasty, and nice
words in their leads while the Times did not use any of these types of
words significantly more than the Post.
With the principal goal of this paper being to apply Whissell's dictionary
as a methodology to media studies, particularly as it might explicate
research on sensationalism, we wondered whether two arguably disparate
publications would manifest differences in affective tonal value. If a
study could demonstrate this trend, it would offer some support for the
methodological value of Whissell's Dictionary of Affect in Language. We
believe this did, and does.
Being able to measure and compare affective tonal values marks a keen
methodological advancement, and this research gives some credence to the
idea that an affective tone can be identified and measured within a text,
and compared with valid results to that of other compositional
bodies. Part of the reason for undertaking this project was to test the
efficacy of using Whissell's DAL to measure the affective elements of a
news story. Whissell's dictionary represents years of research, numerous
hours of test administration and coding, and a host of studies in search of
cross-validation and internal methodological rigor. These efforts have
produced a list nearly 10,000 words long, each with a score for three
dimensions – pleasantness, activation, and imagery – and represent the
continuation of the work started by Osgood and others a half century
ago. There is corroborating evidence that seems to validate the
instrument; it was repeatedly able to replicate by computer results that
were previous found critically or by hand. Stylistic matters relating to
tone and to word and sentence length can distinguish one author from
another. Differences in advertising copy were found using the DAL that
made good sense intuitively (that advertising aimed at men tends to be more
arousing and less pleasant).
However, one drawback to Whissell's work is perhaps the lack of attention
it has received outside its own niche; few if any scholars outside
Whissell's group have tested the efficacy of the DAL. Certainly it is
beneficial to have scholars from other areas and backgrounds such as
psychology and sociology investigate these new measures and techniques
independently. This study is one attempt to begin the process of
assimilation of this work from one discipline into another.
 Z. Pan and G. M. Kosicki, "Framing analysis: An approach to news
discourse," Political Communication, 10 (1993): 70.
 K. Sweeney and C. Whissell, "A dictionary of affect in language: I.
Establishment and preliminary validation," Perceptual and Motor Skills, 59
(1984): 695-698. C. Whissell, "Pleasure and activation revisited:
Dimensions underlying semantic responses to fifty randomly selected
'emotional' words," Perceptual and Motor Skills, 53 (1981): 871-874. C.
Whissell and K. Charuk, "A dictionary of affect in language: II. Word
inclusion and additional validation," Perceptual and Motor Skills, 61
(1985): 65-66. David Heise also constructed earlier profiles of 1,000
common English words in D.R. Heise, "Semantic differential profiles for
1000 most frequent English words," Psychological Monographs: General and
Applied, 79:8 (1965): 1-31.
 C. E. Osgood, G. J. Suci, and P. H. Tannenbaum, Measurement of Meaning
(Urbana: University of Illinois Press, 1957).
 K. Sweeney and C. M. Whissell, (1984): 696.
 C. Whissell, (1996).
 K. W. Mossholder, R. P. Settoon, S. G. Harris, and A. Armenakis,
"Measuring emotion in open-ended survey responses: an application of
textual data analysis," Journal of Management, 21:2 (Summer, 1995) 335-355.
 Stephanie L. Dubois, "Gender differences in the emotional tone of
written sexual fantasies." The Canadian Journal of Sexuality, 6:4 (Winter,
 C. Whissell and L. McCall (1997): 365
 L. Rovinelli and C. Whissell, " Emotion and style in 30-second
television advertisements targeted at men, women, boys, and girls,"
Perceptual and Motor Skills, 86:3 (1998) 1048-50.
 M. Fournier, M. Dewson, and C. Whissell, "The Dictionary of Affect in
Language: VI. 'Sensationalism' defined in terms of affective tone,"
Perceptual and Motor Skills, 63 (1986): 1073-1074.
 Ibid., 1074. This study was conducted without the benefit of the
third dimension – IMAGERY – which was later incorporated into the Dictionary.
 InfoTrac searched April 1, 2003. These groupings (followed by number
of "hits") emerged: Science/nature (7), criminal/law (7), medicine (4),
sociology/political science (10), anthropology/cultural studies (6),
theatre/film (5), literature (5), business (3), history (2), and
 M. E. Grabe, S. Zhou, and B. Barnett, "Explicating sensationalism in
television news: content and the bells and whistles of form," Journal of
Broadcasting & Electronic Media, 45:4 (Fall, 2001) 635-55.
 Prominent journalism historian Mitchell Stephens wrote an essay on
sensationalism, specifically featuring Rupert Murdoch and the New York Post
("Viewpoints: About Rupert Murdoch," Newsday, April 2, 1993, p. 52); Harry
Stein specifically contrasted the Times and the Post, calling the latter
"subtle as a Vegas floor show" when it comes to tabloid journalism ("New
York's Tabloid Treasure," Urbanities, 11:1 (2001).
 The software, simply called Whissell's Dictionary of Affect in
Language, was developed by Dr. Paul Duhamel, a former graduate student
under Dr. Whissell. Copyright 1998-2002 Human Development Consulting –
[log in to unmask]