Using Lexis/Nexis and Other Databases for Content Analysis:
Opportunities and Risks
James W. Tankard, Jr., Laura Hendrickson, and Dong-Guen Lee
Department of Journalism
The University of Texas At Austin
Austin, Texas 78712
Recently available computer information services such as
Lexis/Nexis provide new opportunities for the content analysis
researcher _ opportunities that can be accompanied by risks.
This paper explores the potential of computer databases for
research, discusses techniques for carrying out such studies, and
describes problems and unresolved questions that accompany this
research approach. The authors recommend empirical investigation
of the reliability and validity of Lexis/Nexis samples and
predict discovery of additional innovative techniques in
database-based content analysis. _ END OF ABSTRACT
Recently available computer information services such as
Lexis/Nexis provide new and significant opportunities for the
content analysis researcher. But those opportunities can be
accompanied by risks. The growing use of these databases for
mass communication research suggests the importance of examining
both their weaknesses and their strengths.
Because of these services' capacity to search full-text
stories for key words, numerous articles on a specific topic,
name, or slogan that could have been found before only by months
of library searching are now available within minutes. In
addition, Lexis/Nexis can provide counts of articles gathered
using various search terms. In certain circumstances these
counts can themselves be used as data, bypassing the need for
tedious and error-prone coding by human coders.
This paper will explore the potential of Lexis/Nexis and
related information services for content analysis research,
discuss some of the techniques and methods for carrying out such
content analysis studies, and describe some of the problems and
unresolved questions that accompany this research approach.
Past Studies Using Databases
Several completed studies have already used the research
approach of content analysis with a sample of material from an
information service (computerized database).
Miller and Stebenne used the Nexis database to examine the
amount of press coverage given to the Bush campaign and the
Dukakis campaign in the 1988 presidential election.(1) They took
advantage of the capability of the database for determining the
number of times key words, phrases, and proper nouns have
appeared in news stories. They found that the Bush campaign
received more press attention than the Dukakis campaign. They
also looked at the number of mentions various campaign slogans
had received, and found that "card-carrying member of the ACLU,"
a label that Bush applied to Dukakis, occurred the most. They
also were able to show through the database that Ann Richards'
description of Bush as "born with a silver foot in his mouth" did
not originate with Richards. In similar studies of the 1992
presidential election, Miller and Stebenne(2) and Miller and
Pavlik(3) used the Nexis database to look at numbers of mentions
of campaign managers for the various candidates, media opinion
leaders, and names and phrases from the campaign trail.
Bishop used a database called DataTimes to look at a sample
of newspaper articles for mentions of the terms public relations,
press relations, public information, government information, or
press officer.(4) He looked at three daily newspapers for a
period of one month, examining more than 16,000 stories.
Swisher and Reese used the VU/TEXT newspaper database to
study daily press coverage of smoking and health.(5) Part of
their research involved determining the numbers of mentions in
various newspapers of such phrases as "Great American Smokeout,"
"Great American Welcome," "Tobacco Institute," and "American
Content Analysis as a Method
This paper deals primarily with the application of
computerized data bases to quantitative content analysis. There
are obviously many opportunities for qualitative research being
opened up by databases, but they will not be dealt with in this
Content analysis is used here in essentially the sense in
which Bernard Berelson used it some years ago: "Content analysis
is a research technique for the objective, systematic, and
quantitative description of the manifest content of
As Wimmer and Dominick have pointed out, content analysis
can be used for the following purposes:
1. Describing communication content.
2. Testing a hypothesis about message characteristics.
3. Comparing media content to the real world.
4. Assessing images in the mass media of particular groups.
5. Establishing a starting point for media effects.(7)
It also should be acknowledged that Lexis/Nexis wasn't set
up for content analysis purposes. So all the information that
the content analysis researcher might want may not be readily
available. For instance, the typical Lexis/Nexis user, unlike
the content analysis researcher, probably isn't worried about
exactly what material is being searched. He or she just wants an
answer to a specific _ often factual _ question. But for a
content analysis, the researcher may need a better definition of
what material is being searched.
Using the Database to Count Articles
One approach to using Lexis/Nexis or some other database for
content analysis allows the researcher to take advantage of the
ability of these services to count stories in various categories.
With this approach, the researcher need not ever look at
individual stories themselves or download the text. As a
standard procedure at the end of each search, Lexis/Nexis reports
that it has found a specific number of stories.
One way to apply this technique is to use counts of numbers
of stories to do trend graphs. For instance, using Lexis/Nexis,
Tankard and Sumpter demonstrated with such a trend graph the
increase in the use of the term "spin doctor" by the media.
Their sample included 13 stories using the term in 1986, 131
stories using the term in 1988, then finally 1,553 stories using
the term in 1992.(8)
Figure 1(9), in what may reflect a similar kind of self
examining by the media, shows the growth in the use of the term
"sound bite" from 1979 to 1992.
Another innovative application is to use counts of numbers
of stories to do crosstabulations. Table 1 presents a
crosstabulation showing use of the word "community" in two
newspapers _ The Chicago Tribune and The Washington Post _ during
the 1992 presidential campaign. This use of the term "community"
may reflect greater attention to the concept, as was called for
by Bellah, Madsen, Sullivan, Swidler and Tipton.(10) These
authors express concern that in America "individualism may have
grown cancerous" to the point that it actually could be a threat
to freedom. Table 1 may be showing that the Post was more in
tune than the Tribune with this "communitarian" approach,
sometimes identified with Bill Clinton, during the 1992 campaign.
In some studies, story counts from Lexis/Nexis can also be
used as a dependent variable to measure the amount of media
coverage of an issue, event, organization, or person. For
instance, in a study of interest groups, one of the authors
wanted to measure the amount of news coverage that various
interest groups received, and used the number of stories that
mentioned them as a measure of amount of coverage.(11)
When you are doing a search that identifies more than 1,000
articles, Lexis/Nexis stops and gives you the opportunity to
terminate the search. For the purpose of getting these counts,
it is necessary to go ahead and request that it continue the
In these kinds of analyses we are essentially using
Lexis/Nexis word recognition capabilities with the full-text
search to do some sorting of stories by content, and then using
those counts as our data.
Using the Database for Conventional Content Analysis
Another approach to using Lexis/Nexis (or similar databases)
is for obtaining a sample for standard content analysis.
Lexis/Nexis can be used to identify and obtain a sample of
articles, and a researcher can turn to conventional content
analysis procedures, using human coders to do the coding. This
is a relatively straightforward procedure in which we take
advantage of the power of the computer and its ability to store
full texts of articles to draw a sample.
If we are using the Nexis information service, this kind of
"standard" content analysis might involve the following steps:
1. Choose an appropriate file.
2. Choose key words and a search strategy.
3. Draw a random sample.
4. Code with human coders.
Let's look at these steps in more detail:
1. Choose an appropriate file _ one with mass media content.
Some possibilities from the Nexis library are the following:
CURRNT _ full text articles from 650 publications since
ARCHIV _ Full text articles from 650 publications prior to
PAPERS _ full text articles from 300 newspapers and
MAJPAP _ full text articles from 17 major newspapers.
PR NEWSWIRE _ news releases.
The contents of files are described in a searchable file
2. Choose key words to identify the content being sought.
Then choose a search strategy. For instance, in a study dealing
with coverage of AIDS, some of the basic search terms might be
"AIDS" and "HIV." The search strategy will determine how
frequently and where the key words will appear in the story.
In this example, the researcher may want stories dealing
centrally with AIDS, not stories with casual or one-time
mentions. Thus, he or she might want to use a variety of search
commands to find stories dealing more centrally with AIDS.
For instance, the following command would find stories that
mentioned AIDS within 25 words of another mention of AIDS:
AIDS w/25 AIDS
The following command would find stories that mentioned AIDS
in the first paragraph:
The following command would find stories that mentioned AIDS
in the headline:
The following command would find stories that mentioned AIDS
in their abstracts (although not all stories have abstracts):
Also, you can use OR as a connector to extend the search to
AIDS or the related term HIV:
AIDS or HIV
And you can combine some of these approaches:
(AIDS or HIV) w/25 (AIDS or HIV)
In the study of coverage of interest groups mentioned above,
an author was attempting to identify stories dealing with
specific interest groups and didn't want to miss stories. The
researcher developed the following search command, so that
wherever the key word appeared_ in the story, in the headline, or
in the abstract _ Lexis/Nexis would identify it:
(organization name W/25 organization name) OR headline
(organization name) OR lead (organization name) OR abstract
This search strategy was entered for 32 different
organization names and identified a number of stories for each
one, ranging from 2,064 for Exxon Corp. to 18 for the National
Committee to Preserve Social Security.
In a study of child abuse, one researcher wanted to limit
the identified stories to a manageable number, and still wanted
the stories to have child abuse as a central theme.(12) Thus, a
variety of key words were chosen to make sure no stories were
missed, and the search commands were intended to catch stories
that contained the key word at least three times. Within 10
words of each other, the key word might appear in the same
sentence. Within 30 words, it might appear in the succeeding
sentence or paragraph, thus eliminating those stories which made
only casual references to the subject. This search strategy was
used with seven different key words, and the lists of identified
article citations were combined to draw the random sample:
child abuse w/10 child abuse w/30 child abuse
child maltreatment w/10 child maltreatment w/30 child
emotional abuse w/10 emotional abuse w/30 emotional abuse
sexual abuse w/10 sexual abuse w/30 sexual abuse and child
molestation w/10 molestation w/30 molestation and child
incest w/10 incest w/30 incest and child
child neglect w/10 child neglect w/30 child neglect
A number of search strategies dealing with the term "sound
bite" are presented in Table 2. From this table it is apparent
that different search strategies could lead to quite different
groups of stories available for further study through content
analysis. This point needs further empirical investigation _ do
these samples of varying sizes represent universes of varying
content, or not?
These search commands become essentially the
operational definitions of the kind of content being examined. A
researcher needs to present these search commands in the research
report, and defend them. The researcher should probably have a
good answer to the question of why this particular search string?
3. The searches may find more articles than it is possible
to code. In that case, the researcher may wish to examine a
random sample of the articles found. One way to do this is to
download the list of citations, print it out, and draw a random
sample from that list. The sample could be a systematic sample
or a simple random sample. Another way is to get the number of
stories that appear
with a given search command, and then randomly select a sample of
articles to be analyzed from that number. Each story is numbered
and Nexis has a command to go directly to a story number after a
search is conducted, so it becomes easy to pull out the stories
in a particular sample.
4. Then, once the researcher has a sample, he or she can
either download the articles and do the coding with hard copies
or do the coding from the computer screen if it is not desirable
to print out a large number of files. The researcher will at
this point need to be concerned about the basic matters of a
standard content analysis _ definitions of categories, coder
reliability, and so forth.
Advantages of Database-Based Content Analysis
Content analysis using computerized databases has the
1. It makes it very easy to get large _ and we hope more
representative _ samples of mass media content than the old
traditional content analysis sampling methods.
2. It processes large amounts of data very quickly. In
creating the cross tabulation on "community" we conducted a
search of The Washington Post portion of the Campaign library for
uses of the word "community" that looked at 50,569 stories. And
it did it in about two minutes.
3. It is particularly useful for searching for rare kinds
of media content. For instance, suppose a researcher were trying
to find stories on eating disorders. If you looked through
newspapers with conventional content analysis techniques you
might find one in 1,000 stories dealt with eating disorders.
With a computerized database such as Lexis/Nexis you can use the
right key words and quickly go to a sample of those stories.
Disadvantages of Database-Based Content Analysis
Content analysis using computerized databases also has
1. It is not clear what universe we are studying. We may
start with the premise that anything is better than one more
master's thesis or dissertation doing a content analysis based
solely on The New York Times. But is sampling from a database
really much better if we don't know what content the material in
the database is representating? Even if you look into GUIDE,
it's difficult to see just what is in these files. The time
periods covered are typically not listed, for instance. And
sources are added or deleted from files from time to time. And
the system isn't set up to make it easy for you to know.
At this point, we offer several points to help deal with the
issue of the representativeness of Lexis/Nexis samples.
a. With some files, such as MAJPAP, it is somewhat clear
what the sample represents. A sample taken from 17 major daily
newspapers would be a useful sample for many studies.
b. An examination of the samples used in many content
analysis studies suggests that they are often quite limited. An
examination of past issues of Journalism Quarterly would probably
turn up only a few national probability samples. Lowry has
faulted many communication studies, including content analysis
studies, for not clearly stating what kind of sample was used or
specifying the population or intended universe.(13)
c. The representativeness of Lexis/Nexis samples could be
empirically examined. For instance, one approach would be to
compare results of an examination of a Nexis sample on several
simple descriptive variables with some existing study using a
national probability sample.
2. The search procedures will sometimes identify articles
that are not really dealing with the content of interest. For
instance, a researcher interested in the communication concept of
"spin control" will also be likely to find some articles dealing
with airplane propellers. Even the use of the "w/25" technique
or other search strategies may not eliminate this problem.
3. The material stored in the typical database omits many
of the nonverbal signals that are involved in mass communication
_ the photographs, the pull quotes, the headline type style and
size, the position on the page. Nexis, however, does include
information about statistics, tables, charts, and photographs in
a GRAPHIC segment at the end of many articles.14 It probably
will be necessary to go to the original publication to see the
illustrative material, however.
4. Database searches can be expensive. Nexis charges $30 a
search for commercial uses. Database services are sometimes
available through universities at no charge or lower fees.
Lexis/Nexis offers some tremendous opportunities for the
content analysis researcher. Among its strong advantages are
fast access to a large universe of information; availability of a
wide selection of mass media material, including newspapers,
magazines, and some television programs; and some powerful tools
for probing that universe, including full-text search by key
words. This paper has presented some innovative techniques for
conducting database-based content analysis studies, including the
use of story counts for bibliometric analyses such as trend
charts and cross tabulations. Researchers should begin to
investigate empirically the reliability and validity of
Lexis/Nexis samples. As researchers use this tool more, they
will undoubtedly discover additional innovative techniques that
will make database-based content analysis even more useful.
1. Tim Miller and David Stebenne, "The Bibliometrics of
Politics," Gannett Center Journal 2, No. 4 (Fall 1988): 24-30.
2. Tim Miller and David Stebenne, "Campaign Coverage by the
Numbers," in An Uncertain Season: Reporting in the Postprimary
Period. (New York: The Freedom Forum Media Studies Center,
1992), pp. 34-39.
3. Tim Miller and John Pavlik, "Campaign Coverage by the
Numbers," in The Finish Line: Covering the Campaign's Final Days.
(New York: The Freedom Forum Media Studies Center, 1993), pp.
4. Robert L. Bishop, "What Newspapers Say About Public
Relations," Public Relations Review 14 (Summer 1988): 50-51.
5. C. Kevin Swisher and Stephen D. Reese, "The Smoking and Health
Issue in Newspapers: Influence of Regional Economics, the Tobacco
Institute, and News Objectivity," Journalism Quarterly 69, No. 4
(Winter 1992): 987-1000.
6. Bernard Berelson, Content Analysis in Communication Research
(New York: The Free Press, 1952), p. 18.
7. Roger D. Wimmer and Joseph R. Dominick, Mass Media Research:
An Introduction, 3rd ed. (Belmont, Calif.: Wadsworth, 1991), pp.
8. James W. Tankard, Jr. and Randy Sumpter, Media Awareness of
Media Manipulation: The Use of the Term "Spin Doctor", Paper
presented to the Mass Communication and Society Division,
Association for Education in Journalism and Mass Communication,
Kansas City, Mo., August 1993.
9. Figures and tables may be obtained by writing the authors.
10. Robert N. Bellah, Richard Madsen, William M. Sullivan, Ann
Swidler, and Steven M. Tipton, Habits of the Heart, (Berkeley:
University of California Press, 1985), p. vii.
11. Dong-Geun Lee, "Press Coverage of Interest Groups: News
Values as Determinants," unpublished dissertation, The
University of Texas at Austin, 1993.
12. Laura Jean Hendrickson, "Media Framing of Child Maltreatment:
Conceptualizing Framing as a Continuous Variable," unpublished
dissertation, The University of Texas at Austin, 1994.
13. Dennis T. Lowry, "Population Validity of Communication
Research: Sampling the Samples," Journalism Quarterly 56, No. 1
(Spring 1979): 62-68, 76.
14. Nancy F. Hardy, "Finding Statistics, Tables, Charts and
Pictures Using Nexis," in National Online Meeting: Proceedings of
the Ninth National Online Meeting, New York, May 10-12, 1988, pp.
Mention of the Word "Community" in Articles in the Chicago
Tribune and the Washington Post during the 1992 Presidential
Community 9% 15%
Did not mention
Community 91 85
Story counts taken from searches in the CMPGN library. The
CHTRIB file was searched first with "(Clinton or Bush)" to get
the number of campaign stories and then the search was modified
to "(Clinton or Bush) and community" to get the number of
campaign stories mentioning community. Then the WPOST file was
searched in the same way.
Number of Stories Mentioning the Phrase "Sound Bite"
Found in the MAJPAP File Using Various Search Strategies
Search Strategy Of Stories
sound bite 6,039
lead (sound bite) 903
headline (sound bite) 202
abstract (sound bite) 3
sound bite w/25 sound bite 347
sound bite w/24 sound bite 344
sound bite w/23 sound bite 342
sound bite w/22 sound bite 337
sound bite w/21 sound bite 329
sound bite w/20 sound bite 326
sound bite w/19 sound bite 315
sound bite w/18 sound bite 303
sound bite w/17 sound bite 293
sound bite w/16 sound bite 276
sound bite w/15 sound bite 250
sound bite w/14 sound bite 208
sound bite w/13 sound bite 177
sound bite w/12 sound bite 152
sound bite w/11 sound bite 135
sound bite w/10 sound bite 125
sound bite w/9 sound bite 116
sound bite w/8 sound bite 104
sound bite w/7 sound bite 91
sound bite w/6 sound bite 74
sound bite w/5 sound bite 69
sound bite w/4 sound bite 56
sound bite w/3 sound bite 38
sound bite w/2 sound bite 30
sound bite w/1 sound bite 5