AEJMC Archives

AEJMC Archives


View:

Next Message | Previous Message
Next in Topic | Previous in Topic
Next by Same Author | Previous by Same Author
Chronologically | Most Recent First
Proportional Font | Monospaced Font

Options:

Join or Leave AEJMC
Reply | Post New Message
Search Archives


Subject: AEJ 03 RoyalC CTM Comparison of Computerized and Traditional Content Analysis Techniques
From: Elliott Parker <[log in to unmask]>
Reply-To:AEJMC Conference Papers <[log in to unmask]>
Date:Sun, 21 Sep 2003 19:37:52 -0400
Content-Type:text/plain
Parts/Attachments:
Parts/Attachments

text/plain (666 lines)


Comparison of Computerized and Traditional
Content Analysis Techniques:
A Case Study of the Texas Democratic Gubernatorial Primary

Cindy Royal
Doctoral Student
The University of Texas at Austin
School of Journalism

Submitted to the Association for Education in Journalism and Mass Communication
Communication Theory and Methodology Division
[log in to unmask]
March 20, 2003

4502 Avenue C
Austin, TX 78751
512-407-8930





 Abstract:

In Spring 2002, a graduate seminar at a large southwestern university
embarked on a project to analyze the images of political candidates in the
Texas Democratic Gubernatorial Primary as found in the Austin
American-Statesman.  While part of the class used traditional coding
techniques, other students utilized the computerized content analysis tool
VBPro to analyze the same series of data.  This provided a unique
opportunity to compare and contrast strategies and results.






 Comparison of Computerized and Traditional Content Analysis Techniques:
A Case Study of the Texas Democratic Gubernatorial Primary

Introduction

Over time, computers have evolved to become an important tool in performing
academic research.  From analyzing statistical data to searching the Web
for sources to utilizing online database and subscription services to
providing the site at which research is occurring, technology can both
assist and challenge researchers.  In the area of content analysis, in
which texts are analyzed for themes or frames, computer programs are
available that can aide a researcher in identifying frequency and
co-occurrence of terms and to code large texts.
In Spring 2002, a graduate seminar at a large southwestern university
embarked on a project to analyze the images of political candidates in the
Texas Democratic Gubernatorial Primary as found in the Austin
American-Statesman.  Another class was tasked to survey voters' impressions
of the candidates.  Data from these two sources was then made available to
students to perform analyses of content and/or agenda setting effects on
selected attributes.
The majority of the class utilized traditional content analysis methods to
code the material across six major categories: party affiliation, specific
issue positions, personal qualifications and character, biographical
information, campaign conduct, and support and endorsements.  The detailed
process is described in the Methods section.
Two students volunteered to analyze the same data set by using a
computerized content analysis tool, VBPro, developed by Dr. Mark Miller of
the University of Tennessee.[1]  This program has features that can process
large amounts of text for frequency and co-occurrence of terms and provide
coded output that can be exported into a spreadsheet or statistical
analysis package like SPSS.
This paper will outline the methods used by the two groups and compare and
contrast the results.  The research questions under study are:
1.      Did the traditional and computerized approaches yield similar results in
regard to personal qualifications and character attributes?
2.      For selected issue and biographical attributes (specifically Debate,
Race, and Wealth), did the two approaches yield similar trends?
3.      In what ways did the results differ, and what impact did the defined
approaches have on these differences?
4.      What are the benefits and limitations of both methods?

 Description of Primary
In March 2002, Texans voted in the Democratic Gubernatorial Primary, which
pitted Laredo businessman Tony Sanchez against former Texas Attorney
General Dan Morales.  The presence of two Hispanic candidates made for
interesting campaign dynamics, which included lengthy negotiations
regarding holding a debate in Spanish. Hispanics are now the largest single
ethnic group in four of the Texas' five biggest cities: Houston, Dallas,
San Antonio and El Paso. One in three Texans identifies themselves as
Hispanic, thus highlighting the importance of this landmark race.[2]  News
coverage of the candidates in the Austin American-Statesman was identified
as the source of data covering the period January 1st through the day
before the primary, March 11th. Tony Sanchez was the ultimate winner of the
primary and will face the Republican incumbent Rick Perry in the
Gubernatorial election in November.  While Perry will be mentioned for
specific points of comparison, the majority of the analysis of this paper
focuses on the Democratic candidates, Sanchez and Morales.

 Literature Review
The area of literature that most applies to this study is that of the
analysis of the contents of texts. Usage of content analysis in
communications research has a long history. In 1952, Berelson offered this
definition: "(C)ontent analysis is a research technique for the objective,
systematic, and quantitative description of the manifest content of
communication."[3]  The focus on manifest content indicates that the
definition is most concerned with actual meaning rather than implied or
connoted.  Holsti, in 1969, defined content analysis as "any technique for
making inferences by objectively and systematically identifying specified
characteristics of the messages."[4] The emphasis here is on the systematic
nature of the process.  Krippendorff (1980) further highlights this
characteristic with"(c)ontent analysis is a research technique for making
replicative and valid inferences from data to their context."[5] This view
places the emphasis further on the text as "data" and the importance of
validity and reliability in the research. Finally, in 1998, Riffe, Lacy,
and Fico elaborated on the process of content analysis by describing it as
"the systematic assignment of communication content to categories according
to rules, and the analysis of the relationships involving those categories
using statistical methods."[6]
Looking at these definitions, a laundry list of desirable characteristics
of content analysis is developed:
•       Focus on manifest content rather than implied meaning
•       Objective
•       Quantitative
•       Systematic
•       Inferential
•       Replicable and Valid
•       Analysis of text as data
•       Assignment of content to categories
•       Usage of statistical methods to analyze relationships

Content analysis has been applied in a variety of areas of communication
research.  Framing is one such area.  Framing as defined by Entman is
selecting "some aspects of a perceived reality" to enhance their salience
"in such a way as to "promote a particular problem definition, causal
interpretation, moral evaluation and/or treatment
recommendation."[7]  Frames can be used as a strategy by humans to help
with processing vast amounts of information, a process of selection and
prioritization, or as Goffman relates, frames help audiences "locate,
perceive, identify, and label" the flow of information around them.[8]  But
when used by media workers in setting the context of a story, framing can
serve to promote certain values and discourage others.
Beyond framing, the concept of agenda setting looks to identify the
relationship between public opinion and media frames.  Agenda setting is a
hypothesis that the degree of emphasis placed on an issue in the news
influences the public's prioritization of such issues.[9]  In the first
level of agenda setting, the objects under study can be issues, political
candidates, institutions, or ideas that are being presented in a certain
light or priority.  Beyond looking at objects, the second level of agenda
setting deals with "the transmission of attribute salience," or the
characteristics of the objects.[10]  This reinforces Cohen's idea that the
media not only tell us what to think about, but how to think about it.[11]
Content analysis researchers often rely on theories of linguistics to
verify their results.  According to Wittgenstein, "the meaning of a word
lies in how it is used in the language, in how it is applied."[12]  The
selection of terms, frequency of their usage, and the proximity of
surrounding context have inherent meaning in texts.[13]
Content analysis and agenda setting techniques have often been applied to
media surrounding political campaigns and candidate images.  In 1981,
Weaver, Graber, McCombs, and Eyal studied the relationship between frames
present in the Chicago Tribune and Illinois voters' descriptions of the
candidates in the 1976 Presidential election.[14] In 1978, Becker and
McCombs found correlation between attributes found in Newsweek and
attributions given to candidates by New York Democrats.[15]  And in 1996,
McCombs et al studied the agenda setting effects of attributes in the 1996
general election in Spain.[16]
Miller, Andsager and Reichart looked at the images of GOP Presidential
candidates in the 1996 election.[17]  This analysis differed from those
mentioned above due to the use of computerized content analysis
techniques.  The authors employed the software program, VB Pro, developed
by Miller, to compare candidate images in press releases and news
coverage.  The program was used not only to analyze frequencies of terms,
but also to create clusters of relevant terms in the manifest content to be
used as categories.
The use of computers for content analysis has grown more commonplace over
the past 20 years.  This includes the usage of computers to identify and
access content, to create content categories, as well as to analyze the
frequency and occurrence of terms in context.
As early as 1969, Holsti provided suggestions as to when computerized
content analysis was considered useful:
1.      When the unit of analysis is the symbol or word, and analysis concerns
number of times a word is used.
2.      When analysis is extremely complex, such as using a large amount of
text, a large number of categories, or the analysis relies on finding
co-occurrence of terms in context.
3.      When the analysis involves analyzing the data in multiple ways.
4.      When the data is of basic importance to a variety of discipline and
might be used in multiple studies.

Holsti further warned of situations in which the usage of computers might
not be appropriate:
1.      When the research involves a single, expensive, specialized study.
2.      When the number of documents is large but the information is limited.
3.      When the research calls for measures of space or time.
4.      When thematic analysis is being used.[18]

While Holsti's general warnings are still important to consider before
using computerized techniques for any analysis, the propagation of computer
equipment and software has greatly reduced the price associated with
performing such studies (as noted in #1 and #2 above).
The question exists as to whether computerized content analysis can yield
comparable results over traditional coding methods.  In 1991, Nacos et al.
compared the same content analyzed by humans and computers on two data
sets, the Grenada invasion and the Three Mile Island incident.[19]  They
found correlations in one data set but not the other, thus offering
warnings as to the usage of computerized content analysis for issues of
topic complexity and the ability to categorize beyond programmed
rules.  This is similar to Holsti's warning above regarding using computers
for thematic analysis.
The following study draws on the above literature by applying an
understanding of the study of content analysis as it relates to framing and
agenda setting, but also the relevance of the meaning of language via
linguistics.  In order for computer-assisted methods to produce significant
results, the assumption must be made that frequency, selection, and
placement of words have meaning, and that meaning can be gathered via
computerized techniques.  This study has many of the attributes that Holsti
mentioned for successful application of computerized techniques,
specifically that the analysis involves usage and frequency of words,
consists of large amounts of data easily accessible in electronic format,
and that the data may be analyzed in multiple ways.  It does, however, hold
one characteristic that might challenge the usage of computerized
techniques in that the study involves the identification of what could be
considered "themes" or image attributes of candidates. The rest of this
paper will deal with comparing and contrasting the two methods to determine
if similar results were found.




 Methods
Traditional Method
As mentioned in the Description of Primary section, the data under study
included articles in the Austin American-Statesman from January 1, 2002 to
March 11, 2002, the day before the primary.  One research assistant was
assigned the task of the daily archiving of papers and selection of stories
about the primary.  These stories included mentions of both the Democratic
candidates in the primary as well as the Republican incumbent Rick
Perry.  Once these articles were selected, they were copied, bound and
distributed to the class.   The class was then instructed to read assigned
articles and identify descriptive paragraphs.  Descriptive paragraphs were
defined as paragraphs containing content that might influence a reader in
answering the question "If someone who had been away from Texas for a long
time asked you your opinion of any of the candidates, what would you say?"
The class divided the content so that at least two people were analyzing
the same paragraphs. After the descriptive paragraphs were identified and
marked by individual coders, coding pairs worked together to reach
consensus and make final descriptive paragraph identification.  These were
the paragraphs from which the content was to be analyzed.
A series of categories and attributes was derived from those identified in
the voter survey conducted by the other class.  These attributes were
tested on a sample of the data set.  The attributes and categories were
then adjusted for the appropriateness to the content.  (For example, at the
time of the survey, the "debate about the debate" was not an issue, but a
significant amount of content was dedicated to it in the text, thus
requiring the creation of new attributes to deal with this issue).  The
categories and attributes were finalized based on this analysis.
The text with descriptive paragraphs identified was redistributed to the
class so that coding of the final attributes could be performed.  Coders
were instructed to identify both the candidate and the relevant attribute
in the descriptive paragraph.  Paragraphs could have more than one
descriptive assertion. Pairs of coders worked individually first, then
together to reconcile any discrepancies.  The final frequencies per
attribute category (by candidate and paragraph) were submitted to SPSS for
further analysis.

Computerized Method
After the research assistant had manually selected the articles for the
study and the class had identified the descriptive paragraphs, two students
used Lexis-Nexis to obtain electronic versions of the same material. The
researchers could have chosen to use the search features of Lexis-Nexis to
identify the articles and other searching means to identify descriptive
paragraphs, but this strategy was decided so that both groups were roughly
working from the same data set. Of the 108 articles in the class data set,
102 were found in the Lexis-Nexis database (reason for omission is
unknown).  Of the 1151 paragraphs under study, this omission of 24
paragraphs counts for 2% of the content, and it was decided that it would
not make a material difference in the resulting trends.   Given such a
small amount of omitted content, a researcher could have decided to
manually input the information that was not initially available in
electronic format.
The computerized content analysis tool selected was VBPro, developed by
Dr.Mark  Miller of the University of Tennessee.  Miller hosts a Website at
http://excellent.com.utk.edu/~mmmiller/vbpro.html, in which one can find a
free download of the software, documentation, and articles written that
used VBPro for text analysis.  VBPro requires that texts be consolidated
into a single file with omissions highlighted in square brackets ([, ]).
Also, cases are to be identified with the "#" sign, and all other uses of
that character must be eliminated.  The Lexis-Nexis search yielded
individual files for each story that had to be consolidated into a single
file with the proper formatting.  Visual inspection of the file showed that
some names were similar to the candidates but were not the candidate under
study (i.e. mentions of Senatorial candidate Victor Morales were found in
the Morales file) and needed to be omitted. Once this file was prepared, it
was submitted to VBPro's Format feature to create a file that eliminated
the omissions of date, header, footer, graphics, and non-descriptive
paragraphs.  The formatted file could now be used by other VBPro features.
The first analysis was to alphabetize and rank the overall file.  This
procedure yielded a term list showing frequency and percentage of
occurrence of all terms in the file.  From this list, a preliminary
codebook of terms was created by eliminating common words such as "the",
"and", etc. and words that were used less than eight times in the entire
file.  Additionally, the master file was submitted to the search function
to create individual files by candidate based on the presence of their name
in a paragraph.  VBPro allows the user to identify the context as case
(article or other defined case), paragraph, or sentence.  The paragraph
context was initially questioned due to the high frequency of both
candidates being named in the same paragraph (47% for Morales and 54% for
Sanchez, but was ultimately used to maintain broader contexts in which
attributes could be tested.  A sentence context still yielded high
percentages of candidates being named in the same context (<30% for each
candidate) and lost much of the contextual nature of the text. Other
suggested approaches were to identify articles manually by candidate and
eliminate any that were ambiguous or dealt with both candidates
equally.  This will be addressed further in the discussion.
The candidate files for Sanchez and Morales were submitted to the VB Select
feature which outputs the presence of terms used disproportionately by one
file over the other.  For example, the Sanchez file showed high proportions
of the words "million", "stock", "Laredo", and "wealth".  The Morales file
found the terms "attorney general", "consumers", and "tobacco" in
disproportional numbers.  These results helped to further define the
potential codebook terms.
The next step was to group terms into categories.  Since the survey
categories had already been created and these were the same categories that
were being used by the class, it was decided that these should be the basis
for the computerized categories.  The original master file was coded for
the terms that were defined in the frequency report and Select functions
and this output was submitted to VBMap, a feature that maps clusters of
terms for their co-occurrence in a text.  The output of VBMap is a
three-dimensional map of eigenvectors, like that of a factor analysis, that
can be plotted to identify clusters of terms.  This output served to
finalize our categories and the associated terms.  The final codebook
included 24 image and issue categories and three additional categories for
candidate identifier (see Appendix A).
The new codebook was then used against each of the three candidate
files.  The categories were coded in sentence context (like the candidate
file) in order to find sentences in which both the candidate name and
category terms were found.
The class analysis not only looked at the presence of images, but also
whether those images were negative or positive.  One of the predominant
categories, Experience, was selected for an analysis of negative and
positive attributes (This attribute was selected due to the low frequencies
recorded on other categories in both the class and computer
procedures).  An additional codebook using words that connote both states
was developed using the VBMap and Select outputs (see Appendix
B).  Candidate files were searched in sentence context for each of the two
attributes, creating two files per candidate in which the negative/positive
codebook was then submitted.
It is important to note that at many times during this process, decisions
could have been made that could change the resulting outputs.  For example,
assumptions were made about unavailable content, context of searches, and
terms used in categories.  The nature of computerized content analysis
allows the researcher to quickly and easily try different options to assess
their impact.  The result of these modifications is the process described
above.  For example, an issue might be deemed to have significance in
paragraph context, if one assumes that if an issue is mentioned in a
paragraph, it will be the subject of that paragraph for the majority of
occurrences.  One would then search the files by paragraph context for the
issue category first, then in sentence context by candidate.  This same
strategy might not apply as well for personal qualifications, in which the
mention of experience or wealth might be more reflective of the subject of
a sentence or assertion rather than the entire paragraph.

Comparison of methods
In order to compare the two methods, first I compared and correlated
results of the two methods by candidate on the category of Personal
Qualifications and Characteristics by collapsing the positive and negative
attributes in the class analysis to identify if both procedures yielded
similar emphasis on the general characteristics. After assessing the
initial correlations, I identified the major image attributes and compared
the results of the Negative/Positive computer coding procedure.
Secondarily, I analyzed three issue or biographical categories, Debate,
Race, and Wealth to see if similar trends were found in both
analyses.  These categories were selected based on the frequencies reported
in both analyses and the importance to the race.
Next, I offer a qualitative discussion of any differences in results and
the nature of the process that might impact those differences.  I end with
an analysis of the strengths and limitations of each approach and suggest
recommendations for future research.

 Results
A general look at the results from both the class and computer show that
many of the anticipated Personal Qualification and Characteristic
attributes were deemed irrelevant to the analysis.  This would indicate
that there was not much difference in the media coverage of these two
candidates on personal issues.  Trends that one would expect from reading
the material emerged.  Emphasis was placed on Experience, which includes
mentions of former positions and offices held.  Table 1 compares the
weighted frequencies of categories that comprise Personal Qualifications
and Characteristics.  The figures were weighted in their respective
categories by total number of terms found by candidate. Total correlation
between the class and computer procedures was high (.76).  Individual
correlations within categories were also high, with the exception being the
presence of Perry skewing the Experience correlation (Perry is frequently
mentioned in his current role as Governor).  Removing Perry from the
Experience category showed a high correlation for Sanchez and Morales
between the two methods (>.9).  No significant difference was found in the
Experience attribute, however, between candidates in either the class or
computer data sets.  A significant difference (p>.001) was found for the
Morality category for Sanchez in the class data.  Low frequencies present
in the computer output indicate that the codebook may not have included all
terms associated with the Morality attribute.
 Table 1:  Correlation of Personal Qualifications and Character Attributes




























*Experience and Competence attributes were collapsed in class data for
comparison to computer data.
*Presence of Perry in Experience category reduced correlation.  A high
correlation was found when Perry was excluded.
 The Experience category was further analyzed to determine whether mentions
of experience were in the positive or negative contexts for both Democratic
candidates.  Results are provided in Table 2.

Table 2: Positive/Negative uses of Experience Attribute


The results from the class clearly show that Morales had more positive and
fewer negative Experience mentions than Sanchez, but the computer analysis
showed an equal distribution between candidates of 3/4 positive and 1/4
negative comments.  This is one area in which the computerized content
analysis failed to identify the same trend as the class
analysis.  Assessing negative and positive feelings is leaning heavily
toward the problematic "thematic" issues that were addressed in the
literature review.  It is possible, however, for an individual attribute,
to create a codebook of terms specific to its negative and positive
connotation (rather than the generic negative and positive terms used here)
to improve the assessment of this category.  Additionally, the analysis
could be improved by using computerized techniques to sort the data down to
the relevant attribute, and then use human coders to apply the affective
element.  By parsing down the data, this would reduce the amount of time
and potential human error spent in manual coding.
Three issue or biographical attributes were selected for further analysis
based on the frequency of their usage in the text and their importance to
the race.  Two of the three issues have to do with the fact that both
Democratic candidates were Hispanic. The debate issue is relevant due to
the controversy in the press about holding one of the debates in
Spanish.  The race issue includes mentions of personal race, affirmative
action, immigration, and racial profling.  The wealth attribute was added
due to the highly publicized personal wealth of Tony Sanchez.  The areas of
Debate, Race, and Class are analyzed in Table 3.
Table 3

.
Directional trends and proportions for each issue are consistent with the
class and computer data.  Both analyses identified these as the major
campaign issues.  Morales data had more mentions regarding the
debate.  This was expected due to the problems he had both during the
negotiations of the debates and with his performance in the debates.  For
the Race issue, Morales had a higher percentage increase than Sanchez,
again probably due to mentions of Spanish language issues in his
data.  And, as expected, the Wealth issue was much more of an issue for
Sanchez than for Morales, although mentions of wealth and money for Morales
were found in higher proportions by the class.

 Discussion
This study sought to compare and contrast two methods of content analysis,
the traditional coding method in which human coders read texts and identify
attributes and the computerized method in which VBPro was used to analyze
texts for the occurrence and frequency of terms.  The overall objective of
the project was to identify images related to individual candidates that
are presented in the media.  For most of the attributes studied, a high
correlation existed between the human coding and computerized methods.  In
identifying overall image attributes and focus on issues, the computer
program had a high correlation with the manual method.  In regard to
defining specific directions of affect toward a particular attribute,
however, the computer program was not found to produce effective results in
the manner undertaken in this study.  This is due to several factors.  The
first factor is the generic nature of the codebook used to identify
Negative and Positive terms.   As mentioned earlier, a more specific
codebook per image attribute could be created to focus on the specific
terms associated as negative or positive aspects of that attribute.  For
example, instead of using a codebook for Negative terms like no, not,
never, opposed, etc., a codebook that is specific to the category of
Lacking Experience could be created to include terms like naοve,
inexperienced, and lack.  These terms might be irrelevant or misleading
when applied to other attributes.
An additional problem in assessing affective characteristics with a
computer program is the inherent meaning of language.  Some phrases are
negative in meaning without the use of negative terms.  For example, the
phrase "he needs to get on the ball" does not include any negative words
but clearly has a negative connotation.  It would be very difficult to get
the computer program to code this phrase correctly.
Which leads to the general assumption behind computer coding.  What one
must embrace is that the computer will not code every instance of the text
the same way as the human coder, but what this analysis has shown, in many
ways, overall trends can be captured by analyzing occurrences of terms.  As
pointed out earlier, some analyses are better suited for computer-assisted
coding, and while there were many consistent trends to report, this
particular project had many of the elements that make computer coding
inappropriate.  It is also important to note the limitations in the human
process, rather than to assume that the human coded data has properly
uncovered all trends. The success or failure of the computer-assisted
process is based on the strategy employed during all steps of the process:
content selection, codebook creation, coding, and analysis.  This strategy
can have many variations.
Content Selection – In this phase, the computer has several advantages over
human methods.  If the materials are archived in an electronic database, it
is much easier to use a computer to compile your data.  This prevents
having to save clippings or visit manual archives to make copies of the
texts.  Electronic archives are designed to provide advanced searching
capabilities to help identify relevant content.  But, as indicated in this
study, all source content does not get archived in electronic
databases  (as witnessed by the omission of six of the articles the class
used).  It is important to note that a human coder could also miss some
articles of relevant content, potentially affecting the validity of the sample.
Beyond the mere selection of data, a computer can assist in parsing the
data down to a manageable size.  Searches in context can eliminate
unnecessary content, but it is important to understand what might be
included and excluded in doing a context search.  For example, in this
project, we searched paragraphs for the candidate names to make files per
candidate.  But, upon further inspection, we found a high frequency of both
candidate names in paragraphs.  Other researchers have used manual methods
to place articles in categories first, then perform the computer analysis.
[20]  For this project, such a strategy was impractical, where most
articles dealt with multiple candidates, perhaps a function of the media
grouping both candidates based on their racial similarity.
Codebook Creation – Human coders may be able to identify certain themes
more effectively than the computer.  But some computer programs, like
VBPro, provide powerful features that can help to identify frequently and
disproportionately used terms and co-occurrences of terms.  The ability to
compress series of terms into clusters through mapping techniques can
improve the effectiveness and efficiency of the study.  It is important to
understand the limitations of solely utilizing the computer in codebook
creation.  But even in a manual coding environment, an analysis of
frequency and cluster reports might bring out additional themes that went
undetected by the human.
As mentioned previously, some concepts may be very difficult to code, such
as the negative/positive attributes attempted by this study.  But, if used
to parse down data on a specific attribute and subsequently ranking
frequencies of terms and mapping clusters, trends might emerge that would
aide in assessing affect.  For many projects, this may be a stronger
strategy than developing generic affect categories.
Coding  - For certain problems, this is the most difficult phase for the
computer to emulate.  By using Holsti's criteria for appropriateness and
the suggestions of Nacos, et al., one can determine whether their project
is suited for computerized analysis.  While this project yielded high
correlations and similar results, that might not hold for studies in which
there is more diversity and complexity of thematic content.  On the other
hand, in our class coding situation, several anomalies might have also
effected the quality of the coding.  One issue entails the high proportion
of international students in this seminar for which reading meaning in
American texts might be difficult or inconsistent, particularly when it
comes to jargon or affect.  While the class did reliability coding in
pairs, the results were varied, with some pairs scoring high (>90%) and
some low (<50%).  The discrepancies were ultimately reconciled.  But the
number and complexity of categories made the coding of the texts difficult
for most coders.
Analysis – The area of analysis provides certain challenges as well.  How
does one interpret both data sets?  It is important to clearly comprehend
the methods of both processes in order to make comparisons.  It is not
likely that a computerized process will yield exactly the same measure as a
manual process, but it is possible to develop statistics that can
demonstrate similar trends. By using computerized content analysis, one can
perform multiple actions on the data to attempt to retrieve the desired
information.  But having the ability to slice the data in many ways is not
necessarily a strength.  One must define a clear set of research questions
and design the process to achieve those goals.  Challenges in analysis
exist with traditional coding techniques, in determining when to collapse
categories, what statistics are relevant for comparison, and how to present
the information.  The computer-assisted analysis can provide consistency to
the methods that might not be present in the manual
situation.  Additionally, the computer analysis may be able to provide
complementary or supporting data to aide in a qualitative study.
In summary, this project was able to yield similar results in many ways
between the computer and class methods.  Strengths of the computer method
included the ability to analyze large amounts of text and to use the
program to aide in defining the codebook through terms present in the
manifest content.  The strengths of the traditional method include the
unique ability for humans to read meaning by assessing terms with multiple
definitions or understanding in context.  What is necessary is a mixed
approach to content analysis, one in which the strengths of both methods
can be incorporated.  In the case of this study, it would have been
possible to use computer methods to select texts, use frequency rankings
and mapping to assist in developing the codebook, search for key terms, and
code to identify important issues.  The human coders could then focus on
sets of texts that deal with specific attributes, thus applying their
unique human judgment to the areas of theme and affect.  Finally, the
analysis of the results could be performed on a quantitative level by the
computer and on a qualitative level by humans.  Application of computerized
techniques in an agenda setting study might provide insight into potential
categories to survey, as would survey data provide information as to which
categories to analyze in texts.  This indicates that the process of text
analysis is not necessarily linear, but rather mixed and cyclical,
requiring different emphases, directions, and methods at different
points.  The communication researcher must always apply astute judgment to
the selection of all tools, both human and computer, employed in analysis.
Further studies should seek to compare computerized and manual content
analysis methods and experiment with the mixed, cyclical process mentioned
above.  Additionally, a large market exists for the creation of improved
content analysis tools that would have more sophisticated libraries and
linguistic capabilities.  Research into the development of such tools could
provide further insight on the usage of computerized techniques in
communication research.




 Endnotes
[1]  M. Mark Miller, User's Guide for VBPro: A Program for Qualitative and
Quantitative Analysis of Verbatim Text (Knoxville, TN: M. Mark Miller, 1993).
[2]  Guillermo X. Garcia, "Texas Surpasses New York as Second Most Populous
State," Elpasotimes.com,
http://www.borderlandnews.com/Census/texasstory.html, statistics quoted
from U.S.  Census, accessed on 5/2/02.
[3]  B.R.Berelson, Content Analysis in Communication Research, The Free
Press, New York, 1952.
[4]  O. R. Holsti, Content Analysis for the Social Sciences and Humanities,
Addison-Wesley, Reading, MA, 1969.
[5]  K. Krippendorff, Content Analysis: An Introduction to its Methodology,
Sage, Beverly Hills, CA, 1980.
[6]  D.  Riffe, S. Lacy, F.G. Fico, Analyzing Media Messages:  Using
Quantitative Content Analysis in Research, Lawrence Erlbaum Associates, New
York, 1998.
[7]  R. Entman, Framing: Toward Clarification of a Fractured Paradigm,
Journal of Communication, 43(4), 1993, p.53.
[8]  E. Goffman, Frame Analysis: An Essay on the Organization of
Experience, Harper & Row, New York, 1974.
[9]  M. McCombs and T. Bell, The Agenda Setting Role of Mass Communication,
In M. Salwen & D. Stacks (Eds.), An Integrated Approach to Communication
Theory and Research, Erlbaum, Mahwah, NJ, pp. 93-110.
[10]  M.  McCombs, E.  Lopez-Escobar, & J.P. Llamas, "Setting the Agenda of
Attributes in the 1996 Spanish General Election," Journal of Communication,
Spring 2000, p. 78.
[11]  B. Cohen, The Press and Foreign Policy, Princeton Univeristy Press,
Princeton, NJ, 1963.
[12]  G. Brand, The Essential Wittgenstein, Basic Books, New York, 1979.
[13]  R.A. Lind & c. Salo, "The Framing of Feminists and Feminism in News
and Public Affairs Progams in U.S. Electronic Media," Journal of
Communication, March 2002.
[14]  D. Weaver, D. Graber, M.  McCombs, & C.  Eyal, Media Agenda Setting
in a Presidential Election:  Issues, Images, and Interest, Praeger, New
York, 1981.
[15]  L. Becker & M. McCombs, The Role of the Press in Determining Voter
Reaction to Presidential Primaries, Human Comunication Research, 4,
301-307, 1978.
[16]  M.  McCombs, E.  Lopez-Escobar, & J.P. Llamas, 2000.
[17]  M.  Miller, J. Andsager, B. Reichart, "Framing the Candidates in
Presidential Primaries:  Issues and Images in Press Releases and News
Coverage," Journalism and Mass Communication Quarterly, Summer 1998.
[18]  O. R. Holsti, Content Analysis for the Social Sciences and
Humanities, Addison-Wesley, Reading, MA, 1969.
[19]  B.L. Nacos, R.Y. Shapiro, J.T. Young, D.P. Fan, T. Kjellstrand, & C,
McCaa, "Content Analysis of New Reports:  Comparing Human Coding and a
Computer-assisted Method," Communication (12), 1991.
[20]  M.  Miller, J. Andsager, B. Reichart, Summer 1998.

Back to: Top of Message | Previous Page | Main AEJMC Page

Permalink



LIST.MSU.EDU

CataList Email List Search Powered by the LISTSERV Email List Manager