Up: 011_data-and-disc-lect Prev: week12-discourse-analysis-in-philology-and-social-sciences Next: week14-corpus-approaches-to-discourse-2

Video Lecture Notes:

Guest Lecture:  Corpus Approaches to Risk Prof. Dr. Jens Zinn, University of Melbourne

This lecture introduces corpus linguistics as a method for analyzing discourse, emphasizing its role in uncovering patterns through statistical exploration and contextual analysis. The speaker uses the analogy of exploring a medieval city (Passau) to explain the dual need for macro-level structural analysis (satellite view) and micro-level contextual examination (on-the-ground observation) in discourse studies. Below is a detailed breakdown of the key concepts and methodologies:


1. The Analogy of Passau: Macro vs. Micro Analysis

  • Macro-Level (Satellite View):
    • Involves statistical exploration to identify broad patterns (e.g., word frequencies, collocations).
    • Example: Analyzing the “street network” of discourse through tools like keyword analysis.
  • Micro-Level (Ground View):
    • Requires close reading of texts to understand context, nuance, and function.
    • Example: Examining specific phrases or metaphors in a political speech.
  • Integration: Effective discourse analysis combines both approaches, akin to understanding Passau’s layout by observing its streets and landmarks while also viewing its topography from afar.

2. Defining a Corpus

A corpus is a machine-readable collection of authentic texts produced in natural language contexts (not elicited for research). Corpora are categorized into:

  • Specialized Corpora: Focused on a specific topic, genre, or discourse (e.g., climate change debates).
  • Reference Corpora: Broadly representative of a language/variety (e.g., the British National Corpus).
  • Balanced Corpora: Include diverse genres, media, and topics to approximate linguistic diversity.
  • Large Versatile Corpora: Massive collections (e.g., web-crawled texts) that allow user-defined sub-corpus creation.

Key Challenge: Representativeness

  • Corpora aim to reflect a “population” of texts (e.g., all English tweets about climate change), but this population is often undefined.
  • Three Solutions:
    1. Large Versatile Corpora: Maximize volume but lack structure.
    2. Balanced Corpora: Strive for genre/media diversity.
    3. Specialized Corpora: Target specific discourses (optimal for focused studies).

3. Limitations of Corpora

  • No Inherent Meaning: Corpora contain textual data, not pre-packaged meanings. Analysts must reconstruct meaning through:
    • Collocation Analysis: Identifying words that frequently co-occur (e.g., “climate” + “crisis”).
    • Keyword Analysis: Comparing word frequencies against a reference corpus to pinpoint discourse-specific terms.
  • Example: In climate change discourse, terms like “mitigation” or “carbon footprint” gain significance through their statistical prominence and contextual usage.

4. Methodologies in Corpus Linguistics

a. John Rupert Firth’s Principle
  • “You shall know a word by the company it keeps”: A word’s meaning is derived from its linguistic environment (collocations) and situational context.
  • Application: Analyzing the word “mitigation” in climate discourse reveals associations like “adoption of mitigation strategies” or “cost-effective mitigation.
b. Layers of Social Discourse in Corpora
  1. Linguistic Expressions: Surface-level text (e.g., “climate emergency”).
  2. Concepts: Reconstructed meanings (e.g., “global warming” as a threat).
  3. Linguistic Practices: Actions performed through language (e.g., warnings, denials).
  4. Social Practices: Real-world actions linked to discourse (e.g., policy changes). Example:
  • Expression: “Net-zero emissions.”
  • Concept: A target for balancing greenhouse gases.
  • Linguistic Practice: Advocating for legislative action.
  • Social Practice: Governments enacting carbon taxes.
c. Practical Tools: Corpus Workbench (CWB)
  • Corpus Query Language (CQL): A syntax for searching corpora (e.g., [lemma="mitigation"%c] to find all forms of “mitigation”).
  • Concordance Analysis: Displays search results in context (e.g., “mitigation” paired with “strategies” or “challenges”).
  • Collocation Analysis: Measures statistically significant word pairs (e.g., “mitigation” + “adoption” with a log-likelihood score of 138.773). Step-by-Step Example:
  1. Query: Search for the lemma “mitigation” in a climate corpus.
  2. Concordance: Review all instances of “mitigation” in context.
  3. Collocation: Identify words like “adaptation” or “funding” that frequently co-occur with “mitigation.”
  4. Keyword Analysis: Compare term frequencies against a reference corpus (e.g., general English) to isolate climate-specific vocabulary.

5. Interdisciplinarity and Validity

  • Interdisciplinary Collaboration: Discourse analysis bridges linguistics with sociology, political science, and environmental studies.
    • Example: Linking “climate denial” rhetoric to lobbying practices.
  • Validity Radius: Researchers must define the scope of their corpus (e.g., “This study covers U.S. media from 2010–2020”) to avoid overgeneralization.

6. Case Study: Climate Change Discourse

  • Expressions: “Climate crisis,” “global warming,” “carbon neutrality.”
  • Concepts: Environmental urgency, anthropogenic impact, sustainability.
  • Linguistic Practices: Scientific reporting, activist mobilization, political rhetoric.
  • Social Practices: International agreements (Paris Accord), corporate greenwashing. Visualization:
  • Climate Crisis: Images of protests, wildfires.
  • Global Warming: Graphs of rising temperatures.
  • Carbon Neutrality: Infographics on renewable energy.

7. Challenges and Future Directions

  • Data Preprocessing: Tokenization, lemmatization, and metadata tagging are labor-intensive.
  • Ethical Considerations: Bias in corpus compilation (e.g., overrepresenting mainstream media).
  • Emerging Trends:
    • Multimodal Corpora: Integrating text, images, and video.
    • Real-Time Analysis: Using APIs to study live social media discourse.

Conclusion

Corpus approaches to discourse analysis offer powerful tools for uncovering patterns and reconstructing meanings in large text collections. By combining quantitative methods (collocation, keyword analysis) with qualitative interpretation, researchers can map the intricate relationships between language, concepts, and social practices. However, the field demands interdisciplinary collaboration and careful attention to the limitations of representativeness and contextual nuance.


Reading Notes:

CORPUS APPROACHES TO RISK