Julian Häußler (Darmstadt): Teaching Distributional Semantics Through Concordance Analysis

Datum: 21. März 2025Zeit: 11:30 – 14:00Ort: Kollegienhaus, Universitätsstraße 15, 91054 Erlangen

Join us for the RC21 Project Symposium, where invited speakers and project team members, Poster Presenters will present their work on methodology and applications of concordance analysis!

 

Julian Häußler (Darmstadt): » Teaching Distributional Semantics Through Concordance Analysis«


Abstract:

Distributional semantics, the idea that a word’s meaning can be captured by its surroundings in a text, is a fundamental concept in corpus linguistics and through its implementation, e.g., as Word Embedding Models, also a cornerstone of Natural Language Processing (NLP) and Computational Literary Studies (CLS). However, concordances can also be used to form an understanding of distributional semantics, are easily experimented with and offer a low barrier to entry. This poster will introduce a teaching unit on this topic directed at Master’s students with limited background knowledge in corpus linguistics. Its objective is to train students in using concordance analysis to understand distributional semantics and to enable them to apply it and its implementations in NLP/CLS research workflows.

The idea that the meaning of a word can be identified by looking at the surroundings of said word in a text is simple yet powerful. It is ascribed in its most basic form, the distributional hypothesis (“You shall know a word by the company it keeps!”), to Firth (1957), who cites Wittgenstein (2003, §43). This assumption is today the foundation of distributional semantics and NLP-methods such as Word Embedding Models (Mikolov et al. 2013) and Topic Modeling, specifically LDA (Blei, Ng and Jordan 2003). However, these implementations work by reducing the information on words’ surroundings to numerical values and only allow the analysis as such. Therefore, ‘manually’ analyzing these surroundings by looking at concordances is most beneficial for properly understanding how distributional semantics works.

This poster will introduce a teaching unit on distributional semantics through concordance analysis, which was part of a weekly practice session (German: Übung) on distributional semantics in the summer term of 2024 at the Technical University of Darmstadt, Germany. The students were mostly enrolled in Master’s degrees with a strong focus on computational research, but had little to no prior experience in corpus linguistics. The teaching objectives were to a) teach them how to do concordance analysis and b) how to reasonably identify and compare word meaning through concordance analysis.

The teaching unit consists of 1) a discussion of the principles of distributional semantics(based on Harris 1954, a more extensive discussion of the topic than Firth 1957), 2) a short introduction to the concordancing tool AntConc, and lastly, 3) practical, guided exercises for the students to experiment with concordance analysis across corpora. The unit closed with a submission where students were asked to perform a step-by-step concordance analysis in order to examine what a specific target word may mean across different corpora.

The rationale behind including this teaching unit in the practice session was to get students to thoroughly examine word surroundings, before graduating to automated methods such as Word Embedding Models. These applications make it easy to overinterpret results, e.g., if approached with a common knowledge definition of synonyms, instead of an understandingof the results as a summary of corpus-specific observations. Furthermore, this unit was seen as an exercise in formulating and testing hypotheses in corpus-based research.

References:
Blei, David M, Andrew Y Ng, and Michael I Jordan. 2003. “Latent Dirichlet Allocation.” Journal of Machine Learning Research 3:993–1022.

Firth, John. 1957. “A Synopsis of Linguistic Theory, 1930–1955.” In Selected Papers of J.R. Firth 1952–1959, edited by Frank Palmer, 168–205. London: Longman.

Harris, Zellig S. 1954. “Distributional Structure.” WORD 10 (2–3): 146–62. https://doi.org/10.1080/00437956.1954.11659520.

Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Efficient Estimation of Word Representations in Vector Space.” arXiv. https://arxiv.org/abs/1301.3781.

Wittgenstein, Ludwig. 2003. Philosophische Untersuchungen. Edited by Joachim Schulte. Bibliothek Suhrkamp, Band 1372. Frankfurt am Main: Suhrkamp

Zum Kalender hinzufügen

Details

Datum:
21. März 2025
Zeit:
11:30 – 14:00
Ort:

Kollegienhaus, Universitätsstraße 15, 91054 Erlangen

Veranstaltungskategorien:
RC21