Dirk Geeraerts (°1955) is professor of linguistics at the University of Leuven. His main research interests involve the overlapping fields of lexical semantics, lexicology, and lexicography, with a specific focus on social variation and diachronic change. His theoretical orientation is mainly that of Cognitive Linguistics, with a special emphasis on empirical methods for linguistic analysis.
Big data and the dictionary
Possibly the most exciting (and arguably also the most challenging) recent perspective for lexicography is the advent of massive amounts of digital corpus data. Big Data and the Dictionary: how can the data tsunami be channeled for lexicographical purposes - and more invasively, does the availability of big data change our idea of what a lexicographical description should look like?
For the traditional descriptive purposes of the standard-language dictionary, the mass of currently available text data constitutes an opportunity and a challenge: more raw information holds a promise of better descriptions, but at the same time, the quantitative tools for tracing meaning changes in the vastness of the data, for selecting the most relevant items for description, or for identifying the most illustrative quotations need further development.But why should the data explosion be envisaged only from the point of view of traditional descriptive practices? The sheer size of the available data suggests a shift away from an exclusive focus on single words. Treating all the new data with a classical amount of attention is a practical impossibility, but that difficulty seems to carry its own solution: the magnitude of the data makes it easier to analyze tendencies in the vocabulary that transcend the level of the individual word. Why would (historical) dictionaries not include such an analysis of underlying trends in the lexicon? The sociolectometrical techniques that are currently under development allow mining the data for answers to questions like the following:
- have the various varieties of a given language converged or diverged?
- has the internal variability of these varieties increased or diminished?
- has the rate of innovation in the lexicon changed?
- which fields have seen most innovation, and which have contributed most to the general language?
- has language use in general become more informal?
Descriptions of such shifts, in a graphical or text format, would enrich the dictionary, and if periodically repeated, would cumulate into an entirely new kind of history of the lexicon.
Patrick Hanks is professor at the Research Institute of Information and Language Processing in the University of Wolverhampton, and at the Bristol Centre for Linguistics in the University of the West of England (UWE, Bristol). His research interests are: lexical analysis, analysis of intentional lexical irregularities, similes, comparisons, and metaphors, and personal names.
Prototypes and Probabilities: a corpus-driven approach to Lexis and Meaning
Analysis of corpus data encourages the view that natural languages are a puzzling mixture of logical structures and analogical processes. The logic of natural language has been well studied since the Latin grammarians and the European Enlightenment, but the analogical aspects have been comparatively neglected. In this talk, I shall focus on the analogical nature of meaning. I suggest that most meanings are events (interactions between speaker and hearer, or—with displacement in time—between writer and reader), which "require the presence of more that one word for their normal realization" (Sinclair 1998).
I suggest that corpus-driven lexicography of the future should devote much of its attention place to the analysis of collocational preferences, and that this will lead to a new type of on-line dictionary. It is only possible to understand what a word means when it is used in context, so it is necessary to define what is meant by 'context'. The aim of such a dictionary, then, will be to map prototypical meanings ('entailments' or 'implicatures') onto prototypical patterns of phraseology.
Such a dictionary will provide an inventory of patterns of normal word use, such that actual uses of words in texts and conversations can be compared with prototypical patterns of usage in order to derive a meaning. However, it seems that such mappings will at best only be probabilistic. So what is the source of our sense of certainty that we know what a speaker means when he or she uses a word?
Associated with this approach to lexical analysis is a new theory of language in use—the theory of norms and exploitations (TNE). At its most basic, this says that when we use a word, we either conform to one of its normal patterns of use or we exploit one or more of those patterns creatively. If time allows, I shall discuss the nature of exploitations.
Sinclair, J. M. 1998. 'The Lexical Item' in E. Weigand (ed.) Contrastive Lexical Semantics. Benjamins.