Probability in linguistic typology

Elena Maslova

University of Bielefeld (17-19 July, 2006)

C01-273, 9.00-18.00


Two major objectives of this course are

This is a crash course: there will be twelve classes in three days, four classes per day (see the course program below), alternating between lectures, discussion sections, and some exercises. The lectures will be offered in English; both English and German can be used for questions, as well as in discussion and exercise sections. The course does not presuppose any previous knowledge in probability and/or statistics.

If you want to take this course for credit (3LP), a term paper (ca. 15 pages) is required. It can be either a critical analysis of linguistic inferences from statistical evidence in two or more thematically related typological studies, or a statistical typological investigation of your own (e.g. testing a specific hypothesis), which can be based on published typological databases. The paper can be written in English or in German. A preliminary version for comments must be submitted by the end of November (2006). The final version must be submitted by the end of January (2007).

  1. Introductory

  2. The two introductory lectures give an overview of the history of quantitative typology and its major methodological issues. We will see how statistical tests and (mostly implicitly) probabilistic concepts have been invoked in typological research, and why we need a better understanding of the basics of probability theory in order to build a more solid methodological foundation for this branch of linguistics.

    1. Overview of the course. Practical matters. A preliminary overview of research in quantitative typology
    2. Statistical data and statistical inference. Inferences and explanations in typology

  1. Probabilities and random variables

  2. Judging from the titles, the next three lectures may look as though they come from an introduction to probability theory, rather than from a course in linguistic typology. The point is, however, to introduce the absolutely necessary probability-theoretic concepts with direct reference to their typological applications. So, for instance, we will discuss not just the abstract concept of independence, but also its relation to the problem of “independence of languages”, probably the most widely known methodological problem of linguistic typology; not just correlations, but also the probabilistic sense of implicational universals, etc. More generally, all examples and exercises will come directly from typological studies.

    1. Probability. Independence. Conditional probabilities
    2. Random variables and their distributions. Properties of expected values, means, variance etc.
    3. Correlations and dependencies.
    4. An interim summary: statistical analysis of cross-linguistic distributions
    5. As a way to consolidate the probability-theoretic basics discussed so far, we will critically examine some influential typological studies which heavily rely on statistical inferences (and, sometimes implicitly, on probabilistic concepts). Among other things, this analysis will demonstrate that another probability-theoretic domain has to be explored, namely, random processes.

  1. Random processes

  2. Having introduced the mathematical concept of random process in the first lecture, we will move to a discussion of random processes going on in the language population and various approaches to modelling these processes. In the final lecture of this part of the course, we'll see how the effects of different random processes might be reflected in the modern cross-linguistic distribution.

    1. Introduction to random processes
    2. Non-linguistic random processes in the language population
    3. Language change as a random process. Ergodic hypothesis.
    4. Random processes and cross-linguistic distributions

  1. From theory to applications

  2. The last two lectures establish the missing links between the theoretical issues discussed in the course and tests and recommendations one finds find in statistical textbooks. The major goal is to give the students the basic tools to approach a typological study of their own in an informed manner, as well as to understand and analyze other typological studies relying on statistical evidence. In particular, we will return to methodological issues outlined in the introductory part of the course and see whether (and if yes, how) they can be resolved.

    1. Limiting distributions, statistical convergence, large numbers
    2. Sampling, statistics, tests of hypotheses


Bell Alan. "Language sampling." In Universals of human language, vol. 1, edited by Joseph H. Greenberg, Charles A. Ferguson & Edith A. Moravcsik, 125-156. Stanford University Press, 1978.

Cysouw, Michael. "Quantitative Method in Typology." In Quantitative Linguistics: An International Handbook, edited by Gabriel Altmann, Reinhard Köhler, and R. Piotrowski, Berlin: Mouton de Gruyter, 2005.

Dryer, Matthew S. "Why Statistical Universals Are Better Than Absolute Universals." In Papers From the 33Rd Annual Meeting of the Chicago Linguistic Society, 123-45. 1998.

Dryer, Matthew S. "Large Linguistic Areas and Language Sampling." Studies in Language 13 (1989): 257-92.

Greenberg, Joseph H. "Diachrony, Synchrony and Language Universals." In Universals of Human Language, edited by Joseph Harold Greenberg, Charles Albert Ferguson, and Edith A Moravcsik, 61-91. Stanford, Calif: Stanford University Press, 1978.

Greenberg, Joseph H. "The Diachronic Typological Approach to Language." edited by Masayoshi Shibatani, and Theodora Bynon, 143-66. Oxford Oxford ; New York: Clarendon Press Oxford University Press, 1995.

Hawkins, John A. A Performance Theory of Order and Constituency. Vol. Cambridge studies in linguistics ; 73, Cambridge ; New York: Cambridge University Press, 1994.

Maddieson, Ian. "Investigating Linguistic Universals." In Proceedings of the Xiith International Congress of Phonetic Sciences, 346-54. 1991.

Maslova, Elena. "Meta-Typological Distributions." Sprachtypologie und Universalienvorshung

Maslova, Elena. "A Dynamic Approach to the Verification of Distributional Universals." Linguistic Typology 4-3 (2000):

Maslova, Elena. "Динамика типологических распределений и стабильность языковых типов." Вопросы языкознания 5 (2004).

Nichols, Johanna. Linguistic Diversity in Space and Time. Chicago: University of Chicago Press, 1992.

Perkins, Revere D. "Statistical Techniques for Determining Language Sample Size." Studies in Language 13 (1989): 293-315.

Perkins, Revere D. "Sampling Procedures and Statistical Methods." In Language Typology and Language Universals : An International Handbook, edited by Martin Haspelmath, Ekkehard König, Wulf Oesterreicher, and Wolfgang Raible, 419-34. 2001.

Rijkhoff, Jan, Bakker, Dik, Hengeveld, Kees, and Kahrel, Peter. "A Method of Language Sampling." Studies in Language 17-1 (1993): 169-203.

Rijkhoff, Jan, and Bakker, Dik. "Language Sampling." Lingustic Typology 2-3 (1998): 263-314.

Tomlin, Russell S. Basic Word Order : Functional Principles. Vol. Croom Helm linguistics series, London ; Wolfeboro, N.H: Croom Helm, 1986.

Course plan

17.07.2006 9.15-10.45 1. Introductory
11.00-12.30 2. Statistical inferences
14.30-16.00 3. Probabilities
16.15-17.45 4. Random variables
18.07.2006 9.15-10.45 5. Correlations and dependencies.
11.00-12.30 6.Analysis of cross-linguistic distributions
14.30-16.00 7. Random processes
16.15-17.45 8. Non-linguistic random processes.
19.07.2006 9.15-10.45 9. Language change as a random processes.
11.00-12.30 10. Random processes and cross-lingusitic distributions.
14.30-16.00 11. Statistical convergence
16.15-17.45 12. Sampling, tests of hypotheses
17-19.07.2006 13.45-14.15, 18.00-18.30 Questions (Sprechstunde), as needed
Exact times are given (no academic delays)!