Cross-situational Algorithms for Vocabulary Learning

A Cognition Briefing

Contributed by: Jose' Fernando Fontanari, University of Sao Paulo

Language, according to Bickerton (1990), is primarily a representational system, developed well before our remote ancestors have uttered the first recognizable word. Individuals endowed with such a representational system, it was hypothesized, could have invented purely mental labels for the categories they created, which constituted symbolic thought (Deacon, 1997). In such a dormant form, however, language would be essentially unusable – only through communication language could have evolved to become unarguably the most powerful representational system ever seen in nature. How those mental labels are mapped into arbitrary signals which are then made available to other individuals (e.g., through speech or gestures) and how these individuals infer their meanings constitute the issue of the origin of the communication. In consonance with this perspective, a common computational framework used to study the origin of communication is to allow the agents first to develop the meaning structure (i.e., the object-meaning mapping) and only then begin the communication phase (see, e.g. Steels, 2002; Smith, 2003a; 2003b).

In the communication phase, which will be our main concern in this contribution, it is assumed that each individual (or agent), when communicates, can produce a signal for any of its N meanings (mental labels). More specifically, the agents can choose any signal from a fixed vocabulary of H different signals. For the sake of concreteness, we can imagine that the signals are denoted by letters of the alphabet a, b, c, … and the meanings by the integers 1,2,3,… . We note that in doing so we are actually modeling the emergence of a holistic communication code, in which a signal stands for the meaning as a whole, so this formulation is more appropriate to study the emergence of a protolanguage (Arbib, 2008) rather than of a language. In this context, the communication competence of an agent is determined by a N x H verbalization matrix (i.e., a table with N rows and H columns) that describes the meaning- signal mapping of that agent (Hurford, 1989). So an algorithm or procedure to evolve a communication code in a population of agents is essentially a prescription to modify the agents’ verbalization matrices depending on the outcome of their interactions, which we refer as communicative events.

There are a few radically distinct approaches to study the emergence of communication codes. Some are based on a direct analogy with biological evolution (see, e.g., Hurford, 1989; Nowak & Krakauer, 1999; Fontanari & Perlovsky, 2007) and make use of the explicit assumption that the communication codes (the verbalization matrices) are transmitted from parents to children (vertical transmission in the population genetics jargon) and that possessing an efficient communication code confers a fitness advantage to the individual. Some, on the other hand, are based on cultural evolution (see, e.g., Cavalli-Sforza & Feldman, 1981) and assume that language spreads like a virus epidemics in a population. Another relevant approach is the so-called Iterated Learning framework (Smith et al, 2003; Brighton et al, 2005) in which there are (typically) only two agents involved – the teacher and an initially tabula rasa pupil. After learning, the pupil replaces the teacher and a fresh pupil is introduced in the scenario. This procedure is repeated until the communication code reaches a stationary regime where it becomes fixed.

A cross-situational algorithm to learn or evolve a communication code (a vocabulary) differs from the previous approaches in the sense that it involves the repeated interaction between the same two individuals. In that sense, these algorithms are also known as guessing or naming games. Cross-situational learning is based on a very intuitive idea (which may actually bear no relevance at all on the way children learn a vocabulary!) that one way that a learner can determine the meaning of a signal is to find something in common across all observed uses of that signal (Siskind, 1996). Although, the general notion of cross-situational learning has been proposed by many authors (see, e.g., Pinker (1989) and Fisher et al. (1994)) the translation of those rather abstract theories into mathematical or computational models is still subject of intense research in the literature (Siskind, 1996; Smith, 2003a; Smith, 2003b; Beule et al, 2006; Fontanari & Perlovsky, 2006).

In this contribution we report results of extensive simulations of a minimal cross-situational learning model proposed by Smith (2003a, 2003b) (see also Fontanari & Perlovsky, 2006) . For the sake of simplicity, henceforth we will refer to meanings as objects, so the verbalization matrices or tables represent an object-signal mapping. In Smith’s minimal model, one of the agents, who plays the role of the “hearer”, has access to a set of objects (the context – a random subset of the entire set of N objects) and to a single signal emitted by the other agent who plays the role of the “speaker” in this round of the game. The learning algorithm increases by a certain amount those entries of the hearer’s verbalization matrix that associate the observed signal with all objects in the context. All other entries are decreased by an appropriate amount. The verbalization matrix of the speaker is not changed. Then the players change role: the hearer becomes the speaker and vice-versa. It is important to emphasize that this algorithm does not use any feedback about the success of the communicative event.

Repetition of this game ultimately leads to a stationary situation where the verbalization matrices of the two players become fixed (actually they become identical too). At this stage we can calculate the communication error of the outcome of the game which is defined as the probability that a signal emitted by one agent is misinterpreted by the other agent. We find that the communication error tends to zero (perfect communication) only in the limit of infinite vocabulary size, whereas an optimal algorithm should yield zero error provided that H (the vocabulary size) is greater or equal to N (the number of objects). More pointedly, we find that the communication error vanishes like the inverse of the ratio H/N when H, N and H/N become very large. Interestingly, the same asymptotic result is obtained when we use a supervised learning algorithm (Steels & Kaplan, 1998; Steels, 2002; 2003), which indicates that, as far as the asymptotic learning behavior is concerned - and this is the standard measure used to compare the performance of learning algorithms (Vapnik & Chervonenkis, 1971) – supervised and unsupervised algorithms for vocabulary learning seem to belong to the same category.

References
Arbib, M.A. (2008). Holophrasis and the protolanguage spectrum. Interaction Studies, 9: 154-168.

Beule, J., Vylder, B. & Belpaeme, T. (2006). A cross-situational learning algorithm for damping homonymy in the guessing game. In: L.M. Rocha, M. Bedau, D. Floreano, R. Goldstone, A. Vespignani, L. Yaeger (Eds.), Proceedings of the Xth Conference on Artificial Life, The MIT Press, Cambridge.

Bickerton, D. (1990). Language & Species. University of Chicago Press, Chicago.

Brighton, H., Smith, K. & Kirby, S. (2005). Language as an evolutionary system. Phys. Life Rev., 2: .177-226.

Cavalli-Sforza, L.L. & Feldman, M.W. (1981). Cultural Transmission and Evolution: A Quantitative Approach. Princeton University Press, Princeton, NJ.

Deacon, T.W. (1997). The Symbolic Species. W. W. Norton & Company, New York.

Fisher, C., Hall, G., Rakowitz, S., & Gleitman, L. (1994). When it is better to receive than to give: Syntactic and conceptual constraints on vocabulary growth. Lingua, 92: 333-375.

Fontanari, J.F. & Perlovsky, L.I. (2006). Meaning creation and communication in a community of agents. Proceedings of the IEEE International Joint Conference on Neural Networks IJCNN'06. Vancouver, Canada, pp. 2892 –2897.

Fontanari, J.F. & Perlovsky, L.I. (2007). Evolving compositionality in Evolutionary language Games, IEEE Trans. Evol. Comput. 11: 758-769.

Hurford, J.R. (1989). Biological evolution of the Saussurean sign as a component of the language acquisition device. Lingua, 77: 187-222.

Nowak, M.A. Krakauer, D.C. (1999). The evolution of language. Proc. Natl. Acad. Sci. USA, 96: 8028-8033.

Pinker, S. (1989). Learnability and cognition. Cambridge, MA: MIT Press.

Siskind, J.M. (1996). A computational study of cross-situational techniques for learning word-to-meaning mappings. Cognition, 61: 39-91.

Smith, A.D.M. (2003a). Semantic generalization and the inference of meaning.In: W. Banzhaf, T. Christaller, P. Dittrich, J. T. Kim, J. Ziegler (Eds.), Proceedings 7th European Conference on Artificial Life, Lecture Notes in Artificial Intelligence, 2801: 499-506.

Smith, A.D.M. (2003b). Intelligent meaning creation in a clumpy world helps communication. Artificial Life, 9: 557-574.

Smith, K., Kirby, S., Brighton, H. (2003). Iterated Learning: a framework for the emergence of language. Artificial Life, 9: 371-386.

Steels, L. (2002). Grounding symbols through evolutionary language games. In: A. Cangelosi, D. Parisi (Eds.), Simulating the Evolution of Language, London: Springer-Verlag, pp. 211-226.

Steels, L. (2003). Evolving grounded communication for robots. Trends in Cognitive Science, 7: 308-312.

Steels, L. & Kaplan, F. (1998). Spontaneous lexicon change. Proceedings of COLING-ACL. Morgan Kaufmann, San Francisco, pp. 1243-1250.

Vapnik, V.N. & Chervonenkis, A.Y. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and Its Applications, 16: 264–280.