Proc. of Machine Intelligence 17, pp.32-35, 2000

Mechanism of Lexical Development

Mutsumi Imai
Faculty of Environmental Information, Keio University

Ikuo Kobayashi, Tomonobu Ozaki and Koichi Furukawa
Graduate School of Media and Governance, Keio University

5322 Endo, Fujisawa, Kanagawa, 252-0816, JAPAN
{imai,ikuokoba,tozaki,furukawa}@sfc.keio.ac.jp

1 Introduction

Mapping words onto their meanings involves notorious problem of induction, as pointed by the philosopher W. V. Quine (Quine, 1960). Suppose one hears a word `Gavagai' in a totally foreign language when s/he saw a rabbit running across a field. There are almost infinite number of logically possible meanings for the word `Gavagai'. For example, it can mean `a rabbit', `a white running animal', `a soft, fluffy creature', `a rabbit running in the field', etc. That is, logically, it is impossible to specify the meaning (intension) of an unknown word by an inductive inference based only on a single instance of the word's extension. However, this problem of induction is what children face everyday. A child hears a new word embedded in a sentence in a certain situation, and s/he must infer what the word mean. First, s/he must identify the referent of the word in that particular scene to which the word is uttered. This is not easy, since a scene (e.g., in a living room, a park, a street, etc.) usually includes multiple objects. But even when the referent is successfully identified, s/he needs to determine to what other entities the word can be (or cannot be) generalized. However, quite paradoxically, they learn words at an amazing pace, sometimes adding new words as many as 8-10 words a day to their vocabulary. In fact, researchers have found that children assign the meaning to a newly introduced word at the very first exposure, the phenomenon called ``fast-mapping'' (Carey & Bartlett, 1978).

In this talk, I lay out how young children fast-map novel words to their meanings, getting around the problem of induction based on the findings of empirical research in developmental psychology. I then briefly introduce our recent attempt to model children's word learning using Inductive Logic Programming (Furukawa,et al.,1999; Kobayashi, 1999).

2 Word learning biases

Researchers have agreed that children have implicit assumptions about how lexicon is organized and what a word can mean and cannot mean. These assumptions (biases) enable children to constrain originally a vast number of possible meanings and assign a meaning to a given word even at the first encounterance to the word. These assumptions/biases include the whole object bias, the object category bias, the shape bias, the mutual exclusivity bias, and the principle of contrast ( Clark, 1987; Markmann, 1989; Imai, Gentner & Uchida, 1994 ; ). By the mutual exclusivity bias, children assume that the newly introduced word maps to an object which does not yet has a label and further by the whole object assumption, they assume that the word refers to the entirety of the referred object, not its part, material, texture, or color. By the object category bias, they assume that the word is not restricted to the originally referred object, but it can be generalized to other objects of 'like kind'. The shape bias provides children with a basis for determining what objects are `alike' to the originally labeled object. Guided by this bias, children can generalize the new word to other similar objects in shape. This bias is particularly important since it provides tangible perceptual basis to approximate taxonomic categories (e.g., dogs, birds, cars, etc.) without prior knowledge about the labeled object and the category it belongs to.

3 Constraints on the application of word learning biases

The word learning biases described above provide children with a powerful tool to constrain possible meanings of unknown words especially at the very beginning stages of word learning, in which children's knowledge about the world is sparse and very limited. However, at the same time, the application of the biases must be appropriately controlled for efficient word learning, because the above word learning biases are most useful for acquiring labels for basic-level object categories (Roche, 1978) but can block learning of other types of words including material names, part names, property names and non-basic category names (e.g., subordinate and superordinate category names) and proper names. For example, the whole object assumption should not be applied when a child learns words for substances, such as water, sand, and sugar (Imai & Gentner,1997). Learning names for specific individuals, i.e., proper nouns, requires suspension of the taxonomic assumption (Hall, 1991). The mutual exclusivity assumption must be overcome in order for a child to learn category names at different levels of the taxonomic hierarchy as well as names for particular individuals (Hall, 1991; Gelman & Taylor, 1984). Thus, for successful word learning, children must not only possess the word learning biases but they also need heuristics that enable them to determine when to apply, suspend, or overcome the biases.

The results of a series of research (Imai & Gentner, 1997; Imai, Gentner & Uchida, 1994 ; Imai & Haryu, in press) present a fairly comprehensive picture of how children assign meanings to novel nouns. When a novel label is given to an object that does not yet have a label, children assume that the label is a name for an object category whether the referent is an animal or an inanimate object. If the named object already has an established name, and if the object is an animal, children tend to interpret the label as a name for the particular individual rather than interpreting it as a name for a narrower or a broader category. However, when a novel label was given to an inanimate object, they do not interpret it as a name for a particular instance any more. Instead, they map the noun to a narrower category, i.e., a subordinate category. When a label is associated with a non-solid substance whose shape does not hold over time, they do not apply the whole object bias and the shape bias any more. They know that substance kinds are fundamentally different from object kinds, and extend a new label for a substance on the basis of material identity instead of shape similarity.

Furthermore, with the increase of conceptual knowledge, children gradually come to realize that shape similarity is not the most essential factor for determining the membership of an object category. That is, children's notion of `likeness' change over the course of development, shifting the weight from perception-based similarity to concept-based similarity in determining the extension of a novel word. Subsequently, the original bias toward generalizing a novel word to other shape-similar objects is weakened and is replaced by a more sophisticated assumption as follows : ``When information about non-perceptual, internal attribute is available, use that information as the basis for determining the extension of a novel word. Otherwise use shape as the basis of generalization.''

I propose that what is behind the amazingly efficient word learning in young children is their ability to generate biases (rules) from a small amount of word learning experience and subsequently fine-tune the application of the biases as word learning and knowledge acquisition further proceeds.

4 A model of lexical acquisition using ILP

4.1 ILP and lexical acquisition

Inductive logic programming(ILP) (Muggleton, 1995) is a framework for learning relational concepts when a set of positive and negative examples are given together with background knowledge. It provides a framework that is particularly suitable for acquiring concept descriptions.

Our ILP model for the children's word learning is most notably different from the standard ILPs in that the system learns, just as human learning process, which takes place continuously during the life time. We adopt an ILP system Progol (Muggleton, 1995) for this purpose and attempt to model the `fast-mapping' phenomenon in young children. More specifically the word learning biases and a new evaluation criterion are implemented in Progol so that the system can select an appropriate hypothesis even if only one or few examples are provided.

Before describing how the biases are implemented and how the evaluation function is employed, we briefly explain how actual word learning corresponds to ILP in our model.

We assume that children can correctly identify the object with which the word is associated. We also assume that they can extract properties of the object to infer its meaning. The extracted properties are divided into two kinds, which we call the Categorical Classifier and the General Attribute. The categorical classifier corresponds to children's innate ontology, thus it indicates ontological categories such as `animate', 'countable', etc. The general attribute such as `having-for-legs' or 'having-fur' includes all properties except the categorical classifier. The general attributes are further divided into some subtypes such as shapes, colors, textures, etc. Based on these assumptions, we feed an object and its label in a pair to the system in the form of logical formula which ILP can directly handle.

Since each label presents the category to which the object belongs, the concept to be learned is indicated by the label. The objects are given as examples and their properties are given as background knowledge. When the system takes an object and its label as a pair (we call them current object and current label respectively), the system learns or revises the concept indicated by the current label by generalizing the current object if necessary. At this time, the current object and all objects named by the current label (which are given in the previous learning processes) are used as positive examples. The negative examples are selected from the rest examples based on word learning bias.

4.2 Implementation of Word learning biases

The whole object bias:
Let us consider again a part of the `reference problem' of the lexical acquisition. When a label is associated with an object, what aspect of the object should the label be mapped on to? This is an essential problem for our model because it determines kinds of concept our model can learn. In our current model, the system assumes that every label refers to the entirety of an object according to the whole object bias, which says that ``a novel label is interpreted as referring to the whole of an object''. Therefore the system can learn only labels of objects as a whole, while it can not learn other kinds of concept such as a property or a part of object, an action, an event, etc. For example, the word `dog' and `desk' can be learned, but the word `ear' (the part of object) and `running' (the action) can not be learned.

The object category bias:
The object category bias is automatically implemented in our system because ILP by its nature is used for generalization. That is, the system automatically assumes that a label refers to a category.

The mutual exclusivity bias:
According to the mutual exclusivity bias, each object can be allowed to have only one label. Based on these biases, when some object is explained by more than one learned concept, we have to learn these concepts again to solve this contradiction.

4.3 Learning concepts by using the ontology

In our model the concepts are learned in two steps. First the categorical classifiers in the current object are selected and the super category to which the object belongs is determined by using the categorical classifiers. Then the appropriate description for the concept is constructed by both the categorical classifiers and the general attributes.

It is effective to divide the inference process into the above two steps because the hypothesis space can be reduced and only meaningful negative examples are selected. To explain these effectiveness, let us consider the case in which we learn a concept cat. Assume that the given object named cat has `animate' as its categorical classifier and `having-four-legs' as its general attribute. Assume also that an object having a label `desk' has already been given in the previous inference process and it had `inanimate' and `having-four-legs' as its general attribute and categorical classifier respectively. In this case, although the object named `desk' is obviously negative example for the concept `cat' because the label is different, the system excludes this object from the negative examples for learning because the categorical classifiers `animate' and `inanimate' are mutually exclusive. If this dynamic selection of negative examples can not be done, the concept `cat', which includes the property `having-four-legs', which may be crucial for this concept can not be inferred. Furthermore we can reduce the hypothesis space by excluding the candidates having categorical classifiers used in the selection of negative examples.

4.4 The hypothesis evaluation criterion

In our model, the evaluation criterion is introduced so that it reflects the shape bias. This allows the system to select an appropriate hypothesis even if only few positive examples are given. Like Progol, this evaluation criterion is basically based on the description length of the concept, but some weights are added to each atom consisting of the concept. In the current implementation, we set the weight of the general attribute except that about the shape is more heavy than that of the general attribute about shape, and the weight of the general attribute is more heavy than that of the categorical classifier.

When we learn the concept belonging to the super concept Tax in the children's innate ontology, the general expression of our evaluation function is defined as follows:

C = BodyLength - PE + NE - WCC * CC - Σi (W(GA,Tax,i) * GAi)

where BodyLength is the number of the atoms in the body of the candidate hypothesis, PE and NE are the number of positive and negative examples explained by the candidate hypothesis respectively. CC and WCC are the number of the categorical classifiers in the candidate hypothesis and the weight of categorical classifiers. GAi and W(GA,Tax,i) are the number of the general attribute whose subtype is i in the candidate hypothesis and the weight of general attribute.

4.5 Learning hierarchical concepts

As mentioned in the previous section the word learning biases are useful for learning the basic-level object categories but can block learning of other types of words. As one of the possible solutions to this problem, we introduce a mechanism of learning hierarchical concepts which may override the mutual exclusivity bias.

If an object is judged to be explained by two concepts, the system calculates the similarity between these concepts to determine whether they are regarded as hierarchical or not. If the relation among the concepts is regarded as hierarchical, the system allows the object to be explained by more than one concept and do nothing. Otherwise, the system determines that the concepts to be learned are wrong and revise these concepts to avoid the contradiction.

The relation between such concepts is not limited to hierarchical and mutual exclusive in general. For example, synonymous relation is also possible. But we use here the principle of contrast, which assume that different labels refer to different concepts.

The current implementation of this similarity measurement is very simple and it is based on the shape bias. That is, if the two concepts include the general attribute about `shape' in common, the system judges that they are similar and these concepts should be regarded as hierarchical.

Of course, as shown in the research in psychology, children change their similarity measure as they grow. Therefore, to reflect this, we need to modify the similarity measure during the learning process. This is one of the central topics of our future works.

5 Conclusion and future works

We described briefly the results of a series of research in children's word learning, and introduced our ILP model in which these results are reflected.

We are now in the process of designing more precise similarity measure which can construct and use the ontology more effectively in continuous learning. We also plan to implement a mechanism for controlling word learning biases so that the system can learn properties of objects, proper nouns, etc.

We believe that this model will be useful for understanding and discovering the relationship among the word learning biases.

References