Chapter One, Part 4

1.6 Directions for new research

1.6.1 Unanswered questions

The work on antonymy described so far has provided several clues to help answer my initial questions, but it has also raised new ones. This is a good point at which to take another look at those questions and the answers that have been proposed. The first questions were "What makes two words antonyms?" and "What exactly are the semantic dimensions which antonyms are said to share?" It seems that people generally agree that semantic dimensions somehow reflect attributes of things in the world, e.g., the attribute of SIZE or the attribute of SPEED, and that these attributes naturally have two main contrasting values. Antonyms are the words that label these two contrasting values.

People do not agree, however, on how particular adjectives come to be chosen as the labels for particular dimensions. In other words, they do not agree on the answer to the questions of why some words have antonyms while others have none and why some words have more than one antonym. As was discussed above, Cruse (1986), Lehrer and Lehrer (1982) and Egan (1968) focus on the meanings of the individual words involved, suggesting that subtle semantic differences, such as differences in connotative meaning or in range of application, may explain why two words which seem to contrast in meaning are not accepted as antonyms by most speakers. In contrast, the psychologists and computational linguists, Gross, Fischer and Miller (1988), Charles and Miller (1989) and Justeson and Katz (1992), view antonymy as a kind of lexical association which is to some extent independent of meaning.1 They say that contrasting concepts can be expressed by many different words (e.g., the concept of wetness expressed by wet, damp, soaked, soggy, etc.), but that antonyms have a special direct association developed through hearing the antonyms frequently used together. The reason why some words have no clear antonym is because they simply do not co-occur often enough with any semantically contrasting word for this lexical association to develop, and presumably the explanation for why some words have more than one antonymy must be that these words frequently occur with more than one semantically contrasting word. However, I have not found any research that directly tests this by looking at the co-occurrence patterns of words that do not have antonyms or words that have two antonyms.

The co-occurrence account of antonymy also seems to be somewhat useful in answering the questions about the clang phenomenon. The set of 39 strongly associated opposite pairs described by Deese all "sound" like very good examples of antonyms, and Justeson and Katz have shown that these pairs do in fact co-occur frequently. As was discussed above, they believe that co-occurrence can also explain why people can easily identify some contrasting pairs as antonyms but have a harder time making judgments about others.

In any case, whether or not one believes that co-occurrence can explain everything about antonymy, the co-occurrence of antonyms is a real phenomenon, and the question still remains about why speakers choose to use two particular words together so frequently in the first place. Of all the still unanswered questions about antonymy, this is the one I find most interesting and pursue in the rest of the dissertation. Charles and Miller (1989) seem to suggest that the reason is that people have a kind of conditioned response--they often hear two words used together, and they come to associate them in their minds. But as Murphy and Andrew (1992) point out, this view of antonymy leaves out its most salient characteristic, that it is a relationship between words with contrasting meanings. It seems to me that speakers choose to use antonyms together not because it has become a habit, but because it is effective in conveying meaning or making a strong rhetorical impact of the kind described by Fellbaum (1995) and Mettinger (1994). The question then becomes what makes two good antonyms, such as hot /cold and young/old, more effective for these uses than pairs of near-opposites, such as hot/ freezing and youthful/elderly?

As I briefly described in section 1.3 above, I think Egan's suggestions about range of scope and implications are extremely promising. Although I do not think her characterization is exactly right--I do not think there are any two words, either synonyms or antonyms, which are exactly the same in terms of range of application and connotations--I think it is likely that antonyms are very similar in terms of range of application, as well as in connotations and register. Egan picks out some of the same aspects of meaning as the linguists in section 1.2.3, such as the differences in non-propositional meaning mentioned by Cruse (1986) and the differences in distribution noted by Lehrer and Lehrer (1982). Taken together, the linguistic and lexical accounts suggest that antonyms are words which have a similarity in meaning, and I think that it is this similarity which provides opportunities for speakers to use the antonyms together in a wide range of different sentences. This is turn may explain why antonyms co-occur so frequently. The rest of this thesis is an exploration of this idea.

1.6.2 A hypothesis: Antonymy results from shared semantic range

The hypothesis that I examine here is that a word's semantic range, which reflects its range of application (in Egan's terms) as well as connotations and stylistic factors, determine whether the word will have an antonym and what that antonym is. My idea is that if two words contrast in meaning across a wide semantic range, there will be many different situations in which it will be appropriate for a speaker to use the two words together. On the other hand, if two words only contrast in a small part of their overall range of meaning, that is, if they contrast only when they are used in a particular context, they won't be considered antonyms outside of that context.

Consider the pairs dry/sweet and wet/dry, for example. Dry and sweet contrast only when they are being used to describe wine, and this is only a small part of the overall ranges of these two adjectives. Wet and dry, in contrast, share a great deal of semantic range: they can both be used to describe weather conditions, soil conditions, the state of people and their clothing, and so on, in phrases such as wet/dry summer, wet/dry sand and wet/dry laundry. In fact, it is probably the case that wet and dry share so much semantic range that for most uses of wet, the opposite state would be best described by dry, and vice-versa. My claim is that dry and wet are considered antonyms even in the absence of any particular context, as in the word association tests done by Deese (1965) and the antonym tests done by Murphy and Andrew (1992), because speakers have a knowledge of the overall semantic range shared by these two words, and perhaps also because this shared semantic range allows speakers to use the antonyms together frequently, thus leading to the kind of association described by Justeson and Katz (1992). This hypothesis, that antonyms are words which share a great deal of semantic range, is pursued in the three case studies that make up the rest of this thesis.

The first case study, which makes up Chapter Two, looks at big, little, large and small. The question that arises with these four adjectives, of course, is why large can be paired with small but not little, while big can be paired with both small and little (most people seem to prefer little, but small is often listed as an antonym for big in many dictionaries and lists of antonyms).2 Charles and Miller believe that semantics cannot explain this and so they regard this case as crucial evidence that antonymy is a relationship between word forms rather than concepts. My goal is to show, through a detailed look at the four adjectives' meanings, that semantics can explain this. My study shows that large and little have almost no semantic range in common and have pronounced differences in register, while big shares a significant amount of semantic range with both little and small.

The second case study, Chapter Three, focuses on the distinction between antonyms and near-opposites (direct and indirect antonyms, in the terms of Gross, Fischer and Miller 1988), looking at the antonyms wet and dry and several adjectives similar in meaning, namely damp, moist, dank, humid, arid, and parched.3 I show that in this case too, semantic range can explain, for example, why humid and damp are not considered antonyms of dry.

The final case study, Chapter Four, looks at a word with two antonyms, happy and its opposites sad and unhappy. Many cases of a word with two different antonyms involve a word which has clearly distinct senses (e.g., right/wrong, right/left and old/young, old/new), but in this case, sad and unhappy are similar in meaning so it is not clear that distinct senses of happy are involved. A careful look at the meanings of the three adjectives shows that subtle differences in sad and unhappy cause them to have somewhat different semantic ranges. The wider range of happy overlaps the ranges of both sad and unhappy so that happy can contrast with both sad and unhappy.

1.7 Sources of data

In each of the case studies, my goal is to compare the meanings of the words in order to see how similar or different they really are. Since all of the case studies focus on adjectives, and since the function of adjectives is to modify nouns, a very good way to characterize adjectives' meanings is by looking at the kinds of nouns they typically modify.

One source of information I make use of in this study is definitions and usage notes in learner's dictionaries. Dictionaries designed for people learning English as a second or foreign language are more useful than regular dictionaries in that they contain more information about how to distinguish near synonyms, especially synonyms which are very commonly used, such as big and large. This information often includes specific comments about common collocational patterns. In this study, I used four recent learner's dictionaries: Collins Cobuild English Dictionary (abbreviated hereafter as CCED), Longman Dictionary of Contemporary English, Third Edition (abbreviated as LDOCE), Oxford Advanced Learner's Dictionary, Fifth Edition (abbreviated as OALD) and Longman Language Activator (abbreviated as LLA).4 The entries in these dictionaries are useful as a starting point in investigating how particular adjectives compare in terms of semantic range, but they do not provide enough examples--due to space limitations, a dictionary simply cannot list all the nouns an adjective typically occurs with. It is therefore necessary to also look at data from a large corpus.

For finding a wide range of examples of the nouns which an adjective typically modifies, my main source of data was a corpus of over 50 million words from 6 months of the New York Times. It is not a balanced corpus, but it is large enough to be useful for finding statistical patterns in adjective-noun usage. A fellow graduate student at Northwestern, John Wickberg, first ran the corpus through a simple parser to tag the corpus for part-of-speech. Then he wrote a program which could take an adjective, find the nouns which it was modifying, and calculate the strength of the association between the adjective and each noun using a measure called the mutual information statistic. I then used this data in making my analysis of the semantic ranges of the adjectives.

The mutual information statistic is introduced in Church and Hanks (1990). They give the following formula, in which P(x) is the probability of one word occurring (calculated by counting the number of times a word actually occurs in a particular corpus), P(y) is the probability of a second word occurring, and I(x,y) is the mutual information measure (the measure of association) for the two words: I(x,y) = log2 P(x,y)/P(x)P(y). Or as Church and Hanks describe this formula in prose:

Informally, mutual information compares the probability of observing x and y together (the joint probability) with the probabilities of observing x and y independently (chance). If there is a genuine association between x and y, then the joint probability P(x,y) will be much larger than chance P(x)P(y), and consequently I(x,y)>>0. [I=the mutual information measure] If there is no interesting relationship between x and y, then P(x,y)P(x)P(y), and thus I(x,y)0. If x and y are in complementary distribution, then P(x,y) will be much less than P(x)P(y), forcing I(x,y)<<0. (Church and Hanks 1990, 23)

In other words, the mutual information measures the association between two words by comparing the number of times they actually occur together in a particular corpus to the number of times they would occur by chance. The higher the mutual information value, the stronger the association.

With a sufficiently large database, such as the New York Times database, it is possible to find some useful patterns using this statistic. For example, big is a very common word, occurring 13,914 times in the corpus, and there are many nouns which occur with big with a mutual information value of much higher than zero. The highest values of all, not surprisingly, are for collocations with an idiomatic meaning such as big bang and big leagues, which both have a mutual information value of more than 9. Of the 112 occurrences of bang in the corpus, 31 of of them occur with big in the same noun phrase, probably right before it. Some other nouns with which big has a high mutual information value (higher than 6) are chunk, grin, jump and loser.

The mutual information statistic is thus a good way of identifying strong relationships between particular adjectives and nouns, and thus a good source of data for someone who wants to characterize the typical semantic range of an adjective. But this method does have some limitations. The biggest drawback is that the measure is most meaningful only with adjectives that occur frequently in the corpus, e.g., big, large ( which occurred 18143 times), wet (685), and dry (2147). It is much less useful for studying less frequent adjectives such as damp, which occurred 43 times in this corpus, or parched, which occurred only 47 times. Because these adjectives occur so rarely, the chance of them occurring with any noun except the most common ones are exceptional. Thus, the combination parched fairways occurs only once in the corpus, but it has a high mutual information value because fairways is a infrequent noun, occurring only 104 times. Thus, for the less frequent adjectives discussed in the second case study, the mutual information value does not provide a strong measure of typicality, so in order to find patterns, I supplemented the corpus data with examples from dictionaries and from electronic versions of texts of English literature available through Project Gutenberg.5

Another limitation to this method is that although the large database had been tagged for part of speech (not always accurately, but on the whole, pretty well), the phrase structure of sentences were not marked. This makes it hard to identify when nouns are being modified by predicative uses of an adjective. The program written by Wickberg simply looked for an adjective followed by a head noun within a three word window (to allow for compound nouns and multiple adjectives); this could accurately pick out the noun being modified by the adjective in phrases such as large department stores, but it could not pick out the noun modified by the adjective in a sentence such as Her house was quite large. In fact, the program might identify a completely unrelated noun if there was one in the first three words of the following clause. With frequently occurring adjectives, this would not be a problem--since such a noun would probably be picked out only once, randomly, it would have a mutual information value close to 0. In this study, I considered only the nouns with which the adjective had a mutual information value of 3 or more, a value which indicated the relationship was not accidental. With the infrequent adjectives, some of the nouns picked out by the program occurred only once with the adjective, so I checked on any combinations that seemed unexpected to see if they were due to parsing mistakes.6 In the case of the adjectives big, little, large, and small, the fact that the program only picked out prenominal (attributive) uses of the adjectives was not a problem, since I was able to determine from other sources (dictionary definitions and examples picked out of a smaller corpus) that the predicative uses and the prenominal uses of these adjectives were basically the same.

In the case of the adjectives related to wet and dry, there were a few differences between predicative and attributive uses. For example, wet and dry are often used as predicative adjectives to describe the state of people, in sentences such as I was all wet, but there is no attributive use corresponding attributive use because wet and dry do not usually occur before nouns which name people. Since data from the New York Times corpus did not provide information of predicative uses, I used additional examples from the electronic texts from Project Gutenberg which showed that the differences between the attributive and predicative uses were limited in scope and easy to characterize. In the case of happy, unhappy and sad, the differences between the predicative and the prenominal uses was greater, and I had to compensate for the limitations of the mutual information data by finding many examples from other sources. One of these sources was the example sentences is the four learners' dictionaries, LDOCE, OALD, CCED, and LLA. The examples from all four are taken from large corpora of British and American English. Another main source of data was examples from the quotations in the CD-ROM version of the Oxford English Dictionary.7 Throughout the discussion, I have identified which data came from the large corpus, which examples came from the Oxford English Dictionary, and which came from sources available through Project Gutenberg. In the case of the later, I only identify individual novels when I quote sentences from them.


1
Gross, Fischer and Miller and Charles and Miller say the antonymic association is completely independent of meaning, while Justeson and Katz say that a semantic component is needed in addition to the lexical component.


2
Deese even lists small as the antonym of big in the Associational Dictionary in the appendix to his book (Deese 1965), even though in the chapter on adjectives and antonymy, he lists big/little and large/small as the antonym pairs.


3
As I will explain in Chapter 3, I chose these adjectives from a larger set of synonyms of wet and dry listed in WordNet.


4
The Longman Language Activator is different from the other dictionaries in that it groups words relating to the same concept together. Under the entry under big for example, there are groupings for "words for describing an object,building, animal, or organization etc. that is big" and words that mean "to make things bigger." There is also an alphabetical cross-listing.


5
Project Gutenberg makes available many different kinds of texts in electronic form; the texts have been selected and scanned in by volunteers and reflect a wide variety of interests, from novels to historical documents to works of mathematics. For reasons having to do with copyright laws, however, they tend to be older works that are in the public domain. For this research, I downloaded and searched about 35 different works, including novels by Lewis Carroll, Thomas Hardy , and others. More information about Project Gutenberg and the kinds of texts that are available can be found on the Internet at <http://www.promo.net/pg/> or by email to <hart@pobox.com>.


6
For example, the list of co-occurring nouns and adjectives contained the combination humid athletes, but this turned out to be a parsing mistake, with humid occurring in the predicate of a clause, modifying air, while athletes started the following clause.


7
I used the Oxford English Dictionary rather than the electronic texts in the third case study because it contained a greater variety of sentences using happy, sad, and unhappy, and because the examples were more recent than those in the Project Gutenberg texts.