You are on page 1of 461

Expecting the Unexpected: Exceptions in Grammar

Trends in Linguistics
Studies and Monographs 216

Editor

Volker Gast
Founding Editor

Werner Winter
Editorial Board

Walter Bisang Hans Henrich Hock Matthias Schlesewsky Niina Ning Zhang
Editor responsible for this volume

Walter Bisang

De Gruyter Mouton

Expecting the Unexpected: Exceptions in Grammar

Edited by

Horst J. Simon Heike Wiese

De Gruyter Mouton

ISBN 978-3-11-021908-1 e-ISBN 978-3-11-021909-8 ISSN 1861-4302


Library of Congress Cataloging-in-Publication Data Expecting the unexpected : exceptions in grammar / edited by Horst J. Simon, Heike Wiese. p. cm. (Trends in linguistics. Studies and monographs ; 216) Includes bibliographical references and index. ISBN 978-3-11-021908-1 (alk. paper) 1. Grammar, Comparative and general Grammatical categories. 2. Generative grammar. 3. Functionalism (Linguistics) I. Simon, Horst J. II. Wiese, Heike. P283.E97 2011 415 dc22 2010039874

Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available in the Internet at http://dnb.d-nb.de. 2011 Walter de Gruyter GmbH & Co. KG, Berlin/New York Typesetting: PTP-Berlin Protago TEX-Production GmbH, Berlin Printing: Hubert & Co. GmbH & Co. KG, Gttingen Printed on acid-free paper Printed in Germany. www.degruyter.com

Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

Introductory overview
What are exceptions? And what can be done about them? . . . . . . . . Horst J. Simon and Heike Wiese Coming to grips with exceptions . . . . . . . . . . . . . . . . . . . . . Edith Moravcsik 3 31

Classical loci for exceptions: morphology and the lexicon


Exceptions to stress and harmony in Turkish: co-phonologies or prespecication? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bar Kabak and Irene Vogel s Lexical exceptions as prespecication: some critical remarks . . . . . . T.A. Hall 59 95

Feature spreading, lexical specication and truncation . . . . . . . . . . 103 Bar Kabak and Irene Vogel s Higher order exceptionality in inectional morphology . . . . . . . . . 107 Greville G. Corbett An I-language view of morphological exceptionality: Comments on Corbetts paper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Stephen R. Anderson Exceptions and what they tell us: reections on Andersons comments . 135 Greville G. Corbett How do exceptions arise? On different paths to morphological irregularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Damaris Nbling On the role of subregularities in the rise of exceptions . . . . . . . . . . 163 Wolfgang U. Dressler

vi

Contents

Statement on the commentary by Wolfgang U. Dressler Damaris Nbling

. . . . . . . . 169

Taking into account interactions of grammatical sub-systems


Lexical variation in relativizer frequency . . . . . . . . . . . . . . . . . 175 Thomas Wasow, T. Florian Jaeger, and David M. Orr Corpus evidence and the role of probability estimates in processing decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Ruth Kempson Response to Kempsons comments . . . . . . . . . . . . . . . . . . . . 205 Thomas Wasow, T. Florian Jaeger and David Orr Structured exceptions and case selection in Insular Scandinavian . . . . 213 Jhannes Gsli Jnsson and Thrhallur Eythrsson Remarks on two kinds of exceptions: arbitrary vs. structured exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Susann Fischer Response to Susann Fischer . . . . . . . . . . . . . . . . . . . . . . . . 251 Jhannes Gsli Jnsson and Thrhallur Eythrsson

Loosening the strictness of grammar


Three approaches to exceptionality in syntactic typology . . . . . . . . 255 Frederick J. Newmeyer Remarks on three approaches to exceptionality in syntactic typology . . 283 Artemis Alexiadou A reply to the commentary by Artemis Alexiadou . . . . . . . . . . . . 289 Frederick J. Newmeyer Three types of exceptions and all of them rule-based . . . . . . . . . . 291 Sam Featherston Anomalies and exceptions . . . . . . . . . . . . . . . . . . . . . . . . 325 Hubert Haider Distinguishing lexical and syntactic exceptions . . . . . . . . . . . . . 335 Sam Featherston

Contents

vii

Disagreement, variation, markedness, and other apparent exceptions . . 339 Ralf Vogel What is an exception to what? Some comments on Ralf Vogels contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 Henk van Riemsdijk Response to van Riemsdijk . . . . . . . . . . . . . . . . . . . . . . . . 369 Ralf Vogel Describing exceptions in a formal grammar framework . . . . . . . . . 377 Frederik Fouvry Explanation and constraint relaxation . . . . . . . . . . . . . . . . . . . 401 Pius ten Hacken

Unexpected loci for exceptions: languages and language families


Quantitative explorations of the worldwide distribution of rare characteristics, or: the exceptionality of northwestern European languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 Michael Cysouw Remarks on rarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433 sten Dahl Some more details about the denition of rarity . . . . . . . . . . . . . 437 Michael Cysouw Subject index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443 Language index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449

Preface
The present volume contains a variety of contributions: some have evolved from a selection of contributions to a workshop at the 27th Annual Meeting of the German Society for Linguistics (DGfS) in Cologne in 2005; others were invited by the editors. We have decided to introduce a somewhat exceptional or at least rare structural feature to this volume: Each main article is complemented by an invited critical commentary and by a response from the original author(s) (with the exception of the two introductory chapters, which thus constitute a small exceptional subset within the broader exceptional pattern of this book). We believe that enhancing the discursivity of the book in this way makes for a livelier and more fruitful discussion, in particular in the case of a topic that is as central to theory and practice in our eld, and accordingly as controversial, as that of exceptions. The beginnings of this book reach back to a time when we were both Research Fellows of the Alexander-von-Humboldt Foundation, at the University of Vienna and at Yale University respectively, on leave from our shared home afliation at the Department of German Language and Linguistics at Humboldt University; we gratefully acknowledge the support of these institutions. Horst J. Simon & Heike Wiese London & Potsdam, 2010

Introductory overview

What are exceptions? And what can be done about them? Horst J. Simon and Heike Wiese

la question de lexception est un point nvralgique de la linguistique (Danjou-Flaux and Fichez-Vallez 1985: 99)

1.

Exceptions and rules

When modelling data, we want the world to be nice and simple. We would like the phenomena we encounter to be easily categorised and neatly related to each other, and maybe even into causal or at least implicational relationships. However, the world is more complicated. More often than not, when we propose rules in order to capture the observed facts, we nd problems. Certain pieces of data refuse to submit to the generalisations we propose; they stand out as exceptions. Or, to put it the other way round, an exception necessarily implies a rule, which it violates. In what follows we illustrate four central aspects of the complex relationship between exceptions and rules: (i) the underdetermination of rules, and hence the impossibility of avoid exceptions, (ii) the formation of exceptional rules in subsystems, (iii) the interaction of different grammatical levels inuencing rules and exceptions, and (iv) the possibility of having more exceptions than rule-governed instances. 1.1. The underdetermination of rules In a general sense, a rule is a generalisation over empirical observations that allows predictions with regard to data yet to be collected.1 The basic problem with generalisations is, of course, that we never know the future for certain: one can never know that the next bit of data one examines will not be like the
1. Thus, the concept of rule in an empirical science like linguistics must be distinguished from the concept of a social rule, which people are expected to adhere to.

Horst J. Simon and Heike Wiese

data considered before. The reason for this is the fact that a rule is underdetermined by its extension, i.e. by the instantiations of its application. An example of what it means to follow a rule has been discussed by Wittgenstein (1953: 143ff., in particular 185f.): Consider a case where you try to teach someone the rule add 2 for natural numbers by showing her the series 0, 2, 4, 6, 8. The pupil then correctly writes 0, 2, 4, 6, 8, 10, 12, , that is, she can apply the rule to new instances. But when reaching 1000, she might go on 1004, 1008, 1012, . In such a case, the pupil might have extrapolated a rule Add 2 up to 1000, 4 up to 2000, 6 up to 3000, and so on. (185). Both the pupils rule and our rule were compatible with the initial data, i.e. with the series from 0 to 8, hence, an extrapolation of a rule from these data (its instantiations) is underdetermined. Now, since the available data underlying any generalisation are of necessity nite, this is a fundamental problem for the empirical sciences.2 Now imagine a slightly different case (not Wittgensteins example anymore): The pupil sees the same series 0, 2, , 8 and this time extrapolates from this data the rule add 2. However, she then discovers that the series goes on 10, 12, , 1000, 1004. In order to account for this new data, one option she now has is to keep the rule add 2 and mark 1004 as an exception. Another option is to assume a more complex rule, e.g. the one along the lines of Add 2 up to 1000, 4 up to 2000, 6 up to 3000, . In this simple case, the two different rules would make two different predictions that could be tested by further data: In the rst case, the series should then go on 1006, 1008, 1010, ; in the second case, it should go on 1004, 1008, 1012, , 2000, 2006, 2012, . Or it might be the case that something in-between is correct: it might turn out that the series from 1000 to 2000 forms an irregular, exceptional subsystem with a special rule add 4 that only holds in this domain; then the series would go on 1008, 1012, , 2000, 2002, 2004, 2006, . 1.2. Exceptional rules Such in-between phenomena that illustrate the dialectical nature of the relationship between rules and exceptions, can be found, for instance, in a linguistic counterpart of numbers, the formation of number words in natural languages. In most languages of the world, the following generalisation holds: in complex number words of an additive make-up, the constituent referring to the larger

2. There are, of course, general methodological considerations to guide ones generalisation process, for instance Occams Razor, which basically advises one not to add complications to an analysis unless absolutely necessary.

What are exceptions? And what can be done about them?

number comes rst (cf. Hurfords 1975 Packing Strategy). For instance, a decade word (words for the decades 10, 20, 30, , 90) should come before a word for ones (1, , 9), as in English forty-two, not *two-forty, so that we have an order H-L of constituents, where H is the higher number word, and L is the lower one. However, the English teens represent an exception to this rule: number words from thirteen to nineteen follow the pattern L-H where the lower constituent, namely the expression for the ones, precedes the higher constituent, i.e. the decade word (hence, we have thir-teen, four-teen, nineteen). This is in contrast to, say, French, where the order is H-L (dix-sept, dixhuit, dix-neuf) in keeping with the general rule for the order of additive constituents. The English teens hence form a small, exceptional class of their own: given their unied pattern, we can formulate a sub-rule for them, stating that the order of constituents is L-H for teens. What we have here is then an exceptional rule. This rule is restricted to only a few words and deviates from the general pattern of number words in English which follows the usual HL pattern found in the worlds languages. However, there are also languages where the kind of irregular pattern we nd in English teens is more generalised and is used in all number word constructions consisting of a decade word and a word for ones. Examples are other Germanic languages like German or Dutch, but also genetically and typologically unrelated languages like Arabic. In these languages, the L-H pattern holds not only for the teens, but extends to 120, 220, 990. Thus, despite the obvious exceptionality from a typological point of view, we can still nd internal regularity in these languages: for a large exceptional class, we can formulate a rule LO -HD , where O is a number word for ones, and D is one for decades, as a well-dened deviation from the general H-L rule. This rule then supports a special, exceptional subsystem, a subsystem that covers a larger domain than the one in English, and that is absent in French altogether. In this sense, exceptionality is a gradeable and context-dependent concept: elements can be more or less exceptional, and they can be exceptional with respect to a general rule that governs the system as a whole, but non-exceptional with respect to a rule that governs a subsystem. 1.3. The interaction of different grammatical levels The interplay of rule and exception is of methodological and theoretical signicance for any linguistic analysis. It therefore comes as no surprise that the rst major methodological debate in modern linguistics, in the 1870s, centred exactly around this problem. In compliance with 19th century linguists preoccupation with diachronic issues, this so-called Neogrammarian Controversy

Horst J. Simon and Heike Wiese

focused on the hypothesis that Sound Laws are without exceptions.3 Following up on previous achievements of Comparative Indo-European Linguistics, and inspired by possible parallels with the Laws of Physics, the Neogrammarians maintained that:
Aller Lautwandel, so weit er mechanisch vor sich geht, vollzieht sich nach ausnahmslosen gesetzen, d.h. die richtung der lautbewegung ist bei allen angehrigen einer sprachgenossenschaft, ausser dem fall, dass dialektspaltung eintritt, stets dieselbe, und alle wrter, in denen der der lautbewegung unterworfende laut unter gleichen verhltnissen erscheint, werden ohne ausnahme von der nderung ergriffen. (Osthoff and Brugmann 1878: XIII) [All sound change, insofar as it is mechanical, takes places under exceptionless laws, i.e. the direction of the sound movement is always the same with all members of a speech community unless dialect split occurs and all words, in which the sound undergoing the sound movement occurs in the same circumstances, are without exception affected by the change.]

The main initial idea here was that at a certain place in a certain period all words containing the relevant sound (in the relevant phonological environment) would have undergone a particular sound change captured by a certain law; the motivation for such a general change was primarily seen in physiological factors. Later on, the hypothesis was somewhat relaxed by reducing it to a working hypothesis and one which was motivated by considerations from psychology. The greatest triumph of the rigorous Neogrammarian methodology and a conrmation of their basic idea was accomplished by the discovery of Verners Law. Initially, there had remained an embarrassing exception to the outcomes of the First (Germanic) Consonant Shift, or Grimms Law: in this sound shift the Indo-European voiceless plosive consonants /p, t, k/ were fricativised to /f, , h/ (as exemplied by the correspondence of Ancient Greek ao o phr t r and Gothic br ar brother). However, unexpectedly, the equivalent of Greek pat r was Gothic faar father with a voiced fricative.4 Working within e the exceptionlessness-paradigm,5 Verner (1877) could reconcile the deviant facts with Grimms Law decades after its initial formulation. He showed how
3. Neatly documented in Wilbur (1977) and discussed at length in Jankowsky (1972). 4. Modern German still evinces differing consonants in this case: Bruder and Vater, albeit with different voicedness values due to subsequent developments. 5. His main tenet was: Bei der annahme eines zufalls darf man jedoch nicht beharren. [] Es muss in solchem falle so zu sagen eine regel fr die unregelmssigkeit da sein; es gilt nur diese ausndig zu machen (Verner 1877: 101). [However, one must not be content with the assumption of chance. In such a case, there must be, so to speak, a rule for the irregularity; it is just necessary to nd it.]

What are exceptions? And what can be done about them?

these exceptions could be explained by taking into account the position of the word accent in the proto-language: Grimms Law proper applies only if the accent was on the immediately preceding syllable in Proto-Indo-European, otherwise the fricatives are voiced in Germanic: /b, d, g/.6 This case nicely illustrates that exceptions on one linguistic level (in this case, the segmental-phonological system) can be accounted for by competing rules from other linguistic or nonlinguistic levels (in this case, prosodic phonology). Meanwhile, there were a great many diachronic sound laws advanced, which apply the idea of blind, exceptionless sound changes.7 This is not to say, by the way, that all sound change is exceptionless. In fact, sometimes even the exact opposite occurs: so-called sporadic change one that exceptionally occurs in a single example, both unexplained and inexplicable as for instance the loss of /r/ in Modern English speech from Old English sprc.8 What is more, there are also many examples where non-phonological factors interfere with the regularity of a sound change. The most notable of these are analogy,9 lexical diffusion and general sociolinguistic factors.10 McMahon (1994: 21) captures the dialectic relationship of phonology and paradigmatic morphology in what she calls Sturtevants Paradox: sound change is regular but creates irregularity, whereas analogy is irregular but creates regularity. 1.4. Exceptions in the majority One important factor in the interplay of rules and exceptions is that it is not at all trivial to decide which is which given a mass of initially unstructured facts. Often it turns out that what appears to be an exception in one scientic account is an instantiation of the rules in a competing analysis. A case in point is the system of plural formation in German nouns. Nominal plural in German is expressed by a variety of sufxes (-e, -en, -er etc.) as well as by umlaut and zero-sufxation, leading to eight different forms of plural formation. In order to account for the distribution of plural markers over nouns, a number of rules have been proposed in traditional German grammar, making use of features
6. Apparently, this correlation between voicedness and accent is still applicable in Modern German, cf. Hann[f]er vs. Hanno[v]erner; Udolph 1989). 7. Many of the sound changes inside Indo-European are discussed in Collinge (1985). 8. In other words, the initial consonant cluster has been retained in Modern English (as can be deduced from examples such as spring, spray, sprawl etc.), so there is no sound law in the history of English pertaining to the loss of r in speech. 9. Already alluded to in the above quote from Osthoff and Brugmann (1878). 10. Those factors have been discussed amply, and non-conclusively, in the literature, e.g. in Labov (1981) and de Olviera (1991).

Horst J. Simon and Heike Wiese

from different grammatical levels, like nominal gender, number of syllables, or the ending of the singular form. However, these rules can only account for part of the nominal inventory and do not work very well for predictions. There is one plural ending, though, whose distribution can be accounted for more straightforwardly, namely the sufx -s. This plural sufx is used as a default; it turns up whenever there is no existing form already or none that can be formed by analogy, as in a lot of loan words, in abbreviations, and also in proper names. This had led to accounts that characterise the -s sufx as the regular form, while the seven other classes of nominal plural are considered irregular ones that are driven by analogy (Janda 1991, R. Wiese 1996: 136143, Pinker 1999: 211 239). Additional support for the regular status of the -s plural comes from overgeneralisations in rst language acquisition (Clahsen et al. 1992, Marcus et al. 1995). However, the -s sufx is the least common plural form statistically, hence under this view, only a small part of plural formation is regularly rule-governed, while most of it is exceptional: exceptions are more commonly realised than rules the statistical relationship of specic rule and Elsewhererule is turned upside down; curiously, such an analysis echoes a remark in Mark Twains essay, The Awful German Language:
Surely there is not another language that is so slipshod and systemless, and so slippery and elusive to the grasp. One is washed about in it, hither and thither, in the most helpless way; and when at last [the language learner] thinks he has captured a rule which offers rm ground to take a rest on amid the general rage and turmoil of the ten parts of speech, he turns over the page and reads, Let the pupil make careful note of the following exceptions. He runs his eye down and nds that there are more exceptions to the rule than instances of it. So overboard he goes again, to hunt for another Ararat and nd another quicksand. (Twain [1880]1907: 267)

Thus, maybe unfortunately for the language learner (and the language teacher) and fortunately for the linguist who is interested in complex structures, language is not parsimonious. As has already become clear from the examples discussed so far, there are different ways that linguists typically handle the exceptions they encounter in their analyses. In the following sections, we will discuss these in turn.

What are exceptions? And what can be done about them?

2.

Approaches to exceptions

2.1. Ignoring exceptions A common approach to the problems posed by exceptions is to simply ignore them. This can be achieved through more or less sophisticated argumentation. For example, when confronted with an exception to the rule that one has proposed, often the easiest way out is to say that this apparently disturbing fact does not belong to the linguistic system one analyses, using the infamous answering technique: Well, in my dialect .11 As Labov (1972: 292) has noted, [m]y dialect turns out to be characterized by all sentence types that have been objected to by others. In the statistical analysis of data, doing away with exceptions is part of a reasonable methodology: in any empirical study, one has to take into account that the collected data can be spoiled for a variety of reasons. In order to minimise unwanted statistical effects due to bad data (which appear as a kind of exceptions to the general picture), one usually abstracts away from what is called outliers, i.e. the most deviant pieces of data on any given test item; they are held to likely be mistakes or other irrelevant phenomena. 2.2. Re-analysing exceptions Another type of exceptional data not mentioned so far can be entire languages. In cross-linguistically informed typological linguistics, where correlations between logically independent facts are investigated, one rarely nds downright universal phenomena; the formulations of the statistical universals are usually hedged by phrases like with overwhelmingly greater than chance frequency, thereby allowing for a small number of languages behaving not as expected (cf. Dryer 1998 for discussion). One type of exception one frequently encounters in linguistics is the odd language that does not follow the generalisations made in large-scale crosslinguistic investigations. Thus, in linguistic typology, Greenberg-type universals are extrapolated from large databases: basically they are predictions of occurrences of a certain structure, or rather predictions of the fact that they do not occur or can occur only under specic conditions. However, there almost

11. Obviously, the background assumption in such a strategy is that data from a different micro-variety need not be taken into account since [l]inguistic theory is concerned primarily with an ideal speaker-listener, in a completely homogeneous speech-community, who knows its language perfectly (Chomsky 1965: 3).

10

Horst J. Simon and Heike Wiese

always appear to be a few, some, or just single examples of languages where the structure in question does in fact exist in a forbidden context. While singleton languages not displaying the usual phenomena are interesting laboratories for the typologist, who can then seek to nd an explanation as to what functional or other factors have played a role in creating such a peculiar system, such languages can be a great challenge for the formal linguist. Especially in the Chomskyan tradition with its strong emphasis on explanations on the grounds of a genetically endowed Universal Grammar (UG), languages with exceptional grammatical peculiarities pose problems.12 Since grammatical distinctions, be they universal or only the speciality of a single language, should be captured by UG mechanisms in order to be acquirable by the child, the UG component becomes more unwieldy if it has to cater for all those exceptional characteristics. Because of this extra burden that exceptions put on the language faculty, within this framework it is very much desirable to show that any account that assumes exceptions is awed and can be replaced by one that re-analyses the phenomena in a way that exceptions disappear. To give an example of an allegedly exceptional trait that turned out to be a chimera on closer inspection:13 all recently proposed morphological feature inventories designed to capture the various systems of person-number combinations in the pronouns of the world (cf. Harley and Ritter 2002 and subsequent work) have difculties when it comes to distinguish the putative clusivity contrast in second person plural pronouns, i.e. the difference between a set of only addressees on the one hand and a group comprising addressee(s) and other(s) (non-speech-act-participants) on the other. While some authors have categorically denied the existence of such a distinction, others have claimed to have found exceptions to the statement no language distinguishes clusivity in the second person. A closer look at these purported exceptional systems revealed, however, that in each case there had been some kind of mistake in the transmission of the data: in the case of South-East Ambrym, the original author had inadvertently conated two geographically distinct dialects in his paradigms; with regard to Abkhaz (and a number of other, unrelated languages), an essentially emphatic sufx had been misinterpreted as involving clusivity; in the descriptions of Ojibwe (and a few other languages), the term second person inclusive
12. In this context, a reviewer has mentioned Newmeyers (2005) discussion of the relationship between typology and UG. In our understanding, however, this is somewhat beside the point, since Newmeyer seems to be concerned with broad-sweeping typological generalisations and how they can be accounted for, not necessarily with the explanation of potential individual counter-examples. 13. This example is more thoroughly discussed and documented in Simon (2005).

What are exceptions? And what can be done about them?

11

had been used in a terminologically confused (and confusing) way. In short, a careful study of the details of the particular pronominal systems (aided by philological and dialect-geographic information) could show that the exception in question actually vanished when studied more closely.14 In this case, then, exceptions could be analysed away by a more careful look at the data; there is no need to complicate the model of morphological features (Simon 2005). Nonetheless, exceptions frequently do refuse to go away. In a typological vein, several phenomena have been recorded among the languages of the world that are rara, rarissima or even none-suchs.15 2.3. Integrating exceptions In the study of a single language, one also frequently encounters a set of data that cannot be handled straightforwardly by the usual grammatical rules of that language. In that case, at least two principal options arise for the researcher: s/he can restrict the domain of the main rule and allocate the exceptions to a special sub-component of the grammar (a sub-rule, e.g. the one discussed above for decade number words, or, more extremely, the Lexicon); or s/he can design the grammatical apparatus in a softer fashion so that the exceptions can be accomodated in the main component itself. We will demonstrate these options in turn. In the rst kind of approach, one reduces the scope of the main grammatical rule when one nds data contradicting it. In this case, one denes a smaller domain for the rule and makes no prediction for the rest, i.e. for the space of the exceptions. Thus one gets a core grammar and some kind of periphery where the normal rules simply do not apply. A case in point are interjections, response particles and the like, which allow for phonological, morphological, syntactic and semantic structures that are otherwise ruled out in the language.
14. Ironically, new possible evidence for the category just discussed and thus possibly an exceptional linguistic trait , which was brought forward by Simon (2005) from Bavarian, has meanwhile also been disputed by Gehling (2006). Here, in fact, the debate revolves around the question where to draw the boundary between grammar and pragmatics the question of how much of politeness and other pragmatic factors needs to be incorporated into a referentially-oriented grammatical system. 15. They have been collected in the internet archive Das grammatische Rarittenkabinett (The grammatical rarity cabinet) at the University of Constance, searchable at: http://typo.uni-konstanz.de/rara/intro/. There one can also nd the Universals Archive which lists not only the typological universals proposed, but also pertinent counter-examples that have been noted in the literature. Cf. also the recent collections of studies on this matter: Wohlgemuth and Cysouw (2010a,b).

12

Horst J. Simon and Heike Wiese

For instance, this is the domain where German allows for non-vowel syllable nuclei of whole words ([pst], [ P P] / [ P m P m])16 . A version of that domain-restricting strategy is the strong reliance on a notion of the Lexicon as a storage space for all the peculiar characteristics of lexical items. In consequence, the syntactic (or phonological, or semantic) component proper is freed from all complications. Any idiosyncrasies are relegated to the individual lexical entries, where they are, by denition, not exceptions but only specic lexicalised properties. Rather than relegating exceptions to the lexicon altogether, one can dene a specic rule for them that creates an exceptional subset, and an Elsewhere-rule for the rest. This can account for exceptions that constitute sub-systems of their own, which is quite common, since exceptions tend to cluster. However, as the example of German plural formation illustrated, it may not always be an easy task to decide which sub-system represents the main rule and which one the exceptional rule.17 An altogether different approach to the problem of exceptions is the second kind of approach mentioned above, which is based on the notion of a softer grammar, a grammar without hard contrasts, where exceptions pose much less of a problem. One way of achieving this is to build the model of grammar on prototypes. In doing so, one denes focal elements, which combine a number of key characteristics. Grammatical items will be more or less similar to these prototypes; those that are least similar are what used to be called exceptions; they have now turned their status into non-prototypical members of their category. The obvious advantage of such an approach is the great exibility and cross-categorial cohesion it creates. A potential disadvantage is that its lack of clear-cut distinctions makes it hard to formalise, and brings with it the risk that useful distinctions might be blurred (for discussion cf., e.g., Tsohatzidis 1990 and Aarts et al. 2004). Another exible approach to the problem of exceptions in grammar is to allow different rules to compete with each other. This means that one will not have a single, denite prediction but that several, possibly graded, alternatives arise.
16. The second example is the phonological representation of a colloquial variant of the negative response particle (ie., the counterpart of no) (cf. H. Wiese 2003 for a detailed discussion of the exceptional status of interjections). 17. A comparable situation holds for diachronic facts: system-internally, it is far from clear whether English has a rule that results in the loss of /r/ in certain syllable positions (cf. bass, equivalent to Modern German Barsch), and the r is occasionally retained as in horse, or whether horse is what one expects and r was lost exceptionally in bass; only facts of extra-linguistic history (and consequently probable language contact scenarios) help to clarify the situation (cf. Hoenigswald 1978: 26).

What are exceptions? And what can be done about them?

13

In a subcase of this scenario, rules from different grammatical sub-systems access the same domain so that, for instance, regularities from semantics and from syntax are in competition. Consequently, the phenomenon at hand is an exception in one system, but is predicted in the other system. Take, for instance, case assignment of some psych-verbs in German, like frieren to feel cold. A sentence with a psych-verb like (1) poses a problem for a syntactic account of German. (1) Mich friert. 1sg.acc freeze.3sg I am freezing. (lit.: Me freezes.)

In (1) the only argument slot is occupied by an accusative pronoun. So, do we have an accusative (or ergative) subject here, in contrast to what we nd in German sentences as a rule? Against this analysis, we nd no person-number agreement between the pronoun and the nite verb. So, have we instead got an entirely subject-less clause? This would constitute an exception in the syntax of German as well.18 But despite this syntactic anomaly, the structure makes perfect sense from a semantic point of view: The Experiencer-role is typically coded by dative or accusative case,19 whereas the nominative subject of a clause is typically an Agent. In this example, there is a mismatch between the syntactic and semantic requirements of a normal clause; the two components compete with each other. The syntactic requirements are fullled when the sentence is coded as in (2), a construction that is more common in modern German, replacing the subject-less alternative illustrated in (1). In this case, we also get the syntactically expected subject-verb agreement. The morphosyntactic systems gain is, however, semantics loss, because of the unusual correlation of case and semantic role.

18. The variant Es friert mich gives evidence of a rescue strategy available in this case: the use of an expletive subject whose only function seems to be to rectify the exceptionality of (1). 19. As in Ich streichle ihm dat - exp den Bart. (lit.: I stroke him the beard., I am stroking his beard) and Ich lehre ihn acc - exp singen. (lit.: I teach him sing., I teach him to sing).

14 (2)

Horst J. Simon and Heike Wiese

Ich friere. 1sg.nom freeze.1sg I am freezing.

The observed pattern extends to other examples of relatively recent change as well, showing that we are dealing with a real if exceptional sub-system in the case system of German. (3a) vs. (3b) shows a similar phenomenon for the case of denken to think, where the development towards morpho-syntactic regulation is even more advanced that is, (3a) sounds already archaic and is rarely used in contemporary German anymore presumably driven by the more agentive status of the (Experiencer-) role that denken assigns compared to frieren:20 (3) a. b. Mich 1sg.acc Ich 1sg.nom I think. dnkt. think.3sg denke. think.1sg

What this development illustrates, then, is the interplay not only of exceptions and rules, but of exceptions, rules, and grammatical (sub)systems: what appears as an exception in one system can be perfectly in accordance with a rule from another system. Such an interlocking network of rules (or rather: constraints) each of them violable is focussed on in approaches within Optimality Theory: in this framework, the rules themselves need not be modied, they are just seen to be operating on different levels of grammar, and taking different dominance over each other. 3. Why are there exceptions? How do they arise, and how do disappear?

3.1. The emergence of exceptions At a rst glance, one should think that a language system without exceptions would be best. And indeed, that is what one roughly gets at least at the beginning when people invent an articial language, such as Esperanto.21

20. Moreover, the obsolete form of the verb is replaced by a newer, more regular one. 21. Cf. Hagge (2005) for discussion.

What are exceptions? And what can be done about them?

15

However, since natural languages are biological systems they are susceptible to evolutionary change (cf. e.g. Ritt 2004); it is only natural that they evolve gradually. In such a view of language change that crucially involves the idea of bricolage tinkering with what happens to be at hand (cf. Lass 1997: 313 316) small-scale incremental changes necessarily produce structures that are exceptions to the system before the change.22 So how exactly do exceptions come into existence? We will discuss two major scenarios: rst, the interplay of different levels of grammar can create complexity and irregularity on one level when changes occur on another level; second, changes due to extra-grammatical factors can unbalance the distribution of forms in a grammar. An illustration for the rst kind of scenario comes from a part of the grammar of English and German that appears quite confusing and exception-laden today, but started out as a fairly regular component in Proto-Indo-European: the group of co-called irregular verbs. This group comprises for the most part what historical linguists call strong verbs, i.e. those verbs which form their past tense and their past participle forms with ablaut of the stem vowel. In Contemporary German, this area seems to be hardly rule-governed at all. The 5th edition of the Duden-grammar of Modern German (Duden 1995), for instance, lists as many as 39 ablaut-classes for the ca. 170 ablauting verbs (p. 125),23 several with only one verb that follows the particular pattern each of them being an exception to all others so to speak. By contrast, the system of ablaut was entirely regular in an early variety of Indo-European.24 and still fairly predictable
22. Taken seriously, this fact contradicts the research methodology of strict structuralism (purporting to analyse un systme o tout se tient) as it is most succinctly stated by Beedham (2005: 153): Yet exceptions do exist, so how do they arise? It seems to me that they arise to the extent that we, the grammarians, have got it wrong. We introduce them from outside with rules that are not quite right. If a rule is 100% correct it will have no (unexplained) exceptions whatsoever, if it is almost right it will have a smaller number of exceptions, and if it is badly wrong it will have lots of exceptions. Reasonable as such a view may seem as a methodological premise, in the light of the inevitability of exceptions in diachrony, it will have to be discarded. 23. To be fair, the most recent edition brings some systematisation into this list (Duden 2005: 458461). 24. That is at least the picture one gets if one subscribes to the not uncontroversial laryngeal-hypothesis for Pre- or Proto-Indo-European (cf. Lehmann 1993); otherwise, more traditionally, some form of accentual difference will have to be taken as the decisive factor. For a description of the fate of ablaut in the history of German cf. Nbling et al. (2006: 199209); Mailhammer (2007) provides a new systematisation of Germanic ablauting verbs.

16

Horst J. Simon and Heike Wiese

in Old High German, when the phonological make-up of the stem determined to which of the seven ablaut-classes a verb would belong. The break-up of this old morphologically regular system seems to be due to a phonological change: the loss of laryngals or a prosodic change. Hence in this case, an independent development in phonology creates exceptions on the morphological level. Similarly, the loss of phonological distinctions in nal syllables between Old and Middle High German obscured the phonological trigger for umlauting in German morphology. Therefore umlauting became free to be a purely lexically based morphological process, for example in the formation of nominal plurals.25 In this way an irregularity effect was created: only some nouns take umlaut in their plural, cf. Faden sg Fden pl thread vs. Fladen sg Fladen pl at bread.26 An example for the second kind of scenario is provided by the virtual disappearance of the second person singular pronoun thou in Standard Modern English, where pragmatic and sociolinguistic factors were responsible for the spread of one form at the expense of another, thus creating a typological exception in the English pronominal system. In Middle English, and well into Shakespeares time, there was a politeness distinction in English pronouns of address comparable to that of Modern French or Russian. There were two second person pronouns: the (informal) singular form thou and ye/you, which was employed in second person plural reference and also when a single person was to be addressed politely. In a kind of inationary process, the usage of you-forms then became more and more generalised, so that thou was relegated to the fringe, used only in very restricted circumstances, such as certain religious contexts. Therefore, a kind of markedness reversal due to a sociolinguistic overgeneralisation took place: the relationship between marked and unmarked, between exception and rule was turned upside-down, so that what used to be the unmarked form, the informal thou, became an exception, while the more marked form, the formal ye/you, became the rule. A side-effect of this generalisation of you is that Standard Modern English stands out among the languages of the world as having number distinctions in nouns and pronouns in general, but not in second person pronouns.27

25. Cf. Sonderegger (1979: 297319) for a detailed description of this development. 26. With the additional complication that for some nouns there exists regional variation as to whether they take their plural with or without umlaut, e.g. Wagen car. 27. According to Cysouw (2003: 118), this situation is not common at all.

What are exceptions? And what can be done about them?

17

3.2. The disappearance of exceptions Given the situation in Modern English with a double-fold exception in the pronominal system the lack of a number opposition in the second person plural is unusual both from a cross-linguistic point of view and language internally since English does otherwise encode number quite rmly it is not surprising that many non-standard varieties of English repair their systems: they create new plural forms by morphological reinforcement: yall, youse etc. (cf. Hickey 2003). Thus, the exceptional gap in the paradigm is lled again). In general, two potential diachronic scenarios for the gradual disappearance of an exceptionality in a language system are conceivable: either the sub-class forming the exceptional trait loses some or all of its members, or the sub-class is strengthened, thereby creating a stronger, less exceptional sub-system of the language. The basic mechanism is here that a set of exceptions exhibits a certain internal regularity, which is signicant enough to attract new members gravitating to that group. Again, we illustrate these two possibilities with developments in the verbal system of German. An example for the rst case is the abovementioned exceptional class of strong verbs that is overall on the decline in German: Sound change has obscured its phonological basis; new verbs entering the language as loan words are automatically assigned to the weak class; some strong verbs, mostly the less frequent ones, undergo inection class changes and lose their ablaut-formation over time, so for example in a relatively recent case with backen to bake and melken to milk.28 Thus, the relative frequency of the two sub-classes of verbs (strong vs. weak) has reversed since the creation of the latter in ProtoGermanic.29 A phenomenon illustrating the second case is the integration of the German verb brauchen need into the group of modal verbs. Unlike in the case of psych-verbs discussed above, where syntactic regulation overruled semantics, in this case, the morphosyntactic development is driven by semantic pressure. Since this development might be less well-known than some of our other examples and because it is currently happening under our very eyes, we discuss it in somewhat more detail.

28. Here, the old past tense forms buk and molk have practically died out, in favour of regular backte and melkte. 29. On a comparative note it is worth mentioning that the other Germanic languages follow a similar diachronic drift; in the extreme case of Afrikaans the strong-weak (i.e. irregular-regular) distinction has been lost almost completely, viz. outside the auxiliary system.

18

Horst J. Simon and Heike Wiese

Three core grammatical properties of modal verbs in German are important for our understanding here. First, they are exceptional with respect to inection: given their origin as old preterite presents, modals lack the usual nal morpheme -t in the third person singular of the present tense indicative (5a), in contrast to regular verbs (5b): (5) a. b. Sie she Sie she muss / kann / darf must can may sagt / macht / singt says makes sings

Second, modals display a syntactic peculiarity in that they subcategorise innitive phrases without the complementiser zu to (6a), unlike many but not all non-modal verbs with innitival complement (6b): (6) a. b. Sie she Sie she muss must hofft hopes singen. sing zu singen. to sing [modal verb] [non-modal verb]

Third, when used in the (periphrastic) perfect tense, modals exhibit the so-called IPP-effect:30 basically this means that instead of an expected past participle, the modal occurs in the innitive: (7) a. Er hat singen msseninf . mustinf 3sg has sing b. *Er hat singen gemusstp ii . 3sg has sing mustp ii He has had to sing.

brauchen need shares a central meaning aspect modality with modal verbs when used with an innitive: apart from its use with a nominal complement (as in Sie braucht einen Regenschirm. She need an umbrella.), this verb can also be used with an innitival complement, in particular under negation (or in the context of a restrictive particle like nur).31 Unlike the core set of modal
30. I.e., innitivus pro participio, also known as Ersatzinnitiv. 31. Unlike its English counterpart, the negation of mssen usually takes wide scope over the whole sentence, not just over its complement, hence Er muss nicht singen. (lit.: He must not sing.) does not mean He must: not sing., but rather Not: he must sing.. Negation of brauchen takes wide scope as well, while having a weaker meaning, along the lines of English He need not sing. The domain of English must not

What are exceptions? And what can be done about them?

19

verbs, however, brauchen does not go back to an old preterite present, and accordingly, in compliance with its more distant origins as a normal transitive verb, should behave as a regular verb morpho-syntactically; that is, sufx nal -t in the third person singular and select an innitive with zu. And this regular behaviour is exactly what one nds in most instances of written language usage, as in (8): (8) Er braucht nicht zu singen. 3sg needs not to sing He need not sing.

This construction, however, is presently developing into one that agrees with the irregular modal verb pattern, as illustrated in (9): no -t and no zu: (9) Er brauch nicht singen. 3sg need not sing He need not sing.

Moreover, in perfect tense constructions, the IPP-effect comes into force: (10) Er hat nicht singen braucheninf . 3sg has not sing need He hasnt needed to sing.

At present, this is found predominantly in Spoken German, but it appears more and more in written varieties as well (cf. Askedal 1997). Note that there is no phonological motivation for the loss of nal -t in German, which is shown by the fact that -t never fails to occur with, e.g., rauchen to smoke despite the phonological near-identity of the verbs: (11) Sie raucht / *rauch. 3sg smokes smoke She smokes. [regular verb]

Thus, what we are witnessing at the moment with the spread of the type Er brauch nicht singen is the integration of a regular verb into a morphosyntactically irregular, exceptional subsystem, based on shared semantic features: From the general point of view of the morpho-syntax of German verbs, brauchen becomes exceptional it develops from a regular verb into one with
with narrow scope is covered by German drfen may / to be allowed to, e.g. Er darf nicht singen. (i.e.: He must not sing. / He is not allowed to sing.

20

Horst J. Simon and Heike Wiese

irregular features but from the point of view of modal verbs, brauchen becomes regularised, being integrated into their specic, exceptional, subsystem. This development demonstrates the power of the system not only in the case of the overall, more general system here: verbal morpho-syntax but also in the case of subsystems constituted by irregular forms that present an exception from the point of view of this general system. In sum, brauchen exemplies the interaction of different grammatical levels in the development of exceptions, in this case semantically-driven morphosyntactic integration.32 3.3. Morphology as a locus of exceptions Exceptions typically spread unevenly over the grammatical system as a whole, i.e. not all grammatical sub-systems are equally prone to exceptionality. In particular, the status and make-up of morphology as a central organisational device in the interaction of grammar and lexicon makes it open to the development of exceptions. Morphology is often considered as an evolutionarily earlier domain for the construction of complex linguistic elements than, say, syntax (cf. Fanselow 1985; Jackendoff 2002). In comparison to syntax, the interpretation of complex forms in word formation is underdetermined by their constituent structure and less driven by strict rules of syntactic-semantic cocomposition; instead, it makes more use of contextual information. This is evident, for instance, in the case of determining the semantic relation between constituents of a compound. Take again a German example, the nominal compound Fischfrau sh-woman. This word can mean woman who sells sh, wife of a sh, woman whose zodiac sign is pisces, mermaid, and a number of other things Heringer (1984: 2) lists ten possible meanings all we know from the make-up of the compound is that there has to be some relation between a woman and a sh or shes, but not which one.33 In comparison to syntax, morphology is also less characterised by clearcut classes with particular dening features and more often based on protopatterns that form the basis for classes that are driven by associations. This can often lead to deviations from general patterns and the formation of exceptional
32. The import of semantics on the development of this particular domain of German morphology is further shown by the following fact: Old High German had a few more verbs which behaved morphologically as preterite presents (e.g. turran to dare); among those verbs only the ones that belonged to the semantic sub-class of modals have survived into the present form of the language. 33. Similarly, note in English the difference between a pork butcher and a family butcher.

What are exceptions? And what can be done about them?

21

subsystems. An example coming from inectional morphology is the formation of tense forms of irregular verbs in English (cf. Jackendoff 2002: ch. 6) and German (cf. Beedham 2005). Since complex morphological constructions are often semantically underdetermined, the formation of such patterns can be based on aspects of meaning of the elements involved. This holds for semantic as well as pragmatic aspects. The example of brauchen need above illustrated a case where the development into an inectionally irregular verb is driven by the semantic afliation with elements of a morpho-syntactically exceptional subsystem. An interesting example from morphopragmatics comes from diminutives in English and German (cf. H. Wiese 2006). The diminutive afxes -chen and -i in Contemporary German and similarly -ish in English exhibit some exceptional, erratic behaviour from the morphosyntactic point of view, although they present a unied picture on the morphopragmatic side. On the morpho-syntactic level, no clear classication of diminutive sufxes as heads or modiers is possible. They act as prototypical heads (not just relativised heads in the sense of Di Sciullo and Williams 1987) with some stems, while with other stems, they behave like prototypical modiers. In (12), English -ish and German diminutives -chen and -i behave as adjectival or nominal heads, respectively, with adjectives, nouns, quantiers, and verbs as a basis: (12a) (12b) [yellow]A ish]A , [child]N ish]A , [fty]Q ish]A [Hnd]N chen]N / [Hund]N i]N dog-dim, i.e. doggy, [Lieb]A chen]N dear-dim, i.e. dearie, [Schnpp]V chen]N grab-dim, i.e. bargain

However, in (13), German diminutive sufxes behave as prototypical modiers with particles as a basis, in particular with greeting particles (GP) and answer particles (INT) in informal speech: (13) [Tschss]GP chen]GP / [Tschss]GP i]GP bye-dim,[OK]GP chen]GP okdim, [OK]INT chen]INT OK-dim, [Jau]INT i]INT yes-dim

And, likewise in informal contexts, English -ish can be used as a modier, albeit as one that is even more of an outlier from a morphosyntactic point of view: it can be used not only with a morphological stem, but also with a syntactically complex phrase, thus neglecting a crucial syntactic distinction: (14) a. Nikki and I woke up at quarter-to-eight-ish. [data from internet forum: http://www.exposedbrain.com/archives/000301.html; 4/5/2005]

22

Horst J. Simon and Heike Wiese

b.

Breakfast: 8am 2pm ish [menu of Little Debs Caf, Provincetown, MA, 2000]

This makes these diminutives highly exceptional sufxes from the morphosyntactic point of view. However, their erratic behaviour turns out to be more systematic when viewed from a morphopragmatic perspective: on the pragmatic level, diminutives contribute the notions of informality or intimacy (cf. Dressler and Merlini Barbaresi 1994), and it is this expressive component that the morphosyntactically exceptional distribution of chen and i in (13) and of ish in (14) draws on. Hence, the possibility of directly involving pragmatic aspects in morphology can lead to the establishment of morpho-syntactically exceptional subsystems: in this case, the hybrid syntactic status of diminutive sufxes inbetween head and modier. Strangely, there are also cases where morphology itself seems to be the source of exceptional behaviour at the higher syntactic level a phenomenon which runs counter to the otherwise well-established, though not universally accepted34 principle that syntax cannot see the internal word-formational makeup of the lexical items it deals with, known as the Lexical Integrity Hypothesis (Di Sciullo and Williams 1987: 48). Perhaps the best example for this is the erratic behaviour of certain complex verbs in German that fail to appear in V2position (that is, in the standard position for verbs in assertative main clauses) (16a, 16b), but are perfectly ne at the end of a clause (the position of verbs in subordinate clauses) (16c): (16) a. *Das Flugzeug not-landet in Paris. the plane emergency-lands in Paris The plane makes an emergency landing in Paris. b. *Das Flugzeug landet in Paris not. the plane lands in Paris emergency The plane makes an emergency landing in Paris. c. , weil das Flugzeug in Paris not-landet. because the plane in Paris emergency-lands because the plane makes an emergency landing in Paris.

The verbs concerned are word formation products in some way or other (from back formation, conversion, incorporation or double-prexation). So, what we see here is a syntatic exception that is governed by the morphological make-up of its constituent parts. But this alone cannot be sufcient; other factors such
34. Cf. Spencer and Zwicky (1998: 46).

What are exceptions? And what can be done about them?

23

as potential analogy to particle verbs seem to play a role as well, cf. the much better acceptability of (17):35 (17) Das Flugzeug landet in Paris zwischen. the plane lands in Paris between The plane makes a stop-over landing in Paris.

In sum, morphology appears to be a prime locus for exceptions. It is the central part of the grammatical system, determined by and partially determining exceptionality in grammatical structure. 4. The signicance of exceptions what this book has to offer

As we have seen, the study of exceptions is relevant for linguistic theory on a substantial level. In linguistics, like in all areas of science, the pursuit of scientic knowledge implies the creation of abstractions, which are then formalised in rules, or constraints etc. This will, as a matter of principle, lead to a potential for exceptions at all levels involved, i.e. on all grammatical levels phonology, morphology, syntax, semantics as well as in their interaction with each other and with pragmatics and other extragrammatical areas. Even if the researcher takes it as a methodological principle that exceptions must not be postulated unless absolutely necessary, there are many cases when deviant facts cannot be accomodated in a simple and elegant model.36 Such a challenge generates a range of approaches and can lead to new insights into the nature of the linguistic system and its (internal and external) interfaces. The analysis of exceptions can be instructive in at least two respects. Firstly, from a methodological point of view, the treatment of exceptions will highlight different ways of dealing with empirical data, each leading to a different status of the concept of rule in the respective theory. Secondly, from the point of view of the linguistic system, exeptions show us what kind of system language is: an arrangement of interlocking structures, each of them more or less exible, always in ux such that variation and change are possible. In the present book, we have collected studies that tackle the problem of exceptions from a number of different angles. Most papers (and the commen35. Contrary to what some studies suggest, it is still not clear what exactly it is that determines the status of a given lexical item as a non-V2-verb; cf. Freywald and Simon (2007) for a brief overview and some empirical investigations. 36. Maybe this holds even more for linguistics than for other areas of science, given the curious duality of language as both biologically and culturally determined.

24

Horst J. Simon and Heike Wiese

taries we have invited on them) focus on syntactic phenomena, but there are also discussions of morphology and of phonology as well as of languages as macro-structures. The introduction of this book consists of two parts. The present introductory chapter is complemented by a paper by Edith Moravcsik, who surveys possible approaches to the problems exceptions pose, with a focus on syntactic theory; her taxonomy of exceptionality in language and how linguists cope with it can serve as a basis for all further discussion of the subject. The main body of the book is then divided into four parts. The papers in the rst part take a closer look at the area where exceptions are traditionally taken to be stored: the lexicon. This is the designated location of word-based exceptionality in a language,37 comprising morphological as well as phonological idiosyncrasies. The papers in the second part discuss the interrelation of grammatical subsystems, in particular syntax and semantics, but also syntax and extra-grammatical aspects such as processing. The third part is dedicated to a common method to accommodate exceptions: relaxing the systemconstituting elements of grammatical structure, be they conceptualised as e.g. rules or as constraints. The fourth and nal part provides a statistically informed consideration of wholesale exceptionality (or unusualness) of languages as such. In whole, the papers in this book (and the respective comments and responses) offer a multi-faceted body of work on the signicance of exceptions for linguistic theory. They show the potential of different approaches to capture grammatical exceptions; and they demonstrate how the study of exceptions can be productive for the development of new grammatical models and new perspectives on grammatical systems. This is true for a number of controversial claims in current linguistic research: First, there is more to the systematic study of language than just grammar: aspects of linguistic structure interact with external systems, and thus exceptions can be explained. For instance, morphologically exceptional (i.e. irregular) structures emerge because processing pressures such as the production need to be phonologically brief act on the demand to produce informationally distinct forms, as discussed by Damaris Nbling in her contribution on Germanic verbal morphology. Similarly, Frederick J. Newmeyer invokes parsing strategies in his explanation of cross-linguistic variation and exceptional patterns therein. Second, within the system, grammatical subcomponents can interact in such a way as to enhance the stability of exceptions, which then resist regularization:
37. And some linguists would maintain that all exceptionality is word-based.

What are exceptions? And what can be done about them?

25

e.g. a certain class of oblique subjects in Icelandic and Faroese was reinforced by its semantic coherence and has thus survived into the present systems, as is argued by Jhannes Gsli Jnsson & Thrhallur Eythrsson in their contribution. A grammar-internal view on exceptionality can also lead to the adoption of softer models of grammar, which can incorporate seemingly exceptional cases as instances of less central structures. As Sam Featherston demonstrates, such a way of thinking is well-qualied to tackle the problem of grammatical gradience the fact that there are grey zones of more or less severe awkwardness between fully acceptable and inacceptable structures. Several contributions in this book are apt to challenge our traditional views of the notion of exception as they discover new kinds of exceptions. Thus, Thomas Wasow, T. Florian Jaeger and David M. Orr identify exceptions in language use that one notices when taking into account quantitative data; again, these exceptions can be accounted for in terms of processing and other extragrammatical factors. Ralf Vogel, by contrast, invokes a population-based notion of exception: he uncovers different tolerance levels on the part of native speakers of German with regard to contradictory case-information in free relative clauses; from such a perspective, exceptions exist in the minds of certain speakers, but not others. Greville G. Corbett discerns a kind of (higher-order) hyper-exception that occurs when different types of exception come together and interact with each other; he thus underlines their great importance especially those linguistic structures that are extraordinarily rare for the understanding of what is possible in human language. Frederik Fouvry, in turn, takes a broad view of exceptions: while traditional approaches in computational linguistics tend to treat both production errors and linguistic exceptions as extragrammatical structures that are neither covered nor possible to cover, by the grammar formalism, the alternative apparatus he proposes captures all types of deviance from the expected data: exceptional but acceptable idiosyncrasies of a data set and mere errors are both encompassed within one arrangement of constraints that can be relaxed as needed to accommodate them. A more conservative line of reasoning is followed by Bar Kabak and Irene s Vogel. Concentrating on patterns of vowel harmony and stress assignment in Turkish, they maintain that it is not possible to determine an externally motivated sub-class of exceptional words, such as loan words or names; therefore employing a component of lexical prespecication in the grammar is unavoidable, which basically reverts to the traditional idea that every word must be treated on its own terms. Finally, Michael Cysouw has a different focus in his contribution: instead of looking at instances of exceptional grammar in individual languages, he zooms out and takes on a macro-perspective by looking at patterns of co-ocurrence of

26

Horst J. Simon and Heike Wiese

rare, exceptional traits in a multitude of languages: according to his ndings, there are some clusters of linguistic exceptionality, or unusualness, in certain areas of the world, among them North-Western Europe, whose languages form the basis of most of the theorising in contemporary linguistics. Taken together, the contributions to this volume explore a range of new avenues to an understanding of exceptions: they probe deeper into the analysis of already established grammatical exceptions, they re-dene and develop further the notion of exceptionality, and they invoke a variety of concepts to describe the formation of exceptions and to explain their existence in grammatical systems. While they are understood to be rare and thus in need of special efforts to be grasped, exceptions are expected in the various models utilized either because of some grammar-internal competition or because of extra-grammatical factors bearing on grammar proper. Needless to say, because of the exceptionally complex phenomenon of exceptions, it can be expected that not every linguist will agree with the analyses and models offered. But in any case, we expect exceptions to keep fascinating linguists who are keen to understand the workings of language. Il serait absurde de dire que lexception est mieux traite dans une perspective que dans lautre. (Danjou-Flaux and Fichez-Vallez 1985: 116) Abbreviations
a acc dim gp inf adjective accusative diminutive greeting particle innitive int n nom p ii sg interjection noun nominative 2nd participle singular

References
Aarts, Bas, David Denison, Evelien Keizer, and Gergana Popova 2004 Fuzzy Grammar. A Reader. Oxford: Oxford University Press. Askedal, John Ole 1997 brauchen mit Innitiv. Aspekte der Auxiliarisierung. Jahrbuch der ungarischen Germanistik 1997: 5368. Beedham, Christopher 2005 Language and Meaning. The Structural Creation of Reality. Amsterdam/Philadephia: Benjamins (Studies in Functional and Structural Linguistics 55). Chomsky, Noam 1965 Aspects of the Theory of Syntax. Cambridge (MA): MIT Press.

What are exceptions? And what can be done about them?

27

Clahsen, Harald, Monika Rothweiler, Andreas Woest, and Gary F. Marcus 1992 Regular and irregular inection in the acquisition of German noun plurals. Cognition 45: 225255. Collinge, N.E. 1985 The Laws of Indo-European. Amsterdam/Philadelphia: Benjamins. Cysouw, Michael 2003 The Paradigmatic Structure of Person Marking. Oxford: Oxford University Press. Danjou-Flaux, Nelly, and lisabeth Fichez-Vallez 1985 Linguistique taxonomique et grammaire gnrative. Le traitement de lexception. Langue Franaise 66: 99116. Di Sciullo, Anna Maria, and Edwin Williams 1987 On the Denition of Word. Cambridge (MA)/London: MIT Press (Linguistic Inquiry Monographs 14). Dressler, Wolfgang U., and Lavinia Merlini Barbaresi 1994 Morphopragmatics. Diminutives and Intensiers in Italian, German, and Other Languages. Berlin/New York (Trends in Linguistics. Studies and Monographs 76). Dryer, Matthew 1998 Why statistical universals are better than absolute universals. Chicago Linguistic Society 33, 123145. Duden 1995 Grammatik der deutschen Gegenwartssprache, Gnther Drosdowski (ed.). 5th edition. Mannheim: Dudenverlag (Duden 4). Duden 2005 Die Grammatik, Dudenredaktion (ed.). 7th edition. Mannheim: Dudenverlag (Duden 4). Fanselow, Gisbert 1985 Die Stellung der Wortbildung im System kognitiver Module. Linguistische Berichte 96: 91126. Freywald, Ulrike, and Horst J. Simon 2007 Wenn die Wortbildung die Syntax strt: ber Verben, die nicht in V2 stehen knnen. Verbale Wortbildung im Spannungsfeld zwischen Wortsemantik, Syntax und Rechtschreibung, Maurice Kauffer and Ren Mtrich (eds.), 181194. Tbingen: Stauffenburg (Eurogermanistik 26). Gehling, Thomas 2006 Die Suche nach dem anderen ihr. Zur Inklusiv-Exklusiv-Distinktion in der Zweiten Person. Einblicke in Sprache, Festschrift fr ClemensPeter Herbermann zum 65. Geburtstag, Thomas Gehling, Viola Voss and Jan Wohlgemuth (eds.), 153180. Berlin: Logos.

28

Horst J. Simon and Heike Wiese

Hagge, Claude 2005

Le d de la langue ou la souillure de lException. Faits de langues 25, 5360 (special issue: Lexception entre les thories linguistiques et lexprience. Irina Vilkou-Poustovaa (ed.)). Harley, Heidi, and Elizabeth Ritter 2002 Person and number in pronouns. A feature-geometric analysis. Language 78: 482526. Heringer, Hans-Jrgen 1984 Wortbildung: Sinn aus dem Chaos. Deutsche Sprache 12: 113. Hickey, Raymond 2003 Rectifying a standard deciency: Second-person pronominal distinction in varieties of English. Diachronic Perspectives on Address Term Systems, Irma Taavitsainen and Andreas H. Jucker (eds.), 345374. Amsterdam/Philadephia: Benjamins (Pragmatics and Beyond N.S. 107). Hoenigswald, Henry M. 1978 The Annus Mirabilis 1876 and posterity. Transactions of the Philological Society 76: 1735. Hurford, James R. 1975 The Linguistic Theory of Numerals. Cambridge: Cambridge University Press (Cambridge Studies in Linguistics 16). Jackendoff, Ray S. 2002 Foundations of Language. Oxford: Oxford University Press. Janda, Richard D. 1991 Frequency, markedness, and morphological change. On predicting the spread of noun-plural -s in Modern High German and West Germanic. Proceedings of the 7th Eastern States Conference on Linguistics, Yongkyoon No and Mark Libucha (eds.), 136153. Columbus (OH): Ohio State University. Jankowsky, Kurt R. 1972 The Neogrammarians. A Re-Evaluation of Their Place in the Development of Linguistic Science. The Hague/Paris: Mouton. Labov, Wiliam 1972 Language in the Inner City. Studies in the Black English Vernacular. Philadelphia: University of Pennsylvania Press. Labov, Wiliam 1981 Resolving the Neogrammarian controversy. Language 57: 267308. Lass, Roger 1997 Historical Linguistics and Language Change. Cambridge: Cambridge University Press (Cambridge Studies in Linguistics 81).

What are exceptions? And what can be done about them?

29

Lehmann, Winfred P. 1993 Theoretical Bases of Indo-European Linguistics. London: Routledge. Marcus, Gary F., Ursula Brinkmann, Harald Clahsen, and Richard Wiese 1995 German inection: The exception that proves the rule. Cognitive Psychology 29: 189256. Mailhammer, Robert 2007 Islands of resilience. The history of German strong verbs from a systematic point of view. Morphology 17: 77108. McMahon, April M.S. 1994 Understanding Language Change. Cambridge: Cambridge University Press. Newmeyer, Frederick J. 2005 Possible and Probable Languages. A Generative Perspective on Linguistic Typology. Oxford: Oxford University Press. Nbling, Damaris, Antje Dammel, Janet Duke, and Renata Szczepaniak 2006 Historische Sprachwissenschaft des Deutschen. Eine Einfhrung in die Prinzipien des Sprachwandels. Tbingen: Narr. de Oliveira, Marco Antonio 1991 The neogrammarian controversy revisited. International Journal of the Sociology of Language 89: 93105. Osthoff, Hermann, and Karl Brugmann 1878 Morphologische Untersuchungen auf dem Gebiete der indogermanischen Sprachen. Erster Theil. Leipzig: Hirzel. Pinker, Steven 1999 Words and Rules. The Ingredients of Language. New York: Basic Books. Ritt, Nikolaus 2004 Selsh Sounds and Linguistic Evolution. A Darwinian Approach to Language Change. Cambridge: Cambridge University Press. Simon, Horst J. 2005 Only you? Philological investigations into the alleged inclusiveexclusive distinction in the second person plural. Clusivity. Typology and Case Studies of the Inclusive-Exclusive Distinction, Elena Filimonova (ed.), 113150. Amsterdam/Philadelphia: Benjamins (Typological Studies in Language 63). Sonderegger, Stefan 1979 Grundzge deutscher Sprachgeschichte. Diachronie des Sprachsystems. Berlin/New York: de Gruyter. Spencer, Andrew, and Arnold M. Zwicky 1998 Introduction. The Handbook of Morphology, Andrew Spencer and Arnold M. Zwicky (eds.), 110. Oxford/Malden (MA): Blackwell.

30

Horst J. Simon and Heike Wiese

Tsohatzidis, Savas L. (ed.) 1990 Meanings and Prototypes. Studies in Linguistic Categorization. London: Routledge. Twain, Mark 1907 The awful German language. In A Tramp Abroad, Mark Twain, 267 284. New York: P.F. Collier & Son [originally: 1880]. Udolph, Jrgen 1989 Verners Gesetz im heutigen Deutsch. Zeitschrift fr Dialektologie und Linguistik 56: 156170. Verner, Karl 1877 Eine ausnahme der ersten lautverschiebung. Zeitschrift fr vergleichende Sprachforschung 23: 97130. Wiese, Heike 2003 Sprachliche Arbitraritt als Schnittstellenphnomen. Habilitation thesis. Humboldt-University Berlin. Wiese, Heike 2006 Partikeldiminuierung im Deutschen. Sprachwissenschaft 31: 457 489. Wiese, Richard 1996 The Phonology of German. Oxford: Oxford University Press. Wilbur, Terence H. (ed.) 1977 The Lautgesetz-Controversy. A Documentation. Amsterdam/Philadelphia: Benjamins (Amsterdam Studies in the Theory and History of Linguistic Science, Series 1, 9). Wittgenstein, Ludwig 1953 Philosophical Investigations. Transl. by G.E.M. Anscombe. Oxford: Blackwell. Wohlgemuth, Jan, and Michael Cysouw (eds.) 2010a Rethinking Universals. How Rarities Affect Linguistic Theory. Berlin/ New York: Mouton de Gruyter (Empirical Approaches to Language Typology 45). Wohlgemuth, Jan, and Michael Cysouw (eds.) 2010b Rara & Rarissima. Documenting the Fringes of Linguistic Diversity. Berlin/New York: Mouton de Gruyter (Empirical Approaches to Language Typology 46).

Coming to grips with exceptions Edith Moravcsik

Abstract. Based on a general denition of the concept of exception, the problematic nature of exceptions is made explicit by showing how they weaken the generality of descriptions: they disrupt a superclass without forming a principled subclass. Focusing on examples from syntax, three approaches to dealing with exceptions are identied.

1.

Why are exceptions a problem?

1.1. Dening exceptions Typical exceptions are a small subclass of a class where this subclass is not otherwise denable. What this means is that apart from their deviant characteristic that renders them exceptional, there is no additional property that distinguishes them from the regular cases. Given also that the exceptional subclass has generally much fewer members than the regular one, exceptions can be characterized as a subclass of a class that is weak both quantitatively (fewer members) and qualitatively (only a single distinguishing characteristic). The description of an exception must include ve components: the pertinent domain; the class within which the items in question are exceptional, which we will call superordinate class (or superclass for short); the regular subclass and the irregular subclass; the characteristic in which the two subclasses differ; and the relative size of the two subclasses. This is shown in (1) on the example of English nominal plurals, where RSC labels the regular subclass and ESC is the label for the exceptional one1 .

1. A large inventory of lexical exceptions in English is cited and their exceptionality relative to transformational rules discussed in Lakoff (1970: 1421, 3043 et passim).

32 (1)

Edith Moravcsik

domain: English superordinate class: plural nouns subclasses: RSC: apples, cats, pencils, etc. ESC: oxen, children, brethren distinguishing property: plural sufx is {s} versus / n/ relative size of membership: RSC > ESC

Three components of the schema call for comments. Starting with domain: a structure may be exceptional within a language, a dialect of a language, a language family, a language area, or across languages. M. Cysouws paper in this volume is a study of crosslinguistic exceptionality and so is part of S. Featherstons article.2 It is important to indicate the domain within which an exception holds because exceptionality is relative to it. First, what is an exceptional structure in one language may not be exceptional in another. An example is the morphosyntactic alignment of subjects of one-place predicates with patient-like arguments of two-place predicates: this is regular in ergative languages but exceptional in accusative languages. Second, language-internal and crosslinguistic exceptionality do not necessarily coincide. For example, click sounds are very numerous in Zulu but very rare across languages; and passive constructions are infrequent in Kirghiz, but frequent across languages. A second set of comments has to do with the distinguishing property of the exceptional class. Several papers in this volume emphasize the unique nature of exceptions. B. Kabak and I. Vogel are very explicit about this point as they analyze Turkish vowel harmony and stress assignment and argue for the need for lexical pre-specication of the irregular items as both necessary and sufcient for an adequate account. J.G. Jnsson and Th. Eythrsson also emphasize that truly exceptional structures have no correlating properties. They show genitive objects in Icelandic to be clearly exceptional by this criterion, as opposed to accusative subjects, which show subregularities. As two of the papers in the volume show, items may differ from the regular class in more than one characteristic. G. Corbett discusses lexemes that show higher-order exceptionality by multiply violating normal morphological patterns. Utilizing the WALS database, M. Cysouw computes rarity indices for languages and language areas and shows that they may be multiply exceptional to varying degrees. Paradoxically, exceptions that differ from the regular sub-

2. For a rich collection of crosslinguistically rare grammatical constructions, see the Grammatisches Rarittenkabinett at http://lang.uni-konstanz.de/pages/proj/ sprachbau.htm. On the inherent difculties of establishing a grammatical structure as crosslinguistically rare, see Cysouw (2005).

Coming to grips with exceptions

33

class in more than one way are less exceptional by our denition since each exceptional property nds its correlates in the other deviant characteristics. Lexical items may be exceptional not by structurally deviating from others but by exhibiting skewed, rather than balanced, frequency patterns of their alternative forms. For example, the passive form of the English verb convict occurs with unusual frequency relative to the passive of other verbs. Such soft exceptions are in the focus of Th. Wasow, F. Jaeger, and D. Orrs paper (this volume) as they explore correlates for the omission of the conjunction that in English relative clauses. The third comment pertains to relative size. Note that having fewer members is a necessary but not sufcient characteristic of an exceptional subclass. That it is necessary can be shown by the example in (1): without reserving the label exception for the smaller subclass, English nouns whose plural is formed with {s} would qualify for being the exceptions even though intuitively we do not to consider them exceptional. But being a small subclass is not sufcient for exceptionality. For example, of the English verbs whose past tense form ends in {d}, relatively few employ the allomorph / d/. But this subclass of verbs is not exceptional because the members have a phonological property in common that denes them as a principled, rather than random, class. An apparent counterexample to the regular class having more members than the exceptional class(es) is nominal plural marking in German. There are ve plural markers: -0, -e, -(e)n, and -s; which if any should be considered the regular one? Although most nouns of the German lexicon take -(e)n, Clahsen, Rothweiler, and Woest (1992) argue convincingly that -s is actually the default form: it is the only productive one, used with names (e.g. die Bckers) and with newly-minted words such as clippings (e.g. Loks for Lokomotiven) or loan words (e.g. Kiosks). Given that relatively few existing nouns are pluralized with -s, declaring this form to be the regular ending would seem to conict with the general pattern of the regular class having a larger membership than the exceptional ones. However, there is in fact no conict: the very fact that -s is productive expands indenitely the class of nouns that take it as their plural sufx. 1.2. Two problems with exceptions Why are exceptions a problem? The short answer is that they y in the face of generalizations. This is so due to two aspects of their denition. First, by token of the very fact that they form a subclass of a class, they conict with a generalization that would otherwise hold for the entire superordinate class. This problem so far is not specic to exceptions: it is posed by all instances of subclassication. Subclasses, by denition, compromise the homogeneity of

34

Edith Moravcsik

a superclass. But as long as the subclasses have at least one characteristic other than the one that the split is based on, the loss of the supergeneralization is compensated for by a sub-generalization that describes the subclasses. For an example of regular subclasses, let us consider those English nouns that form their plural with the sufx {s}. This is not an undivided class in that the particular shape of the sufx is variable: -/s/, -/z/, and -/ z/. However, each subclass is denable by phonological context: / z/ after alveolar and palatal fricatives and affricates, /s/ after other voiceless sounds and /z/ after other voiced sounds. Thus, none are exceptions. Exceptional subclasses are different from normal subclasses of this sort because they have no additional characteristics to independently identify them. This is the second reason why exceptions pose a problem: they do not only scuttle a generalization that would otherwise hold for the entire superordinate class but they do not allow for a generalization about their subclass, either. The fact that exceptions have much fewer members than their sister-classes compounds the problem: their sporadicity suggests that correlating properties may not exist at all: they may be random chance phenomena.3 All in all: exceptions disrupt supergeneralizations without supporting subgeneralizations. In the case of English noun plurals, the two generalizations that the exceptions disallow are given in (2). (2) a. b. supergeneralization lost: **All English nouns form their plural with {s}. subgeneralization not possible: **All those English nouns that form their plural with / n/ have property P.

The two problems posed by exceptions can be similarly illustrated with a crosslinguistic example: phoneme inventories that lack nasal consonant phonemes. (3) domain: a sample of languages superordinate class: consonant phoneme inventories subclasses: RSC: consonant phoneme inventories of English, Irish, Amharic, etc.

3. Regarding crosslinguistic exceptionality, compare Haiman (1974: 341): If a word exhibits polysemy in one language, one may be inclined, or forced, to dismiss its various meanings as coincidental; if a corresponding word in another language exhibits the same, or closely parallel polysemy, it becomes an extremely interesting coincidence; if it displays the same polysemy in four, ve, or seven genetically unrelated languages, by statistical law it ceases to be a coincidence at all.

Coming to grips with exceptions

35

subclasses: ESC: consonant phoneme inventories of Quileute, Puget Sound, Duwamish, Snoqualmie, Mura, Rotokas distinguishing property: presence versus absence of nasal consonant phonemes relative membership: RSC > ESC

The two generalizations disabled by the exceptional consonant phoneme inventories are as follows: (4) a. supergeneralization lost: **All consonant phoneme inventories of languages include nasal consonant phonemes. subgeneralization not possible: **All those languages that lack nasal consonant phonemes have property P.4

b.

The lesser number of nasal-less languages suggests once again that their occurrence is for no reason: it may be an accident. How are the twin problems posed by exceptions responded to in linguistic analysis? The purpose of this paper is to address this question by surveying the various ways in which exceptions have been dealt with in syntax. The alternatives fall into three basic types. First, many descriptive frameworks represent exceptional structures as both exceptional and non-exceptional. What this means is that the representation of the exceptional structure is split into two parts: one shows it to be exceptional but the other part draws it into the regular class. Second, there are proposals for regularizing exceptions: re-analyzing them so that they turn out to be fully unexceptional. And third, some accounts acknowledge exceptions as such and try to explain why they are exceptional. The three options of accommodating, regularizing, and explaining exceptions will be discussed in the next three sections in turn. 2. Accommodating exceptions in syntax

Let us consider ways of representing syntactic exceptions as hybrid structures, part exceptional and part regular. The idea is similar to psychiatrists ascribing
4. Note that the class of languages that have no nasal consonant phonemes is not dened either by genetic or by areal relationship: while Quileute (Chimaukan) and the Salish languages: Puget Sound, Duwamish, and Snoqualmie, are geographically close, Mira is spoken in Brazil and Rotokas in New Guinea. For some Niger-Congo languages without nasal consonant phonemes, see Bole-Richard (1985).

36

Edith Moravcsik

deviant behavioral traits of people to a separate persona coexisting with the normal personality. Four such approaches may be identied in the literature: two faces of a single representation two strata in a single representation separate representations in a single component separate representations in separate components

We will take a closer look at each. 2.1. Two faces of a single representation In this type of account, exceptional and non-exceptional characteristics of a construction are represented on opposite sides of the same structural diagram. An example is Katalin . Kisss transformational generative account of longdistance agreement in Hungarian (. Kiss 1987: 226243). In Hungarian, the verb agrees with both its subject and its direct object. Person agreement with the object is illustrated in (5). (5) a. n szeretnm I would.like-S1SUBJ.3OBJ I would like him. n szeretn-ek l I would.like-S1SUBJ.2OBJ I would like youS . t. him tged. youS

b.

However, verb agreement in sentences such as (6) is unexpected. (6) a. n szeretnm I would.like-S1SUBJ.3OBJ I would like to see him. n szeretn-ek l I would:like-S1SUBJ.2OBJ I would like to see youS . ltni t. to:see him lt-ni tged. to:see youS. ACC

b.

The problem is that the verb in the main clause would like has a sufx selected by the direct object of the subordinate clause you rather than by its own object, which would be the entire subordinate clause, as in (7).5

5. The verb-agreement pattern in Hungarian is actually more complex than shown by these examples but the details are not relevant here.

Coming to grips with exceptions

37

(7)

n szeretnm, ha lthatnlak. I would-like-S1SUBJ.3OBJ if I.could.see.you

(6) is an exception since agreement in general is local: controller and target are clause-mates. Because of the type of agreement shown in (6), the supergeneralization according to which all agreement is local is lost and no subgeneralization is apparent holding for cases such as (6) where agreement is not local. One might try to dene the subclass of structures that exhibit this kind of long-distance agreement by the schema main verb + innitive complement. If this denition were successful, the structures would form a regular, rather than exceptional, subclass since they would have a common denominator other than showing long-distance agreement. However, not all verb + innitive constructions show this kind of agreement: transitive verbs (such as want or try) and some intransitive ones (such as strive) do but other intransitive ones (e.g. be ready) do not (cf. . Kiss 1987: 227229, 2002: 203205). Exceptionality would be eliminated if we could analyze the entire sentences in (6) as a single clause because then agreement controller and agreement target would be clause-mates. There is indeed some evidence indicating the monoclausality of the sentence even apart from agreement. For example, if the subordinate object youS is to be focused it may occur in immediately pre-verbal position relative to the main verb. However, there is also evidence that this sentence is bi-clausal: the subordinate object youS may also be focused by being placed in front of the subordinate verb. Thus, sentences like (6) are exceptional when considered as biclausal structures but regular when considered as monoclausal. Since there are arguments for both analyses, . Kiss concludes as follows (1987: 237, 239; emphasis added): It appears that the monoclausal and biclausal properties are equally weighty; neither can be ignored or explained away. What is more, they are simultaneously present; consequently, the biclausal structure and monoclausal structure that can be associated with [this construction] cannot represent two subsequent stages of the derivation, but must hold simultaneously Here is a simplied version of the representation suggested by . Kiss (1987: 238).

38 (8)

Edith Moravcsik

NP Inf

szeretnlek

n ltni tged

NP

Inf

The sentence is shown as both exceptional and not exceptional depending on which face of the tree we consider. The top face represents the biclausal, exceptional structure: agreement controller and agreement target are in separate clauses. The bottom face in turn is monoclausal rendering the agreement conguration regular, with controller and target situated in the same clause. Thus, the supergeneralization according to which agreement is local is denied by the top half of the tree but it is saved with respect to the bottom half. 2.2. Two strata in a single representation In . Kisss account, the exceptional and non-exceptional personalities of the construction are co-present at a single stage of the grammatical derivation. In other types of accounts, the regular and irregular facets of the construction are separated by derivational distance. For an example, let us rst consider the analysis of passives in Relational Grammar. In this framework, passive sentences are viewed as exceptional relative to actives. The example sentence skeletally represented in (9) is The woman was eaten by the crocodile (Blake 1990: 12). (P stands for predicate, 1 stands for subject, 2 stands for direct object, Cho (chomeur) stands for the demoted subject: the by-phrase.) (9)
P P eat 1 Cho crocodile 2 1 woman

Coming to grips with exceptions

39

The structural representation shows the sentence as having the passive structure on the nal (lower) stratum but it has the active i.e., regular structure on the initial (upper) stratum. The passive structure is derived by a grammaticalrelations-changing rule: advancement of the initial object and demotion of the initial subject. The supergeneralizations that are lost due to the existence of passives are the alignment of the more active participant of a two-place predicate with the grammatical subject and the alignment of the less active participant with the object. There is also no subgeneralization that would render the alternative, passivetype alignment predictable. The label passive would not provide an independent characterization of this subclass since it is simply a label for the exceptional structure. The Relational Grammar account restores the supergeneralization in that it holds for the initial stratum of passive sentence representations, although not for the nal one. The derivational distance between the irregular and regular facets of the sentence is more pronounced when they are represented as two separate tree structures. This will be illustrated next. 2.3. Separate representations in a single component This approach to exceptions is familiar from various versions of Transformational Generative Grammar: exceptional structures are represented by two or more trees within the syntactic component of the grammar connected by transformational rules. We will look at two examples, one involving a movement rule, the other, raising. The rst example has to do with verb-object-particle order in English. Given the generalization that components of a lexical entry must be adjacent, and given that the verb and the particle e.g. wipe and off form a single lexical item, the prediction is that the verb will be immediately followed by the particle, as is the case in Megan wiped off the table. However, this prediction is not always valid since Megan wiped the table off is also grammatical. The verb & object & particle order thus contradicts the supergeneralization about components of lexical items being adjacent and, in the absence of some condition under which the exceptional order occurs, there is no subgeneralization possible, either. The descriptor particle verb would not dene the class independently of the deviant order since this label is based on the separability of the two elements. In some versions of Transformational Grammar, sentences where verb and particle are not adjacent are shown as having the regular order on the underlying level with the particle directly following the verb, while the exceptional order

40

Edith Moravcsik

is shown on the surface level (see for example Jacobs and Rosenbaum 1968: 100106). (10) underlying structure: Megan wiped off the table. surface structure: Megan wiped the table off.

Thus, the supergeneralization is restored with respect to the underlying structure, with exceptionality relegated to surface structure. A second example of this approach to exceptions is the analysis of longdistance agreement in languages such as Imbabura Quechua proposed by Maria Polinsky (2003). This case is similar to the one seen in Hungarian: the verb of the main clause shows agreement with the object of the subordinate clause rather than with its own object, which would be the entire subordinate clause. (11) is an example (NMLS stands for nominalizer) (Polinsky 2003: 292). (11) Jose yacha-wa-n uca-ta maria-ta Jose know-S1OBJ-S3SUBJ me-ACC Maria-ACC juya-j-ta. love-PRES.NMLS-ACC Jose knows that I love Maria.

Polinsky acknowledges that the controller is in the subordinate clause on an underlying level but argues that it is subsequently raised into the main clause. This means that controller and target end up in the same clause and thus the supergeneralization about the locality of agreement is preserved intact on the surface level although it is violated in underlying structure. The anomaly that the grammatical operation of raising solves here is anomalous agreement. In addition, as is well-known, raising has also been adopted for resolving anomalous case marking. In English sentences like Mary believes him to have won the race, the accusative case of him is problematic. First, it thwarts the supergeneralization according to which verbs assign case to their own arguments because this noun phrase is the semantic subject of to have won and not an argument of any kind of believes. Second, no general conditions are apparent under which this anomaly crops up and thus no subgeneralization is possible. In Government and Binding theory, the exceptionality of such instances is explicitly acknowledged by the label Exceptional Case Marking, attributed to the exceptional nature of the main verbs that allow for this case assignment pattern (Chomsky 1984: 6474, 98101; Webelhuth 1995: 3538). An alternative tack is taken in Paul Postals classic account of 1974, as well as in Howard Lasniks more recent proposal (Lasnik 1999): both opt for the raising analysis,

Coming to grips with exceptions

41

whereby the main verb and the subordinate subject are in separate clauses on the underlying level but in the same clause in surface structure. Since surface structure legitimizes the assignment of the accusative by the main verb to the underlying lower subject that has been raised into the main clause, the supergeneralization regarding case marking is upheld in surface structure.6 The distribution of regular and exceptional over underlying and surface structure is not the same in these accounts. In the analysis of passives in Relational Grammar and in the analysis of particle constructions, the exceptional structures are shown to be regular underlyingly and irregular on the surface, while in the case of long-distance agreement in Imbabura Quechua and of exceptional case marking in English, it is the opposite: the irregular structure is shown underlyingly and the derived structure the result of raising is regular. What is nonetheless common to all of these analyses is that there is a derivational split between two facets of the exceptional construction, only one of which is exceptional. 2.4. Separate representations in separate components In the long-distance agreement pattern of Hungarian discussed above, the exceptional and regular patterns are simultaneously present: neither is derivationally prior to the other (see (8) above). In the account of English passives in Relational Grammar ((9) above) and in the accounts of English verb-particle order ((10)), of long-distance agreement in Imbabura Quechua ((11)), and of exceptional case marking in English, out of the two representations one regular, one exceptional one derivationally precedes the other within the same component. In yet another type of representation of exceptional structures, the distance between the exceptional and regular facets of the construction is widened. This is illustrated by Jerrold Sadocks Autolexical Grammar analysis of particle order in English (Sadock 1987: 296297, 2003). Here, the regular and irregular representations of an irregular structure are in different components with the two linked by non-directional lines of association. The verb & object & particle order is shown as exceptional in syntax but regular in semantics. Thus, the supergeneralization that lexical items are contiguous holds true in semantics and violated only in syntax. This is shown in (12).

6. For discussion, see Newmeyer (2003: 157160).

42 (12)

Edith Moravcsik
SYNTAX: NP S VP

NP N V Article N Particle

Megan .

SEMANTICS: Megan

wiped the . . . . . . wiped-off the

counter off. . . . . . counter.

Let us summarize the four basic ways of accommodating exceptional structures discussed in section 2. In the diagrams below, R stands for regular, E stands for exceptional. (13) a. two faces of a single representation
R

b.

two strata in a single representation

c.

separate representations in a single component


R E

Coming to grips with exceptions

43

d.

separate representations in separate components


R E

The four ways of splitting exceptional structures differ in the amount of independent support available for the two contradictory representations of the construction. If there is independent evidence for the existence of the two faces, strata, levels, or components that the structures are split into, the analysis is more convincing. Thus, Sadocks account, where discontinuous particle structures are regular in their meanings but irregular in their forms, rests on the rmest ground: the basis of the split is meaning versus syntactic form a dichotomy that is widely supported and the mismatches between the two multiply evidenced. Different levels in the same component and different strata in a single tree may or may not be justied depending on the amount of independent evidence for the existence of the levels and strata. The most conicted representation is the bifacial tree although, given the facts and the framework assumed, it seems indeed fully unavoidable. 3. Regularizing exceptions

The analyses we have surveyed so far go half-way towards eliminating exceptionality: they represent exceptional structures as exceptional in part of the account but regular in another part. An alternative approach taken in the literature is re-analyzing exceptions as fully regular. As noted in section 1.2, there are two problems with exceptions. First, they split the superclass and thus disable a general rule that would hold for that class. Second, since the regular and exceptional subclasses are not otherwise identiable, no sub-generalization is possible either. It follows then that exceptions may be regularized in two ways: either by restoring the homogeneity of the superclass by abolishing the subclasses (since in that case, the supergeneralization can be maintained); or, somewhat paradoxically, by strengthening the subclasses through identifying a correlating property which renders subgeneralizations possible. In other words, one tack is to show that the regular and irregular distinction does not exist: there are no subclasses within the class; the other is to acknowledge that there are indeed subclasses and showing that they are all robust. We will now see examples of both kinds of solution.

44

Edith Moravcsik

3.1. Restoring the superclass There are two ways of eliminating subclasses within a superclass. One is by reanalyzing the subclasses so that there is no difference between them, after all. The general schema is shown in (14). RSC stands for the regular subclass, ESC stands for the exceptional one. (14)
RSC RSC

(RSC) (ESC)

The other way of eliminating subclasses amounts to deepening the difference between the regular and exceptional cases so that the exceptional cases fall outside the superclass. This is diagrammed in (15). (15)
RSC RSC

(RSC)

(ESC)

Let us see examples of each approach. A. Restoring the superclass by unifying the subclasses English verb-particle constructions once again offer an example. Their transformational analysis was described above; here is an alternative account. Pauline Jacobson proposes that the single rule the direct object immediately follows the verb holds both for the regular and the seemingly exceptional cases (Jacobson 1987: 3239). This is made possible by assuming that the lexicon lists both call and call up as verbs. The seemingly exceptional order call Sue up is therefore not exceptional since it obeys the same rule as the regular order call up Sue: in both cases, the direct object immediately follows the verb. Other examples of resolving exceptions by showing them to be regular come from long-distance agreement. In her paper of 2003 cited above, Maria Polinsky surveys several languages where the same exceptional pattern crops up: the main verb agreeing with an argument of the subordinate clause. Her proposed solutions fall into three types. For Imbabura Quechua as was discussed above (see (11)) she proposes a raising analysis which puts the agreement controller from the subordinate clause into the main clause and thus halfway regularizes the construction. For the other two kinds of long-distance agreement (in Algonquian languages and in Tsez, respectively) she proposes two avenues of full regularization. (16) and (17) present examples of the two patterns (Polinsky 2003: 285, 303).

Coming to grips with exceptions

45

(16)

Blackfoot (glossing is simplied) nit-wikixtatwaa-wa n-oxko-wa m-xk-apotakixsi 1SUBJ-want-3OBJ my-son-3 3SUBJ-might-work I want my son to work. Tsez (A stands for a long /a/; II indicates Class II) uir y-iyx kidbA ka at tAtruli boy II-knows girl letter.II.ABS read The boy knows that the girl has read the letter.

(17)

For Blackfoot and other Algonquian languages, Polinsky proposes that the controller in the subordinate clause has a proxy in the main clause and cites independent evidence. The main verb thus agrees with this proxy a clause-mate. This analysis merges the exceptional cases into the regular class so that the supergeneralization about the class-mate-hood of controller and target stands unimpaired. For Tsez, she suggests that the very domain of agreement be re-dened: rather than controller and target having to be clause-mates, both have to occur in the domain of head-government. This amounts to formulating a new, broader generalization into which both regular and irregular cases t with their difference wiped out. Both solutions amount to eliminating the boundary between the regular and exceptional subclasses. B. Restoring the superclass by exempting the exceptions The examples above showed two ways in which the boundary between regular and exceptional subclasses can be eliminated: either by reanalyzing the exceptions, as Jacobson does for verb-particle constructions and Polinsky for the Algonquian-type long-distance agreement, or by reformulating the supergeneralization, as Polinsky does for Tsez. As noted in the beginning of this section, the unity of the superclass can also be restored by more dramatically re-analyzing the exceptions so that they do not even belong to the superclass. An example is Ivan Sags analysis of English verb-particle constructions (Sag 1987: 329333). In Sags analysis, when the particle is separated from the verb, it is not a particle but a prepositional phrase. For instance, in the sentence Megan wiped off the table, off is a particle but in Megan wiped the table off, off is a prepositional phrase. Thus, this off is simply not beholden to the generalization according to which lexical items such as wipe off have to be continuous since it does not form a single lexeme with wipe.

46

Edith Moravcsik

Another proposal that resolves exceptionality by removing the apparent exception from the superclass within which it might be seen as exceptional is by Peter Cole and Gabriella Hermon (Cole and Hermon 1998). The problematic structure is long-distance reexives in Singapore Malay: as shown in (18), the pronoun diri-nya can take either a local or a long-distance antecedent (Cole and Hermon 1998: 61; Ahmad is male, Salmah is female.) (18) Ahmad tahuy Salmah akan membeli baju untik diri-nya. Ahmad know Salmah will buy clothes for self-S3 Ahmad knows Salmah will buy clothes for him. OR Ahmad knows Salmah will buy clothes for herself.

The word diri-nya is a crosslinguistic anomaly both in its internal structure and in its distribution: it does not exhibit the usual characteristics of long-distance reexives in other languages (such as Mandarin). Two of the generalizations that it is an exception to are that long-distance reexives are monomorphemic and that they require a subject antecedent. Cole and Hermon propose that diri-nyas properties deviate from long-distance reexives in other languages not because it is an exceptional long-distance reexive but because it is not a long-distance reexive at all: instead, it is a structure indeterminate between a reexive and a pronoun. They offer various bits of evidence in support of the proposal that will not be reproduced; what is important here is the type of argument employed to deal with the exception. As in Sags analysis of discontinuous verb-particle constructions, the offending exception is lifted out of the superclass and thus freed of the obligation to conform. As noted in the beginning of section 3, there are two basic ways of regularizing exceptions. One is by eliminating the regular-exceptional distinction within the superclass and thus restoring the supergeneralization. The other is by strengthening the subclasses and thus making subgeneralizations possible. So far we have seen examples of the rst approach; we will now turn to examples of the second. 3.2. Strengthening the subclasses As discussed in section 1.2, exceptions form a subclass that is both small and undened. Thus, strengthening the subclass of exceptions may be achieved in two ways. First, exceptions may be strengthened quantitatively. If the number of exceptions can be shown to be larger than rst thought, it is more likely that the exceptions are principled rather than chance phenomena. Second, the exceptional subclass may be strengthened qualitatively if additional characteristics

Coming to grips with exceptions

47

can be identied other than the one on which the regular-irregular distinction rides: correlating properties that render the exceptions predictable. (19) diagrams the two approaches; r1 and r2 stand for properties of the regular subclass and e1 and e2 are properties of the exceptional subclass. (19) Strengthening the exceptional subclass a. quantitatively
RSC ESC

RSC

ESC

b.

qualitatively
r1 e1

r1 r2

e1 e2

A recent study that quantitatively strengthens a crosslinguistically exceptional subclass is Rachel Nordlinger and Louisa Sadlers article of nominal tense (2004). It has been generally assumed that nouns are time-stable entities and therefore tensed nouns are exceptional across languages. Nordlinger and Salder, however, marshal evidence for tensed nouns from ten languages, some of them areally and genetically distant (e.g. Hixkaryana, Potawatami, and Somali). The fact that tensed nouns are more frequent than generally believed makes it likely that their occurrence is not just a freak accident: there may be a structural condition to predict their existence. Let us now turn to proposals that shore up exceptional subclasses qualitatively. For the rst example, we will return once more to English verb-particle constructions. In her book on English verb-particle constructions, Nicole Deh (2002) assumes a transformational account whereby the contiguous verb-particle construction is underlying and the disjoint structure is derived. The additional step that she takes is probing into the conditions under which the discontinuous construction is used. She nds that this exceptional structure does have an information-theoretical correlate (103207, 279283). In particular, a noun-headed object follows the particle if the object is part of the focus of the sentence. If, however, the object is known to the speaker and hearer and the focus is on the complex verb, the object intervenes between the verb and the particle. Thus, Andrew handed out the papers to the students is used if the papers is new information and Andrew handed the papers out to the students is used if the papers is topical.

48

Edith Moravcsik

According to this account, the two order patterns of the English verb-particle construction do not form arbitrary subclasses. Their dichotomy is maintained but since an information-structural correlate has been identied for each class, the order patterns are predictable rather than random.7 Proposing correlations for exceptions and thus showing that they are regular is a focus of several papers in this volume. As already mentioned above, J. Jnsson and Th. Eythrsson propose that the apparently exceptional Icelandic verbs that take accusative subjects form a syntactically and semantically coherent class; Th. Wasow, F. Jaeger, and D. Orrs study reveals that the exceptional omission of the conjunction that in English relative clauses is correlated with the predictability of the conjunction in those structures; and M. Cysouw and G. Corbett describe clusterings of exceptional properties in and across languages. Finding correlates to structures that are crosslinguistically exceptional is the central goal of language typology. A recent study that exemplies this endeavor with respect to crosslinguistic exceptions is Masayuki Onishis (2001). Onishis concern is with non-canonical case marking i.e., patterns that depart from the normal case marking of intransitive subjects, transitive subjects, and direct objects in a language. He nds that non-canonical case marking is not random across languages: it correlates with certain semantico-syntactic predicate types; for example, stative verbs expressing physiological states and events or psychological experiences such as enjoy, be happy, and be pleased. 4. Explaining exceptions

In the preceding two sections, we have seen two basic ways in which exceptions can be dealt with: representing them as both exceptional and regular; and reanalyzing them as fully regular. A third alternative of dealing with exceptions is accepting them as fully or partially exceptional and nding reasons why they are so; i.e., explaining them. This is an extension of identifying correlating properties since such properties are required for explaining exceptions. They are, however, not quite sufcient: for a maximally convincing explanation, there has to be a causal relation between a correlating property and the exceptional feature. The basic idea is diagrammed in (20), where r1 and e1 are the properties in terms of which the exceptional subclass is exceptional and, as before, r2 and e2 are the correlating properties. The arrow stands for explanatory deduction.
7. For several alternative accounts of verb-particle construction in English and other languages, see Deh et al. (2002).

Coming to grips with exceptions

49

(20)

Explaining exceptions
r1 e1

r1 r2

e1 e2

The studies by J. Jnsson and Th. Eythrsson on Icelandic accusative objects and by Th. Wasow, T. Jaeger, and D. Orr on English relative clauses mentioned above are explanatory if we take meaning and processing ease to be explanations of form. An example from outside this volume is Langackers account of raising structures that were discussed in section 2.3. Rather than an arbitrary exception within the class of subordinate constructions, he recognizes this type of construction as an instance of a widespread structural pattern in language. The analysis is based on an observation made by Relational Grammarians under the label Relational Succession Law. What it says is that a noun phrase raised into the main clause inherits the grammatical role of the clause that it is raised from (cf. Blake 1990: 94). Thus, noun phrases that come from subject clauses are raised to subject (as Fred in Fred seems to be happy, derived from [[Fred is happy] S seems] S ) and if they come from an object clause, they are raised to object as the computer in Fred believes the computer to have been delivered, derived from [Fred believes [that the computer has been delivered]S ]S . Langacker goes a step further by showing that raising structures are an instance of pars-pro-toto constructions: a part stands for the whole, as in Give me a hand, where hand stands for manual assistance by a person, or So you got new wheels, where wheels stands for a car. Given the generalization that the whole may be represented either by the whole itself or by a part, both raised and unraised constructions are brought under a single generalization and are explained as both regular instances of a very general, independently attested pattern. However, synchronic observations cannot provide direct causes for language structure; they act only indirectly on language processing, language acquisition, and ultimately on historical change. D. Nblings paper in this volume proposes to explain the morphological irregularity of four German verbs by tracing their historical origins and relating them to well-known pathways of change. But even diachronic explanations cannot do more than render exceptionality possible or perhaps probable rather than necessary. To see this, let us return to the two examples given in the beginning of this paper. The historical background of English nouns with / n/ plural (see (1)) is that they were weak nouns in Old English and for weak nouns, / n/ was the regular plural. But this fact does not predict that this sufx should have been retained by any noun at all and even less that it should have been retained by the three nouns where it occurs today.

50

Edith Moravcsik

Similarly, languages that have no nasal consonant phonemes (see (3)) are said to have had them at some point in their history before the nasals turned into voiced oral consonants (Hockett 1955: 119). But the availability of such a process does not predict that it should actually have happened in any language at all and even less that it should have happened in those particular languages where it has.8 5. Conclusions

In this paper, exceptions were characterized as posing a conict in categorization. All instances of subclassication disrupt the homogeneity of a class; but if the subclasses are characterized by clusters of properties, they can be described in terms of subgeneralizations. Exceptions, however, form a rogue subclass that is both quantitatively and qualitatively lean and thus not subsumable under a subgeneralization. Various ways of coming to grips with exceptions were surveyed; here is a summary of the approaches discussed above. (A) Representing exceptions as both exceptional and regular by means of (a) two faces of a single representation, or (b) two strata in a single representation, or (c) separate representations in a single component, or (d) separate representation in separate components (B) Regularizing exceptions by (a) restoring the homogeneity of the superclass by unifying the regular and exceptional subclasses, through re-analyzing the exceptions as regular, or through positing a new, more comprehensive superclass within which both the erstwhile regular and erstwhile exceptional cases turn out to be regular; or by assigning the exceptions to a different superclass; or by

8. Exceptions, also known as irregularities, anomalies, or simply counterexamples to generalizations, loom large in all sciences both social and natural. For relevant discussions in the philosophical literature about ceteris paribus generalizations, see Cartwright (1988a, 1988b); Hempel (1988); Darden (1992); and Carroll (no date). Whether this paper might contribute to a general account of how exceptions are dealt with across sciences remains to be seen.

Coming to grips with exceptions

51

(b) strengthening the exceptional subclass quantitatively, and/or qualitatively (C) Explaining why the exceptions are exceptional While we have seen that solutions to exceptions vary with the theoretical framework, it is important to recognize that the very status of a grammatical pattern: whether it is or is not exceptional to begin with, is also highly theorydependent (Plank 1981: 47). The most fundamental variable across different approaches is whether the empirical domain in question is assumed to be wellregulated so that generalizations are to be expected to hold exceptionless; or whether the domain is seen as a less tidy sort without tight rules. If structural patterns are assumed to be mere probabilistic tendencies, what would otherwise count as exceptions will be automatically anticipated (Hempel 1988: 152). If there is no strict regularity, there cannot be irregularity, either. The four last papers in this volume propose to change the theoretical assumptions in the light of which certain phenomena are exceptional. R. Vogels paper about alternative case assignment to relative pronouns in German free relatives argues that none of the alternatives is the norm; instead, variation itself is the norm in the grammar of German resulting from the conicting desiderata that case assignment needs to satisfy. Somewhat in the same vein, F. Fouvry suggests that grammatical rules be relaxed to operate probabilistically so that exceptions are still rule-governed. F. Newmeyer similarly calls into question the very concept of regularity within the superclass that exceptional phenomena are generally assumed to belong to. He suggests that typological correlations in syntax are performancebased rather than stemming from principles of linguistic competence. Given that the domain of performance is less constrained overall, there is no reason to expect typological generalizations to be free of exceptions. The competenceperformance distinction is also central to S. Featherstons paper. He proposes that if well-formedness is allowed to be gradient, rather than binary, grammars have no exceptions. Exceptions are in turn the result of output selection by the speaker a function of language processing. These proposals are akin to the way of dealing with exceptions that we saw above: separating them out of the superclass within which they would appear to be exceptions (e.g. Sags analyzing particles that are separated from the verb not as irregularly placed particles but as regularly ordered prepositional phrases). The difference is that in these accounts, not individual exceptions but entire classes of exceptions are lifted out of the broader domain of strictly regulated phenomena.

52

Edith Moravcsik

In sum: just as no grammatical construction is exceptional all by itself but only if considered in comparison with other similar constructions, it is exceptional only if the theoretical framework assumed would expect it to be regular. Acknowledgement For their thoughtful comments, I am grateful to a reviewer, to the editors of this volume, to Professor Mrta Fehr, to the participants of the 27th Annual Meeting of the Deutsche Gesellschaft fr Sprachwissenschaft in Cologne, February 05, and to the audience at the University of Debrecen, Hungary, where I presented a version of this paper in November 05. Many thanks also to Michael Liston for directing me to relevant literature in the philosophy of science. References
Blake, Barry J. 1990 Relational Grammar. London/New York: Routledge.

Bole-Richard, Rmy 1985 Hypothse sur la gense de la nasalit en Niger-Congo. Journal of West African Languages 15 (2): 328. Carroll, John no date Laws of nature. [Available at http://plato.stanford.edu/entries/lawsof-nature]

Cartwright, Nancy 1988a Truth does not explain much. In How the Laws of Physics Lie, Nancy Cartwright, 4453. Oxford: Clarendon Press. Cartwright, Nancy 1988b Do the laws of physics state the facts? In How the Laws of Physics Lie, Nancy Cartwright, 5473. Oxford: Clarendon Press. Cartwright, Nancy 1988c How the Laws of Physics Lie. Oxford: Clarendon Press. Chomsky, Noam 1984 Lectures on Government and Binding. The Pisa Lectures. Dordrecht: Foris. Clahsen, Harald, Monika Rothweiler, and Andreas Woest 1992 Regular and irregular inection in the acquisition of German noun plurals. Cognition 45: 225255. Cole, Peter, and Gabriella Hermon 1998 Long distance reexives in Singapore Malay: An apparent typological anomaly. Linguistic Typology 2: 5777.

Coming to grips with exceptions

53

Cysouw, Michael 2005 What it means to be rare: The variability of person marking. In Linguistic Diversity and Language Theories, Zygmunt Frajzyngier, Adam Hodges and David S. Rood (eds.), 235258. Amsterdam/ Philadelphia: Benjamins. Darden, Lindley 1992 Strategies for anomaly resolution. In Cognitive Models of Science, Ronald N. Griere (ed.), 251273. Minneapolis: University of Minnesota Press. Deh, Nicole 2002 Particle Verbs in English. Syntax, Information Structure and Intonation. Amsterdam/Atlanta: Benjamins.

Deh, Nicole, Ray Jackendoff, Andrew McIntyre, and Silke Urban (eds.) 2002 Verb-particle Explorations. Berlin/New York: Mouton de Gruyter. . Kiss, Katalin 1987 . Kiss, Katalin 2002 Congurationality in Hungarian. Budapest: Akadmiai Kiad. The Syntax of Hungarian. Cambridge: Cambridge University Press.

Francis, Elaine J., and Laura A. Michaelis (ed.) 2003 Mismatch. Form-function Incongruity and the Architecture of Grammar. Stanford, CA: Center for the Study of Language and Information. Haiman, John 1974 Hempel, Carl 1988 Concessives, conditionals, and verbs of volition. Foundations of Language 11: 341359. Provisoes: A problem concerning the inferential function of scientic theories. Erkenntnis 28: 147164.

Hockett, Charles 1955 Manual of Phonology. Baltimore, MD: Indiana University Publications in Anthropological Linguistics, Memoir 11. Huck, Geoffrey, and Almerindo E. Ojeda (eds.) 1987 Discontinuous Constituency. (Syntax and Semantics 20) Orlando, FL: Academic Press. Jacobs, Roderick A., and Peter S. Rosenbaum 1968 English Transformational Grammar. Waltham, MA: Blaisdell. Jacobson, Pauline 1987 Phrase structure, grammatical relations, and discontinuous constituents. In Discontinuous Constituency, Geoffrey Huck and Almerindo

54

Edith Moravcsik E. Ojeda (eds.), 2769. (Syntax and Semantics 20) Orlando, FL: Academic Press.

Lakoff, George 1970

Irregularity in Syntax. New York/Chicago: Holt, Rinehart and Winston.

Langacker, Ronald W. 1995 Raising and transparency. Language 71 (1): 162. Lasnik, Howard. 1999 Minimalist Analysis. Malden, MA, Oxford: Blackwell. Newmeyer, Frederick J. 2003 Theoretical implications of grammatical category grammatical relation mismatches. In Mismatch. Form-function Incongruity and the Architecture of Grammar, Elaine J. Francis and Laura A. Michaelis (eds.), 149178. Stanford, CA: Center for the Study of Language and Information. Nordlinger, Rachel, and Louisa Sadler 2004 Nominal tense in crosslinguistic perspective. Language 80 (4): 776 806. Onishi, Masayuki 2001 Non-canonically marked subjects and objects: Parameters and properties. In Non-canonical Marking of Subjects and Objects, Alexandra Y. Aikhenvald, R. M. W. Dixon and Masayuki Onishi (eds.), 151. Amsterdam/Philadelphia: Benjamins. Plank, Frans 1981 Polinsky, Maria 2003 Postal, Paul M. 1974 Morphologische (Ir-)Regularitten. Aspekte der Wortstrukturtheorie. Tbingen: Narr. Non-canonical agreement is canonical. Transactions of the Philological Society 101: 279312. On Raising. One rule of English and its theoretical implications. Cambridge/London: MIT Press.

Sadock, Jerrold M. 1987 Discontinuity in autolexical and autosemantic syntax. In Discontinuous Constituency, Geoffrey Huck and Almerindo E. Ojeda (eds.), 283301. (Syntax and Semantics 20) Orlando, FL: Academic Press. Sadock, Jerrold M. 2003 Mismatches in autonomous modular versus derivational grammar. In Mismatch. Form-function Incongruity and the Architecture of Gram-

Coming to grips with exceptions

55

mar, Elaine J. Francis and Laura A. Michaelis (eds.), 333353. Stanford, CA: Center for the Study of Language and Information. Sag, Ivan A. 1987 Grammatical hierarchy and linear precedence. In Discontinuous Constituency, Geoffrey Huck and Almerindo E. Ojeda (eds.), 283301. (Syntax and Semantics 20) Orlando, FL: Academic Press.

Webelhuth, Gert 1995 Government and Binding Theory and the Minimalist Program. Oxford/Cambridge: Blackwell.

Classical loci for exceptions: morphology and the lexicon

Exceptions to stress and harmony in Turkish: co-phonologies or prespecication? Bar Kabak and Irene Vogel s

Abstract. We examine the nature of regularities and exceptions in Turkish vowel harmony and stress assignment and show how the use of lexical strata or co-phonologies fails to determine classes of exceptions or sub-regularities in a principled manner. Instead, we propose a single mechanism, lexical pre-specication, to handle exceptions in both processes and advance an analysis based on truncation to mark disharmonic root vowels while maintaining the phonological integrity of the vowels in question. We also show that pre-specication also provides the most viable and unied treatment for all types of irregularly stressed roots in Turkish, rather than singling out specic categories such as place names.

1.

Introduction

Exceptions within phonology have plagued all modern phonological theories. Indeed, they have been the impetus behind such major theoretical developments as Kiparskys well-known Elsewhere Principle and the entire model of Lexical Phonology. Examination of the phonological phenomena that are considered exceptional reveals a variety of properties, ranging from isolated cases to fairly general sub-regularities, from forms that are considered atypical by native speakers to forms that are not found to be noteworthy in any way. In this paper, we rst briey examine the various types of phonological exceptions as well as several of the mechanisms that have been proposed to handle them. We then focus on two well-known phenomena of Turkish, Vowel Harmony and Stress Assignment and show how these quite regular phenomena are subject to specic types of exceptions. It is shown that previous types of analyses of these regular phenomena and their exceptions have serious drawbacks. An alternative proposal is advanced and shown to be simpler and more comprehensive than previous analyses.

60 2.

Bar Kabak and Irene Vogel s

Types of phonological exceptions and treatments

2.1. Overview of types of exceptions It is possible to distinguish two broad groups of phonological exceptions in terms of whether they involve a) phonotactic constraints or b) (morpho-) phonological rules. Within each type, we can nd exceptions that are fairly isolated cases and others that represent fairly widespread patterns. With regard to phonotactic constraints, let us rst consider several wellknown cases in English. There is a fairly limited set of Yiddish borrowings that include initial clusters such as [Sl-] and [Sm-] (e.g. schlep, schmuck) which violate the phonotactic constraints of English. Those speakers who use the pronunciations in question are aware of their foreign nature and such forms do not appear to have any affect on the nature of English phonology per se. For example, speakers would still judge a nonce word such as *schlick unacceptable, as opposed a word such as blick. Furthermore, many speakers ignore the violation and regularize the clusters to t the phonotactics of English (i.e. [sl-] and [sm-]). More interesting are cases involving the initial cluster [sf-] (e.g. sphinx, sphere). While such words are historically loan words of Greek origin, the typical native speaker of English is unaware of this. Aside from the somewhat uncommon spelling of [f] as ph, such words are not felt to be exceptional. Differently from the Yiddish case, however, we do not nd English speakers regularizing such words for example with [sp-] in place of [sf-] (i.e. *[spir] for sphere). It might thus seem that, while limited in distribution, the [sf-] onset does not actually constitute a phonotactic exception in English. Nevertheless, native English speakers typically reject nonce words with the same onset (e.g. *sck as opposed to blick), and may have problems pronouncing the same cluster in a foreign language (e.g. Italian sfortuno misfortune). To the extent that Vowel Harmony represents a phonotactic constraint on the possible vowel combinations in a word, so-called disharmonic words constitute phonotactic exceptions. Interestingly, in Turkish, such exceptions to Vowel Harmony run the range from a) words felt to be foreign and thus marginal to the system (e.g. randevu appointment, rtar delay, monitr monitor,) analogous to Yiddish clusters in English, to b) words that are felt to be perfectly ne words of the language (e.g. radyo radio, gazete newspaper, siyah black), even though similar nonce items might be rejected as potential words, analogous to the case of [sf-] in English (see Yava 1980 for a psycholinguistic s investigation). Exceptions with respect to Phonological Rule (P-Rule) application typically involve morphophonological alternations. As in the case of phonotactic ex-

Exceptions to stress and harmony in Turkish: co-phonologies or prespecication?

61

ceptions, these exceptions may constitute isolated cases and be recognized as marginal, or they may represent more general patterns and not be recognized as being atypical. Certain irregular plurals are recognized as exceptions to the general rules that determine the surface form of [-z], [-s] or [-@z] (e.g. tooth teeth vs. booth booths). Like the case of the Yiddish clusters, such forms are felt to be external to the system of English. Thus, when making the plural of a nonce item such as looth, speakers will use the regular pattern (i.e. looths) as opposed to an irregular form (i.e. *leeth). Other morphophonological exceptions, however, are not noted as such by native speakers. In a substantial number of words, a nal /d/ is pronounced as [Z] when followed by the sufx [-@n] -ion (e.g. divi[Z]ion, corro[Z]ion, elu[Z]ion, inva[Z]ion). The phonetically quite similar sufx, [-@v] -ive, however, causes the appearance of [s] in the same words (e.g. divi[s]ive, corro[s]ive, elu[s]ive, inva[s]ive). Both of these patterns are in a sense exceptional since other sufxes that begin with schwa typically do not cause the /d/ to undergo any changes, for example, [-@r] -er (e.g. divi[d]er, corro[d]er, etc.), or [-@b@l] -ible/-able (e.g. divi[d]able, corro[d]ible, etc.). There are, however, also several exceptions involving additional changes with [-@b@l] (e.g. divi[z]ible). Despite the various treatments of word nal /d/ when followed by different sufxes, speakers do not identify the forms in question as exceptions but rather consider them part of the basic system of English. Likewise, the loan sufx -istan in Turkish, which derives country names, contains vowels that are disharmonic, violating the vowel harmony rules of Turkish (e.g., Trkmen-istan Turkmenistan, Hind-istan India, Macar-istan Hungary). The sufx, however, is felt to be perfectly regular by native speakers, especially evinced by the fact that it can be used productively to create made-up country names (e.g., hayvan-istan the country of animals; ocuk-istan the country of children). 2.2. Overview of treatment of exceptions In modern phonological theories, a number of proposals have been advanced to account for the fact that in addition to the general patterns of a language, there tend to be not only sporadic exceptional items but classes of items that exhibit their own sets of patterns that are systematically different from those of the core phonology. It is possible to distinguish two general approaches to treating exceptions: a) those that simply mark individual items in some way as foreign (e.g. SPE) and thus exempt them from the regularities of the language and b) those that attempt to enrich the structure of the phonology of the language to recognize and formalize the role of exceptions in the make-up of the language

62

Bar Kabak and Irene Vogel s

as a whole (e.g. Lexical Phonology). The problem of the former is that while some exceptions are felt to be marginal and excluding them from the phonological system of the language seems reasonable (e.g. the Yiddish borrowings), where more widespread patterns exist and where these are not interpreted as foreign by native speakers, simply excluding the items in question is not a viable option. In the SPE model of phonology, in addition to exempting individual items from undergoing phonological rules, more general structural mechanisms were available for handling exceptions. By using abstract underlying representations (URs), it was possible to distinguish items based on differences in UR such that a particular rule/s would apply or fail to apply in order to arrive at the appropriate surface form. In addition, to account for the fact that certain P-Rules might apply only in certain morphophonological contexts, different types of boundary symbols were used, most notably + and #. Such mechanisms, however, were criticized as being merely diacritic notations. Attempts were thus made to a) constrain the nature of URs and b) eliminate the arbitrariness of boundary symbols. The latter in particular gave rise to Lexical Phonology in which different P-Rules were associated with different levels of representation; roughly, the + and # boundary phenomena were located on Levels 1 and 2, respectively. It was still necessary to stipulate at which level a given rule was operative, however, the levels themselves imposed a principled ordering relationship among the sets of rules applying at the different levels. In addition to the claim that irregularities were relegated to Level 1, it was noted that a strong correlation existed between the Latinate and Germanic components of English and Levels 1 and 2, respectively. While the rules of Level 2 (and possibly beyond, depending on the number of levels in the model) were the ones considered synchronically productive, it could also be seen that many of the rules of Level 1 were quite productive, but only among the Latinate structures. Thus, the P-Rules of Level 2 could be interpreted as characterizing the basics of English phonology, while those of Level 1 could be seen as constituting a more exceptional component of the phonology, albeit a substantial one. Analogous situations were also identied in other languages in which the different levels in Lexical Phonology appear to correspond, at least roughly, to phenomena with different linguistic origins (e.g. Sanskrit and Dravidian in Malayalam; cf. Mohanan 1986). While the ordering relationships among phonological forms and phenomena inherent in Lexical Phonology provided a number of important insights, they also introduced their own problems. One problem was that it was not clear how many levels were needed and on what grounds they were established. Furthermore, any attempt at ordering phenomena led to the well-known ordering

Exceptions to stress and harmony in Turkish: co-phonologies or prespecication?

63

paradoxes. Co-phonologies (cf. Inkelas 1998; Anttila 2002; Inkelas and Orgun 2003) were introduced to account for the fact that different phonological phenomena may apply to different items in a language without, however, requiring that the items be relegated to different levels in the lexicon. In fact, while certain words might behave in a similar fashion with respect to one rule, they might not share the same property with respect to another rule. For example, in Turkish, certain place names might exhibit an atypical stress pattern and thus be placed in a given co-phonology for this purpose (cf. Inkelas 1999; Inkelas and Orgun 2003). At the same time, however, these same words might behave regularly with respect to other phenomena such as Vowel Harmony. By associating the appropriate words with a particular co-phonology, no assumptions are made with respect to other phenomena, whereas in Lexical Phonology the levels correspond to clusters of phenomena. The main problem is, however, that it is unclear how many co-phonologies (like lexical levels) a language may have, and on what grounds they are established. Finally, within Optimality Theory, exceptions are typically addressed either in terms of the nature of the representation or the ordering of constraints in regard to the exceptional items. In the former case, a particular property of an item is specied in such a way that it is protected by a Faithfulness Constraint, with the result that the more typical form appears to be a less highly valued option. In the latter case, a particular lexical item may be specied for a special ordering of constraints, either in terms of a re-ranking of certain constraints or in terms of observation of a lexically specic constraint. Both of these possibilities treat exceptions as isolated items. While this might be appropriate for such cases as the Yiddish clusters in English, it fails to capture the fact that certain exceptions are found in groups of items, an insight inherent in both the Lexical Phonology and co-phonology approaches. Only to the extent that phonological patterns are associated with specic types of morphological junctures can OT account for at least some general patterns of exceptions, via the use of Alignment Constraints (cf. Mc Carthy and Prince 1993). 2.3. How different are the approaches to exceptions? Despite the apparent differences among the previous approaches to exceptions, they all share several fundamental properties. That is, they all crucially rely on prespecication of a) particular features in the underlying representation of a lexical item or b) a property of the item that causes it to exhibit atypical behavior with regard to the general rules (or constraints) of the language. While the SPE approach used diacritics, this is not all that different from the use of item-specic rewrite rules (e.g. tooth + pl teeth) and the application of the

64

Bar Kabak and Irene Vogel s

Elsewhere Condition in Lexical Phonology. Similarly, in OT the specication of particular features in a UR and the related Faithfulness constraints protect a given atypical surface form. In each case, individual lexical items are specied with whatever properties are needed so that they do not participate in what are considered the general patterns/phenomena of the language. Both Lexical Phonology and co-phonologies attempt, furthermore, to address the fact that certain exceptions are not isolated cases but constitute patterns themselves. The use of diacritics is not avoided, however, since they are required to indicate the location of items in their appropriate portion of the phonological system. These diacritics, however, do not refer to the nature of the UR and thus do not provide information about the nature of the exceptionality, they merely serve as a type of tag indicating which set of rules is applicable. Furthermore, it has been pointed out (e.g., Inkelas, Orgun, and Zoll 1996, 1997, 2004) that establishing co-phonologies for capturing static regularities in grammar leads to the proliferation of co-phonologies, and similar objections to the proliferation of levels were raised in regard to Lexical Phonology. In OT, too, the mechanism for capturing exceptions and static patterns is prespecication in the context of the richness of the base (McCarthy 1995; Prince and Smolensky 1993; Smolensky 1996). Lexicon Optimization (Prince and Smolensky 1993) ensures that underlying representations are posited such that the fewest highly ranked constraints are violated in the determination of surface forms. That is, it is essentially the surface forms of morphemes that determine their underlying specication. Thus, for structures which exceptionally fail to show the expected alternations and instead maintain a constant surface form, Lexicon Optimization will naturally chose them to be stored underlyingly in their surface forms-that is, to be prespecied (Inkelas, Orgun, and Zoll 2004: 549). 3. Two types of phonological exceptions and treatments

We now consider the treatment of exceptions to two general phonological phenomena in Turkish: Vowel Harmony VH and (nal) Stress Assignment (SA). We will show how previous analyses have required multiple mechanisms, including the use of diacritic specications, and we will evaluate the consequences of using multiple mechanisms for handling phonological exceptions. This evaluation leads to a proposal that a single mechanism of lexical prespecication of the exceptional behavior provides the simplest and the most efcient way to handle exceptions.

Exceptions to stress and harmony in Turkish: co-phonologies or prespecication?

65

3.1. Vowel harmony The Turkish vowel system comprises eight vowels with symmetrical high-low, front-back, and round vs. non-round oppositions, as seen in Table 1.1
Table 1. Turkish vowel phonemes. Front unrounded High Low i e rounded unrounded 1 a Back rounded u o

These vowels may only combine in certain ways, exhibiting Vowel Harmony (VH), within a specic domain, roughly a non-compound word. Abstracting away from theoretical differences with respect to the nature of the phonological features assumed for the Turkish vowel system, it is generally assumed that there are two patterns of VH: (i) palatal harmony, which requires that all vowels agree in terms of the frontness-backness dimension, and (ii) labial harmony, which requires that high vowels agree with the preceding vowel in terms of roundness (e.g., Kardetuncer 1982; Clements and Sezer 1982; van der Hulst s and van de Weijer 1991; Polgardi 1999; Kaye, Lowenstamm and Vergnaud 1985). It can be seen that VH applies within roots, as in (1), as well as in morphologically complex words, as in (2). (1) Vowel Harmony in Roots a. bar1 peace s b. yedi seven c. so uk cold g d. kt bad

1. In this paper, we mostly use the general conventions of Turkish orthography in Turkish examples. Accordingly, and represent high and low front rounded vowels, respectively. Instead of the orthographic , a barred-i (1) is used for the high back unrounded vowel (IPA: [W]) to avoid possible confusion with the symbol i which represents the high front unrounded vowel. The symbol s represents the voiceless palato-alveolar fricative (IPA: [S]) ; y the palatal glide (IPA: [j]); and c indicate voiceless and voiced palatal affricates (IPA: [tS] and [dZ]), respectively. The letter known as soft-g ( ), which corresponds to a voiced velar fricative in Anatolian diag lects, is argued not to be produced in standard Istanbul Turkish, but to hold an empty consonantal position in the underlying representation.

66 (2)

Bar Kabak and Irene Vogel s

Vowel Harmony in Morphologically Complex Words Nom Sing k1l hair il city ku bird s st milk kat oor tel wire yol road l desert Acc Sing k1l-1 il-i ku-u s st- kat-1 tel-i yol-u l- Nom Plural k1l-lar il-ler ku-lar s st-ler kat-lar tel-ler yol-lar l-ler Acc Plural k1l-lar-1 il-ler-i ku-lar-1 s st-ler-i kat-lar-1 tel-ler-i yol-lar-1 l-ler-i

Despite such regularities, certain exceptions also exist, as detailed below. 3.1.1. Disharmonic roots: a subregular pattern? Clements and Sezer (1982) observe that vowels from the set /i, e, a, o, u/ may combine freely in roots, leading to the violation of palatal and labial harmony, as in (3). (3) Root disharmony a. b. c. d. sa:hip polis bro pilot owner police ofce pilot Violations Palatal Harmony Palatal and Labial Harmony Palatal and Labial Harmony Palatal and Labial Harmony

In all of these items, palatal harmony is violated. Labial harmony is additionally violated in (3b), where we would expect the non-initial vowel to be round since it is high, and in (3c, d), where the non-inital vowels are rounded although they are low. By contrast, vowels from the set / 1/ have been observed not to occur in disharmonic roots (e.g., Clements and Sezer 1982), although several words, all of which are borrowings, contain a combination of /i/ and // (e.g., virs virus, mit hope). Goldsmith (1990: 304309) points out that the rst group, /i, e, a, o, u/ coincides with the cross-linguistically favoured ve vowel system, where the specication for the frontness/backness dimension (i.e., [back] according to Goldsmith) is fully predictable from rounding. According to Goldsmith, [back] is underspecied at this level, leading these ve vowels to combine freely within a root without violating frontness-backness harmony. The specication of [back] is required, however, when the root contains //, //, or /1/, since these vowels involve marked combinations of frontness and rounding.

Exceptions to stress and harmony in Turkish: co-phonologies or prespecication?

67

As Clements and Sezer (1982) observe, furthermore, these vowels fail to surface with other vowels in disharmonic roots since they tend to be regularized in such cases, as illustrated in (4). (4) a. b. c. d. e. mersrize kupr biskvit pro komnist merserize kpr bskvt puro kominist mercerized clipping biscuit cigar communist

Although Clements and Sezer (1982) note such regularizations, they nevertheless suggest that Vowel Harmony is not an active process in roots. The opposite conclusion is drawn by others such as Van der Hulst and van de Weijer (1991) who argue that since VH is independently required for sufxes, harmonic roots get a free ride with regard to the harmony rules. In addition, they require a set of restrictions on possible vowel combinations permitted in disharmonic roots, which would presumably be part of the (phonological) grammar of Turkish speakers: (5) Restrictions on Disharmonic Roots: a. /1/ does not occur disharmonically. b. // and // do not occur with back vowels. c. Non-initial syllables do not contain round vowels.

A similar set of restrictions on disharmonic roots is incorporated into Polgardis (1999) Optimality Theoretic analysis. As will be shown below, however, both of these approaches have shortcomings in that they a) require underlying specication of exceptionality on already exceptional roots, and b) result in a substantial number of incorrect predictions. 3.1.2. Previous accounts of vowel harmony exceptions Van der Hulst and van de Weijer (1991) assume unary vowel components (primitives) that either regularly extend over the word domain, or are linked to specic vowel (V) positions. Accordingly, Turkish vowels are assumed to occupy a V position, and are classied in terms of L(ow), F(ront), and R(ound). V positions with no further information are interpreted as high and back, presumably yielding the unmarked vowel in Turkish. The combinations of R and F with V or the specication of L, provide the remaining 7 vowel phonemes in Turkish. For example, the vowel // exhibits the maximum number of components (i.e., L, F, and R), a reection of the marked status of the vowel. Vowel Harmony is

68

Bar Kabak and Irene Vogel s

essentially viewed as the process of associating a vowel primitive to every V or L following it in the relevant domain, as illustrated in (6). Since low rounded vowels are unattested in non-initial position, an additional condition is posited to ban the association of R to a non-initial L. If no F or R element is available to be associated to subsequent positions, the regular patterns surface as illustrated in (7). (6)

(7)

V L /1 a/ g1rtlak throat

L V /a 1/ alt1 six

L L /a a/ kara black

V V /1 1/ k1s1m part

Disharmonic roots, by contrast, may not contain a bare V position, which would yield /1/ in violation of the generalization in (5a) above, or a V position with two specied elements, which would result in the violation of (5b), as shown in (8). Thus, the disharmonic roots containing the combinations /i-o/, /e-u/, /e-o/, and /i-u/, in either order, contain V positions with only one specied element. (8) Properties of disharmonic roots (after van der Hulst and van de Weijer 1991) a. Disharmonic roots do not contain bare V Examples:
F V */e V i/ V */i R L o/

Exceptions to stress and harmony in Turkish: co-phonologies or prespecication?

69

b.

Disharmonic roots do not contain roots with two prosodies Examples:


F V R */ L V R */u F V

a/

The same claims are made within an OT framework by Polgardi (1999). The constraint harmony essentially associates every vowel element (A = Low, U = Round, I = Front) to a vowel position. An additional constraint, *lcense (a, u), ensures that combinations of the elements A (Low) and U (Round) are only licensed in initial position. A third constraint, *multple ( ), conicts with harmony, banning the multiple association of elements. Since A (Low) cannot spread in Turkish, *multple (a) must outrank harmony and a more general *multple ( ) constraint. Polgardi (1999) blocks VH in roots by assuming the dec (Derived Environment Constraint), which restricts harmony to derived environments. Specically, by ranking the dec above constraints that require the association of elements (i.e., harmony), VH will only be observed in derived environments. Beyond using the dec to block harmony in roots, Polgardi employs another set of constraints to account for the additional generalizations regarding disharmonic roots given above in (5) to ensure that impossible disharmonic roots do not surface. The constraint *elements prohibits the presence of elements (prosodies such as I, U, A), with the effect of banning // and // in disharmonic roots. Similarly, /1/ is prohibited in disharmonic roots by ll which avoids empty positions. Finally, *license (i, u) ensures that the combination of I and U (i.e., //), is only licensed by multiple association of I, which serves to ban the combination of // and // with back vowels. The tableau in (9) illustrates how these constraints regularize an ungrammatical disharmonic form containing /-1/ to a form with [-i]. In the tableau, A, U, and I stand for Low, Round, and Front, respectively, and v indicates the absence of a particular element. Each vowel is represented by the elements placed on top of one another (e.g., [v I U] = //; [v v v] = /1/; [A v v] = /a/). Spreading is indicated by , and the large dot indicates the target that is affected by the spreading element. Three dots between the vowels show that there can be an intervening segment (e.g., a consonant).

70 (9)

Bar Kabak and Irene Vogel s

Regularization of ungrammatical disharmonic root / 1/ (adapted from Polgrdi 1999: 196)


v v v 1 v v I * U * v v I * U v i v v I v U * u v v I v U v 1 v A I v U v a

v I U a.

* ELEMENTS **

LIC ( I , U )

FILL

DEC

HARMONY

**!

*MULT ( ) **

b.

**

Z
c.

**

*!

d.

**

*!

*!

**

e.

***!

***

Finally, a recent approach to exceptions in phonology, the introduction of cophonologies that are intended to represent sub-regularities in the lexicon, can also be applied to the case of VH exceptions in Turkish. In particular, to capture Goldsmiths (1990) observation that /i, e, a, o, u/ may freely combine in disharmonic roots, different sets of roots would need to be relegated to at least the two co-phonologies outlined in (10). In addition, it would still be necessary to resort to lexical specication to account for exceptions to these co-phonologies, as presented in (11). (10) Co-phonology A: This co-phonology is subject to the feature values imposed by the 5-vowel system, where vowels are underspecied for frontnessbackness dimension. The members of this co-phonology are roots that contain any of the vowels from the set /i, e, a, o, u/, which are seemingly har-

Exceptions to stress and harmony in Turkish: co-phonologies or prespecication?

71

monic with respect to palatal harmony, but otherwise disharmonic in the context of the 8-vowel system. Examples: kitap book, kalem pencil, etc. Co-phonology B: This co-phonology is subject to the feature values imposed by the 8-vowel system, where vowels must be specied for the frontnessbackness dimension, rounding, and height. The members of this co-phonology are roots that obey palatal and rounding harmony. Examples: bal1k sh, eki sour, konu topic, etc. s (11) Lexcal specfcaton: Vowels in roots that belong to neither of the above co-phonologies must be associated with lexically specied feature values. Examples: k1rlent pillow; bro ofce, etc.

3.1.3. Drawbacks of previous treatments of vowel harmony exceptions One problem with the above analyses is that they do not crucially distinguish between harmonic roots and those disharmonic roots that fail to obey the properties given above in (8). That is, both types of roots may contain a) V positions with two elements (e.g. /-/ and /-/), as in the examples in (6), and b) bare V positions, as in (7). Nothing in the representation of the disharmonic cases indicates the nature of their exceptionality, so some type of additional marking is required to distinguish the two categories of roots. A second type of problem is the existence of disharmonic items with sequences that violate the generalization in (8b), some of which are quite frequent (e.g., virs virus, kllah cone, mahkm convict, rtar delay, bro ofce etc.). All such items (i.e. those with combinations of /-i/, /i-/, /-a/, /a-/, /-o/, /o-/, /-a/, and /a-/) contain Vs with two prosodies. In addition, there are several disharmonic roots with bare V positions, in violation of the generalization given in (8b). These include many commonly used borrowings in which an epenthetic vowel is introduced to break up onset clusters that are otherwise illicit in Turkish (e.g., [k1rem] from cream; [k-redi] from credit), as well as those borrowings where certain vowels were replaced with [1] for various reasons during the process of nativization (e.g., [k1rlent] pillow from ghirlanda (Italian); [k1dem] rank from kidem (Arabic)). Such cases would need some idiosyncratic marking to allow them to surface as words of Turkish. The problem is not resolved by the OT treatment, as Polgardis (1999) constraint ranking in (9) runs into the same difculty. As can be seen in the tableau,

72

Bar Kabak and Irene Vogel s

the sequence / -a/ in (9d) incurs multiple violations of the unranked constraint *elements because the sequence contains three prosodies. Likewise, all the other sequences noted above also incur multiple violations of this constraint, incorrectly predicting that such sequences should fail to appear in the surface forms of disharmonic roots in Turkish. Furthermore, the constraint ll always militates against disharmonic roots with bare V positions thus incorrectly disallowing the combination of [1] with vowels other than [a] from surfacing in the Turkish lexicon. In fact, due to recent borrowings, Turkish is replete with such words where the vowel [1] is the result of epenthesis to break up onset consonant clusters, as mentioned above. In spoken Turkish, these vowels are typically produced, although they are not always specied in the spellings in current dictionaries. The cophonology approach entertained above also encounters some substantial problems. As argued by Inkelas, Orgun and Zoll (1996, 1997, 2004), as soon as a grammar divides morphemes into distinct classes based on some detectable pattern, it permits the proliferation of (potentially uninteresting) cophonologies. One might, however, invoke the concept of statistical signicance here, particulary to restrict morpheme-specic co-phonologies. A quantitative approach might support Co-phonology A in (10), since there are many items that follow this pattern, while less frequent patterns would simply need to be handled by lexical prespecication of some sort. Inkelas, Orgun and Zoll, point out, furthermore, that even statistically signicant patterns might nevertheless not be of phonological importance in a language. Another way to limit co-phonologies suggested by Inkelas et al. is to restrict their nature, for example, by allowing them to involve only the non-native portion of the lexicon of a language (cf. Ito and Mester 1993, 1995). This would not solve the problem in Turkish, however, since there is no obvious way to separate the vocabulary into native vs. non-native categories, except possibly in the case of certain very recent borrowings.2 Furthermore, such groupings would not result in a clear-cut distinction between disharmonic and harmonic roots, and in fact, there exist certain non-native words that are harmonic (e.g., lise lycee) as well as native Turkish words that are disharmonic (e.g., anne mother). Despite differences in their approach, all of the previous models have in common that their treatment of VH requires multiple types of idiosyncratic marking: (a) the specication of a particular class of disharmonic words as being
2. See Lightner (1972) for an attempt to distinguish non-native words from native ones in the Turkish lexicon on a number of phonological properties. The results do not, however, provide the distinctions needed here.

Exceptions to stress and harmony in Turkish: co-phonologies or prespecication?

73

subject to the observed sub-regularity, and (b) a lexical specication of words that violate the sub-regularity. As we will see below, a similar degree of complexity arises in previous treatments of irregular stress patterns in Turkish. 3.2. Stress It is well known that Turkish regularly places primary stress on the nal syllable of a word, whether this is a root or a combination of root and sufxes (e.g., Lees 1961; Lewis 1967; Underhill 1967; Inkelas 1996; Kabak and Vogel 2001), as illustrated in (12). (12) Regular Word-Final Stress a. ked cat b. kedi-lr cats c. kedi-ler-m my cats d. kedi-ler-im-z our cats e. kedi-ler-im-iz-n of our cats

Despite this regularity, however, exceptions are also observed. While irregular stress exists in both roots and morphologically complex items, the focus here is on irregular stress in roots, illustrated in (13). The reader is referred to Kabak and Vogel (2001, 2005) for a discussion of other types of stress irregularities. (13) Irregular Stress: exceptional lexical items a. Edrne b. Kastmonu c. sk dar d. tiytro theatre e. fabrka factory f. ngatif negative

It has been widely noted that a number of words exhibit irregular stress, following a quantity-sensitive pattern similar to the Latin Stress Rule, often referred to as the Sezer Stress Rule (cf. Sezer 1981, Kaisse 1985; Barker 1989, akr 2000 among others). These words are typically, though not exclusively, native or foreign place names, personal names and other loan words. According to the Sezer Stress Rule (SSR), in these irregularly stressed words stress falls on the antepenultimate syllable if it is heavy and the penultimate syllable is light; otherwise it falls on the penultimate syllable. It is easily seen, however, that such a generalization does not do justice to the facts of irregular root stress in Turkish since a variety of irregularly stressed words fail to follow the SSR: (13b and f) are stressed on the antepenultimate rather than the predicted penultimate syl-

74

Bar Kabak and Irene Vogel s

lable; (13c) and (13e) are stressed on the penultimate syllable rather than the predicted antepenultimate syllable. Recently, the argument has been advanced that the SSR is productive exclusively in place names (Inkelas and Orgun 1998; Inkelas 1999; Inkelas and Orgun 2003). It is argued, furthermore, that derived words which do not themselves follow the SSR exhibit this pattern when used as place names (e.g., uls nation vs. lus (place name); sirke-c vinegar seller vs. Srkeci (place name); kulak-s-z without ear vs. Kulks1z (place name)). It should be noted, however, that several exceptions to this generalization exist. As noted above, place names such as in (13b) and (13c) fail to conform to this pattern. In addition, Demircan (1976) lists several morphologically complex place names with nal stress that thus also fail to exhibit the SSR (e.g. Adalr, Y1ld1rn, Savar, Arabac1lr, Yumurtal-k, De irmenc, etc.). There are s g also other morphologically complex place names with non-nal stress where the SSR is violated in other ways (e.g., rma anl1, klaval1, Semreci, etc.). g Furthermore, numerous place names follow the regular compound stress rule, where the stress is retained on the rst member of the compound (e.g., Fenrbahe; Kad--ky; K1r-k-kale).3 Finally, it should also be noted that there is con siderable variation with respect to stress placement in certain place names (e.g., Sylemez Sylemz; Emrgan Emirgn ; Blaban Balban Balabn; ridir E rdir; see Demircan 1976: 410 for further examples), although it g g

3. For completeness, it should be added that there are also compounds that are stressed on the nal word of the whole compound construction in Turkish (e.g., akar+yak-t (ow-Aor+fuel) fuel oil, imam+bay1l-d- (imam+faint-Past) a pot roast of lamb with eggplant puree; uyu-r+gez-r (sleep-Aor+wander-Aor) sleepwalker; gk+del-n (sky+pierce-Rel) sky-scraper; bilgi+say-r (information+count-Aor computer ). The same pattern is also manifested in several proper names (Son+g l (last+rose); Gl+y (rose+moon); Bin+nz (thousand+caprice)), as well as a few place names (e.g., Fener+bah (lantern+garden) that are formed through compounding. In the dialect of the rst author, a native speaker of Istanbul Turkish, however, some of these compounds, except for proper names, follow the regular compound stress pattern (e.g., imm+bay1ld1; uyr-gezer, Fenr+bahe). The fact that stress appears on the last syllable of the rightmost word in certain compound formations, especially in the case of proper name, suggests that these constructions have been grammaticalized as simplex nouns, and are no longer analyzed as compounds in the synchronic grammar. The distinction between compounds that are stressed on the leftmost word vs. the rightmost element also seems to be grounded in the morphosyntactic and semantic properties of compounds (e.g., endocentricity vs. exocentricity; see Demircan 1996: 147148 for details). The discussion of these properties, however, is outside the scope of this paper.

Exceptions to stress and harmony in Turkish: co-phonologies or prespecication?

75

is not clear whether this is due to dialectal, or even idiolectal, variation. While some of these variants follow the SSR, clearly others do not. 3.2.1. Stress irregularities and co-phonologies In several recent analyses of Turkish stress, it has been proposed that the different irregular stress patterns, in particular those found in place names, be treated as separate co-phonologies, beyond the general co-phonology that accounts for the regular stress pattern (cf. among others Inkelas, Orgun, and Zoll 1997; Inkelas 1999; Inkelas and Orgun 2003). In such proposals, roots as well morphologically complex place names are grouped together with regard to the SSR. Several problems are immediately apparent, however, with such an approach. First, as was noted, irregular stress is not limited to place names, so treating place names as distinct from other items requires additional mechanisms for the latter, and creates the expectation that the different types of words are crucially distinct in some way. Second, the problem of the proliferation of cophonologies arises with regard to place names. As was also noted, while some irregularly stressed place names exhibit the SSR, some show other irregular stress patterns, while a considerable amount of place names actually follow the regular Turkish stress patterns (e.g., nal stress and compound stress). This suggests that lexical items that represent place names require a multiplicity of cophonologies, minimally, one for the regular stress pattern, one for SSR and another one for place names that do not t into either of these two co-phonologies. In fact, examination of the various patterns discussed in Inkelas and Orgun (2003) reveals seven distinct categories of place names with regard to their stress assignment (cf. Kabak and Vogel 2001). If each of these categories is permitted to constitute a separate co-phonology, we would expect that the introduction of a new item that does not t any of these patterns would then also be allowed to form yet another co-phonology. It is not clear on what grounds one may establish a co-phonology, and if certain restrictions are imposed, what would happen to forms that fall outside the co-phonologies that are established in accordance with the restrictions. The obvious solution would be to permit one co-phonology that is a type of catch-all for any items that are not otherwise accounted for. In this case, it would be necessary to simply specify the location of stress individually for all of the forms in this overow co-phonology. The question such an approach raises is whether it makes sense for a grammar to invoke a variety of different mechanisms including co-phonologies and whatever means are required to a) identify the items that are associated with each co-phonology and b) ensure that the necessary phonological rules operate in the appropriate order in each co-phonology.

76

Bar Kabak and Irene Vogel s

A third, and more general, concern regarding the co-phonology approach is what it implies about the nature of the grammar itself. It has been suggested that the SSR is psychologically real and operates productively in Turkish (cf. Inkelas and Orgun 2003). Thus, we might interpret the presence of a given cophonology as an indication that the relevant portion of the grammar is synchronically active in the language. It is also claimed that speakers follow this pattern in stressing new place names, although this claim has not been substantiated with experimental data. Furthermore, based on an examination of the 948 irregularly stress place names listed in the TELL data base (= Turkish Electronic Living Lexicon: http://ist-socrates.berkeley.edu:7037/TELLhome.html), Kabak and Vogel (2005) determine that only 19% of the items unquestionably conform to the Sezer Stress pattern. The other irregular stress patterns are represented by even smaller percentages. While numerical data are not necessarily indicative of productivity (or its absence), it is also the case that the informal observations of the rst author do not support the suggestion that native speakers of Turkish regularly apply the Sezer stress pattern to new place names. Thus, if the existence of a Sezer co-phonology implies that this stress rule actively determines the pronunciation of place names in Turkish, such a claim is at best questionable without experimental data to support it. Note that speakers might generalize the fact that place names have just non-nal stress, which may not necessarily correspond to the SSR, and apply this to novel place names via analogy to the existing ones. In addition, the introduction of a Sezer co-phonology into the grammar of Turkish, through which all place names must pass, would obscure several facts of Turkish that a grammar would normally be expected to capture. For example, we would lose the information that a considerable number of place names in Turkish actually exhibit regular stress, or follow other regular patterns of the language. Indeed, not only is Turkish morphology extremely productive, the same mechanisms (i.e., sufxation, compounding) used in regular word formation processes are also employed in the formation of complex place names. If the place names are isolated from the rest of the lexicon in some way, this generalization is missed. Moreover, if it is assumed that the SSR is what accounts for stress placement in place names, the grammar must somehow be even further complicated to handle those that do not follow the SSR. As mentioned above, such items actually constitute the majority of place names, so we must ask at this point to what extent the grammar is really capturing and representing the phonological generalizations of Turkish. In an effort to maintain the concept of co-phonologies, it might be possible to simplify the system by allowing the Sezer co-phonology only to apply to place

Exceptions to stress and harmony in Turkish: co-phonologies or prespecication?

77

names that in fact exhibit the SSR. Thus we could limit the place names to only three categories: a) those that are predicted via the regular stress patterns of Turkish, b) those that follow the SSR, and c) those that are irregular but follow neither the SSR nor the regular stress patterns of Turkish. The third category would inevitably require some type of stress prespecication, as they do in fact, in Inkelas and Orguns co-phonology treatment. The question that arises at this point is whether the introduction of co-phonologies offers any substantive advantages to the grammar of Turkish. As we have seen, in the case of place names, the result is a fairly complex system which requires the segmentation of the Turkish lexicon into a number of components, with the consequence that certain generalizations about stress and word formation processes are missed. Furthermore, the introduction of co-phonologies does not free the grammar of the need for overt prespecication of irregularities in the stress system, as even in a model with a Sezer co-phonology, the majority of place names with irregular stress will need some direct specication of where this stress falls. Given that such a direct method of prespecication is required in any case, the question that arises is whether the additional machinery involved in co-phonologies is necessary. That is, it appears to add cost to the system without reducing the need for the mechanism of lexical prespecication of idiosyncratic properties.

4.

Motivating prespecication

It seems uncontroversial that the inclusion of item-specic information in a lexical representation is an indication of some form of irregularity. As we have shown, in Turkish such information is required in both disharmonic roots and atypical stress patterns. Since the introduction of additional constructs such as lexical levels or co-phonologies does not eliminate the necessity of lexical specication, these constructs can be seen as merely adding cost to the system. We propose, instead, that lexical specication be used as the sole means of representing phonological exceptions. As indicated above, this is essentially the approach taken within OT, where exceptions are handled via the relatively high ranking of faithfulness constraints that protect underlying structure (e.g., Inkelas 1999). Indeed, given two competing input forms, one fully specied (i.e. prespecied as containing a particular property), and one partially specied (i.e. prespecied as belonging to a special co-phonology), Lexicon Optimization ensures that the fully specied alternative will be preferred. Thus, this use of prespecication in fact makes co-phonologies superuous within OT.

78

Bar Kabak and Irene Vogel s

In the following sections, we show in detail how a prespecication model accounts for the facts of disharmonic roots and exceptional stress. In fact, we propose that this type of model is potentially extendable in a simple and straightforward way to other types of phonological exceptions as well. 4.1. Prespecication in disharmonic roots In a model that uses prespecication as the sole mechanism for treating disharmonic roots in Turkish, there are at least three options for analyzing the vowel system. In the context of the overall system of Turkish phonology, we will exclude two of these options. We will then present our proposal, Option 3, in which we argue that all that is required is that atypical roots be lexically specied as being excluded from the progressive Vowel Harmony rules via the marking of the precise exceptionality or disharmony. It is, furthermore, the model advanced as Option 3 that will be used below in accounting for the exceptional stress patterns. 4.1.1. Prespecication in disharmonic roots: two potential problems As indicated above, while we propose to treat phonological exceptions via the mechanism of prespecication, the choices of what to prespecify, and in what way, are not trivial. In this regard, we rst consider two options that turn out to be untenable, despite what appear to be reasonable assumptions. The rst case, Option 1, accounts for disharmonic roots by underlyingly specifying those features that do not undergo Vowel Harmony, making use of opaque segments and their interaction with the general autosegmental association conventions (e.g., Clements 1981). The standard view of opaque segments entails that such segments not only fail to undergo VH themselves, but block the spreading of a given feature(s) and at the same time initiate the spreading of a different feature(s). According to Clements and Sezer (1982), all root vowels, whether harmonic or not, are opaque (i.e. fully specied). Similarly, in Inkelas (1995) (a restricted version of) Lexicon Optimization determines that predictable feature values are underspecied only when they enter into surface alternations. Since root vowels never alternate, this approach entails that all root vowels, regardless of whether they are harmonic on the surface, must be specied for the relevant features (e.g., backness, rounding). Thus, only sufx vowels may be the targets of VH since these are the only ones that exhibit predictable alternations for backness and rounding in Turkish. As shown in (14), the root vowels are all specied since they themselves do not alternate; only the last feature specications spread to subsequent morphemes (cf. Clements and Sezer 1982).

Exceptions to stress and harmony in Turkish: co-phonologies or prespecication?

79

(14)

Option 1: Lexical specication of all root vowel features

Although root vowels do not alternate, there is experimental research that indicates that such vowels may nevertheless be underspecied (cf. Harrison and Kaun 2000, 2001). In an initial study involving a language game, Turkish speakers were taught a reduplication rule which involved replacing the initial vowel of a set of real (Turkish) words, both harmonic and disharmonic, with [a] or [u]. Harrison and Kaun (2001) report that while the subjects tended to re-harmonize the harmonic roots according to the pattern of backness harmony (e.g., kibrit match [kabr1t] not *[kabrit]), they failed to show the same pattern with the disharmonic roots (e.g., butik boutique [batik] not [bat1k]). More systematic experimentation is needed, however, the fact that similar results were also obtained from other languages including Finnish and Tuvan, suggests that pervasive surface-true patterns, such as the harmonic vowel sequences in Turkish roots, should in fact be analyzed as underspecied; only idiosyncratic patterns, such as vowel sequences in disharmonic roots, require full specication (cf. Harrison and Kaun 2001). The second drawback of Option 1 is its lack of representational economy. While VH is successfully blocked by the full specication of non-harmonizing root vowels (and sufxes) specifying the features of all root vowels misses a crucial generalization. That is, it fails to show that, in fact, most roots share vowel features. This, in turn, leads to excessive feature specication, where the harmonizing root vowels could otherwise benet from a free ride (cf. van der Hulst and van de Weijer 1991). Since Option 1 is not feasible, let us now consider Option 2, where lexical pre-specication of vowel features is allowed only in disharmonic roots (15a).

80

Bar Kabak and Irene Vogel s

Harmonic roots are underspecied except for the initial vowel which bears the features for frontness-backness and rounding that spread throughout the rest of the item (15b). (15) Option 2: Lexical specication of vowel features only in disharmonic roots
a. +R -R +R -l I [kortizon-lu] with cortisone

kErt Iz E n +B b. -R pI r lEnt E +B -B +B

-l I

[pirlanta-li]

with brilliants

While this approach avoids the drawbacks associated with specifying all the root vowel features seen in relation to Option 2, it nevertheless gives rise to several other problems. These center on the feature (i.e. [-Back], or [Coronal] in other frameworks)4 specication of front vowels in disharmonic roots. First, it should be noted that the prespecication of [Coronal] (i.e., [-Back] in Clements and Sezer 1982) in disharmonic roots is inconsistent with a redundancy-free lexicon, where predictable and redundant features are excluded from underlying representations. The Articulator feature [Dorsal] (corresponding to [+Back] in Clements and Sezer 1982) and the Tongue Height features (e.g. [Low]) are sufcient to distinguish all the vowel phonemes of Turkish. Thus, [Coronal] is essentially redundant and should be excluded from the underlying representation. Second, underlying specication of [Coronal] conicts with the cross-linguistically unmarked status of this feature. That is, if the vowels are fully specied, this would require that the unmarked [Coronal] feature also be specied,
4. We opt to use [Coronal], corresponding to [-Back] in Clements and Sezer (1982), to characterize front vowels. This choice is based on the original feature organization proposed by Jakobson et al (1952), where vowels and consonants share the same place features such as [Labial], [Coronal], and [Dorsal] (See also Lahiri and Evers (1991) for a similar view). The theoretical consequences of the choice between [-Back] and [Coronal] are, however, tangential to the purposes of this paper.

Exceptions to stress and harmony in Turkish: co-phonologies or prespecication?

81

contrary to an approach in which only marked feature values are specied (for discussion of the status of [Coronal], see among others Lahiri 2000 for a summary of the issues; Paradis and Prunet 1991 for several papers on coronals; and Kabak 2007 for further arguments pertaining specically to the phonological inertness of [Coronal] in Turkish). Furthermore, the prespecication of front vowels for [Coronal] in disharmonic roots makes the prediction that these front vowels behave differently from the same vowels in harmonic roots. In fact, there is at least one phonological process, Vowel Assimilation (VA), where such a distinction, in fact, makes an incorrect prediction. VA optionally eliminates vowel sequences ([V1 .V2 ]) by the assimilation of V2 to V1 , yielding a long vowel (cf. Sezer 1986, Kabak 2007). Typically, V2 must be high and both V1 and V2 must share backness and rounding features in order for VA to apply (e.g., [yourt] [yoort] yoghurt; [a1r] [aar] heavy; [gs] [gs] breast). Although the sequence [e.i] satises these conditions, VA fails to apply in this case (e.g., [beit] *[beet] couplet, [deil] *[deel not). This difference is not limited to roots, but is also observed between a root and a sufx (e.g. [aya-1] (foot-Poss.3S) [ayaa], but [bebe-i] (baby-Poss.3S) *[bebee]). If we assume that VA applies only when the vowels in question share the same Place feature (cf. Kabak 20075 ), and we assume as in Option 2 (and Option 1) that [Coronal] is underlyingly specied in disharmonic roots, the prediction is made that VA should apply in sequences of [e.i] as in other sequences in disharmonic roots. Consider the representation of the disharmonic root kafein caffeine in (16). (16) The representation of kafein caffeine
[Low] [High] kV f V V n [Dor.] [Cor.] [kafein] caffeine

5. Kabak (2007) argues that the two vowels in a sequence must share the same specied Place feature in order to undergo VA. The seemingly exceptional nature of the sequence [e.i] is then straightforwardly accounted for in terms of the assumption that vowels are not specied as [Coronal] in Turkish. Since [Coronal] is absent in underlying representations, and neither [e] nor [i] is [Labial], there is indeed no Place feature that is shared by the members of the [e.i] sequence, and hence no motivation for VA.

82

Bar Kabak and Irene Vogel s

Given that the nal two vowels share the feature [Coronal], it is predicted that VA will apply, yielding *[kafeen]. In fact, this is not the correct result, since VA does not apply here, or in other disharmonic roots with the sequence [e.i] (e.g., ateist atheist *[ateest]). Thus, while the prespecication of all vowel features in disharmonic roots does provide a mechanism for blocking VH, it results in an incorrect analysis of of another phenomenon, VA. 4.1.2. Prespecication in Disharmonic Roots: maximum underspecication The third alternative, Option 3, not only provides coverage for the full range of facts of Turkish phonology, it also avoids the problems seen in the previous analysis. In this option, disharmonic root vowels are represented with the minimal number of features necessary to capture their lexical status.6 The only lexical marking required for a disharmonic root is whether or not it obeys the general principles of VH. As mentioned above, we are assuming that [Coronal] is not specied underlyingly since it is a default feature, and will be lled in at a subsequent stage. As shown in (17), roots that disobey VH are, however, marked as carrying a [Dorsal] (or [Labial] in the case of disharmonic roots violating labial harmony) specication. This feature, however, is prohibited from spreading rightward, as indicated here by the association line truncated with x. (17) Disharmonic Root (spreading truncated): maximum underspecication

[Dorsal] cannot spread. [Coronal] is inserted later as default on last V. It should be noted that when the [Dorsal] (or [Labial]) feature is associated with the nal vowel of a root, spreading is not blocked by a truncated association line. This different behavior of the nal root vowel is, in fact, precisely what we would expect since sufxes achieve their full feature specications by VH,
6. This is not quite the same as Radical Underspecication (cf. Archangeli 1988; Archangeli and Pulleyblank 1994), however, the details of this difference are not relevant here.

Exceptions to stress and harmony in Turkish: co-phonologies or prespecication?

83

which spreads the requisite features rightward from the root, as seen in (18a). In this respect, while a root nal [Dorsal] (or [Labial]) feature is free to spread, a [Coronal] feature in the case of a previously truncated feature is inserted as a default feature, as in (18b). In both cases, the spreading applies from relevant starting points as it does in regular harmonic roots, illustrated in (18c) and (18d).7 (18) [Dorsal]
a. V y V t -d V [Low] [Low] [e ya-da] furniture-Loc

Root-nal specied [Dorsal] is free to spread onto the sufx vowel. [Coronal] is inserted as default on the rst V later at the phonetic level. [Dorsal]

VH is blocked within the root. [Coronal] is inserted later as default on the root nal V as well as the sufx V at the phonetic level. [Dorsal]

c.

r k V [Low]

-d V [Low]

[arka-da]

behind-Loc

VH applies: [Dorsal] feature spreads to all following Vs

7. It should be noted that sufx vowels unexpectedly surface as coronal, instead of dorsal, in a certain set of words (e.g., alkol-den (alcohol-Abl); saat-im (watch-Poss.1S), etc.). In such cases, too, the feature [Dorsal] in the nal syllable of the root is prohibited from spreading onto the sufx vowels via truncation (see Kabak 2007 for details).

84

Bar Kabak and Irene Vogel s

d.

k V s [Low]

V r -d V [Low]

[keser-de]

adze-Loc

VH applies: [Coronal] is inserted on all underspecied Vs via redundancy rule later. It should be noted that marking disharmonic roots as exceptions to VH does not exclude them from the application of the so-called epenthesis-driven Vowel Harmony. In this type of VH, vowels epenthesized to break up consonant clusters receive their specication from neighboring vowels (and consonants), however, the pattern is different from that observed in the more usual case of (progressive) Vowel Harmony. That is, unlike progressive VH, epenthesis-driven VH operates from right to left, and is sensitive to the types of consonants within the clusters. For example, in /k/-clusters, the epenthesized vowel is always back (e.g. [k1rem] vs. *[kirem] cream). In addition, it is observed that even though spreading generally proceeds from right to left, low round vowels tend not to trigger rounding on a preceding vowel (e.g., [k1rom] vs. *[kurom] chrome). As mentioned, we are assuming that [Coronal] is not specied in underlying representations, but rather arises as a default feature. At rst glance, it might seem that [Coronal] must be specied to account for a phenomenon of palatalization, whereby the velars and /l/ are fronted when followed by a front vowel within the same syllable (e.g. [sa.k-z] bubble-gum, [kat] oor vs. [se.kj iz] eight, [kj ir] dirt; [par.lak] bright, [mal] property vs. [ka.lj e] castle, [kelj ] bald). We assume such fronting is a matter of surface phonetic form and thus does not depend on the presence of [Coronal] in the underlying representation. Even if this approach is not taken, it is not necessary to provide the feature [Coronal] in the underlying representation. That is, palatalization can be analyzed as the delinking of [Dorsal] from velar consonants in the context of a front vowel. This by no means requires an extra mechanism: feature delinking is one of the well-establihed notions within autosegmental phonology. After delinking, the tongue height specication (i.e. [High]) of the velar consonants in question (both the stops and the lateral) sufces to yield the requisite palatalized variants on the surface. It should be noted that certain loan words in Turkish have palatalized segments where this cannot be due to the vocalic context (e.g. [be.kj ar] single, [se.lj am] greeting). In these cases, the palatalized sounds are in contrast with velars, as can be observed in several minimal pairs (e.g. [kar] snow vs. [kj ar] prot; [sol] left vs. [solj ] the musical note G). In these special cases, the palatal consonants must be afforded special status (cf.

Exceptions to stress and harmony in Turkish: co-phonologies or prespecication?

85

Kabak 2007, where the underlyingly velar consonants are specied for [Dorsal] and [High] while those that are underlying palatal are only specied for [High], the Articulator node being underspecied), like the case the Yiddish clusters in English, and as such, do not require a revision of the core phonology of the language. In sum, the prespecication model proposed here has several advantages over the other two presecication options. Specically, it ensures that underlying representations are redundancy-free, giving vowel features a free ride to the largest extent possible via VH. Furthermore, since the same features that are assumed to be underspecied in the general system of Turkish are also not specied in disharmonic roots, generality is achieved in addition to economy. Thus, the proposed model succeeds in respecting and expressing the overall phonological structure of Turkish in a simple and insightful manner. 4.2. Prespecication of exceptional root stress As was seen above, lexical prespecication of irregularly stressed syllables is required in the different approaches considered. That is, both the co-phonology and OT approaches invoke underlying specication of idiosyncratically stressed syllables in addition to the other mechanisms they include. The proposal advanced here requires only the mechanism of prespecication, and thus shares this property with previous proposals. At the same time, it also avoids the complexities associated with the other proposals by not requiring a combination of other mechanisms as well. Furthermore, the approach advanced here provides a unied treatment for irregularly stressed roots in Turkish, instead of singling out specic categories such as place names. As indicated above, in both place names and other types of roots (e.g., words of foreign origin) with irregular stress, non-nal stress sometimes falls on the syllable identied by the quantity sensitive Sezer Stress Rule, as in (19), but sometimes it falls on other syllables, as in (20). (19) Irregularly stressed roots that follow the SSR a. Place names: nkara, Kanda Canada, Edrne, etc. b. Other roots of foreign origin (proper names, loan words, etc.): Dorti Dorothy, Katarna, Toba Toshiba, sandlye chair, s kafetrya cafeteria, fak lte faculty, sampnya champagne, gazte newspaper, etc.

86 (20)

Bar Kabak and Irene Vogel s

Irregularly stressed roots that do not follow the SSR a. Place names: sk dar, Belka Belgium, Afrka Africa, Avrpa Europe, Bermda, etc. b. Other roots of foreign origin (proper names, loan words, etc.): Gorbov Gorbachov, Mandla, ngatif negative, pzitif positive, fabrka 8 factory, etc.

If the place names are treated separately from other categories of words, the generalization that similar patterns are found across the lexicon is missed. Furthermore, the establishment of a representation that focuses on the fact that a number of place names happen to follow a quantity sensitive stress pattern obscures the facts that a) not all irregularly stressed place names follow this pattern, and b) not all words that happen to follow the stress pattern in question are place names. The mechanism we propose for representing irregular root stress of any sort is the prespecication or marking of the relevant syllable as being stressbearing. Possibilities for such a representation include the use of a grid mark, special foot structure, or some type of diacritic stress feature. We opt for the use of grid structure since it permits the unication of the representation of stress within words as well as within larger phonological constituents. Examples are shown in (21), where * above a syllable indicates that it bears exceptional stress. (21) Exceptional Stress

a. b. c. d. e.

Ankara

Kanada Canada

Belika Belgium

fabrika factory

negatif negative

8. It should be noted that the [b] in fabrika functions as the coda of the rst syllable rather than the onset of the second syllable since complex onsets are impermissible in Turkish.

Exceptions to stress and harmony in Turkish: co-phonologies or prespecication?

87

While the focus here is on roots, it should be noted that the mechanism of prespecication of the locus of irregular stress in terms of a grid mark automatically also accounts for irregular stress in more complex word structures. For example, a word such as Avrupa-l1-la-arak while/by becoming European contains both s an irregularly stressed root (Avrpa Europe) and an irregularly stressed sufx (-(y)rEk while/by). This is shown with the relevant grid markings in (22). (22)

Avrupa-l1-la-arak s Europe-Der-Der-while/by

while/by becoming European

While this is a well-formed word from in terms of its morpho-syntactic structure, it is not well-formed as a Phonological Word (PW), since PWs may only contain a single primary stress. In Kabak and Vogel (2005), it is argued that such items, in fact, constitute a Clitic Group (CG), the constituent in the phonological hierarchy between the PW and the Phonological Phrase. In Turkish, the stress assignment rule for the CG assigns prominence to the leftmost lexical stress, thus yielding a representation such as (23).9 (23)

Avrupa-l1-la-arak s

while/by becoming European

Since there is currently no systematic work on secondary stress in Turkish, it is not clear whether the rightmost stress is lost in such structures, or whether it remains as a type of secondary stress. In either case, there is independent motivation for the CG stress rule (cf. Kabak and Vogel 2001, 2005). Furthermore, the specication of idiosyncratic stress as part of the underlying representation permits a simple and straightforward account of irregular stress in both roots and morphologically complex items, and avoids the introduction of additional representations or operations. Thus, stress is generally assigned via the regular stress rules (i.e., PW stress rule, CG stress rule), or is realized as specied in the underlying representation in the case of exceptional stress. In both cases, maximum use is made of the general principles of Turkish stress assignment.

9. This fact is handled by a principle of Leftmost Stress Wins in Inkelas and Orgun (2003), however, there are fundamental differences between their approach and an analysis involving Clitic Group stress, as discussed in detail in Kabak and Vogel (2005).

88 5.

Bar Kabak and Irene Vogel s

Conclusion

In this paper, we have argued that prespecication is the only descriptively adequate and theoretically viable means of handling various kinds of phonological exceptions. On the basis of evidence from disharmonic and exceptionally stressed roots in Turkish, we have shown that previous approaches do not allow us to determine classes of exceptions or subregularities in a principled manner either in terms of lexical strata or co-phonologies. Furthermore, lexical specication is inevitable in any model of exceptions. The lexical (pre-)specication model proposed here maximizes representational economy, by requiring a minimum of underlying feature specication. With respect to Vowel Harmony, we have shown that maximum underspecication of vowel features can be applied in the same way to both disharmonic and harmonic roots. We have employed feature truncation as a means of lexical specication, preventing the spreading of the lexically specied (exceptional) features (e.g., [Dorsal]) to unspecied segments. This permits any redundant features to remain unspecied in the underlying representation. As such, a single mechanism, lexical specication, is used to capture two separate objectives: (i) to mark the exceptional property in question, that is disharmony, as well as (ii) to ensure that the a given redundant feature has the same property throughout the lexicon. While the truncation convention may seem add an extra mechanism to grammar, it should be viewed as a variant of autosegmental tools that are already in use for lexical specication. This type of lexical marking is crucially needed to separate the marking of the exceptional pattern in question from the representation of phonological features. Within the phonological grammar of Turkish, there is no reason to believe that /e/ in keman violin (a disharmonic root) has a different mental status than in kel guarantor (a harmonic loan from Arabic), or in kemik bone (a harmonic native root), or even in kel bald for that matter. Thus, any model that aims to capture this generalization must inevitably resort to different conventions to mark disharmony while maintaining the phonological integrity of the vowels within the root in question. In our case, we accomplish both objectives by using a single principle, lexical specication, within a single phonological grammar. Likewise, in the case of exceptionally stressed roots, we have shown that the lexical specication of atypical stress is required in addition to other mechanisms proposed in previous models, such as co-phonologies. In the present proposal, the only mechanism required is prespecication in the form of a grid mark associated with the exceptionally stressed syllable. Not only does this approach avoid the complexities associated with previous models that include prespecication alongside other mechanisms, it provides a unied treatment for all

Exceptions to stress and harmony in Turkish: co-phonologies or prespecication?

89

types of irregularly items in Turkish, rather than singling out specic categories such as place names. In conclusion, we would like to point out that by adopting a model of lexical specication, we are by no means suggesting that exceptions constitute uninteresting phenomena. In fact, such a proposal suggests a fundamental distinction in the grammar between the core phonology and any other phenomena or subsystems. It furthermore permits a principled means of assessing the extent to which an item is exceptional in terms of the number of prespecied features it requires. Thus, an item might be exceptional with regard to either VH or stress, or it may be exceptional with regard to both. The isolation of exceptional items in separate parts of the phonology makes it clear that they do not adhere to the general phonological principles of the language, however, it does not provide information as to how and to what extent those items constitute exceptions. The issues raised here ultimately need to be examined on the basis of experimental research aimed at investigating the acquisition of the exceptional properties in question as well as their productivity and extension to novel items. The diachotomy between arbitrary exceptions and those exceptional patterns that display restricted productivity has been noted in other linguistic areas such as morphology and syntax (e.g., Jnsson and Eythrsson, this volume), including those that exhibit extreme instances of exceptionality (e.g., Corbett, this volume). It is generally noted, for example, that semi-productive morphological patterns can be extended to already existing or new words based on linguistic similarity. For example, Jnsson and Eythrsson (this volume) attribute the diachronic stability and partial productivity of verbs with accusative subjects in the history of Icelandic to the fact that such verbs form coherent and homogenous subclasses due to the synactic and semantic similarities that exist between them. It seems that such similarities have been transparent to the learner, leading to the maintanence of this subclass of verbs for generations. How exceptional stress and harmony patterns arise and why they are maintained by learners remains to be explored in Turkish. In the absence of positive psycholinguistic evidence for partitioning the phonology of Turkish into some indeterminate number of components, the use of lexical specication of exceptional features remains the simplest and most direct means of accounting for all types of exceptional phonological behavior. It should be noted that recent psycholinguistically oriented approaches towards patterned exceptions also assert that information about exceptionality must be listed in lexical entries. For example, Zuraw (2000) marks exceptions to nasal substitution in Tagalog in the lexicon, but allows the exceptional pattern in question to perpetuate into new words through stochastically (low)-ranked markedness constraints within a single grammar. This approach is in line with our proposal that lexical speci-

90

Bar Kabak and Irene Vogel s

cation is at the foundation of the account of phonological exceptionality across languages. Acknowledgements The research was supported in part by SFB 471 Variation and Evolution in the Lexicon at the University of Konstanz, funded by the German Research Foundation (Deutsche Forschungsgemeinschaft). Abbreviations Acc Accusative Dat Dative Der Derivational morpheme Loc Locative Nom Nominative Pl Plural References
Anttila, Arto 2002 Morphologically conditioned phonological alternations. Natural Language and Linguistic Theory 20: 142.

Archangeli, Diana 1988 Aspects of underspecication theory. Phonology 5: 183207. Archangeli, Diana, and D. Pulleyblank 1994 Grounded Phonology. Cambridge, MA: MIT Press. Barker, Christopher 1989 Extrametricality, the cycle, and Turkish word stress. In Phonology at Santa Cruz, J. Ito, and J. Runner (eds.), 1: 133. University of California, CA: Syntax Research Center. Clements, George N. 1981 Akan vowel harmony: a nonlinear analysis. In Harvard Studies in Phonology. Vol. 2, George N. Clements (ed.), 108177. Bloomington: Indiana University Linguistics Club. Clements, George N., and Engin Sezer 1982 Vowel and consonant Disharmony in Turkish. In The Structure of Phonological Representations, Part II, Harry van der Hulst, and Norval Smith (eds.), 213255. Dordrecht: Foris.

Exceptions to stress and harmony in Turkish: co-phonologies or prespecication? akr, Cem 2000

91

On non-nal stress in Turkish simplex words. In Studies on Turkish and Turkic Languages. Asl Gksel, and Celia Kerslake (eds.), 310. Wiesbaden: Harrassowitz.

Demircan, mer 1976 Trkiye yer adlarnda vurgu. Trk Dili 300: 402411. Demircan, mer 1996 Trkenin Sesdizimi. Istanbul: Der Yaynevi. Goldsmith, John A. 1990 Autosegmental and Metrical Phonology. Oxford: Blackwell. Harrison, Davin, and Abigail Kaun. 2000 Pattern-responsive lexicon optimization. Proceedings of NELS 30. Harrison, Davin, and Abigail Kaun. 2001 Patterns, pervasive patterns, and feature specication. In Distinctive Feature Theory, Tracy A. Hall (ed.), 211236. Berlin: Mouton de Gruyter. Hulst, Harry van der, and Jeroen van de Weijer 1991 Topics in Turkish phonology. In Turkish Linguistics Today, H. E. Boeschoten, and L. T. Verhoeven (eds.), 1159. Leiden: Brill. Inkelas, Sharon 1995 The consequences of optimization for underspecication. In Proceedings of the Northeastern Linguistics Society 25, Jill Beckman (ed.), 287302, Amherst: GLSA. The interaction of phrase and word rules in Turkish: An apparent paradox in the prosodic hierarchy. Linguistic Review 13: 193217. The theoretical status of morphologically conditioned phonology: A case study from dominance. Yearbook of Morphology 1997, 12155. Exceptional stress-attracting sufxes in Turkish: representations vs. the grammar. In The Prosody-Morphology Interface, Ren Kager, Harry van der Hulst, and Wim Zonneveld (eds.), 13487. Cambridge: Cambridge University Press.

Inkelas, Sharon 1996 Inkelas, Sharon 1998 Inkelas, Sharon 1999

Inkelas, Sharon, and Cemil Orhan Orgun 1999 Level (non)ordering in recursive morphology: evidence from Turkish. In Morphology and its relation to phonology and syntax, Steven Lapointe, Diane Brentari, and Patrick Farrell (eds.), 360392. Stanford: CSLI.

92

Bar Kabak and Irene Vogel s

Inkelas, Sharon, and Cemil Orhan Orgun 2003 Turkish stress: A review. Phonology 20 (1): 139161. Inkelas, Sharon, Cemil Orhan Orgun, and Cheryl Zoll 1996 Exceptions and static phonological patterns: cophonologies vs. prespecication. ROA-124-0496. Inkelas, Sharon, Cemil Orhan Orgun, and Cheryl Zoll 1996 Implications of lexical exceptions for the nature of grammar. In Derivations and Constraints in Phonology, Iggy Roca (ed.), 393418, Oxford: Clarendon Press. Inkelas, Sharon, Cemil Orhan Orgun, and Cheryl Zoll 2004 Implications of lexical exceptions for the nature of grammar. In Optimality Theory in Phonology: A Reader, John J. McCarthy (ed.), 542 551. Malden: Blackwell. Ito, Junko, and Armin Mester 1993 Japanese phonology: Constraint domains and structure preservation. University of California, Santa Cruz: Linguistics Research Center Publication (LRC-93-06). Ito, Junko, and Armin Mester 1995 Japanese phonology. In The Handbook of Phonological Theory, John Goldsmith (ed.), 817838. Cambridge: Blackwell. Jakobson, Roman, Gunnar Fant, and Morris Halle 1952 Preliminaries to Speech Analysis: The Distinctive Features and their Correlates. Cambridge, MA: MIT Press. Kabak, Bar s 2007 Hiatus resolution in Turkish: an underspecication account. Lingua. 117: 1378-1411.

Kabak, Bar, and Irene Vogel s 2001 Phonological word and stress assignment in Turkish. Phonology 18: 315360. Kabak, Bar, and Irene Vogel s 2005 Irregular stress in Turkish. Unpublished ms. University of Konstanz / University of Delaware. Kaisse, Ellen 1985 Some theoretical consequences of stress rules in Turkish. In Papers from the General Session of the 21st Regional Meeting, W. Eilfort, P. Kroeber, and K. Peterson (eds.), 199209. Chicago, IL: Chicago Linguistic Society.

Kardetuncer, Aino s 1982 A Three-Boundary System for Turkish. Linguistic Analysis 10 (2): 95117.

Exceptions to stress and harmony in Turkish: co-phonologies or prespecication?

93

Kaye, Jonathan, Jean Lowenstamm, and Jean-Roger Vergnaud 1985 The internal structure of phonological elements: A theory of charm and government. Phonology Yearbook 2: 305328. Lahiri, Aditi 2000 Phonology: Structure, representation, and process. In Aspects of Language Production, Linda Wheeldon (ed.), 165225. Hove/Philadelphia: Psychology Press.

Lahiri, Aditi, and Vincent Evers 1991 Palatalization and coronality. In The Special Status of Coronals, C. Paradis, and F. Prunet (eds.), 79100. London: Academic Press. Lees, Robert B. 1961 Lees, Robert B. 1966 The Phonology of Modern Standard Turkish. (Uralic and Altaic Series 6) Bloomington: Indiana University Publications. On the interpretation of a Turkish vowel alternation. Anthropological Linguistics 8: 3239.

Lewis, Geoffrey L. 1967 Turkish Grammar. Oxford: Oxford University Press. Lightner, Theodor 1972 Problems in the Theory of Phonology. Vol. 1: Russian Phonology and Turkish Phonology. Edmonton, Champaign: Linguistic Research Inc. McCarthy, John 1995 Extensions of Faithfulness: Rotuman Revisited. ROA-110. McCarthy, John, and Alan S. Prince 1993 Prosodic Morphology I: Constraint Interaction and Satisfaction. Unpublished ms. Mohanan, K. P. 1986 The Theory of Lexical Phonology. Dordrecht: Reidel.

Paradis, Carole, and Jean-Francois Prunet (eds.) 1991 The Special Status of Coronals. Vol. 2: Phonetics and Phonology. New York: Academic Press. Polgrdi, Kriztina 1999 Vowel harmony and disharmony in Turkish. The Linguistic Review 16 (2): 187204. Prince, Alan S., and Paul Smolensky 1993 Optimality Theory: Constraint Interaction in Generative Grammar. Ms. Rutgers University and the University of Colorado: Boulder. Sezer, Engin 1981 On non-nal stress in Turkish. Journal of Turkish Studies 5: 6169.

94

Bar Kabak and Irene Vogel s

Sezer, Engin 1985

An autosegmental analysis of compensatory lengthening in Turkish. In Studies in Compensatory Lengthening, W. L. Wetzels, and E. Sezer (eds.), 227250. Dordrecht: Foris.

Smolensky, Paul 1996 The initial state and Richness of the Base in Optimality Theory. Technical Report, JHU-Cogsci-96-4. Baltimore: Department of Cognitive Science, John Hopkins University. Underhill, Robert 1976 Turkish Grammar. Cambridge, MA: The MIT Press. Yava, Mehmet S. s 1980 Borrowing and its implications for Turkish phonology. Ph. D. diss., University of Kansas. Zuraw, Kie 2000 Patterned exceptions in phonology. Ph. D. diss., University of California Los Angeles.

Lexical exceptions as prespecication: some critical remarks T.A. Hall

1.

Introduction: Lexical exceptions as prespecication

In their article on exceptions to Vowel Harmony and Stress Assignment in Turkish Kabak and Vogel (henceforth K&V) discuss and reject two competing theories of lexical prespecication and then propose an alternative model which they ultimately adopt. Let us review briey one of the models they reject and compare it with the one they endorse. According to the former, the exceptionality of certain segments to Vowel Harmony (VH) in disharmonic roots is captured by lexically prespecifying these exceptional vowels with the features which propagate in VH. The structures in (1) are my interpretations of representations at the stage before VH; cf. (15) in K&V for the equivalent structures at the point when VH applies. On the approach in (1), VH spreads [+B(ack)] (via palatal harmony) and [R(ound)] (via labial harmony) in (1a) for the word [p1rlanta-l1] with brilliants. The spreading of [+Back] and [Round] affects all of the vowels to the right of the rst /I/ in this example because they are not specied for the features that spread. The disharmonic root [kortizon-lu] with cortisone is presented in (1b) at the stage before VH. (1) Lexical Prespecication with a Distinctive Feature (LPDF):
R a. p I r l E n t E - l I +B b. +R R +R

kE rt I z E n -l I +B B +B

The exceptionality of the second vowel in (1b) referred to henceforth as the opaque vowel is captured by prespecifying it for the two features which spread in VH. In a rule-based approach the structure-building rule of VH cannot spread [+Round] and [+Back] to the opaque vowel because it is underlyingly

96

T.A. Hall

[Round] and [Back]. In an OT-style treatment there are high ranking faithfulness constraints (e.g. Max- [Round], Max-[Back]) which protect the underlying features and therefore ensure that they surface as such. I refer to the approach in (1b) henceforth as Lexical Prespecication with a Distinctive Feature (LPDF). As an alternative to the LPDF treatment, K&V endorse an analysis in which opaque vowels are represented with the minimal number of features necessary to capture their lexical status. For the examples in (1) I interpret the input to VH in K&Vs approach to be the representations in (2). On this approach VH spreads the privative feature [D(orsal)] to the right in (2a).1 The exceptional example is given in (2b). Here we see that a crucial difference between this structure and the one in (1b) is that the opaque vowel in (2b) is underspecied for all features that spread in VH. (2) Maximum Underspecication (MU):
[R] a. pIrlEntE - lI [D] b. [R]

k E r t I z E n -l I [D] [D]

The approach to lexical exceptionality in (2b) can be referred to as the Maximum Underspecication (MU) model. Note that the approach to VH and its exceptions in the LPDF model in (1) makes crucial use of binary features, whereas the MU treatment in (2) employs privative features. A consequence of the privative approach is that the exceptionality of opaque vowels cannot in principle be captured by specifying them for the opposite values of the features which spread in VH because these values simply do not exist. What this suggests is that the approach to prespecication in (1b) seems to crucially depend on binary features, whereas the treatment in (2b) cannot make use of this type of prespecication on a priori grounds. My conclusion is that a truly thorough comparison of LPDF and MU clearly exceeds the goals of the present commentary because it would necessarily involve a discussion of the merits and drawbacks of privative vs. binary features. In the remainder of this commentary I focus instead on two more specic questions: First, how does the MU model account for the blockage of VH in (2b) from spreading [Dorsal] and [Round] from /E/ onto the opaque /I/ (section 2)?
1. Since the model in (2a) has privative features, labial harmony cannot involve the spreading of [Round], as in (1a). This point is not important in the following discussion.

Lexical exceptions as prespecication: some critical remarks

97

And second, how do K&Vs arguments against the LPDF model fare when confronted with examples of lexical exceptions from other languages (section 3)? I conclude with some brief comments about a possible rule type which might require some of the mechanics necessary in the MU treatment for exceptionality (section 4). 2. Truncation

An obvious question with respect to the MU treatment in (2b) is how it can block VH from spreading [Dorsal] and [Round] from the rst /E/ onto the opaque /I/. In the examples provided by K&V (see (17) and (18) in their article) the authors write that the VH features do not spread because spreading is truncated, but as I show below this point needs explication within the context of (2b). According to Kabak (2007: note 28) truncation is supposed to indicate that the feature in question is only linked to the segment that it is associated with in the underlying structure but is banned from being further realized on other segments. But a moments reection reveals that this proposal can only work if the word-initial vowel in (2b) is equipped with a diacritic feature which says that [Dorsal] and [Round] cannot spread. Although K&V do not use the term diacritic feature, they seem to be aware of its necessity when they write (section 4.1) that [t]he only lexical marking required for a disharmonic root is whether or not it obeys the general principles of VH.2 Truncation as described above is assumed to have precedence in the literature on intonation (Kabak 2007: note 28). However, once we consider how this term is employed in this area of phonology we will see that those authors use the word truncation in a very different sense than K&V. According to Ladd (1996: 132136) truncation is one of the strategies employed in the literature on intonational phonology (in addition to compression) to adapt intonational contours to short utterances, such as monosyllabic words. Thus, if an intonational contour consists of two or more pitch accents and if the utterance consists of only one syllable languages can either associate all of the pitch accents with this syllable (compression) or delete one of them (truncation). Ladd mentions English as an example of a compression language and Hungarian as a truncation language. In the latter language question intonation involves the tonal sequence L*..H..L%, where the rst (H) edge tone is preferentially
2. The diacritic feature necessary in (2b) is very different from an SPE-style diacritic which would be associated with the entire morpheme. The reason is that the diacritic feature for (2b) must be attached only to the rst root vowel because the features [Dorsal] and [Round] on the nal root vowel must be allowed to spread to the sufx.

98

T.A. Hall

associated with the penultimate syllable (Ladd 1996: 132). In monosyllables this tonal sequence is reduced to a simple rise. Hence, in the underlying tonal sequence for the monosyllable sr beer in (3) only the rst two tones of the three-tone question tune are realized. (3) L*HL% sr In the phonetic representation the nal tone (L%) in (3) is unrealized, i.e. it is truncated. Ladd (1996: 135) suggests that truncation could be formalized as a phonological rule which deletes (i.e. truncates) an unassociated tone. See, for example, Grice (1995: 171ff.), who does something similar in her analysis of Palermo Italian intonation contours. This being said, Ladd concedes that in its original usage (Grnnum 1991, which I have not seen) compression and truncation are intended as phonetic and not phonological descriptions. Since K&V are assuming that truncation is phonological and not phonetic, let us consider in greater detail what the phonological analysis of a language like Hungarian would entail. With respect to (3) truncation in intonational phonology refers to the deletion of an unassociated autosegment, but in in (2b) truncation refers to the failure of an autosegment to spread. A proponent of the MU approach might argue that the two operations are related because examples like the one in (3) also need to capture the fact that the nal tone fails to associate with the syllable. Pursuing this line of thought reveals that truncation in (3) really involves two steps: (a) the failure of an underlying autosegment to associate to a syllable, and (b) the subsequent deletion of the same autosegment. In any phonological analysis one needs to say what the motivation is for (a) and (b). In the Hungarian example in (3) the lack of spreading, i.e. the (a) clause, makes sense because the association of all three of these autosegments to the same syllable would violate a constraint motivated in many other languages which bans three tones on one syllable. The actual deletion of the unassociated autosegment, i.e. the (b) clause, occurs by rule, which is similar to the kinds of rules one encounters in African tone languages. The comparison between the Turkish example in (2b) and the Hungarian one in (3) is important because it reveals a crucial difference between the two: In Hungarian there is a reason for why clause (a) does not apply but in Turkish there is no reason for why [Dorsal] and [Round] on the rst /E/ in (2b) cannot spread by VH. Put differently, for Turkish truncation must be captured with a diacritic feature, but for Hungarian it need not (and should not) be. My conclusion is that the blockage of VH from applying to the opaque vowel in (2b)

Lexical exceptions as prespecication: some critical remarks

99

is due to the presense of the diacritic feature and not to some general principle/convention of truncation. One might point out that spreading and deleting tones in intonational phonology is not always as straightforward as in Hungarian; hence, a more apt comparison would be between the Turkish example in (2b) and some language in which the spreading of tones is unpredictable. An apparent example of such a language are the dialects of Catalan described by Prieto (2002). In these dialects there are two deletion strategies when two pitch accents and an edge tone sequence adapt to short utterances: either the rst or the second pitch accent deletes. However, Prieto (2002) argues that the choice of which pitch accent to delete is a consequence of a more enriched phonological representation of tone, suggesting that Catalan does not require a diacritic feature saying which tone needs to be deleted.

3.

The LPDF model reconsidered

K&V present three arguments against the LPDF model in (1b). Let us consider the third one, which I consider to be the most convincing. According to this argument the LPDF approach is not a desirable theory because it cannot account for the phonological process of Vowel Assimilation in Turkish. The reason is that the rule requires that front vowels in harmonic roots be underspecied for [Coronal], but the representation in (1b) has this feature present in disharmonic roots. The prediction the LPDF model makes is that front vowels in harmonic and disharmonic roots behave differently with respect to Vowel Assimilation, but K&V show that this is not the case. As a non-specialist for Turkish this language-specic argument against LPDF seems sound, and I see it as being the kind of argument necessary to refute the LPDF approach. However, I see both LPDF and MU as being very general theories for capturing lexical exceptions. This means that both models need to be tested with rules other than the language-specic processes of Vowel Harmony, Stress Assignment and Vowel Assimilation. Seen in this broader context, one needs to take care not to reject a particular model on the basis of a single example from one language. Imagine some language with a structure-building rule like VH which happens to have a handful of idiosyncratic exceptions. The difference between this hypothetical language and Turkish is that there is no equivalent rule of Vowel Assimilation which can be used to argue against the LPDF representation in (1b). Assuming there were such a language the argument against representing the lexical exceptions as in (1b) would vanish.

100

T.A. Hall

Consider the case of German. In that language there is a very regular process referred to in the literature as s-Dissimilation, which converts a word-initial /s/ to [S] before [High] consonants like /p t m n l /, i.e. all consonants except for ([+High]) velars. This is a neutralization rule which suspends the lexical contrast between /s/ and /S/ to [S], e.g. Specht [Spet] woodpecker and schmal [Sma:l] narrow with [S] (from /s/) represent the normal case. Signicantly, the rule has a small number of lexical exceptions, e.g. the initial sibilant in Smaragd [s m a a k t] emerald surfaces as [s] and not as [S]. In the LPDF approach one might analyze s-Dissimilation as a rule adding the feature [+High] to a voiceless sibilant unspecied for that feature (see Wiese 1991, Hall 1992, Alber 2001 for various treatments along these lines), while deviant words like Smaragd would require the relevant segment to exceptionally be prespecied for [High]. On this approach s-Dissimilation applies to the /s/ in (4a) (for schmal) and is blocked in (4b) (for Smaragd): (4) a. / s m a: l/ ---b. /smaakd/

[+High]

[High]

Thus, s-Dissimilation fails to apply to the /s/ in Smaragd because the rule is structure-building and is therefore blocked from applying to a segment that is already marked for the feature that is added. The s-Dissimilation example is important because in German there is no equivalent rule like Vowel Assimilation in Turkish which would cause one to reject the LPDF representation in (4b) for an exceptional item.3

3. It was noted at the beginning of this section that K&V present three arguments against (1b), but I have only shown that the third one does not hold for (4b). An inspection of their article reveals that K&Vs rst two arguments against (1b) do not apply to (4b) either: The rst argument is that structures like the one in (4b) (or (1b)) are inconsistent with a redundancy-free lexicon, but the representation in (4b) is not more redundant than the MU alternative, which would require instead of [High] an /s/ underspecied for that feature and a diacritic feature (presumably attached to the /s/) saying that s-Dissimilation does not apply to that segment. K&Vs second argument against (1b) is that it requires a segment to be underlyingly specied for [Coronal], which they see as being a feature which should be underspecied (if possible). However, this is a feature-specic argument, which clearly does not hold against (4b) because [High] is not [Coronal].

Lexical exceptions as prespecication: some critical remarks

101

Clearly the analysis of non-Turkish examples goes beyond the goals of K&Vs analysis, but in future work one might want to investigate a wider spectrum of rules with exceptions in order to evaluate LPDF and MU. 4. A problem for LPDF?

The LPDF approach to exceptionality works well in the case of structure-building operations like Turkish VH or German s-Dissimilation; however, a potential problem for that model involves lexical exceptions to structure-changing rules. For example, if a rule deletes a segment [A] and if there are some words in which [A] is exceptionally not deleted, one cannot capture the deviant forms in the LPDF model by prespecifying these words because [A] has no opposite. An example of a structure-changing rule with exceptions is (intervocalic) Velar Deletion in Turkish. Inkelas and Orgun (1995: 767768) and Inkelas, Orgun and Zoll (1997: 405) cite examples like [bebek] baby vs. [bebe-i] babyaccusative, where the latter word illustrates the rule, but there are lexical exceptions, e.g. [tahakkuk-u] verication-accusative, Kabak 2007: note 4). Velar Deletion poses an apparent problem for the LPDF approach because there does not appear to be anything with which one could prespecify the exceptional sounds. Inkelas, Orgun and Zoll (1997: 409, note 15) recognize the problem and suggest that the exceptional velars are prespecied for syllable structure and that Velar Deletion only affects velars which are not syllabied in the input. What this implies is that a true problem for LPDF would involve a deletion rule which is not sensitive to syllable structure, e.g. a rule which deletes a segment [A] only in word-initial or word-nal position. If there were such a rule and if it had exceptions then it is not clear how the LPDF model could prespecify the exceptional forms, but in the MU treatment one would simply posit a diacritic feature attached to the exceptional [A]s, ensuring that they are not deleted. Whether or not examples like the one just described exist is a question I leave open, but they would provide crucial evidence for capturing exceptions with a diacritic feature. References
Alber, Birgit 2001 Regional variation and edges: glottal stop epenthesis and dissimilation in standard and Southern varieties of German. Zeitschrift fr Sprachwissenschaft 20: 341.

102

T.A. Hall

Grice, Martine 1995 Grnnum, Nina 1991

The Intonation of Interrogation in Palermo Italian: Implications for Intonation Theory. Tbingen: Niemeyer. Prosodic parameters in a variety of regional Danish standard languages with a view towards Swedish and German. Phonetica 47: 188 214. Syllable Structure and Syllable Related Processes in German. Tbingen: Niemeyer.

Hall, T.A. 1992

Inkelas, Sharon, and Orhan Orgun 1995 Level ordering and economy in the lexical phonology of Turkish. Language 71: 763793. Inkelas, Sharon, Orhan Orgun, and Cheryl Zoll 1997 The implications of lexical exceptions for the nature of grammar. In Derivations and Constraints in Phonology, Iggy Roca (ed.), 393418. Oxford: Oxford University Press. Kabak, Bar s 2007 Hiatus resolution in Turkish: an underspecication account. Lingua 117: 13781411.

Ladd, D. Robert 1996 Intonational Phonology. Cambridge: Cambridge University Press. Prieto I Vives, Pilar 2002 Tune-text association patterns in Catalan: An argument for a hierarchical structure of tunes. Probus 14: 173204. Wiese, Richard 1991 Was ist extrasilbisch im Deutschen und warum? Zeitschrift fr Sprachwissenschaft 10: 112133.

Feature spreading, lexical specication and truncation Bar Kabak and Irene Vogel s

1.

Introduction

Tracy Hall in his commentary presents a clear and succint summary of two of the ways of handling Vowel Harmony exceptions discussed in Kabak & Vogel (this volume). We will use Halls terms to refer to these two approaches, namely (i) Lexical Prespecication with a Distinctive Feature (LPDF) and (ii) Lexical Specication with Maximum Underspecication (MU). Hall also introduces independent considerations from German, in particular the rule of s-Dissimilation, in order to examine the broader applicability of these approaches. He also briey mentions another phonological phenomenon of Turkish in this regard: Velar Deletion. In addition, Hall addresses the truncation mechanism introduced in Kabak & Vogel as a means for blocking Vowel Harmony. While not all of these considerations lead to conclusive ndings, they do provide the grounds for both broader and deeper considerations of the issues under consideration.

2.

Two approaches to exceptional features

LPDF and MU primarily differ with respect to the conditions under which they lexically mark a given distinctive feature. The LPDF model ensures the surface realization of an exceptional feature by lexically marking the feature in question regardless of its status. The MU model, by contrast, cannot resort to lexical marking in similar situations if the feature in question is inactive in the phonological system. In this regard, MU respects the phonological status of a given feature throughout the entire lexicon. To enrich the discussion of these two models, Hall introduces some observations regarding German s-Dissimilation, which neutralizes the distinction between initial /s/ and /S/ to [S], a [+High] segment, before [High] consonants (e.g., [S]muck jewelry; [S]lange snake). This is considered a structure build-

104

Bar Kabak and Irene Vogel s

ing rule, and Hall shows that the LPDF model provides sufcient theoretical coverage to account for noted exceptions (e.g., [s]mart a car brand; [s]lip briefs/panties). In particular, pre-specifying /s/ as [High] blocks the application of the rule that adds [+High] to an initial /s/. Thus, Hall suggests we should not exclude the LPDF model and retain the MU model since there is no emprical evidence against the LPDF in the German data. We agree with Hall that s-Dissimilation does not demonstrate a need for the MU model, however, we would not like to suggest that it favors the LPDF model. Instead, it seems merely to be an instance of a phenomeonon that can be handled by either model, and thus does not provide a test ground to chose between the two. Returning to Turkish, Hall considers an additional phenomenon involving a structure-changing rule, Velar Deletion (VD), whereby intervocalic velars are productively deleted (e.g., /inek-I/ [ine-i] cow-Acc; /gk-I/ [g-] skyAcc), although there are several exceptions (e.g., /kk-I/ [kk-] (*[k-]) root-Acc). Hall indicates that this case poses a serious challenge for the LPDF model. The problem arises because this model involves lexical marking of features, but there is no way to pre-specify the exceptional velars by using distinctive features. The MU approach, however, does not encounter this problem since it crucially adopts mechanisms other than feature marking in order to account for analogous types of exceptions in other instances. Hall claims that the MU model would need to use a diacritic feature to ensure that the velar is not deleted in such cases. While it is beyond the scope of this response to discuss the mechanisms needed to handle the additional phenomenon of VD, we would like to suggest that use of diacritic features is not in itself unacceptable, as long as it can be reasonably restricted cross-linguistically, and validated on independent grounds.1 3. Truncation in Turkish Vowel Harmony

An additional point raised by Hall involves Kabak & Vogels use of a truncation rule to block VH in the exceptional cases in which its spread is restricted. According to Hall, this truncation convention is a form of diacritic specication, which merely reects the necessity to identify the exceptional behavior at hand. We agree that the sole purpose of marking exceptions is to show exceptional
1. One such mechanism is proposed in models where lexical representations comprise two dimensions: one to determine the anchoring of phonological elements (tones, accents, features) in underlying structure; another to dictate how and where such elements are pronounced (e.g., Revithiadou 2007). Thus, VD could be blocked if the two dimensions are pre-specied to match.]

Feature spreading, lexical specication and truncation

105

behavior, however, we would like to emphasize again that this is not necessarily problematic, as long as the means for determining the diachritic features rests on general principles. In fact, the truncation analysis is consistent with our treatment of exceptions to regular stress assingment, involving lexical marking of accent, a mechanism of metrical phonology incorporated in most of the approaches we examined. With regard to VH, we simply extended the fundamental concept, expressing exceptionality by means of a trucation mechanism, as opposed to the specication of accent position. We nd this an advantageous means for capturing VH exceptionality since truncation has been introduced in other autsegmental analyses involving spreading and its interruption, in particular in intonation phenomena. As Hall points out, the application of truncation in inonational phonology may be different from our use in relation to VH. In the case of intonation, truncation often corresponds to deletion of an unassociated autosegment (tone), while in the case of VH, it corresponds to the failure of an autosegmental feature (e.g., a distinctive feature) to spread. Rather than being a problem, we believe this difference is, in fact, quite interesting. Specically, we nd a crucial difference in the fact that the features of VH, as opposed to intonation, involve a dual function. First, they characterize the way vowels are realized on the surface in terms of their specic articulatory properties. Second, they determine which features can spread to other vowels within the same domain. The two functions, however, need not co-occur. For example, in the case of disharmony, pre-specied features are only realized on the segments with which they are associated; they do not exhibit spreading behavior. The tones in intonation patterns, however, may at times not be realized at all; a feature is thus lost, not just prevented from spreading. Furthermore, there is an interesting difference regarding the representation of truncation in VH and in intonation phenomena. We assume for VH that truncation is manifested in the underlying representation of segments that do not participate in spreading. By contrast, in intonation phenomena, there is nothing in the representation of tones that causes their deletion. Instead, truncation simply operates to eliminate a tone that fails to unassociate within a particular domain. Beyond the differences, however, we observe that in both cases truncation essentially results in the same generalization: an underlying autosegmental feature fails to spread. Since this convention has been independently motivated as a component of autosegmental phonology, we nd it interesting that it lends itself easily to resolving challenges in different realms of autosegmental spreading.

106 4.

Bar Kabak and Irene Vogel s

Conclusions

In sum, we nd that Tracy Hall has clearly summarized two possible models for handling Vowel Harmony exceptions discussed in Kabak & Vogel (this volume): (i) Lexical Prespecication with a Distinctive Feature and (ii) Lexical Specication with Maximum Underspecication. Two additional phenomena were also considered in this regard: German s-Dissimilation and Turkish Velar Deletion. While the former appears to be inconclusive with regard to the two models, the latter seems to lend support to the second one. Hall also addresses the truncation mechanism proposed for treating exceptions to Vowel Harmony, and its somewhat different use in intonation phenomena. We suggest that this difference, rather than being problematic reveals interesting differences between spreading in the two domains harmony and intonation. Reference
Revithiadou, Anthi 2007 Colored turbid accents and containment: a case study from lexical stress. In Freedom of Analysis? Sylvia Blaho, Patrik Bye and Martin Krmer (eds.). 149174. Berlin/New York Mouton de Gruyter.

Higher order exceptionality in inectional morphology Greville G. Corbett

Abstract. We start from the notion of canonical inection, and we adopt an inferentialrealizational approach. We assume that we have already established the features and their values for a given system (while acknowledging that this may be a substantial analytic task). In a canonical system, feature values should multiply out so that all possible cells exist. Paradigms should be consistent, both internally (within the lexeme) and externally (across lexemes). Such a scheme would make perfect sense in functional terms: it provides maximal differentiation for minimal phonological material. However, real systems show great divergences from this idealization. A typology of divergences from the canonical scheme situates the types of morphological exceptionality, including: periphrasis, anti-periphrasis, defectiveness, overdifferentiation, suppletion, syncretism, heteroclisis and deponency. These types of exceptionality provide the basis for an investigation of higher order exceptionality, which results from interactions of these phenomena, where the exceptional phenomena target the same cells of the paradigm. While some examples are vanishingly rare, they are of great importance for establishing what is a possible word in human language, since they push the limits considerably beyond normal exceptionality.*

1.

Introduction

We propose a part of a typology of inectional morphology, and within it we concentrate on extreme instances of exceptionality.

* A version of this paper was presented at the Arbeitsgruppe Auf alles gefasst sein: Ausnahmen in der Grammatik at the 27th Annual Meeting of the Deutsche Gesellschaft fr Sprachwissenschaft, Cologne, 2325 February 2005. I wish to thank those present and the two anonymous referees for their suggestions. The support of the ESRC under grants RES-000-23-0375 and RES-051-27-0122 and of the ERC (grant ERC-2008-AdG-230268 MORPHOLOGY) is gratefully acknowledged.

108

Greville G. Corbett

1.1. Canonicity in typology If we are to tackle some of the most difcult areas of language from a typological perspective, we shall need new methods. The one suggested here is the canonical approach (Corbett 2005). The basic idea is that we dene carefully a theoretical space, and only then situate the real language phenomena within it. The canonical point, specied by converging denitions, is where we nd the best, clearest, most indisputable examples (for applications of the approach see Seifart 2005: 15674; Suthar 2006: 17898; Corbett 2006, 2007a). However, canonical examples may be rare or even non-existent, hence it is vital to maintain a distinction between what is canonical, and what is usual or frequent. What is canonical gives us the measure against which real examples can be situated, and from which different degrees of irregularity can be calibrated. It also gives us a way of analyzing and celebrating the diversity of inectional morphology by confronting it with an elegant order. 1.2. Canonical inection Linguists are interested in what is a possible human language. A part of that account is coming to understand what is a possible word. In this paper we narrow that question down to looking at possible word from the point of view of inection. We set up a framework of canonical inection, within which we can situate different morphological phenomena. The system of terms for inectional morphology is still inconsistent in places, despite interesting work by Mel uk c (1993) and others. Greater consistency in terminology gives us a surer way to identify exceptions. All the predicted individual deviations from canonicity are found, and we shall illustrate only some of these types of possible word (for illustration of some other types see Corbett 2007b). This is because we are concerned in this paper with even less canonical items. 1.3. Higher order exceptionality Our specic focus is on higher order exceptionality. By this we mean the interaction of exceptional phenomena. These examples are of interest because they show us extreme cases of possible word. Here too we must look at a subset of the possible interactions. Examples are very scarce, partly because they are genuinely rare, but also because they have been little discussed, and so linguists have not been on the lookout for them. It is hoped that this discussion will lead specialists working on various languages to be aware of them, so that the general inventory of these examples is increased.

Higher order exceptionality in inectional morphology

109

2.

Assumptions

We start from the point where the features and their values are established for the language in question; in other words, analysis of the syntactic part of morphosyntax is well advanced. This is not to minimize the problems; this task can involve complex analytical decisions (see Zaliznjak 1973 [2002]; Comrie 1986; Corbett 1991: 145188 for examples). Our general stance will be that of inferential-realizational morphology, as dened and discussed in Stump (2001: 130). The specic variant in mind is Network Morphology (for which see Corbett and Fraser 1993; Evans, Brown and Corbett 2002, and references there). It is important for the reader to be aware of this orientation, but the main points of this general typology could be restated in other frameworks. We assume further that geometry is not relevant to inectional morphology, but that nevertheless presenting paradigms in tabular form is a helpful method of representation. The nal assumption is that when discussing particular phenomena we always imply all other things being equal. For instance, when discussing whether inections are the same or different in particular cells of the paradigm we assume, unless specically mentioned, that the stem remains the same. 3. Canonical inection

We will now outline the notion of canonical inection, which will serve as the basic for approaching various interesting deviations from canonicity in 5. As noted earlier, we assume that we have the features and their values established. Given that, in a canonical system these should multiply out, so that all possible cells in a paradigm exist. For example, if a given language has four cases and three numbers in its nominal system, the paradigm of a noun should have twelve cells. (This is equivalent to Spencers notion of exhaustivity 2003: 252.) Furthermore, to be fully canonical, a paradigm should be consistent, according to the following criteria:

110 (1)

Greville G. Corbett

Canonical inection
comparison across cells of a lexeme (level one comparison) same same cf. 4.1.1 different cf. 4.1.2 different comparison across lexemes (level two comparison) same cf. 4.2.1 different same cf. 4.2.2 different

1. composition/structure 2. lexical material ( shape of stem) 3. inectional material ( shape of inection) outcome ( shape of inected word)

This schema implies two levels of comparison: level one: we start from the abstract paradigm gained by multiplying out the features and their values. We then examine any one lexeme tted within this paradigm. The centre column of (1) compares cell with cell, within a single paradigm. We take in turn the criteria in the left column: 1. we look at the composition and structure of the cells; suppose the rst consists of a stem and a prex: for this lexeme to have a canonical paradigm, every other cell must be the same in this regard. Finding a sufx, or a clitic, or any different means of exponence would reveal non-canonicity. 2. in terms of the lexical material in the cell, we require absolute identity (the stem should remain the same). 3. on the other hand, the inectional material should be different in every cell. The outcome for such a lexeme (last row) is that every cell in its paradigm will realize the morphosyntactic specication in a way distinct from that of every other cell. level two: this involves comparing lexemes with lexemes within the given language (right column). We use the same criteria as before: 1. a canonical system requires that the composition and structure of each cell remains the same, comparing across lexemes. 2. we require that the lexical information be different (we are, after all, comparing different lexemes).

Higher order exceptionality in inectional morphology

111

3. in the canonical situation, the inectional material is identical. That is, if our rst lexeme marks dative plural in -du, so does every other. The outcome is that every cell of every lexeme is distinct. We illustrate this with a hypothetical example: (2) Illustration (hypothetical)
DOG-a DOG-e DOG-i DOG-o CAT-a CAT-e CAT-i CAT-o

This system of canonical inection would make perfect sense in functional terms. There is perfect differentiation within the morphology, while using the minimal material. 4. Deviations from canonical inection

Real systems, however, show great divergences from this idealization. Its value is that we can use the notion of canonicity as a way of calibrating the phenomena we nd. We look at the deviations from canonicity rst internally, comparing the cells of a single lexeme, then externally, comparing across lexemes. It is the typology of these divergences which allows us to move towards a consistent set of terms. A general pattern is that where we actually nd same in place of canonical different this will give a non-functional outcome. If we nd different in place of canonical same this will lead to increased complexity and/or redundancy. Working through the different deviations gives us an overall classication of the phenomena of inectional morphology. That is a long undertaking, and space does not allow us to complete it here. Instead we will take some illustrative instances, selecting as examples those that we shall need for the discussion of higher order exceptionality. 4.1. Internal non-canonicity

We start with phenomena that can be dened within the lexeme, and we take two key types. 4.1.1. Lexical material In the canonical situation, lexical meaning (and only that) is conveyed by lexical material, the stem; grammatical meaning, and only that, is conveyed by the inection. Thus the stem is inert, and all the differentiation in the paradigm is

112

Greville G. Corbett

due to the inectional material. Contrary to this canonical situation, we nd all sorts of alternations of stem, from the predictable, through the less regular, right up to full suppletion as, for example, in Russian rebenok child deti children. Suppletion has rightly attracted a good deal of interest, as in Carstairs-McCarthy (1994), Mel uk (1994), Corbett (2007a); see Chumakina (2004) for an annoc tated bibliography, and Brown, Chumakina, Corbett and Hippisley (2004) for an on-line typological database. In terms of possible words, suppletion is of particular interest because it means that there are lexemes which have forms with no phonological shape in common. 4.1.2. Inectional material Since inectional material conveys grammatical meaning, in the canonical situation we nd a different inection in each cell. Contrast this with the following paradigm from Slovene: (3) Paradigm of Slovene kot corner (Priestly 1993: 400402) nominative accusative genitive dative instrumental locative singular kot kot kota kotu kotom kotu dual kota kota kotov kotoma kotoma kotih plural koti kote kotov kotom koti kotih

A morphosyntactic analysis of Slovene produces good evidence for six cases and three numbers. We therefore expect a paradigm with eighteen cells. This particular lexeme has only nine phonologically distinct forms lling these cells. It shows numerous examples of syncretism, that is, instances where we have a single form which realizes more than one morphosyntactic specication. We use syncretism as a cover term; different examples may be analysed in different ways (see Baerman, Brown and Corbett 2005 for extensive discussion). 4.2. External non-canonicity

We now move on to deviations which are to be dened in terms of comparisons across lexemes.

Higher order exceptionality in inectional morphology

113

4.2.1. Composition/structure In the canonical situation, the composition and structure of a lexemes paradigm will be constant when we compare across the class. For instance, if we nd that nouns in a given language distinguish singular and plural, in the canonical situation this will hold generally true. One of the deviations from this canonical situation is overdifferentiation (Bloomeld 1933: 223224; Nbling, this volume). Lexemes which are overdifferentiated stand out from the rest of the group in that they have an additional cell in their paradigm. For example, in Maltese most nouns distinguish singular from plural. Now consider uqija ounce: (4) Example of the Maltese dual singular uqija dual uqitejn plural uqijiet

Around 30 nouns distinguish singular from dual from plural; this is a minor number (Corbett 2000: 96). With only eight of them, according to Fenech (1996), is the use of the dual obligatory. Uqija ounce is overdifferentiated in having a dual, but its use is not obligatory; for two ounces one can use either the dual uqitejn or the form with the numeral: zew uqijiet. g 4.2.2. Inectional material In the canonical situation, inectional material is the same across lexemes. We can specify that the rst singular present tense active has a particular form just once in the grammar. Of course there are many deviations from this. One of the most interesting, and least studied, is deponency, for which see Embick (1998, 2000), Corbett (1999), Sadler and Spencer (2001), Stump (2002), Kiparsky (2005), Baerman, Corbett, Brown and Hippisley (2007), and for on-line typological material see Baerman (2005). Deponency goes against the notion of regularity of inection: in particular the expectation that certain forms have certain functions. Consider the partial paradigm of two Latin verbs (Kennedy 1955: 72, 82):

114 (5)

Greville G. Corbett

Partial paradigm of a regular Latin verb am re to love a 1 sg 2 sg 3 sg 1 pl 2 pl 3 pl active am o am s a amat am mus a am tis a amant passive amor am ris a am tur a am mur a am min a amantur

Here we see a regular differentiation of active and passive. There are many verbs like this one. In principle, given a particular inection, one can tell immediately whether it the form is active or passive. Now contrast this with deponent verb: (6) Partial paradigm of a deponent Latin verb v n r to hunt e a 1 sg 2 sg 3 sg 1 pl 2 pl 3 pl active v nor e v n ris e a v n tur e a v n mur e a v n min e a v nantur e

With this verb we have the forms which ought to be passive taking the role of active inections. We can say this only by comparison across lexemes: there are many verbs with the pattern of am re to love and relatively few like v n r a e a to hunt. Deponency is generally discussed with reference to Latin. Indeed it is sometimes even dened as being a phenomenon found in Latin: Class of verbs in Latin, intransitive or active in syntax but with inections that usually mark passives. Matthews (1997: 93). However, the basic phenomenon, which we shall call extended deponency, need not be restricted to Latin, to voice, nor even to verbs. The phenomenon consists of inections which have an established function in the morphological system being used in a minority of instances for the opposite function. This covers the Latin deponent verbs, and extends to a range of interesting phenomena which, because they have had no name, have been little studied. For a range of examples see Baerman (2005); an example of deponency in this wider sense will also be analysed in 6.4.

Higher order exceptionality in inectional morphology

115

5.

Interactions

Some of the examples examined so far are well-known and present fairly minor instances of exceptionality. However, they provide the basis for an investigation of higher order exceptionality, which results from interactions of these phenomena. By interactions, we mean not simply that a given lexical item shows more than one type of exceptionality, but that the exceptional phenomena target the same cells of the paradigm. That is, we are dealing not just with a small subclass (Moravcsik, this volume) but with the intersection of small subclasses. 5.1. Suppletion and syncretism

One interaction that has been discussed is from the South Slavonic language Slovene, found in the noun clovek man, person; see Priestly (1993: 401), Plank (1994), Corbett and Fraser (1997), Evans, Brown and Corbett (2001: 215), Baerman, Brown and Corbett (2005: 5.1.1). This is a particularly interesting case, which deserves further mention here. It shows an interaction of suppletion and syncretism. The suppletion involves a plural stem as opposed to that for singular and dual. This interacts with a more general syncretism: Slovene nouns always have the genitive dual syncretic with the genitive plural (similarly the locative dual is syncretic with the locative plural). This is one of the syncretisms in (3) above. Clearly, then, the genitive and locative dual will involve an interaction of these suppletion and syncretism. The effect can be seen in (7): (7) Slovene clvek man, person (Priestly 1993: 401) nominative accusative genitive dative instrumental locative singular clovek cloveka cloveka cloveku clovekom cloveku dual cloveka cloveka ljudi clovekoma clovekoma ljudeh plural ljudje ljudi ljudi ljudem ljudmi ljudeh

In this interesting paradigm certain cells are targeted both by suppletion and by syncretism. The interaction creates an unusual pattern of stems; the general rule of syncretism seems to win out over the suppletion.

116 5.2.

Greville G. Corbett

Suppletion and overdifferentiation

Our second example also concerns suppletion, this time interacting with overdifferentiation. Consider these East Norwegian dialect forms for the adjective small: Norwegian (East Norwegian dialect, Hans-Olav Enger, personal communication) (8) en lit-en gutt.1 art.m.sg.indf small-m.sg.indf boy(m)[sg.indf] a small boy den vesle art.m/f.sg.def small.sg.def the small boy ei art.f.sg.indf a small girl lit-a small-f.sg.indf gutt-en boy(m)-sg.def jent-e girl(f)-sg.indf jent-a girl(f)-sg.def

(9)

(10)

(11)

den vesle art.m/f.sg.def small.sg.def the small girl

(12)

et lit-e barn art.n.sg.indf small-n.sg.indf child(n)[indf] a small child det vesle art.n.sg.def small.sg.def the small child barn-et child(n)-sg.def

(13)

This adjective has three suppletive stems, lit- in the singular indenite, vesle in the singular denite,2 and in the plural there is sm. This latter also deserves illustration: (14) sm small.pl small boys gutt-er boy(m)-pl.indf

1. The Leipzig Glossing Rules are adopted (for details see http://www.eva.mpg.de/ lingua/index.html). 2. In the dialect cited these forms are obligatory. Various other Norwegian speakers I have asked accept these forms, but for them vesle is optional.

Higher order exceptionality in inectional morphology

117

(15)

sm small.pl small girls

jent-er girl(f)-pl.indf

(16)

sm barn small.pl child(n)[indf] small children de sm art.pl.def small.pl the small boys de sm art.pl.def small.pl the small girls de sm art.pl.def small.pl the small children gutt-ene boy(m)-pl.def jent-ene girl(f)-pl.def barn-a child(n)-pl.def

(17)

(18)

(19)

We can see the evidence for suppletion just looking within this one lexeme. To demonstrate that this adjective is also overdifferentiated, we need to compare it with an ordinary adjective: (20) Regular tjukk thick, fat and liten small in East Norwegian (HansOlav Enger, personal communication) singular indf def tjukk tjukt plural m f n singular indf def liten lita vesle lite plural

m f n

tjukke

sm

This dialect has three genders, as shown by the articles. Yet a normal adjective like tjukk thick, fat does not distinguish all three; rather, it makes only one distinction, masculine and feminine together versus neuter (Enger and Kristoffersen 2000: 104). The instance of overdifferentiation involving liten small is within one of the suppletive stems. Besides this, tjukk thick, fat and other normal adjectives do not distinguish denite plural from denite singular; tjukk-e functions for both. However, vesle is the denite singular, but in the plural sm is used. This distinction, not made by normal adjectives, is between the suppletive stems which bring about the overdifferentiation. Putting all this together we see that in the positive, a normal adjective has three forms, while liten has ve forms, resulting from the interaction of suppletion and overdifferentiation.

118 5.3.

Greville G. Corbett

Overdifferentiation and syncretism

Given the number of relevant morphological phenomena, the number of potential interactions, in addition to those we have seen, is potentially rather large. It is therefore an attractive idea to ask whether there are logical restrictions on which two-way interactions are possible. To date, none has been established. Quite on the contrary, one of the most likely restrictions is disproved by data already available. At rst sight it would seem impossible to have an interaction of overdifferentiation and syncretism. After all, one creates too many forms, and the other too few. They would therefore, apparently, cancel each other out. The data are more complex than that. They involve the Russian second genitive. Russian has unarguably six primary cases. But there are additional forms which are harder to analyse (see Zaliznjak 1973; Worth 1984; Comrie 1986 for discussion). Contrast these forms of the nouns kisel kissel (a Russian fruit drink, a bit like thin blancmange) and caj tea. First both have a regular genitive: (21) vkus kiselj-a taste kissel-sg.gen the taste of kissel vkus caj-a taste tea-sg.gen the taste of tea

(22)

However, in certain partitive expressions we nd a contrast: (23) stakan kiselj-a glass kissel-sg.gen a glass of kissel stakan caj-u glass tea-sg.gen2 a glass of tea

(24)

Here kisel kissel is now an example of a normal regular noun, while caj tea is one of a subclass which has a separate form, the so-called second genitive. The number of nouns with this second genitive is restricted, but they number dozens rather than a handful.3 Within those nouns which have a second genitive,
3. Ilola and Mustajoki (1989: 4141) identify 396. However, some of these are rather rare nouns. Moreover, Ilola and Mustajokis source is Zaliznjak (1977), and the form has been in decline since then. Thus kisel kissel is given as having the second genitive; however, my consultants do not accept this form, and Google gives over

Higher order exceptionality in inectional morphology

119

for some of them the second genitive is normally used in partitive expressions, for the others the second genitive is a possibility in competition with the ordinary genitive; for data on this see Panov (1968: 180), Graudina, Ickovi and c Katlinskaja (1976: 121125) and Comrie, Stone and Polinsky (1996: 124125). What concerns us particularly is the form of the second genitive, caju. Con sider the following partial paradigms: (25) Russian partial singular paradigms nominative genitive genitive 2 dative kisel kiselja as genitive kiselju caj caja caju caju

Here we see that the extra form of caj tea, the second genitive, is syncretic with the dative. Note that we cannot push the problem into syntax and claim that the form used is the dative, since any agreements are indeed genitive. This is not obvious, since in the modern language the inclusion of an agreeing modier strongly disfavours the use of the second genitive; instead the ordinary genitive is more likely: (26) stakan zelen-ogo caj-a glass green-m.sg.gen tea(m)-sg.gen a glass of green tea

Here the presence of the modier zelenogo green-m.sg.gen makes much more likely the use of the ordinary genitive caja. However, in those instances where the noun stands in the less likely second genitive in an expression similar to (26) genitive agreement is still required. Thus zelenogo caju green tea is possible if rare as a second genitive. We should therefore test what happens if we put the attributive modier in the dative: (27) zelen-omu caj-u green-m.sg.dat tea(m)-sg.dat green tea

(27) can be used only in syntactic positions where a dative is required. It is not a second genitive, and could not be used in (26). The problem is therefore a morphological one and not a syntactic issue: second genitives are not syntactic datives. We can conclude that the nouns with a second genitive are overdifferen200 examples of stakan kiselja glass of kissel and none of stakan kiselju. Thus the 396 gure is rather high.

120

Greville G. Corbett

tiated, and that the additional form is expressed by syncretism (with the dative). We do indeed have an interaction of overdifferentiation and syncretism. This in turn means that the most promising suggestion for a logical restriction on twoway interactions (that we could not nd an interaction of overdifferentiation and syncretism) does not in fact hold. 5.4. (Extended) deponency, suppletion and overdifferentiation

A second natural way in which we might hope to constrain the possibilities for interactions is simply in terms of quantity. The examples we have seen have been of two-way interaction. Can we state that as the limit? Clearly, if three-way interactions are found, then the space of possibilities expands dramatically. The laws of chance are likely to make three-way interactions rare, but an example has been found: (28) Serbian dete child and ena woman, wife nominative vocative accusative genitive dative instrumental locative dete dete dete deteta detetu detetom detetu singular deca deco decu dece deci decom deci ena eno enu ene eni enom eni singular

Consider the forms in the unlabelled column (deca and so on). These function as the plural of dete child. Viewed against the rest of the inectional system they look odd. First there is a problem with the stem (dec- instead of det-). This is not a possible alternation in modern Serbian, and so we must recognize the stems as being suppletive. Not fully suppletive of course, but partially suppletive (or as showing a completely irregular alternation, if preferred). Second, and rather worse, are the inections. They are apparently completely out of place as plural; the plural inections look rather different from these.4 A
4. Agreements are complex and interesting. In brief, there are some instances in which an unambiguously feminine singular form is used. There are others where a clear plural is used, and still others where a gender/number form is used and where it can be argued that this is best analysed as neuter plural. Personal pronouns with a noun phrase headed by deca children as antecedent can stand in the neuter plural or the masculine plural, dependent on the type of reading, which means that overall it can control three different types of agreement (feminine singular, neuter plural and

Higher order exceptionality in inectional morphology

121

comparison with the singular forms of ena woman, wife, a regular noun of a different inectional class, shows what is going on. We have a set of inections which have an established function in the morphological system being used in a minority of instances for the opposite function. That is, an instance of extended deponency. And third, a noun in the plural in Serbian normally distinguishes three case forms (nominative-vocative-accusative versus genitive versus dative-instrumental-locative) though one large group has four forms (this group has also a unique form for the accusative). Deca children has six forms and so is overdifferentiated. Thus it is possible to nd an instance of a three-way interaction. This means that the space of possible items which we characterize as showing higher order exceptionality is potentially very large. 6. Conclusion

The paper represents part of a new attempt to bring the phenomena of inection into a coherent scheme. This is done within a canonical approach to typology. Such an approach has the advantage of conceptual clarity. 5 It allows us to systematize the various minor irregularities of inectional morphology. However, our focus was rather on those lexemes that are more than merely exceptional. We concentrated on those which show interactions of non-canonical phenonema and so represent a higher order of exceptionality. Such examples are of great importance for establishing what is a possible word in human language, since they push the limits considerably beyond normal exceptionality. In terms of the theoretical possibilities, we were not able to eliminate any of the possible two-way interactions of non-canonicity, which shows that there are a good many potential types. Furthermore, we identied a three-way interaction, which demonstrates that the potential space is large. The initial picture that emerges is that individual lexemes can indeed be exceptionally exceptional: they can show higher order exceptionality in various ways. The range of possible words is remasculine plural), if personal pronouns are counted as agreement targets. See Corbett (1983: 7686), Wechsler and Zlati (2000: 816821) and Corbett (2006) for details. c In part the patterns fall under the typological regularities governing the distribution of syntactic and semantic agreement. However, there are remaining issues, notably the interaction of these choices with case, which make deca children problematic for agreement. While particular items may be highly irregular in morphological terms, this does not normally lead to any impact on syntax. Deca children is particularly challenging in that its aberrant behaviour appears not to be restricted to morphology. 5. It also has the practical advantage of proving a good basis for typological databases, see: http://www.smg.surrey.ac.uk/ for examples.

122

Greville G. Corbett

markable broad. As yet only some of the potential types have been found, but it seems likely that several others exist. From the perspective of a languages lexicon as a whole, however, lexemes showing higher order exceptionality are not surprisingly rare. Abbreviations art dat def f gen indf m n pl sg article dative denite feminine genitive indenite masculine neuter plural singular

References
Baerman, Matthew 2005 A survey of deponency in a sample of 100 languages. [Available online at: http://www.surrey.ac.uk/LIS/MB/WALS/WALS.htm] Baerman, Matthew, Dunstan Brown, and Greville G. Corbett 2005 The Syntax-Morphology Interface: A Study of Syncretism. Cambridge: Cambridge University Press. Baerman, Matthew, Greville G. Corbett, Dunstan Brown, and Andrew Hippisley (eds.) 2007 Deponency and Morphological Mismatches. Oxford: Oxford University Press (Proceedings of the British Academy 145). Bloomeld, Leonard 1933 Language. New York: Holt, Rinehart and Winston. Brown, Dunstan, Marina Chumakina, Greville G. Corbett, and Andrew Hippisley 2004 The Surrey Suppletion Database. [Available online at: http://www. smg.surrey.ac.uk/] Carstairs-McCarthy, Andrew 1994 Suppletion. In Encyclopedia of Language and Linguistics. Vol. 8, R. E. Asher (ed.), 44104411. Oxford: Pergamon.

Higher order exceptionality in inectional morphology

123

Chumakina, Marina 2004 An annotated bibliography of suppletion. [Available online at: http:// www.surrey.ac.uk/LIS/SMG/Suppletion_BIB/WebBibliography.htm] Comrie, Bernard 1986 On delimiting cases. In Case in Slavic, Richard D. Brecht, and James Levine (eds.), 86106. Columbus, OH: Slavica. Comrie, Bernard, Gerald Stone, and Maria Polinsky 1996 The Russian Language in the Twentieth Century. Oxford: Clarendon Press. Corbett, Greville G. 1983 Hierarchies, Targets and Controllers: Agreement patterns in Slavic. London: Croom Helm. Corbett, Greville G. 1991 Gender. Cambridge: Cambridge University Press. Corbett, Greville G. 1999 Defectiveness, syncretism, suppletion, deponency: four dimensions for a typology of inectional systems. Guest lecture at The Second Mediterranean Meeting on Morphology, 1012 September 1999, University Residence, Lija, Malta. Corbett, Greville G. 2000 Number. Cambridge: Cambridge University Press. Corbett, Greville G. 2003 Agreement: Canonical instances and the extent of the phenomenon. In Topics in Morphology: Selected papers from the Third Mediterranean Morphology Meeting (Barcelona, September20-22, 2001), Geert Booij, Janet DeCesaris, Angela Ralli, and Sergio Scalise (eds.), 109 128. Barcelona: Universitat Pompeu Fabra. Corbett, Greville G. 2005 The canonical approach in typology. In Linguistic Diversity and Language Theories, Zygmunt Frajzyngier, Adam Hodges, and David S. Rood (eds.), 2549. (Studies in Language Companion Series 72) Amsterdam: Benjamins. Corbett, Greville G. 2006 Agreement. Cambridge: Cambridge University Press. Corbett, Greville G. 2007a Canonical typology, suppletion and possible words. Language 83, 842. Corbett, Greville G. 2007b Deponency, syncretism, and what lies between. In Baerman et al. (eds.), 2143.

124

Greville G. Corbett

Corbett, Greville G., and Norman M. Fraser 1993 Network Morphology: A DATR account of Russian inectional morphology. Journal of Linguistics 29: 11342. [Reprinted 2003 in Morphology: Critical Concepts in Linguistics, VI: Morphology: Its Place in the Wider Context, Francis X. Katamba (ed.), 364396. London, Routledge.] Corbett, Greville G., and Norman M. Fraser 1997 Vy islitelnaja lingvistika i tipologija <Computational linguistics and c typology>. Vestnik MGU: Serija 9: Filologija 2: 122140. Embick, David 1998 Voice systems and the Syntax/Morphology Interface. In MIT Working Papers in Linguistics 32: Papers from the Penn/MIT Roundtable on Argument Structure and Aspect, Heidi Harley (ed.), 4172. Cambridge, MA.: MIT. Features, syntax and categories in the Latin perfect. Linguistic Inquiry 31: 185230.

Embick, David 2000

Enger, Hans-Olav, and Kristian E. Kristoffersen 2000 Innfring i norsk grammatikk: Morfologi og syntaks [Introduction to Norwegian Grammar: Morphology and syntax]. Oslo: Landlaget for Norskundervisning / Cappelen Akademisk Forlag. Evans, Nicholas, Dunstan Brown, and Greville G. Corbett 2001 Dalabon pronominal prexes and the typology of syncretism: a Network Morphology analysis. In Yearbook of Morphology 2000, Geert Booij and Jaap van Marle (eds.), 187231. Dordrecht: Kluwer. Evans, Nicholas, Dunstan Brown, and Greville G. Corbett 2002 The semantics of gender in Mayali: Partially parallel systems and formal implementation. Language 78: 111155. Fenech, Edward 1996 Functions of the dual sufx in Maltese. Rivista di Linguistica 8: 89 99. Graudina, L. K., V. A. Ickovi , and L. P. Katlinskaja c 1976 Grammati eskaja pravilnost russkoj re i: opyt castotno-stilisti esc c c kogo slovarja variantov [Norms in Russian: a frequency dictionary of stylistic variants].Moscow: Nauka. Ilola, Eeva, and Arto Mustajoki 1989 Report on Russian Morphology as it appears in Zaliznyaks Grammatical Dictionary. (Slavica Helsingiensia 7) Helsinki: Department of Slavonic Languages, University of Helsinki.

Higher order exceptionality in inectional morphology

125

Kennedy, Benjamin H. 1955 The Revised Latin Primer. [Edited and further revised by James Mountford.] London: Longmans, Green and Co. Kiparsky, Paul 2005 Blocking and periphrasis in inectional paradigms. In Yearbook of Morphology 2004, Geert Booij, and Jaap van Marle (eds.), 113135. Dordrecht: Kluwer.

Matthews, P. H. 1997 The Concise Oxford Dictionary of Linguistics. Oxford: Oxford University Press. Mel uk, Igor c 1993 Cours de morphologie gnrale (thorique et descriptive). I: Introduction et Premire partie: Le mot [A Course in General Morphology (theoretical and descriptive). Vol. I: Introduction and rst part: The word]. Montral: Les Presses de lUniversit de Montral. Suppletion: toward a logical analysis of the concept. Studies in Language 18: 339410.

Mel uk, Igor c 1994

Panov, M. V. (ed.) 1968 Morfologija i sintaksis sovremennogo russkogo literaturnogo jazyka (Russkij jazyk i sovetskoe obcestvo: Sociologo-lingvisti eskoe isslec dovanie: III) [The morphology and syntax of the modern Russian standard language (Russian language and Soviet society: A sociolinguistic investigation: III)]. Moscow: Nauka. Plank, Frans 1994 Homonymy vs. suppletion: A riddle (and how it happens to be solved in ). Agreement Gender Number Genitive &, 8186. (EUROTYP Working Papers VII/23) Konstanz: University of Konstanz.

Priestly, T. M. S. 1993 Slovene. In The Slavonic Languages, Bernard Comrie, and Greville G. Corbett (eds.), 388451. London: Routledge. Sadler, Louisa, and Andrew Spencer 2001 Syntax as an exponent of morphological features. In Yearbook of Morphology 2000, Geert Booij, and Jaap van Marle (eds.), 7196. Dordrecht: Kluwer. Seifart, Frank 2005 The structure and use of shape-based noun classes in Miraa (North West Amazon). Ph. D. diss., Radboud University, Nijmegen.

126

Greville G. Corbett

Spencer, Andrew 2003 Periphrastic paradigms in Bulgarian. In Syntactic Structures and Morphological Information, Uwe Junghanns, and Luka Szucsich (eds.), 249282. (Interface Explorations 7) Berlin: Mouton de Gruyter. Stump, Gregory T. 2001 Inectional Morphology: A theory of paradigm structure. Cambridge: Cambridge University Press. Stump, Gregory T. 2002 Morphological and syntactic paradigms: arguments for a theory of paradigm linkage. In Yearbook of Morphology 2001, Geert Booij, and Jaap van Marle (eds.), 147180. Dordrecht: Kluwer. Suthar, Babubhai Kohyabhai 2006 Agreement in Gujarati. Ph. D. diss., University of Pennsylvania. Wechsler, Stephen, and Larisa Zlati c 2000 A theory of agreement and its application to Serbo-Croatian. Language 76: 799832. Worth, Dean S. 1984 Russian gen2 , loc2 revisited. In Signs of Friendship: To Honour A.G.F. van Holk, Slavist, Linguist, Semiotician, J.J. van Baak (ed.), 295306. Amsterdam: Rodopi.

Zaliznjak, Andrej A. 1973[2002] O ponimanii termina pade v lingvisti eskix opisanijax [Underc standing the term case in linguistic descriptions]. In Russkoe imennoe slovoizmenenie, Andrej A. Zaliznjak (ed.), 61347. Moscow: Jazyki slavjanskoj kultury. [originally in Problemy grammati eskogo modelirovanija, Andrej A. Zaliznjak (ed.), 5387. c Moscow: Nauka.] Zaliznjak, Andrej A. 1977 Grammati eskij slovar russkogo jazyka: slovoizmenenie [A Gramc matical Dictionary of Russian: Inection]. Moscow: Russkij jazyk.

An I-language view of morphological exceptionality: comments on Corbetts paper Stephen R. Anderson

1.

Introduction

Corbett starts from what he calls the notion of canonical inection, corresponding closely to an ideal one form one meaning pattern with respect to the roots and formal markers found in inected words. He discusses some of the myriad ways in which actual inectional paradigms are found to deviate from this simple schema, and provides a number of important and thought-provoking examples. Many linguists have felt that something like this canonical inection must in some way characterize morphological structure in general. Some might see this as an ideal form underlying the apparent complexity of surface shapes, as in Lounsburys (1953) notion of an idealized agglutinating analog, while others (such as the practitioners of Natural Morphology) regard it as a fundamental constraint on linguistic structure. Examples such as Corbetts make it clear that any claim to the effect that inectional paradigms are basically regular must at a minimum be carefully hedged to allow for all sorts of deviation from regularity in practice. I address here two consequences that follow from observations such as those in Corbetts paper. I rst note, in section 2, that exceptionality in inectional morphology nds its importance not directly in terms of comparisons between surface forms, but rather in the grammar that underlies them: in I-language rather than E-language, to put the matter in Chomskys terms (cf. Anderson & Lightfoot 2002). In section 3 I then suggest that the range of exceptionality referred to in Corbetts discussion argues that morphological theory, per se, has no place for the notion of such an ideal structural type. To the extent much (indeed, most) actual inectional structure matches it, the explanation is to be found outside of the theory of word structure itself, in areas such as the patterns of diachronic change that lead to observed synchronic systems.

128 2.

Stephen R. Anderson

The locus of exceptionality

Corbett discusses ways in which observed inectional paradigms can deviate from the pattern of canonical inection, including the traditional notions of suppletion, syncretism, overdifferentiation, and deponency. He discusses these notions, as is quite standard in traditional grammar, in terms of surface word forms: thus, syncretism is described as instances in which we have a single form which realizes more than one morphosyntactic specication. This approach, however, leads to a certain amount of indeterminacy. For instance, how do we know in a given case, whether we are dealing with overdifferentiation in one subset of the lexicon of a language as opposed to syncretism in a complementary subset? Is it the case that some Maltese nouns are overdifferentiated for number, or rather that the others (the great majority) show syncretism of the dual and plural? How could we differentiate these accounts, and does it actually matter that we do so? Simply observing that nouns with a distinct dual constitute a small minority makes the difference a matter of mere numbers and seems to trivialize the issue, but it is hard to see how we can improve on that so long as our attention is conned to patterns in surface forms. If the difference between syncretism and overdifferentiation is genuinely signicant, this must be because they correspond to distinct mechanisms in the grammar of a language. In every case, the observation of a surface pattern deviating from the canonical one only raises the question of what lies behind it, rather than serving as a (self-conrming) diagnosis of the nature of the exceptionality. 2.1. Identifying true suppletion For example, suppletion cannot be identied in any signicant sense as mere non-identity of the lexical bases of two (or more) morphosyntactically distinct forms within a paradigm. In some instances, quite considerable differences in the shape of the base among paradigmatically related forms may follow directly from the phonological regularities of the language. For instance, yhden, the genitive form of the Finnish numeral one, differs substantially from the nominative yksi. These differences follow from the phonology of Finnish, however, given a basic stem such as /kte/. In the nominative, /e/ is raised to [i] in nal position, which results in spirantization of the /t/ to [s]. In the genitive, the addition of the regular ending /n/ prevents the stem-nal vowel from raising. It also closes the second syllable, resulting in consonant gradation of the stem /t/ to /d/, and concomitant dissimilation of continuance which causes the /k/ of the stem to be realized as [h]. Surely we should not speak of suppletion here,

An I-language view of morphological exceptionality

129

despite the disparity of stem shape: true suppletion corresponds to the case in which an alternation in form is lexically idiosyncratic, and thus must be represented by distinct memorized forms, rather than mere difference in the surface form of the base. We often proceed as if we could identify genuine cases of suppletion in terms of the distance between variants of the base, and the phonological naturalness of their connection. Sometimes, though, quite minor alternations in shape can have the lexical character that leads us to call them suppletive. An example of this is furnished by the Surmiran form of Rumantsch. Here, as in many other Romance languages, verbal endings differ as to whether or not they bear stress, and the vowels of the stem may change in limited ways depending on whether a form takes stress on the stem or on the ending. For instance, in the paradigm of cantar to sing we nd cantas [kant@s] sing (2sg) alternating with cantagn [k@nta] sing (1pl). Given that Surmiran only has the vowels [i,@,u] in unstressed syllables, as opposed to a full set of seven vowels (short and long) plus several diphthongs in stressed syllables, it seems that this alteration must be a purely phonological matter of vowel reduction in unstressed syllables. In some instances, such as the verb eir to go, we certainly nd suppletion: the alternation of vast [vast] go (2sg) with giagn [dZa] go (1pl) is unconnected with any general phonological rule(s). Surely the alternation in forms of cantar is phonological, though. Closer examination shows that this cannot be not correct. As a consequence of wholesale re-structuring of vocalic patterns in individual verbs, there is no longer a predictable correspondence between stressed and unstressed vowels. For any one of the three unstressed vowels, there are seven or eight possible corresponding stressed vowels (or diphthongs). This would hardly be unusual, if the correspondence were unique in the other direction, but that is not the case. In fact, there is no stressed vowel whose unstressed correspondent is unique; and some of the stressed vowels (e.g., [a] and [o]) can correspond to any one of the unstressed vowels, depending on the lexical identity of the verb in question. Some of these patterns are commoner than others, but not at all to the extent that the simple and phonologically natural alternations can be derived by rule. Although there is no doubt that these alternations originated historically in a straightforward phonological rule reducing vowels in unstressed syllables, the pattern in the language today is arguably such that every verb must have its two alternants (stem-stressed vs. desinential-stressed) indicated in its lexical entry: suppletion by denition, if we think of morphology in terms of grammars (see Anderson 2008 for further discussion of these facts).

130

Stephen R. Anderson

2.2. Sources of syncretism in grammar If we see syncretism pre-systematically as the coincidence in a single surface form of multiple morphosyntactic possibilities, as Corbetts denition (quoted above) suggests, then we must realize that this can have several different sources in the grammar of a language. In some instances, the overlap of multiple morphosyntactically distinct specications in a single surface form is surely a matter of simple homophony, not to be confused with syncretism. In Icelandic, for example, the genitive singular of strong nouns is marked either with -s or with -ar. The nominative (and in some cases, accusative) plural of strong masculine and feminine nouns is marked with -ar, -ir, -ur or -r (of which this last may disappear phonetically after stem-nal r or assimilate to a preceding sonorant). There are some principles governing the distribution of each of these sets of alternants, but they are quite independent of one another. In particular, while some nouns do show -ar both in the genitive singular and the nominative plural (e.g., kerling old woman), there is no reason to treat this as a systematic syncretism, because many others show -ar only on one for the other (e.g., hlutur thing, gsg. hlutar, npl. hlutir; hestur horse, gsg. hests, npl. hestar). The grammar, that is, establishes no systematic connection between nouns with a genitive in -ar and those with an identical nominative(/accusative) plural. The two categories simply have markers that may accidentally coincide, leading to word forms that are homophonous and not syncretic. In other cases, surface identity of morphosyntactically distinct forms is a result of the phonology. For example, a substantial subset of the irregular verbs of English (i.e., those that do not form their past and past participle by adding /d/) involve the addition of a very similar ending, /t/, at an earlier level of the lexical phonology (see Anderson 1973 for some discussion), as in the (now somewhat archaic) burn/burnt, learn/learnt. As opposed to the regular ending /d/, this /t/ has the effect of devoicing a preceding voiced fricative and shortening the vowel of the stem, as in leave/left, lose/lost, and other verbs. The vowelshortening effect, regular before syllable-nal clusters at this level, can occur by itself (creep/crept, mean/meant, deal/dealt). When the stem ends in /t/, the cluster is simplied, but not before triggering vowel shortening: bite/bit, meet/met. For stems ending in /d/, the same cluster reduction occurs (without regressive assimilation of [Voice]), again with vowel shortening: lead/led, slide/slid. This cluster of phonological effects is quite regular and characteristic of the appropriate level of the phonology. What is interesting is the consequence of adding this /t/ to stems ending in a dental stop and containing a basic short vowel. In such forms (e.g. set/set, t/t; rid/rid, wed/wed) the vowel shortening

An I-language view of morphological exceptionality

131

has no visible effect (since the vowel is already short), and the reduction of the nal stop cluster results in the complete loss of any surface reex of the ending. The consequence is a phonologically derived homophony of the present and past forms, but surely not a morphologically governed syncretism. Somewhat similar consequences can follow from the operation of morphological regularities that are sensitive to phonological form. It is of course well known that the regular plural and possessive forms of English nouns are generally identical phonologically: both boys and boys are pronounced [bojz]. This is probably a matter of simple homophony, though the fact that essentially all productive inection in English involves the two phonological forms /z/ and /d/, each adjoined (not simply concatenated) at the right edge of the word, suggests that some more general principle may be at work. What is somewhat more interesting is the fact that the possessive form of a word ending in the regular plural is also homophonous with the simple plural and possessive: boys is also phonetically [bojz], not the expected *[bojzz]. This cannot be an instance of syncretism, because exactly where the plural of a noun is formed in some way other than with /z/, the possessive plural is distinct: childrens, womens. Furthermore, the same homophony is found (for at least some speakers, and perhaps more manuals of English usage) with a class of proper names ending in /z/: Jones theory. Consider also that for some speakers, the presence of the 3rd person singular /z/ also blocks the overt expression of the possessive: The girl Harry adores long hair [is actually a wig] (cf. (Zwicky 1987)). The regularity is thus not the morphologically conditioned one singular and plural possessive are identical that the syncretism analysis predicts. There are various ways to describe these facts (see Anderson 2005: 8994 for some discussion). We might say, for instance, that the plural, possessive and 3sg present /z/s are adjoined to the word with which they appear, and the phonology then reduces a pair of identical adjoined elements to a single instance. Or perhaps the possessive rule adjoins /z/ unless its host already ends in an adjoined /z/. Or perhaps, within an Optimality-theoretic framework, the possessive rule simply says: the last word of a DP with the property [Poss] must end in adjoined /z/ (a condition satised without change if the host word already contains a /z/). On any of these analyses, we also have to say that for (style books and) speakers who prefer Jones theory to Joness theory, the proper names in question contain a semantically vacuous and purely formal adjoined /z/. The differences among these accounts, while undoubtedly signicant, do not matter to the present point. On any one of them, the homophony of the regular plural and its possessive form follows not from a rule of syncretism, but rather from the morphophonology of the rule of phrasal inection that marks the possessive form.

132

Stephen R. Anderson

It is the existence of these various circumstances that can give rise to surface identity among morphosyntactically distinct forms that heightens the signicance of examples of genuine syncretisms, such as the paradigm of clovek man, person in Slovene discussed by Corbett. Here the fact that the genitive and locative dual forms are built in the same way as the corresponding plurals, while the remaining forms of the dual are built on the singular stem, shows that the morphological rule of syncretism cited by Corbett (Slovene nouns always have the genitive [and locative] dual syncretic with the genitive [and locative] plural) accurately captures the generalization at work. Accidental homophony or phonologically derived neutralization are not serious candidates in such a case, which shows that morphological structure must countenance such rules of referral (as some have called them). I do not by any means intend to suggest that Corbett is unaware of the differences I have pointed to here. Indeed, one nds much in work of his and his colleagues referred to in the paper under discussion that is of great importance for making exactly these distinctions. I mean only to emphasize that in any discussion of non-canonical patterns in inectional morphology, it is essential to keep ones focus on just where in the grammar a given (apparently) exceptional pattern has its source, and not only on the surface forms that realize it. Exceptionality in morphology does not wear its diagnosis on its sleeve.

3.

The signicance of exceptionality for morphological theory

Returning to the conceptual framework of Corbetts paper, we can ask what status should be accorded to the notion of canonical inection in terms of which exceptionality is dened. In line with the remarks of the previous section, I suggest that if this notion has systematic status, it should be in terms of the architecture of a grammar, and not directly in terms of patterns among surface forms. In linguistic theory it has been common to posit an innate structural preference for paradigmatic relations of the sort Corbett designates as canonical, under a variety of names: Uniform Exponence, Paradigm Coherence or Uniformity, Output-Output Correspondence, Natural Morphology, etc. For such a principle to have the status of a constraint on grammars represented in Universal Grammar, it ought to have the properties of what Kiparsky (2008) calls true universals, and not simply those of typological generalizations. Corbett, however, demonstrates here for us that there is actually no limit in principle to the range of exceptions that languages may display to such a pattern; and thus, that it cannot constitute the kind of characterization of the human lan-

An I-language view of morphological exceptionality

133

guage capacity that is the business of Universal Grammar. Along similar lines, Garrett (2008) demonstrates that an independent synchronic constraint on nonuniform paradigms does not serve as a mechanism in historical change (driving instances of paradigmatic leveling); rather, the effect is in the opposite direction, with observed instances of leveling following from independent mechanisms of change (paradigm extension). These observations are in line with evidence accumulating in a variety of areas that many observed typological regularities are to be attributed not to the structure of Universal Grammar, but to logically quite separate external sources such as linguistic change or processing preferences. Such arguments have been made for phonology, for example, by Blevins (2004); in syntax, by Newmeyer (2006), and for morphology in Anderson (2004). Corbetts discussion reinforces this conclusion by demonstrating the extent to which any such regularities of word structure are subject to pervasive exceptions of fundamental sorts. Whatever the architecture of morphological theory, it is quite unlikely that a notion such as Corbetts canonical inection has systematic status within it. The study of surface patterns manifested in E-languages, of course, provides vital evidence about the nature of language, but they are not themselves the explananda of grammatical theory. It is, rather, the structure of I-language objects (grammars) that we seek to account for. Of course, we are left here (as in other areas of grammar) with the problem of how to identify genuine constraints on the human cognitive capacity for language, constraints that are true universals and thus appropriate for incorporation into the theory of synchronic grammars. There is little doubt that such constraints exist, but the primary source of evidence for proposals in this area has been the identication of widespread patterns typological generalizations, in Kiparskys formulation. Once we recognize that these are actually due, at least in a great many cases, to factors other than the structure of UG, our search for an appropriate theory of grammar becomes much harder but no less fundamental. References
Anderson, Stephen R. 1973 Remarks on the phonology of English inection. Language & Literature I (4): 3352. Anderson, Stephen R. 2004 Morphological universals and diachrony. Yearbook of Morphology 2004: 117.

134

Stephen R. Anderson

Anderson, Stephen R. 2005 Aspects of the Theory of Clitics. Oxford: Oxford University Press. Anderson, Stephen R. 2008 Phonologically conditioned allomorphy in Surmiran (Rumantsch). Word Structure 1: 109134. Anderson, Stephen R. and David W. Lightfoot 2002 The Language Organ. Linguistics as cognitive physiology. Cambridge: Cambridge University Press. Blevins, Juliette 2004 Evolutionary Phonology. The emergence of sound patterns. Cambridge: Cambridge University Press. Garrett, Andrew 2008 Paradigmatic uniformity and markedness. In Language Universals and Language Change, Jeff Good (ed.), 125143. Oxford: Oxford University Press. Kiparsky, Paul 2008 Universals constrain change; change results in typological generalizations. In Language Universals and Language Change, Jeff Good (ed.), 2353. Oxford: Oxford University Press.

Lounsbury, Floyd Glen 1953 Oneida Verb Morphology (Yale University Publications in Anthropology 48). New Haven: Yale University Press. Newmeyer, Frederick J. 2006 Possible and Probable Languages. Oxford: Oxford University Press. Zwicky, Arnold M. 1987 Suppressing the Zs. Journal of Linguistics 23: 133148.

Exceptions and what they tell us: reections on Andersons comments1 Greville G. Corbett

1.

Introduction

When you have a paper that you have researched, drafted, redrafted, polished, received helpful comments on, rewritten and now wish to send off, there are (at least) three questions you may ask. Where (to send it)? Why (have I got these results)? So what? When the paper is for an edited collection, the rst question is already answered. The second question, one may hope at least, has an answer along the lines that the paper represents a small step beyond what was there before, and in turn poses new issues. It ts in the circle of problems and partial solutions, through which we spiral upwards. It is the last question So what? which can be daunting. It is therefore a wonderful luxury to have someone else ask the question and attempt to answer it, with the interest and persistence shown in this case by Anderson. It is tempting to say Thanks, Steve, youve done the job, and let everyone read your suggestions. But that would be missing the opportunity for debate which the editors have offered. Anderson asks so what?, and offers two answers to the question: rst that exceptionality in inectional morphology nds its importance not directly in terms of comparison between surface forms, but rather in the grammar that underlies them (section 2), and second that morphological theory, per se, has no place for the notion of such an ideal structural type, where the ideal is the fully canonical system, set up as a standard of comparison (section 3). At the risk of ruining the suspense, let me say that I agree with these two points, and will develop them, following Andersons sections.

1. I thank Stephen Anderson for his comments, and Matthew Baerman for some notes on my response.

136 2.

Greville G. Corbett

The locus of exceptionality

Anderson points out that: the observation of a surface pattern deviating from the canonical one only raises the question of what lies behind it, rather than serving as a (self-conrming) diagnosis of the nature of the exceptionality. That is, we nd numerous deviations from the canonical, but so what? Anderson concentrates on two phenomena, suppletion and syncretism. For suppletion he points out that (a) there are forms which are so far apart in phonological terms as to lead us to believe they are suppletive, and yet may follow from phonological regularities; (b) conversely, apparently close forms may not be subject to any phonological rule and may be surprisingly suppletive (see Corbett 2007 for fuller discussion of suppletion). Equally, concerning syncretism, he notes that when a single surface form realizes distinct morphosyntactic specications this may be a matter of simple homophony or it may be an instance of systematic syncretism (see Baerman, Brown and Corbett 2005 for examples and discussion). His general conclusion is that: Exceptionality in morphology does not wear its diagnosis on its sleeve. This is right, and how boring morphology would be if it were otherwise. As a consequence, we need to be able to talk in a scientic way about symptoms. Usage in this area is confused and confusing; introducing canonicity is intended to make our system of concepts sharper and more consistent, and so to contribute to our understanding. However, the issue goes deeper. It is not just linguists who are confronted by symptoms. Hearers are too. When a hearer is confronted by a form which has more than one morphosyntactic description, there is theoretically a possibility of miscommunication. The strategies hearers use in this circumstance are a concern of linguistics, and it is an interesting research question if and when hearers pay attention to the source of the coincidence of form (whether it is an individual homophony or a systematic syncretism). Thus even when the surface coincidence is a result of totally regular rules, it is still of interest, in that we have to ask what such instances tell us about comprehension. The speaker is a problem too, since speakers (at least most speakers) dont do theoretical linguistics. Let us go back to the example of forms that have the symptoms of suppletion (they are distant from each other in phonological terms) and yet can be related by regular phonological rules. Though we can write the rules, we may not be able to demonstrate that speakers use them. It may be that for (some) speakers the forms are stored, just as indisputably suppletive forms are. This is an area where eventually we may hope for help from neurolinguists.

Exceptions and what they tell us: reections on Andersons comments

137

3.

The signicance of exceptionality

Anderson argues that: Whatever the architecture of morphological theory, it is quite unlikely that a notion such as Corbetts canonical inection has systematic status within it. Indeed, the examples of exceptions I provided, showing that there is no principled limit to the range of exceptionality, help conrm Andersons view that the regularities we do observe are to be attributed to external sources rather than to Universal Grammar. Andersons conclusion that canonicity does not have a systematic status within morphological theory is also reasonable, in my view. To change the analogy, in many elds of investigation we may observe that the units of measurement and the means to facilitate measurement have no systematic status within the theory. Theories do not incorporate millimetres or microscopes. Yet having generally agreed standardized units of measurement and tools which aid in measurement is of immense value in moving research forward. The canonical standard offers a point of reference, from which we can calibrate the real language examples we discover, and in particular those which are most relevant for morphological theory. Deviations from canonicity all demand an explanation, and instances of higher order exceptionality may well prove particularly signicant. We investigate them in order both to map out the extent of the exceptionality we nd, and to enable our thinking about what this tells us about human linguistic ability. References
Baerman, Matthew, Dunstan Brown, and Greville G. Corbett 2005 The Syntax-Morphology Interface: A Study of Syncretism. Cambridge: Cambridge University Press. Corbett, Greville G. 2007 Canonical typology, suppletion and possible words. Language 83: 842.

How do exceptions arise? On different paths to morphological irregularity Damaris Nbling

Abstract. In order to understand the function of a certain phenomenon, it is instructive to analyze how this phenomenon emerged. Inectional irregularities are often understood as residuals of phonological change which were not subject to analogical leveling. This article describes four different paths through which irregularity may arise and reveals several connections with other linguistic phenomena. The results are based on a study of the diachronic development of six highly frequently used verbs in ten Germanic languages. In addition, the precise position of the irregularity in the word form as well as in the paradigm is examined. Furthermore, the impact of the inectional category on the degree of irregularity is discussed. Finally it is shown that irregularity often leads to overdifferentiated paradigms. Therefore, it is argued that irregularity is more adequately described as differentiation leading to a higher degree of morphological distinctiveness.

1.

Introduction

Many, if not all, inecting languages have a certain amount of irregularity in their grammatical forms. For some time it has been recognized that the most central part of the language, i.e. the most frequently used units, constitute most of the exceptions of a grammatical system, or, in the terms of Corbett (this volume), of the canonical inection. This article focuses on verbs in Germanic languages. Almost all articles about irregularity give the impression that irregular forms are very old, if not present from the rst written records of a language. Many historical grammars end with the so-called anomalia, a group of exceptions in the sense of extremely irregular, sometimes even suppletive, verbs which do not share many common characteristics within their paradigms. In most of the Germanic languages, the so-called athematic verbs form this group, but sometimes other verbs such as contracted or otherwise deformed ones are included in this wastepaper basket. Most important is the fact that they occur very frequently; thus their grammatically marginal status does not correspond at all to their importance on the performance level.

140

Damaris Nbling

Yet the question as to how and why these verbs became irregular never arises. In the case of the aforementioned athematic verbs (be, do, go, stand), the answer can only be given by Indo-European linguistics. But in many other cases, irregularization developed only after the rst written records of the respective language and can thus be documented step-by-step. For example, every Germanic language has the verbs have or say, two originally weak (ie. regular) verbs which developed highly irregular patterns in every of the Germanic languages, especially have, which is involved in the grammaticalization of the present perfect; for the irregularization process especially of these two verbs, I refer in order to avoid repetitions to Nbling (2001a). Looking for the chronology and the most important steps of this development involves diving into the footnotes of historical grammars, if they mention these processes at all. No historical grammar is really interested in deviations from the rules. This makes it very difcult to document irregularizations. However, studying the different instances of irregularization reveals a sort of regularity of irregularity, meaning that on the one hand it is often the same type of verbs which becomes exceptional and, on the other hand, a number of special paths to irregularization which lead to the same goal can be observed. In the following, the most important paths will be described shortly (Section 2). Then, the precise place of irregularization in the word as well as in the paradigm will be dened (Section 3). Section 4 looks at the categories which are affected by irregularity. It will be shown that tense, for instance, tends to more exceptions than person. Finally, the substitution of the negatively-connotated term of irregularity by the more appropriate term of differentiation or distinctiveness (Section 5) is proposed. None of the studied cases developed syncretisms quite the contrary: paradigms which are affected by irregularities are more differentiated than completely regular paradigms. Section 6 summarizes the most important results. All the following data are based on the systematic investigation of the diachronic development of the six irregular verbs have, become, give, take, come, and say, in ten Germanic languages, which was conducted by Nbling (2000). These verbs underwent a visible process of irregularization. Roughly 1000 years ago they belonged to regular strong or even weak verbs. Four traditionally irregular verbs were added there in an appendix (be, do, go, stand) because they sometimes inuenced these six verbs in departing from their original inection. In the following, only some instructive examples can be presented. Since signicant cross-linguistic investigations of the emergence of exceptional morphology have yet to be conducted, the main focus of this article is empirical rather than theoretical. Nevertheless, it should be stressed that the development of irregularity violates what Naturalness Theory (Mayerthaler 1981)

How do exceptions arise? On different paths to morphological irregularity

141

terms as universally valid naturalness principles. This theory postulates clear one-to-one relations in morphology, comparable to what Corbett describes as canonical inection: uniform stems (lexemes) in combination with uniform and transparent afxes. Corbett, however, calls it an idealization (this volume) from which real inectional systems usually diverge. For Naturalness Theory, this scheme constitutes the goal of language change. The response of this theory to irregularity is the elimination of deviant forms through analogical leveling. Even the system-dependent morphological naturalness which introduced so-called system-dening structural properties cannot solve the problem of real inectional exceptions without any integration into an inectional class (Wurzel 1984/2001, 1994a). Later, as a reaction to an intensive debate with representatives of the Economy Theory (Werner 1987a, 1987b, 1989; Ronneberger-Sibold 1980) a further Natural Principle, Distinctiveness in the Me-First-domain, was introduced (Wurzel 1990, 1994b) allowing exceptions in the basic vocabulary. However, this domain never was thouroughly dened, and there was no interest in asking where these irregularities come from. Therefore more appropriate theories will be considered such as the Dual Processing Model, the Associative Network Model and the Economy Theory. Incidentally, most of these linguists1 are not interested in the emergence of irregular forms. Even Bybee is mainly concerned with the mental representation of irregular morphology and almost always speaks of maintainence of irregularity and suppletion (Bybee 1988: 132). Therefore, this article concentrates on the diachronic emergence of irregularity. 2. Different ways of becoming an exception

There are at least four different paths through which exceptions arise. The most traditional one consists in the preservation of the effects of sound changes which tend to automatically lead to morphological diversity and heterogenity (2.1). A second path is what we call accelerated sound change, i.e. there are deviant phonological changes which occur only in conjunction with high token frequency (2.2). Here, the effect of irregularity is reached faster than in the rst case. Still more striking are morphological changes which lead to disorder. Usually, it is assumed that morphological change, mostly in the form of analogy, leads to paradigmatic order and homogenity, but here, the reverse case can also be observed (2.3). The most drastic method is the mixing of different lexemes
1. The rise of irregularity through sound change is documented in Werner (1977). Ronneberger-Sibold (1987, 1990) deals with the diachronic emergence of suppletion.

142

Damaris Nbling

to form one paradigm (strong suppletion). Here, the effect of maximal irregularity is achieved the fastest. Only extremely token-frequent items are affected by this extraordinary strategy (2.4). It is obvious that the strategies from 2.1 to 2.4 require progressively less time to create short and differentiated forms, which is the more positively-connotated term for irregularity. 2.1. Accumulated sound shift (without analogical leveling) This path to irregularization is the most frequently described path and can be subsumed under the passive type, meaning that the effects of sound change is accepted or accumulated by morphology. Every word undergoes phonological change, which, in most cases, is reductive. Therefore, it should be expected that all the forms of a paradigm change in the same way, leading to shorter, but not more differentiated forms. Since many sound changes consist of assimilations in the sense that they are conditioned by the phonological surroundings, the outputs consist not only of shorter but also more heterogeneous forms. Of the many possible examples, we will discuss only a few. One of the most prominent of these is NHG sein be with the two nite forms ist (3rd sg.pres.) is and sind (3rd pl.pres.) are, which synchronically show the maximum possible irregularity, i.e. total suppletion, although they diachronically developed in a completely regular fashion (the IE forms still are regular):
Table 1. Suppletion through regular sound change. Nr. Sg. Pl. Prs. 3 3 IE *s-ti *s-nti > > > GMC *ist(i) *sinti > > > OHG ist sint > > > NHG is(t) sind

In IE the accent position could change depending on the ablaut stage of the respective verb (here: full grade *s- in the singular and zero grade *s- in the plural). The following developments consist of completely regular sound changes. The decisive point is that analogy never took place to level the highly-diverged forms. Suppressed analogy would therefore be the more appropriate term for this path. This also holds for Engl. was vs. were going back to Verners Law which was effective more than two thousand years ago and which was never levelled. Another example of accumulated sound change is hebben have in Dutch (Table 2). The irregular nite form heeft (3rd sg.pres.) has containing the fricative -fis the regular result of the preservation of Middle Dutch h vet < *habid. Middle e Dutch h vet [he:v@t] was later syncopated to heeft whereby the [v] before [t] bee

How do exceptions arise? On different paths to morphological irregularity Table 2. The paradigm of hebben in Dutch. tense pres. number Sg. person 1 2 3 13 13 13 hebb-en heb hebt heeft hebben had hadden gehad

143

[hEb@(n)] [hEp] [hEpt] [he:ft] [hEb@(n)] [hat] [had@(n)] [X@"hat]

Pl. past Sg. Pl.

past part.

came voiceless. The other forms were affected by the West Germanic consonant gemination and therefore show [b] today. Today, heeft is preserved whereas all the other paradigms (except for zijn be) show syncretisms in the 2nd and 3rd sg. (therefore *hebt should be the expected 3rd sg.): the analogous change did not take place for the other forms. This is what we call the passive way to irregularity: The effects of sound changes are preserved while analogy is blocked. The same holds for the past which shows vowel change from e to a. Hebben is the one and only remainder of the so-called rckumlaut verbs. This special group of weak verbs underwent i-umlaut in the present and no umlaut in the past. German still has brennen brannte burn burned and ve further examples. In Dutch, all but one of these rckumlaut verbs were leveled through analogy. The only exception is hebben, which synchronically resembles the strong verbs: not only because of its tense-related vowel change, but also because of its monosyllabic past pl. form had instead of the obligatory bisyllabic weak forms (such as werkte worked). Due to the difference between had and the corresponding present form heeft, apocope could take place as a further regular sound change: Middle Dutch hadde > Dutch had [hat]. The only irregular reduction is the elimination of the root-nal consonant in the past and in the past participle, which was already absent in Middle Dutch; this is an example of accelerated sound change, the subject of Section 2.2. The selective effect of analogy depending on frequency rather than on markedness principles is described by Bybee:
It is important to note that analogical leveling does not take place in all forms with the relevant alternation at once. Rather, it tends to affect the less frequent lexical items rst: thus weep/weeped is susceptible to change, but keep/kept is not likely to change to keep/keeped immediately. This is because the model for the past tense of keep is highly available, while the model for the past of weep does not

144

Damaris Nbling

occur as often. [] While some have argued that it is the system-internal markedness structure that determines the allomorph that survives in leveling, there is evidence that the relative frequency of the forms in which the allomorphs occur is the determining factor. (Bybee 1994: 2560)

With regard to Dutch hebben, it can be shown how a rather regular and well integrated verb became an exception because it resisted analogical change. 2.2. Accelerated sound change (without analogical leveling) In the above-mentioned sample of ten verbs in ten Germanic languages, accelerated sound change plays a very important role on the road to irregularity. One rather frequently occurring example is the loss of the stem-nal consonant -b(b)- in the past of Dutch {hebb}{en} which is {ha_}{d} in the singular and {ha_}{dden} in the plural; here, it lacks the -b-, indicated by _. The fact that a part of the lexeme is affected by reduction leads to non-canonicity in the sense of Corbett (in this volume): 2. in terms of lexical material in the cell, we require absolute identity (the stem should remain the same). Most of these sound changes occur only in selected forms of the paradigm. This leads to paradigmatic disorder. In many Germanic languages, Have provides evidence for this, cf. Engl. has instead of the regular form *haves (cf. behaves), and had, NHG hast, hat (2nd /3rd sg.pers.pres.), hatte (past). In present-day spoken German, a further widespread irregular development is at work, namely the contraction of haben > ham. This contraction does not occurr with verbs of the same pattern such as graben which never becomes *gram dig (except in some dialects). The same holds for Swedish ha (inf.), har (pres.) and hade ["had:e]. In the supine haft, however, the old -f- is preserved. In Danish and Faroese, this loss is hidden by orthography but nonetheless exists: Dan. have [h] have and havde [h(:)@] had (vs. haft [hafd] in the supine), Far. hevi [hEij@] (past sg.) and hvdu [hd:U] (past pl.) where <v> is no longer pronounced. In Faroese even a vocalic number distinction emerged, which usually only occurs with strong verbs. Other languages such as Luxembourgish and Norwegian jettisoned the root-nal consonant in every position so that this loss does not help in differentiating the paradigm, only in shortening the forms. In these languages, however, further irregularities brought about irregularity. For similar developments with say, cf. Nbling (2001a). A very instructive example is come in such different languages or dialects as Swiss German, Luxembourgish, North Frisian, and Icelandic, where the stemnal consonant [m] was assimilated to [n] before the dental of the inectional ending -s and -t, respectively: Lux. *kmm-s > knn-s (2nd sg.) and *kmm-t

How do exceptions arise? On different paths to morphological irregularity

145

> knn-t (3rd sg.). This, however, only happened in the present singular, not in the 2nd plural (komm-t) nor in the past (koum-s: 2nd sg. past). Obviously, these assimilations are driven by high token frequency. There are no frequency lists for Swiss German, Luxembourgish, and North Frisian but in spoken German, kommen constitutes the third most frequently used verb (Ruoff 2 1991), and West Frisian komme is the fth most frequent one (Van der Veen 1984). Table 3 demonstrates this cross-linguistic parallel:

Table 3. Selective assimilations (bold print) in the present of come in Swiss German, Luxembourgish, and North Frisian (Wiedinghard). number sg. person 1 2 3 1 2 3 Swiss German ch o chum-e chun-sch chun-t chm-e chm-et chm-e Luxembourgish kommen komm-en knn-s knn-t komm-en komm-t komm-en North Frisian kme km kn-st kn-t km-e km-e km-e

pl.

In Icelandic, it is the 2nd sg. imperative, which contains this exceptionality although hidden by writing: komdu [kh OndY]. In Swiss German, the very frequent verb n take shows the same exception (nin-sch, nin-t), but no other verb with stem nal [m]: In all these cases, the partial assimilation worked regressively. Some Low German dialects, however, show the reverse direction, in this case again restricted to kuemmen come and nimen take where it is the inectional ending -t (3rd sg.) which is bilabialized to -p after -m: kmp < kmt comes, and nimmp < nimmt takes (Lindow et al. 1998: 122). As these examples prove, it is not only the lexical but also the categorial token frequency which promotes assimilations and brevity; the present is always more strongly affected than the past tense. The same holds for the singular vs. the plural and the indicative vs. the subjunctive. This refutes the often-mentioned causes in historical grammars, such as unstressed position and the like (Paul 1989: 23, 109, 287). It is already hard to believe that verbs like come und take should be stressed less than other verbs, but it is still harder to believe that only some parts of a paradigm should be affected by this.

146

Damaris Nbling

In the case of come, a further reduction can be observed: This verb goes back to the Germanic stem *kwem- (cf. OHG queman come).2 In most Germanic languages the stem was rst assimilated to *kwom- (progressive rounding of e > o after w) and then reduced to kom-. Other verbs with the same onset, such as OHG quedan speak (which later died out) and quelan well, pour, did not undergo this process (cf. NHG quellen and not *kollen). In most of the languages, this onset simplication spread to the whole paradigm, except in Dutch komen were the past still preserves the old cluster: kwam kwamen (past sg. pl.). It does not appear to be a coincidence that the semantically marked and, above all, less token-frequent past preserved the longer (more complex) form. A third example is presented by what are termed short or contracted verbs, which lost their stem-nal consonant and later contracted to a monosyllabic word, such as MHG hn < OHG hab n have, and ln < OHG l zzan let. e a Neither [b] nor (geminated) [s:] were affected by loss. Here, usually two explanations are offered by historical grammars: an extraordinary loss of consonant or an analogical process directed by the traditionally short athematic verbs, such as g n go and st n stand in OHG (Paul 1989: 283288; Michels 1979: a a 284; Mettke 2000: 145). Some further similarities to the athematic verbs, such as (temporary) strong past forms with ablaut in hie had (after gie went), point in the second direction and should therefore be treated in 2.3 due to the fact that they are morphologically conditioned. The same holds for the Swedish short verbs, such as ha < hava have, ge < giva give, bli < bliva become, ta < taga take, dra < draga draw, etc. Here, the new short forms clearly occur more frequently than the still coexisting long forms (Alln 1972; stman 1992; Teleman et al. 1999, vol. 2). Other forms of the paradigms, especially semantically marked forms (past, supine), preserved the old forms with root nal consonant: haft, gav givit, blev blivit, tog tagit, drog dragit. As already illustrated, the stem nal consonant plays an important role in the irregularization process. 2.3. Morphological change As already mentioned, there are contracted verbs such as MHG hn have and ln let which borrowed irregularities from other verbal paradigms. This sort of irregularization can be called the active type because the following pro-

2. There still is one remnant of OHG queman, the NHG adjectival derivation bequem comfortable which preserved the old complex onset.

How do exceptions arise? On different paths to morphological irregularity

147

cesses are located on the morphological level. However, this term should not be misunderstood in the sense that verbs are acting subjects. Frisian provides a striking example for a morphologically conditioned irregularization: In the past tense, all four athematic verbs cluster together by forming a schema of a monosyllabic past tense ending in -ie (without any stem-nal consonant) in the singular and with an epenthetic -n- (whose origin remains rather unclear) in the plural. The origin of the diphthong ie has yet to be located denitively, but it is certain that it arose due to various analogical processes. This is conrmed by the weak verb ha have which also shows a short, strong past tense: hie had (Table 4).
Table 4. Analogous processes in the past tense of Frisian verbs. nr. 1. 2. 3. 4. 5. innitive gean go stean stand dwaan do wze be ha have pret. sg. gie stie die wie hie pret. pl. giene(n) stiene(n) diene(n) wiene(n) hiene(n)

In the Frisian case, irregularity is concentrated within a small group of similar verbs the only common trait of which is high token frequency. Even originally weak verbs entered this small class and achieved brevity and distinctiveness through the most direct path. Many such irregularizations via analogy to more irregular patterns can be found, especially in Frisian, e.g. the past of jaan give, which is joech [ju(:)X] and which is the result of a partial analogy to sloech [slu(:)X], the past of slaan hit. The same is true for droech [dru(:)X], the past of drage carry. There are no more verbs which form their past like this. The same holds for the innitive jaan in relation to the innitive slaan hit and/or dwaan do: The Old Frisian form for give was jeva (inf.) which developed to jewa, later contracted to j, and eventually adopted the n from the short verbs sln hitor dwn do. The Old Frisian stem forms of the strong verb give were jeva (inf./pres.) jef (pret.sg.) jeven (pret.pl.) (e)jeven (past part.), i.e. the ablaut distinction had been completely eliminated as the vocalic contrast was leveled by regular sound change. Today this verb, containing the stem forms jaan joech jn, is once again highly differentiated thanks to this astonishing, interparadigmatically-motivated, irregularization strategy. This is termed differentiation analogy (c.f. the dashed bold arrows in Figure 1). Most decisive is the fact that not only structure was transferred (as, e.g., in bring brang after sing sang) but real substance (jef joech from droech, sloech; underlined in Figure 1).

148

Damaris Nbling

Figure 1. Irregularization strategies of Frisian jaan give.

Another example is the present of the Frisian dwaan do, doch- in the singular and dogge in the plural. This is the result of analogy to the respective present forms of Frisian sjen see and tsjen draw, sjoch-/sjogge and tsjoch-/tsjogge. It is characteristic that only the beginning of the word forms, the most salient part, remains unchanged (cf. Section 3). These examples of interparadigmatic analogy could be regarded as evidence for the usage-based Associative Network Model in the sense proposed by Bybee (1985, 1988, 1991, 1995, 1996). Due to their high token frequency, the lexical strength of Old Frisian sjen see and tsjen draw presumably was so high that they constituted a type of schema for dwaan do (there is no frequency list for Old Frisian; for Modern West Frisian c.f. the frekwinsjelist in Van der Veen 1984). In addition, these verbs shared a similar phonological pattern (monosyllabics ending in -n), which generally strengthens network connections. Still more remarkable is the case of West Fris. jaan give which was associated to slaan hit and drage carry. In this exceptional case it must be taken into consideration that j(n) had a lack of intraparadigmatic differentiation. The Network Model does not make a strict distinction between regular and irregular morphology in contrast to other models such as the Dual-Processing Model, which postulates that regularity and irregularity are processed in different modules of grammar. Transitions from regular to irregular and even suppletive verbs can be more easily integrated in the rst model although the latter one clearly predicts that high-frequent items behave differently from low-frequent ones. Yet none of these models was concerned with the emergence of irregularities.

How do exceptions arise? On different paths to morphological irregularity

149

Another way to irregularity by morphological processes is represented by the mixed paradigm of German haben have (the short forms are underlined): present: habe vs. hast, hat vs. haben, habt, haben past: hatte etc., subjunctive: htte etc. past participle: gehabt Only language history tells us the story of this mixed paradigm, which emerged by the combination of two originally complete paradigms, one with the old long forms (MHG haben) and the other one with the contracted forms (MHG hn). Thus, OHG hab n was split into two MHG paradigms, the rst one serving as e a lexical verb, the second one as an auxiliary. In ENHG these paradigms were reunied to one with the above-sketched distribution. Another morphologically based process in this paradigm is umlaut in the subjunctive form htte, which developed in analogy to the strong verbs, e.g. gbe would give. Weak verbs never developed umlauted subjunctives. In a similar way, Swedish ge give mixed short and long variants: There were two variants, geva and giva, and their respective short forms, ge and gi, from which the rst (short) one is used to form the innitive and present and the other (long) one the past and the supine, i.e. ge and giva were combined. The result is a highly differentiated paradigm: Swedish ge [je:]/ger [je:r] gav [gA:v] givit [ji:vIt] (instead of *gevit). Thus, the paradigm received three different vowels (e a i) instead of the original two (e a e), and there is in addition an alternation between short and long forms. Here, brevity correlates with the more frequently-used categories (present). 2.4. Lexical fusion As shown for the cases of morphological change, some verbs become interparadigmatically active based on the example of other verbs. This is the case with lexical mixing, i.e. the combination of two different verbs into one paradigm (inectional split). Here, English offers a famous example with go went gone by having combined go and ME wend, the latter providing the past went and replacing the former OE form eode. The already irregular form eode was replaced by a shorter form. Janzing (1999) describes two cases of mixing of East Frisian paradigms, loope go and sjoo see: loope (inf.)/loopt (3rd sg.pres.) versus ron (3rd sg. pret.) ronen (past part.)

150

Damaris Nbling

The ron-forms derived from a verb meaning run (cf. NHG rennen). In the case of sjoo, the border separates the past participle from the rest: sjoo (inf.)/sjucht (3rd pres.sg.) saach (3rd sg.pret.) versus bloouked (past part.). For the second verb there is a corresponding verb in German, blicken look. The most frequent verb, be, is even based on three different IE verbal roots: IE *es- be, *bh- grow, and *wes- stay. *Bh- is only preserved in the West Germanic languages, e.g. in the onset of Fris./NHG bin, bist, Dutch ben, bent, Lux. bas, and in Engl. be, been. Thus, forms such as bin, bist even consist of the syntagmatic combination of two verbs, *bh- and *es. This is another (and only rarely described) path to irregularization: Different verb do not only fuse on the paradigmatic but even on the syntagmatic level. The most extreme case of interparadigmatic fusion is the mixing of loaned and inherited verbs, such as Swed. bli < bliva, a loan from Middle Low German blven which, in spoken Swedish (and Norwegian), is combined with its native predecessor varda to bli/blir vart blivit (for the emergence and for some reasons of suppletion, cf. Werner 1977; Ronneberger-Sibold 1987, 1990; Nbling 1999; Mel uk 2000). c

3.

Position of irregularity in the word and in the paradigm

The position of irregularity can be understood both syntagmatically and paradigmatically, i.e. in the word form as well as in the paradigm. The comparison of six irregular verbs in ten Germanic languages led to the clear result that the place of the deviation in most cases is the stem nal consonant which is either modied (as in cases of Verners Law: Engl. was vs. were) or deleted (underlined): Engl. I have vs. she ha_s, ha_d; NHG habe vs. ha_st/ha_t, ha_tte, gehabt. Due to ablaut, the stem vowel often is modied, too, but this only counts as irregularity if there is a unique, isolated alternance, such as NHG kommen kam gekommen and werden wurde geworden; there is no further [O]-[a:]-[O]- and [e:]-[u:]-[o:]-alternation in German. Another example is worden werd geworden in Dutch. Some languages use quantitative instead of qualitative vocalic alternations, e.g. Engl. say [sei] vs. says, said [sez]: Here, we can oberserve an alternation between [ei] and [e]. In NHG h[a:]be vs. h[a]st, h[a]t, there can be found an opposition between long versus short vowels. Sometimes both positions are used. Thus the onset remains the only stable part of the whole paradigm. This leads to what is called weak or partial suppletion. In the extreme case of strong suppletion, even the onset

How do exceptions arise? On different paths to morphological irregularity

151

is modied. It is characteristic that the effects of phonological change in the beginning of the word often are extended to the whole paradigm as could be seen in the case of Engl. come, NHG kommen < GMC *kwem-, whereas the results of changes in the stem-nal position often are maintained if the word form occurs frequently. Even in the above mentioned cases of differentiation analogy (Figure 1), it is always the rst segment which remains stable. This fact corresponds to word recognition tests which indicate that the word-initial position is most salient (Cutler et al. 1985; Hawkins and Cutler 1988; Fenk-Oczlon 1989). There also were clear results with regard to the paradigm: The irregular forms nearly always reect high token-frequency, i.e. in the singular signicantly more exceptions could be expected than in the plural (there are many uniform plurals, but uniform singulars are not as frequent). The same holds for the present vs. the past and for the indicative vs. the subjunctive distinction, and even for the 3rd and 1st person vs. the 2nd person. This is supported by the fact that the less frequent forms often show the longer forms, as already seen in Dutch with kom-t kwam (come: 3rd sg.pres. vs. 3rd sg.past) or NHG steht [Ste:t] stand [Stant] (stand: 3rd sg.pres. vs. 3rd sg.past) and geh-t [ge:t] ging [gIN] (go: 3rd sg.pres. vs. 3rd sg.past). In NHG haben, the short forms of the present only are found in the singular, not in the plural. The same holds for Dutch, where komen [ko:m@n] (as the one and only verb with quantitative distinctions in the present) shows short [O] in the singular, but long [o:] in the plural: ik kom, jij/zij/hij komt [O] vs. wij/jullie/zij komen [o:]. Thus, a type of diagrammatic iconism, which corresponds primarily not to markedness but rather to token frequency is observed, whereby both often cannot be separated neatly. However, there is evidence to support a higher ranking of the frequency factor (Fenk-Oczlon 1991). Table 5 presents the token frequency of grammatical categories in spoken German. As can be seen the 3rd sg.pres.ind. constitutes the most frequently-used verb form. Indeed, it is precisely this form which carries most of the irregularities, which are usually combined with brevity of expression, cf. Engl. say vs. says [sez], do [du:] vs. does [d2z], am, are vs. is. German even preserved the socalled wechselexion, a systematic vowel change still found in the present singular of roughly 55 strong verbs, which separates the rst from the third (and the second) person: werfe wirfst/wirft throw (Nbling 2001b). This wechselexion pattern, although unproductive for quite some time, is reected in irregularizations: habe vs. ha_st, ha_t, Swiss German h vs. hesch/het, gib a vs. gi_sch /gi_t give, sag vs. saisch /sait say, Lux. son vs. sees /seet say, huelen vs. hls /hlt take, fetch (an originally weak verb which became completely strong). Dutch did not develop wechselexion, but the present singular

152

Damaris Nbling

Table 5. Token frequencies of the grammatical categories of the verb in spoken German (based on Tomczyk-Popiska 1987). person/number cat. 3rd % sg. 47,8 1st sg. 23,4 3rd pl.3 18,5 5,3 1st pl. nd sg. 2 4,8 0,2 2nd pl. person cat. % 66,3 1st 28,7 2nd 5,0 3rd number cat. % sg. pl. 76 24 tense cat. % pres. 76,9 past 10,3 pres.perf 9,35 pluperf. 1,25 future 1,13 mood cat. % ind. 90 subj.II 7,3 imp. 2,7 subj.I 0,06

of the second most frequently used verb hebben have contains the special form heeft (c.f. Section 2.1 and Table 1). In many cases these processes lead to over-differentiated paradigms, i.e. syncretisms can be broken (cf. Section 5). This even holds for English and Swiss German which usually dont have wechselexion distinctions. With regard to lexical token frequency, there is also a strong, well-known relationship with irregularity. Lexical frequency can change because certain verbs are substituted by new expressions (this happened, e.g., with OHG quedan speak) or they simply become more seldom because the corresponding activity isnt executed as often as in earlier times (many manual activities such as bake, milk, clip, etc.). Such verbs with decreasing frequency often develop from strong or even irregular verbs to uniform weak verbs. On the other hand, grammaticalizing verbs increase quickly in frequency. These verbs take the opposite path by becoming strong or even irregular. This is clearly conrmed by have, say and make in many languages. 4. The categorial split caused by irregularization

Not every exception is accepted by the speakers. A strong correlation between grammatical category and irregular expression can be observed. In most of the cases, tense (and/or aspect) are the categories which often are expressed by exclusive means. All the above-mentioned instances of suppletion involve this category (Engl. go went, Swed. bli vart). In the case of the most frequently used verb, Engl. am are is, NHG bin ist sind, it is, however, person and number in addition to tense (Engl. is was, NHG ist war). In those cases,
3. The polite form Sie is included.

How do exceptions arise? On different paths to morphological irregularity

153

Figure 2. Relevance degree of some inectional categories in Germanic languages (based on Bybee 1985).
token frequency] very high high

medium

ow TENSE + relevant fused MOOD NUMBER PERSON relevant concatenative

Figure 3. Inectional categories and their (suppletive, irregular, regular) expression depending on token frequency and relevance.

it could be found out that a suppletive expression of less relevant categories (in Bybees sense) always implies suppletion in more relevant categories (cf. Figure 2 and 3). The same also holds for lower degrees of irregularity: Irregular expression of a less relevant category usually implies irregular expression of more relevant categories; usually the degree of irregularity (and thus fusion) increases from right to left on the scale in Figure 2 and 3. In the case of mood, languages with synthetic mood expression (such as German) should be looked at. Here, the most frequent verb sein be provides evidence for the implicative scale above (here for the 3rd person sing.): ist sei wre (ind. subj.I subj.II). Suppletion is nearly omnipresent in this paradigm. The Faroese past of hava, hevi [hEijI] (past sg.) and hvdu [hd:U]

154

Damaris Nbling

(past pl.) serves as an example for the strengthening of the number distinction. Other languages also have a tendency to separate the singular from the plural. Looking at both the relevance and the frequency aspect, it can be concluded that the degree of irregularity depends rst and foremost on the relevance degree of the respective category (Figure 3). Within this category, there are at least two (if not more) subsets, e.g. singular vs. plural within number and different time stages within tense. The borders between these different options are sharpened if the verb increases in its usage frequency. Here, the more frequently used subset of the category is expressed by more irregular and shorter means (3rd person vs. the others, singular vs. plural, present vs. other tenses). Figure 3 shows the dependency of irregularity and suppletion on token frequency and relevance.

5.

Irregularization as (over)differentiation

When the forms of a paradigm become shorter, syncretism should be expected. However, in the whole sample of the six verbs investigated here, there was not a single instance of the emergence of homonymy. On the contrary, these frequently-used verbs are although often strongly reduced sometimes more differentiated than their regular counterparts or predecessors. This can be easily illustrated by some of the examples mentioned already, such as Engl. be which is more distinctive than all other verbs not only in the present singular (sg.: am, are, is; pl.: are) but also in the past (sg.: was, were, was; pl.: were). In Dutch, the 2nd and 3rd sg. pres. always share the same ending -t while the 1st sg. has no ending (ik werk- jij werk-t/hij werk-t). Only in the case of hebben have and zijn be this syncretism is broken: ik heb jij hebt hij heeft (stem alternation) and ik ben jij bent hij is (different stems, no ending). In the latter case this is due to lexical suppletion, in the former to the preservation of an earlier assimilation. In Frisian the innitive is always identical with the uniform present plural (meitsje meitsje make), except for wze be vs. binne (pl.), jaan give vs. jouwe (pl.), dwaan do vs. dogge (pl.), and some more examples. In German, the short form hat (3rd sg.) breaks the regular syncretism with the 2nd pl. habt. The same is true for Lux. huelen hls, hlt take (1.-3.sg.pres.), an originally weak verb (cf. NHG holen, Dutch halen) which adopted wechselexion; here, hl-t (instead of regular *huelt) breaks the homophony with the 2nd pl. huel-t. The past houl- shows the typical (analogical) uniform ablaut vowel -ou(which can be traced back to an overgeneralization of the second ablaut class cf. Werner 1990); the past participle geholl contains another vowel. Here, an additional form of over-differentiation can be described: In Luxembourgish, a strong tendency to jettison the past and to replace it by the present perfect

How do exceptions arise? On different paths to morphological irregularity

155

(preterite loss) can be observed. Today only few of these verbs (between 10 and 20) show the old synthetic forms, above all auxiliaries, including the modal verbs and some highly-frequent strong verbs such as gon go and huelen. Thus, huelen has a quantitatively more differentiated paradigm than the average verb. This is the type of overdifferentiation described by Corbett: Lexemes which are overdifferentiated stand out from the rest of the group in that they have an additional form (this volume). English offers a further means of over-differentiation by the morphologization of a category which is typically expressed syntactically: negation on (or even in) auxiliaries. In cases such as dont < do not, shant < shall not, wont < will not, aint < am not, the expression of negation even affects the lexical part of the verb. In other cases, the negation particle only cliticizes to the verb (hasnt, isnt, wasnt). Here we can observe a categorial overdifferentiation: highly-frequent verbs express more grammatical information than average verbs. Another aspect of over-differentiation concerns the extent to which irregularity affects the word. As already shown, the degree is often much higher than the usual differences between the categories with the extreme of suppletion. Therefore, the term irregularity, which only reects the perspective of the language describer, should be replaced by a term which denotes the real function of this phenomenon. Irregularity means distinctiveness and allows for two contradicting but highly-functional advantages from the speakers perspective: brevity without loss of information. This is the main topic of Economy Theory as proposed by Ronneberger-Sibold (1980, 1987, 1990), Werner (1987a, 1987b, 1989, 1991), Harnisch (1988), and Fenk-Oczlon (1990, 1991). Irregular forms deserve our special interest and should not be seen as an accident of language history which has been forgotten by analogy. Irregularity does not happen by chance. It creates specic intraparadigmatic distinctions which follow strict principles such as the relevance of the respective category (e.g. tense before number) and their token frequency (present before past, singular before plural). Above all, lexical token frequency determines whether irregularity or rather distinctiveness occurs at all. Finally, another interesting case of over-differentiation which cannot be explained by the above-mentioned advantages should be presented: In our sample, several instances of interparadigmatic irregularization could be found, i.e. some verbs leave their original class and continue as disintegrated and isolated loners without the advantages of increased brevity and distinctiveness. For example, NHG werden wurde geworden (as well as Dutch worden werd geworden and Frisian wurde waard wurden) left its ablaut class (3b) without any apparent reason; all the other members, such as werfen warf

156

Damaris Nbling

geworfen (throw), continue to use the vowel -a- of the old past singular stem (the second ablaut stage), whereas werden chose the third ablaut stage with u-. This conrms the results of Bybee (1995), who shows that members of a class with high token frequency are not associated with this class because they are stored separately in the mental lexicon of the speaker. The so-called Network Model does not separate morphology into regular and irregular modules, but rather regards inected forms as lexical units. Depending on their type and token frequency, they have different lexical connections to other forms (high token frequency leads to lexical strength, high type frequency to schemas). The Dual-Processing Model, proposed by Pinker (1991), Prasada and Pinker (1993), Pinker and Prince (1988) considers regular morphology as rule-governed, irregular morphology as memory driven and thus lexically represented. Between both domains, there is a large scale. The left pole is connected with high productivity and high type frequency, the right pole with restricted productivity and high token frequency. German werden was polygrammaticalized in ENHG (as auxiliary for the passive, for the future, as copula etc.) and thus became very frequent. Although originally formally well integrated in the 3rd ablaut class, it left this class when it became autonomous. When the reduction from four to three ablaut stages took place in ENHG, this fact became obvious: Initially werden did not take part of this far-reaching development, i.e. it resisted analogical leveling4. 4 . Eventually, it chose a different model, the third instead of the second ablaut stage. The fact that even completely regular but highly-frequent verb forms can be stored unanalyzed is important. This must have happened to the originally weak verb have in all Germanic languages, resulting in extremely different paradigms. Every language used its special irregularization strategies but not even one waited for accumulated sound change. 6. Summary

The most important results from the comparative study of the irregularization strategies of six verbs in ten Germanic languages can be summarized as follows: Exceptions are to be expected under high token frequency; these exceptions are highly functional. Exceptions emerge not only through accumulated sound changes which were forgotten by analogical leveling. They are also sometimes created by ex4. Even today speakers know the old four ablaut stages of werden: werden ward wurden geworden at least passively. This is not the case with any other verb.

How do exceptions arise? On different paths to morphological irregularity

157

traordinary (accelerated) sound change, by morphological change, and even by lexical fusion (suppletion). The position of irregularity is highly predictable and depends on two factors: semantic relevance and token frequency. It could be demonstrated that exceptionality follows the relevance degree of the respective category. Within the same category the more frequently used subset is expressed exceptionally. The term irregularity should be replaced by brevity and distinctiveness. Suppletion as the most extreme form of irregularity allows for brevity of expression without the risk of information loss (syncretism). Thus, irregularity must be understood as a protection against syncretism. In contrast to Corbett (this volume), there was no interaction of suppletion and syncretism, but many examples for interaction of suppletion and overdifferentiation. In many cases, irregularity even creates more distinctions than are usually made in a paradigm, i.e. many instances of over-differentiation were observed, even including the morphological expression of categories which are usually realized syntactically (negation on English auxiliaries). The formal distinctions often are deeper and sharper than in common paradigms. The expression of the category at the word level tends to modify the lexical stem, starting with the stem-nal consonant and the vowel and ending with the word onset (suppletion). On the whole, allomorphy and the dissolution of morphological structures (amorphous morphology) emerge. Abbreviations Dan. ENHG Far. Fris. GMC IE Lux. MHG ModFris. NHG OFris. OHG Swed. Danish Early New High German Faroese (West) Frisian Germanic Indo-European Luxembourgish Middle High German Modern (West) Frisian New High German Old Frisian Old High German Swedish

158

Damaris Nbling

References
Bybee, Joan 1985 Bybee, Joan 1988 Morphology. A Study of the Relation between Meaning and Form. Amsterdam: Benjamins. Morphology as lexical organization. In Theoretical Morphology. Approaches in Modern Linguistics, Michael Hammond and Mickey Noonan (eds.), 119141. San Diego: Academic Press. Morphological universals and change. In The Encyclopedia of Language and Linguistics, Vol. 5. R.E. Asher (ed.), 25572562. Oxford: Pergamon. Regular morphology and the lexicon. Language and Cognitive Processes 10 (5): 425455. Productivity, regularity and fusion: How language use affects the lexicon. In Trubetzkoys Orphan, R. Singh (ed.), 247269. Berlin: Mouton de Gruyter.

Bybee, Joan 1994

Bybee, Joan 1995 Bybee, Joan 1996

Bybee, Joan, and Dan I. Slobin 1982 Rules and schemas in the development and use of English past tense. Language 58: 265289. Bybee, Joan, and Jean E.Newman 1995 Are stem changes as natural as afxes? Linguistics 33: 633654. Clahsen, Harald, and Monika Rothweiler 1992 Inectional rules in childrens grammars: Evidence from the development of participles in German. Yearbook of Morphology 1992: 134. Corbett, Greville G. this vol. Higher order exceptionality in inectional morphology. Cutler, Anne, John Hawkins, and Gary Gilligan 1985 The sufxing preference: A processing explanation. Linguistics 23: 723758. Fenk-Oczlon, Gertraud 1989 Gelugkeit als Determinante von phonologischen BackgroundingProzessen. Papiere zur Linguistik 40 (1): 91103. Fenk-Oczlon, Gertraud 1990 konomieprinzipien in Kognition und Kommunikation. In Spielarten der Natrlichkeit Spielarten der konomie. Beitrge zum 5. Essener Kolloquium ber Grammatikalisierung: Natrlichkeit und System-

How do exceptions arise? On different paths to morphological irregularity

159

konomie, Norbert Boretzky, Werner Enninger and Thomas Stolz (eds.), 3751. Bochum: Brockmeyer. Fenk-Oczlon, Gertraud 1991 Frequenz und Kognition Frequenz und Markiertheit. Folia Linguistica 25: 361-394. Fertig, David 1998 Suppletion, natural morphology, and diagrammaticity. Linguistics 36: 10651091.

Harnisch, Rdiger 1988 Natrliche Morphologie und morphologische konomie. Zeitschrift fr Phonetik, Sprachwissenschaft und Kommunikationsforschung 41: 426437. Hawkins, John, and Anne Cutler 1988 Psycholinguistic factors in morphological asymmetry. In Explaining Language Universals, John Hawkins (ed.), 280317. Oxford: Blackwell. Janzing, Gereon 1999 Das Friesische unter den germanischen Sprachen. Freiburg: Gaggstatter. Lindow, Wolfgang, Dieter Mhn, Hermann Niebaum, Dieter Stellmacher, Hans Taubken and Jan Wirrer 1998 Niederdeutsche Grammatik. Leer: Schuster. Maiden, Martin 1992 Irregularity as a determinant of morphological change. In: Journal of Linguistics 28: 285312.

Mayerthaler, Willi 1981 Morphologische Natrlichkeit. Wiesbaden: Athenaion. Mel uk , Igor c 2000 Suppletion. In Morphology. Vol. 1, Geert E. Booij, Christian Lehmann and Joachim Mugdan (eds.), 510521. Berlin/New York: de Gruyter. Mittelhochdeutsche Grammatik. Tbingen: Niemeyer. Mittelhochdeutsche Grammatik. 5th ed. Heidelberg: Winter.

Mettke, Heinz 2000 Michels, Victor 1979

Nbling, Damaris 1999 Zur Funktionalitt von Suppletion. In Variation und Stabilitt in der Wortstruktur, Matthias Butt, and Nanna Fuhrhop (eds.), 77101. Hildesheim/Zrich/New York: Olms

160

Damaris Nbling

Nbling, Damaris 2000 Prinzipien der Irregularisierung. Eine kontrastive Untersuchung von zehn Verben in zehn germanischen Sprachen. Tbingen: Niemeyer. Nbling, Damaris 2001a The development of junk. Irregularization strategies of have and say in the Germanic languages. In Yearbook of Morphology 1999, Geert Booij, and Jaap van Marle (eds.), 5374. Dordrecht: Springer. Nbling, Damaris 2001b Wechselexion Luxemburgisch Deutsch kontrastiv: ech soen du sees/si seet vs. ich sage, du sagst, sie sagt. Zum sekundren Ausbau eines prsentischen Wurzelvokalwechsels im Luxemburgischen. Sprachwissenschaft 26 (4): 433472. stman, Carin 1992 Paul, Hermann 1989 Pinker, Steven 1991 Den korta svenskan. Om reducerade ordformers inbrytning i skriftsprket under nysvensk tid. Diss. Uppsala. Mittelhochdeutsche Grammatik. 23rd ed. Tbingen: Niemeyer. Rules of language. Science 253: 530535.

Pinker, Steven, and Alan Prince 1988 Regular and irregular morphology and the psychological status of rules of grammar. In Proceedings of the 17th Annual Meeting of the Berkeley Linguistics Society, L.A. Sutton, C. Johnson, and R. Shields (eds.), 321251. Berkeley: BLS. Prasada, Sandeep, and Steven Pinker 1993 Generalisation of regular and irregular morphological patterns. Language and Cognitive Processes 8: 156. Ronneberger-Sibold, Elke 1980 Sprachverwendung Sprachsystem. konomie und Wandel. Tbingen: Niemeyer. Ronneberger-Sibold, Elke 1987 Verschiedene Wege zur Entstehung von suppletiven Flexionsparadigmen. Deutsch gern lieber am liebsten. In Beitrge zum 3. Essener Kolloquium ber Sprachwandel und seine bestimmenden Faktoren, Norbert Boretzky, Werner Enninger and Thomas Stolz (eds.), 243 264. Bochum: Brockmeyer. Ronneberger-Sibold, Elke 1990 The genesis of suppletion through morphological change. In Proceedings of the 14th International Congress of Linguists. Berlin/GDR, August 1015, 1987. Vol. 1, Werner Bahner, Joachim Schildt and Dieter Viehweger (eds.), 628631. Berlin: Akademie-Verlag

How do exceptions arise? On different paths to morphological irregularity Ruoff, Arno 1990

161

Hugkeitswrterbuch gesprochener Sprache. 2nd ed. Tbingen: Niemeyer.

Teleman, Ulf, Staffan Hellberg, and Erik Andersson 1999 Svenska Akademiens grammatik. Vol. 2: Ord. Stockholm: Svenska Akademien. Tomczyk-Popiska, Ewa 1987 Linguistische Merkmale der deutschen gesprochenen Standardsprache. Deutsche Sprache 15: 336357. van der Veen, K.F. 1984 Frekwinsjendersyk foar it Frysk. In Miscellania Frisica, in nije bondel Fryske stdzjes, N.R. rhammer et al. (eds.), 205218. Assen: Van Gorcum. Werner, Otmar 1977 Suppletivwesen durch Lautwandel. In Akten der 2. Salzburger Frhlingstagung fr Linguistik, Gaberell Drachmann (ed.), 269283. Tbingen: Narr. The aim of morphological change is a good mixture not a uniform language type. In Papers from the 17th International Conference on Historical Linguistics, Anna Giacalone Ramat, Onofrio Carruba and Giuliano Bernini (eds.), 591616. Amsterdam: Benjamins. Natrlichkeit und Nutzen morphologischer Irregularitt. In Beitrge zum 3. Essener Kolloquium ber Sprachwandel und seine bestimmenden Faktoren, Norbert Boretzky, Werner Enninger and Thomas Stolz (eds.), 289316. Bochum: Brockmeyer. Sprachkonomie und Natrlichkeit im Bereich der Morphologie. Zeitschrift fr Phonetik, Sprachwissenschaft und Kommunikations forschung 42: 3447. Die starken Prterita im Luxemburgischen: Ideale Analogie oder vergeblicher Rettungsversuch? German Life and Letters 43: 182190. Sprachliches Weltbild und/oder Sprachkonomie. In Akten des VIII. Internationalen Germanisten-Kongresses, Tokyo 1990. Vol. 4, Eijiro Iwasaki and Yoshinori Shichiji (eds.), 305315. Mnchen: iudicium.

Werner, Otmar 1987a

Werner, Otmar 1987b

Werner, Otmar 1989

Werner, Otmar 1990 Werner, Otmar 1991

162

Damaris Nbling

Weyerts, Helga, and Harald Clahsen 1994 Netzwerke und symbolische Regeln im Spracherwerb: Experimentelle Ergebnisse zur Entwicklung der Flexionsmorphologie. Linguistische Berichte 154: 430460. Wurzel, Wolfgang Ullrich 1984/2001 Flexionsmorphologie und Natrlichkeit. Berlin: Akademie-Verlag. Wurzel, Wolfgang Ullrich 1990 Gedanken zu Suppletion und Natrlichkeit. Zeitschrift fr Phonetik, Sprachwissenschaft und Kommunikationsforschung 43: 8691. Wurzel, Wolfgang Ullrich 1994a Skizze der natrlichen Morphologie. Papiere zur Linguistik 50: 23 50. Wurzel, Wolfgang Ullrich 1994b Grammatisch initiierter Wandel. (Sprachdynamik. Auf dem Wege zu einer Typologie sprachlichen Wandels. Vol. 1), Benedikt Jeing (ed.). Bochum: Brockmeyer

On the role of subregularities in the rise of exceptions Wolfgang U. Dressler

It is Damaris Nblings great merit to have focused in a series of articles, in her book (Nbling 2000) and now in this paper on certain diachronic developments and their synchronic results which have been marginalised or even neglected in much of morphological literature. It is even a greater merit of hers to have elaborated well-thought and well-documented solutions. Several of these solutions nd parallels in other languages. For example, her case of the irregularizing expansion of German Wechselexion, which sets the second and third person singular present apart from the rest of the present paradigm, as in German helfen to help, 1.sg. helf-e, 2.sg. hilf-st, 3.sg. hilf-t nds many parallels in Romance languages (cf. Maiden 2004, rst thematized by Matthews 1981). Thus in the paradigm of Italian sap-ere to know only the singular and the third plural present indicative have short forms: so, sai, sa, sanno as a result of pattern extension. Or her instances (5) of formal over-differentiation in irregular short verbs have parallels in many languages, most spectacularly in the present indicative of the French verb to be: je suis, tu es = il/elle est, nous sommes, vous tes, ils sont, where two forms are differentiated in the singular, which happens only in two other short verbs, in contrast to the syncretism of the whole singular paradigm in other verbs, plus the third plural as the default. Moreover all nonhomophonous forms (with the exception of the second plural) are in relations of strong or weak suppletion. Lexical fusion of different roots in the verb to be, i.e. of the Proto-IndoEuropean roots *es-, *bhu-, *wes- (2.4) in the West Germanic languages nds a parallel fusion of the rst two roots in Latin as well as in the Italic, Celtic, Slavic and Baltic languages. Thus I can largely agree with many of her solutions. In comparison with these agreements my disagreements with her paper (in line with Gaeta 2006: 1820) are less numerous but still worth putting forward: First of all, in regard to most of her examples and even types of examples, her term irregularity is a misnomer. It should rather be called subregularity,

164

Wolfgang U. Dressler

because she mostly describes patterns (and their diachronic origins) which hold for several items, at least for a small group of them. Examples which Nbling (1995, 2000) has studied in a pioneering way are short verbs in several Germanic languages, such as OHG h n have and l n let, a notion that we have a a applied to Italian verbs: Inf. ess-ere, 3.sg.pres. , avere ha have, fare fa make, and-are va go, pot-ere pu can, sap-ere sa know, vol-ere 2.sg. vuoi want (Pchtrager et al. 1998). Nblings (1 and 5) proposal to replace the notion of irregularity and irregularization with those of differentiation and distinctiveness adds relevant properties or functions but does not change the non-distinction between irregularity as the impossibility of accounting for a pattern by rules only and subregularity as accountability by minor rules. Second, when she deals with the traditionally best studied problem of irregularity, her passive type of irregularity (Nbling 2.1), i.e. the maintenance of irregularity due to high token frequency of the irregular forms, then I cannot see why this should involve a path to irregularizations/exceptionality. This is an instance of preservation but not of origin of irregularity. What is more interesting is her subtype of a gradual loss of members of a small subregular group, until only one member remains, which is then truly irregular. This is exemplied (2.1.) with Dutch hebben to have, but what is missing is a demonstration that such a longest persisting member of a small subregular group of items is always, or at least usually, the member with the highest token frequency of all the items of this group. Next, it is unclear whether what counts should be the high token frequency of the lexeme, i.e. of its whole inectional paradigm, or whether the high token frequency of the subregular/irregular parts of the paradigm is decisive, or whether the ratio between the irregular/subregular and the more regular/general parts of the paradigm. Third, there is the recently more and more discussed problem whether the explanatory factor is high token frequency or the degree of markedness. Nbling (2.1, 2.2, 3) seems to favour frequency, but does not go as far as Haspelmath (2006) in his attack on markedness. What the advocates of frequency rst tend to overlook is, rst, that ease of access in processing is more inuenced by the markedness effect of early age of acquisition than by token frequency (cf. Bonin et al. 2004). Second, nearly always they have to rely on frequency measures that are not at all representative of the token frequencies of the items in the real input of native listeners of the respective language. And third, the impact of token frequencies on our mental lexicon is not only derived from what we hear or read, but also on what we produce and think about. And this depends on the pragmatic importance of concepts, which is the base of markedness (cf. Mayerthaler 1981).

On the role of subregularities in the rise of exceptions

165

Fourth, her reliance on Bybees (1985) explanatory principle of relevance for explaining the distribution of irregularity between regularity and suppletion cannot be transferred to all similar instances in other languages. For example, as in the case of overdifferentiation cited for the French verb to be, the most striking cases of Italian suppletion and irregularity occur in the categories of person and number and not in aspect and tense, as predicted by Bybee and Nbling. This holds particularly for the opposition between root- and stem-based inection, regular in, e.g., Italian 2.sg.pres. am-i, pl. am-a-te you love, but suppletive in 2.sg.pres. esc-i you go out, pl. usc-i-te (cf. Dressler and Thornton 1991, Maiden 2004). Fifth, Nbling (1) criticizes adherents of Natural Morphology for not having tried to account for irregularity. This is, however, neither true for the second point above, the preservation of irregularity, nor for many subtypes of subregularity (the rst point above), which covers the greatest part of Nblings areas of irregularity. Particularly in the subtheory of static morphology (Dressler 2003, Kilani-Schoch and Dressler 2002, 2005, Aguirre and Dressler 2006), i.e. of lexically stored, prototypically unproductive morphology, we have focused on parameters of phonological similarities (cf. Nbling 2.3), for example on rime words such as the two Latin verb pairs f/caveo, f/cavi, f/cautum and m/voveo, m/vovi, m/votum. Or Nblings (3) remarks on the word-initial consonantal onset as preferred position of lexical identication of a lexeme has been antedated by the same authors (cf. also Dressler 1987: 116f.). Or Nblings (5) notion of isolated loners among verbs is paralleled by the notion of isolated paradigms proposed by Dressler (since 1997). Also categorical subgroups of verbs are a feature of static morphology. Thus the secondary umlaut in the German second conjunctive htte (preterite indicative hatte had) should be due to analogy of the fellow auxiliary wre were rather than of strong verbs, such as gbe gave, as Nbling (2.3) proposes. An important difference in our view regards storage: whereas Nbling (5) appears to equate stored with unanalyzed, the model of static morphology allows partial analysis of fully stored forms, e.g. by establishing relations based on rimes or other similarities. When Nbling (2.2) argues for accelerated sound shift as one way of irregularization, then this nds a parallel in the notion of lexicalisation out of fast/sloppy speech of very frequent words, especially of function words, such as auxiliaries (cf. Nblings table 4), as discussed in Dressler (1973) and Dressler and Moosmller (1992). These remarks may show that Nblings object of study has not been neglected within Natural Morphology and Phonology, but clearly not focused upon in such a systematic way as in her own investigations.

166

Wolfgang U. Dressler

References
Aguirre, Carmen, and Wolfgang U. Dressler 2006 On Spanish verb inection. Folia Linguistica 40: 7596. Bonin, Patrick, Christopher Barry, Alain Mot and Marylne Chalard 2004 The inuence of age of acquisition in word reading and other tasks: A never ending story? Journal of Memory and Language 50: 456 476. Bybee, Joan 1985 Morphology. Amsterdam: Benjamins.

Dressler, Wolfgang U. 1973 Pour une stylistique phonologique du latin. Bulletin de la socit de linguistique de Paris 68 : 129145. Dressler, Wolfgang U. 1987 Word formation (WF) as part of natural morphology. In Leitmotifs in Natural Morphology, Wolfgang U. Dressler, Willi Mayerthaler, Oswald Panagl and Wolfgang Ullrich Wurzel (eds.), 99126. Amsterdam: Benjamins. Dressler, Wolfgang U. 2003 Latin static morphology and paradigm families. In Language in Time and Space, Brigitte L.M. Bauer and Georges-Jean Pinault (eds.), 87 99. Berlin/New York: Mouton de Gruyter. Dressler, Wolfgang U., and Sylvia Moosmller 1992 Sociolinguistic parameters of spoken Austrian German. In Studies in Spoken Languages: English, German, Finno-Ugric, Mikls Kontra and Tams Vradi (eds.), 6181. Budapest: TA Nyelvtudomnyi Intzet. Dressler, Wolfgang U., and Anna M. Thornton 1991 Doppie basi e binarismo nella morfologia italiana. Rivista di Linguistica 3: 322. Dressler, Wolfgang U., and Anna M. Thornton 1997 On productivity and potentiality in inectional morphology. CLASNET Working Paper 7. Montral. Gaeta, Livio 2006 How to live naturally and not be bothered by economy. Folia Linguistica 40: 728.

Haspelmath, Martin 2006 Against markedness (and what to replace it with). Journal of Linguistics 42: 2570.

On the role of subregularities in the rise of exceptions

167

Kilani-Schoch, Marianne, and Wolfgang U. Dressler 2002 Afnits phonologiques dans lorganisation de la morphologie statique: lexemple de la exion verbale franaise. Folia Linguistica 36, 297312. Kilani-Schoch, Marianne, and Wolfgang U. Dressler 2005 Morphologie naturelle et exion du verbe franais. Tbingen: Narr Maiden, Martin 2004 When lexemes become allomorphs. On the genesis of suppletion. In: Folia Linguistica 38: 227256.

Matthews, Peter 1981 Present stem alternations in Italian. In Logos Semantikos. Vol. IV, Horst Geckeler, Brigitte Schlieben-Lange, Jrgen Trabant and Harald Weydt (eds.), 5765. Berlin and Madrid: de Gruyter and Gredos. Mayerthaler, Willi 1981 Morphologische Natrlichkeit. Wiesbaden. Nbling, Damaris 1995 Die Kurzverben im Schweizerdeutschen. In Alemannische Dialektforschung, Heinrich Lfer (ed.), 165179. Tbingen: Niemeyer. Nbling, Damaris 2000 Prinzipien der Irregularisierung. Eine kontrastive Untersuchung von zehn Verben in zehn germanischen Sprachen. Tbingen: Niemeyer. Pchtrager, Markus A., Csand Bod, Wolfgang U. Dressler and Teresa Schweiger 1998 On some inectional properties of the agglutinating type illustrated from Finnish, Hungarian and Turkish inection. Wiener linguistische Gazette 6263: 5792.

Statement on the commentary by Wolfgang U. Dressler Damaris Nbling

I thank Wolfgang U. Dressler for his useful commentary on my paper and fully agree with him in claiming that many more languages still have to be investigated regarding irregularities and irregularization strategies. The articles of Martin Maiden concerning the Romance languages are highly instructive in showing the different paradigmatic patterns and their emergence. Dresslers comment that the Romance languages show special classes of highly irregular short verbs also conrms my impression that the described irregularization paths are not only restricted to the family of the Germanic languages. Nevertheless, one should be careful to draw further, possibly universal, conclusions. Therefore, a comparison of these ten verbs in as many (genetically and typologically) different languages as possible is a desideratum. In the following, I relate to his disagreements with my paper: 1. The term irregularity as a misnomer: There are many concepts of irregularity ranging from unproductive paradigms which still are integrated in big classes to strong suppletion, i.e. completely isolated paradigms with idiosyncratic behaviour. Dressler favours the last notion: Only isolated paradigms with unique inectional behaviour are labelled irregular. I think that the notion of irregularity should be kept exible. If it were synonymous to suppletion we would not need the term irregularity. In my opinion, irregularity includes different stages within so-called static (non-productive) morphology, and especially the term irregularization should relate to different stages of this process. This can be compared to the notion of grammaticalization which also refers to a high amount of rather disparate phenomena (which, at rst sight, sometimes even seem not to be related) which does not only describe the nal stages of grammaticalization. Irregularization comprises phenomena, such as: rst of all, the absence of morphological rules, furthermore morphological unproductivity, decreasing type frequency; decreasing intraparadigmatic as well as interparadigmatic similarities, increasing allomorphy including stem variants, increasing affection of the whole word form by the fusion of grammatical and lexical material, the morphological segmentability is reduced in sum: the

170

Damaris Nbling

item behaves more and more like a lexical unit and is processed as such. The development of in Dresslers terms subregularity is the rst, obligatory step to irregularity. Thus, subregularity only constitutes an early stage on the long irregularization scale. 2. The maintenence of irregularity due to high token frequency is not a path to exceptionality: Analogy is the most powerful means to produce regularity in the sense of intra- and/or interparadigmatic similarity. Sometimes, analogy becomes so frequent that it nearly looks like a rule, c.f. the levelling of the number ablaut grades in Early New High German. Another example is superstable markers spreading to nearly all paradigms except for some few items basically the most frequently occurring ones which resist this levelling process. In Middle High German, there still were four so-called athematic verbs with the sufx -n (< OHG -m) in the 1.sg. of the present (g -n, st -n, tuo-n, bi-n I go, a a stand, do, am). Afterwards, three of these four verbs changed to the common strong verb sufx leading to the result, that NHG bin I am is the only remnant of the old regular form. It is not by chance that bin, which belongs to the most frequent verb sein to be, was forgotten by analogy. There are many irregular and even suppletive verb forms which developed completely regularly but never underwent analogy. This passive way to irregularity should be taken seriously because it often leads to complete isolation of the respective items. Dutch hebben is another example; hebben is the second most frequently used verb (after zijn to be). High token frequency is the best protection against analogy. The question of what counts as high token frequency, whether the whole paradigm or only some parts of it, can be clearly answered. It is both: the so-called lexical and the categorial or grammatical token frequency. Thus, the 3.sg.present form of a frequently used verb is most prone to be an irregular one and, at the same time, to exhibit short expression. This connection is explained in Chapter 2.2 and demonstrated by extraordinary assimilations in the 3.sg.present of Come and Take in Swiss German, Luxembourgish, North Frisian, Icelandic, and Low German (cf. Table 3 and the part below). 3. Concerning the explanatory factor of high token frequency, I must admit that I fully agree with Haspelmaths and, by the way, Fenk-Oczlons (1991) concept of frequency rst instead of different markedness degrees and conceptions (compare, e.g., the discussion of whether the 1st or the 3rd or the 2nd person is the least marked one). Fenk-Oczlon provides many examples of the frequency rst principle making cases of so-called local markedness (Tiersma 1982) and markedness reversals (Mayerthaler 1981, Wurzel 1984/2001) unnecessary. I am convinced that token frequency counts are indeed highly reliable and much more exact and measurable than pragmatic considerations and markedness determinations which rarely are dened precisely (and hence vary

Statement on the commentary by Wolfgang U. Dressler

171

depending on the linguists aim). Moreover, token frequency even allows for predictions. In order to explain, e.g., accelerated sound change such as the above mentioned extraordinary assimilations in the most frequently used paradigmatic slots of frequent verbs such as Come and Take, the respective forms have to be really applied, i.e. materialized (pronounced) to undergo these actual (articulatory) changes, which often consist of ease of articulation. Token frequencies of heard and read items may be important for the mental storing but they do not directly lead to what is described in my paper. Surely, it may be interesting to know the many sources of high token frequency which partly may be found in pragmatics. However, this does not restrict the fact that frequency of usage is one of the most important driving forces for language change. 4. Bybees principle of relevance cannot be transferred into other languages: My claim that irregularity in less relevant categories includes irregularity in more relevant categories was only deduced from the Germanic languages. The existence of some exceptions in other languages does not disprove this principle. It would be very worthwhile to test these correlations systematically on the basis of the Romance (and other inecting) languages. In addition to these functional regularities, there often are paradigmatic patterns which vary from language to language. In most cases, their emergence can only be explained by language history. Maiden describes such patterns for the Romance languages, e.g. the special behaviour of the 1st and 2nd pl. in French and Spanish verbs due to different accent positions in Latin. In German and Luxembourgish, it is the so-called wechselexion pattern which often exposes and differentiates the 3rd and 2nd sg.present by umlauted vowels from the remaining paradigm (the original reason for this are OHG endings containing -i-). This is the reason for the fact that the 2nd sg. always follows the 3rd sg. in its inectional behaviour although it is only the 3rd sg. which is extremely frequent (for further details, see Nbling 2000, 2001). 5. It is not true that Natural Morphology is not interested in irregularity: There are different camps in Natural Morphology. I referred mostly to Wurzels (1984/2001) concept because it is most concerned with German and other Germanic languages. Firstly, Wurzel claimed that irregularity including suppletion is unnatural, i.e. highly marked. In a later phase, he established a so-called suppletion domain which was excluded from natural principles. This domain never was dened exactly, and gradual transitions to less irregular conditions were not considered. These clearcut distinctions instead of exible scales and schemes can be found again in the inectional macro- and microclass systems of KilaniSchoch & Dressler (2005). I think it would be very promising to compare the average token frequency rates of the members of the different classes ranging from huge productive classes (such as French verbs on -er) until so-called

172

Damaris Nbling

ssssscl[asses] (s = sub) (such as French courir, venir, tenir). My interest is to show how and possibly why so many small subsub- and so forth classes develop, how they are interrelating, how they cluster with other miniclasses and which categories are most affected by irregular expression. Thus, the dynamics of static morphology is the main topic of my paper. References (in addition to the references in my paper)
Kilani-Schoch, Marianne and Wolfgang U. Dressler 2005 Morphologie naturelle et exion du verbe franais. Tbingen: Narr. Tiersma, Pieter Meijes 1982 Local and general markedness. Language 58: 832849.

Taking into account interactions of grammatical sub-systems

Lexical variation in relativizer frequency Thomas Wasow, T. Florian Jaeger, and David M. Orr

Abstract. An exception to a non-categorical generalization consists of a lexical item that exhibits the general pattern at a rate radically different either far higher or far lower from the norm. Lexical differences in noun phrases containing non-subject relative clauses (NSRCs) correlate with large differences in the likelihood that the NSRC will begin with that. In particular, the choices of determiner, head noun, and prenominal adjective in an NP containing an NSRC may dramatically raise or lower rates of that in the NSRC. These lexical variations can be partially explained in terms of predictability: more predictable NSRCs are less likely to begin with that. This generalization can be plausibly explained in terms of processing, assuming that facilitates processing and/or signals difculty. The correlations between lexical choices in the NP and the predictability of an NSRC can, in turn, be explained in terms of the semantics of the lexical items and the pragmatics of reference.*

1.

Introduction

The notion of exception presupposes that of rule; as Webster (http://www. m-w.com/dictionary) puts it, an exception is a case to which a rule does not
* This paper is dedicated to Professor Gnter Rohdenburg of Paderborn University, whose sixty-fth birthday coincided with the completion of the rst draft of the paper. Professor Rohdenburgs seminal studies on English usage and structure have been an inspiration to many data-oriented students of language, ourselves included. We received help and advice on this work from many people. Paul Fontes did essential work on the maximum entropy predictability model described at the end of section 3. Sandy Thompson was generous in sharing an early version of Fox and Thompson (2007) with us, and in giving us very useful feedback on earlier versions of this work. Additional help and advice was provided by at least the following people: David Beaver, Joan Bresnan, Brady Clark, Liz Coppock, Vic Ferreira, Edward Flemming, Ted Gibson, Jack Hawkins, Irene Heim, Dan Jurafsky, Rafe Kinsey, Roger Levy, Chris Manning, Tanya Nikitina, Doug Rohde, Doug Roland, Neal Snider, Laura Staum, Michael Wagner, and Annie Zaenen. Special thanks also to Heike Wiese and Horst Simon, rst for organizing the workshop at which this material was originally presented, and for comments on the written version.

176

Thomas Wasow, T. Florian Jaeger, and David M. Orr

apply. Linguistic rules (and, more recently, constraints, principles, parameters, etc.) are usually taken to be categorical, at least in the generative tradition. Quantitative data like frequency of usage are widely considered irrelevant to grammar, and gradient theoretical notions like degrees of exceptionality have remained outside of the theoretical mainstream. This antipathy towards things quantitative probably has its origins in Chomskys early writings, which dismissed the signicance of frequency data and statistical models (see, e.g., Chomsky 1955/75: 145146; 1957: 1617; 1962: 128; 1966, 3536). But recently, the availability of large on-line corpora and computational tools for working with them has led some linguists to question the exclusion of frequency data and non-categorical formal mechanisms from theoretical discussions (for example, Wasow 2002 and Bresnan et al. 2007). Moreover, corpus work has revealed that natural-sounding counterexamples to many purportedly categorical generalizations can be found in usage data (Bresnan and Nikitina 2003). If categorical rules are replaced by gradient models, what becomes of the notion of exceptionality? The paradigmatic instance of an exception is a lexical item that satises the applicability conditions of a (categorical) rule, but cannot undergo it. (When rules are categorical, so are exceptions). The obvious analogue for a non-categorical generalization would be a lexical item whose frequency of occurrence in a given environment is dramatically different from that of other lexical items that are similar in relevant respects. For example, whereas about 8 % (11,405/146,531) of the occurrences of transitive verbs in the Penn Treebank III corpora (Marcus et al. 1999) are in the passive voice, certain verbs occur in the passive far more frequently, and others far less frequently. Among the former is convict, which occurs in the passive in 33 % (25/76) of its occurrences as a verb; the latter is represented by read, fewer than 1 % (6/788) of whose occurrences as a transitive verb are passive.1 Such skewed distributions, which we will call soft exceptions, are by no means uncommon. For grammarians who make use of non-categorical data and mechanisms, soft exceptions constitute a challenge. Simply recording statistical biases in individual lexical entries may be feasible and useful in applications to language technologies. But it is theoretically unsatisfying: we would like

1. These numbers are based on searches of the parsed portions of the Wall Street Journal, Brown, and Switchboard corpora, looking at the ratio of passive verb phrases to the total number of VPs directly dominating the verb in question and an NP (possibly a trace).

Lexical variation in relativizer frequency

177

to explain why words show radically different proclivities towards particular constructions. The remainder of this paper examines one set of soft exceptions and offers an explanation for them in terms of a combination of semantic/pragmatic and psycholinguistic considerations. 2. Background

The particular phenomenon we examine is the optionality of relativizers (that or wh-words) in the initial position of certain relative clauses (RCs). This is illustrated in the following examples: (1) a. b. c. That is certainly one reason (why/that) crime has increased. I think that the last movie (which/that) I saw was Misery. They have all the water (that) they want.

We have been exploring what factors correlate with relativizer occurrence in RCs, using syntactically annotated corpora from the Penn Treebank III. The results presented below have been carried out using the Switchboard corpus, which consists of about 650 transcribed telephone conversations between pairs of strangers (on a list of selected topics), totalling approximately 800,000 words. Certain factors make relativizers obligatory, or so strongly preferred as to mask the effects of other factors. As is well-known (see Huddleston and Pullum 2002: 1055), if the RCs gap is the subject of the RC, then the relativizer cannot be omitted:2 (2) I saw a movie *(that) offended me. 3

We have excluded these from our investigations, concentrating instead on what we will call non-subject extracted relative clauses, or NSRCs. We have also excluded examples involving what Ross (1967) dubbed pied piping, as in (3): (3) a. b. a movie to *(which) we went a movie *(whose) title I forget

2. There are dialects that permit relativizer omission in some RCs with subject gaps, as in the childrens song, There was a farmer had a dog 3. An asterisk outside parentheses is used to indicate that the material inside the parentheses is obligatory.

178

Thomas Wasow, T. Florian Jaeger, and David M. Orr

Non-restrictive relative clauses are conventionally claimed (Huddleston and Pullum 2002: 1056) to require a wh-relativizer, and this seems to be correct in clear cases: (4) a. Goodbye Lenin, which I enjoyed, is set in Berlin b. *Goodbye Lenin, (that) I enjoyed, is set in Berlin

The converse that wh-relativizers may not appear in restrictive RCs is a well-known prescription (e.g., Fowler 1944: 635), though it does not appear to be descriptively accurate. Evaluating these claims is complicated by the fact that the boundary between restrictive and non-restrictive modiers seems to be quite fuzzy. Instead of trying to identify all and only non-restrictive RCs, we excluded all examples with wh-relativizers. This decision was also motivated in part by our observation that disproportionately many of the examples with wh-relativizers were questionable for other reasons (e.g. some embedded questions were misanalyzed as RCs). Thus, our results are based on the comparison between NSRCs with that relativizers and those with no overt relativizer.4 In addition, we excluded reduced subject-extracted and innitival RCs, since they never allow relativizers (except for innitival RCs with pied-piping where the relativizer is obligatory): (5) a. b. c. a movie (*that) seen by millions a movie (*that) to see a movie in *(which) to fall asleep

After these exclusions, our corpus contained 3,701 NSRCs, of which 1,601 (43 %) begin with that and the remaining 2,100 (57 %) have no relativizer. A variety of factors seem to inuence the choice between that and no relativizer in these cases. These include the length of the NSRC, properties of the NSRC subject (such as pronominality, person, and number), and the presence of disuencies nearby. We discuss these elsewhere (Jaeger and Wasow 2006; Jaeger, Orr, and Wasow 2005; Jaeger 2005), exploring interactions among the factors and seeking to explain the patterns on the basis of processing considerations. The focus of the present paper is on how lexical choices in an NP containing an NSRC can inuence whether a relativizer is used. We show that particular choices of determiner, noun, or prenominal adjective may correlate with exceptionally high or exceptionally low rates of relativizers. We then propose that this
4. The studies were replicated including the NSRCs with wh-relativizers. The results are qualitatively the same, though the numbers are of course different.

Lexical variation in relativizer frequency

179

correlation can be explained in terms of the predictability of the NSRC, which in turn has a semantic/pragmatic explanation. 3. Lexical choices and relativizer frequency

Early in our investigations of relativizer distribution in NSRCs we noticed that relativizers are far more frequent in NPs introduced by a or an than in those introduced by the. Specically, that occurs in 74.8 % (226/302) of the NSRCs in a(n)-initial NPs and in only 34.2 % (620/1813) of those in the-initial NPs. Puzzled, we checked the relativizer frequency for NSRCs in NPs introduced by other determiners. The results are summarized in Table 1, where the numbers in parentheses indicate the total number of examples.
Table 1. NSRC that Rate by NP Determiner. DETERMINER (FREQUENCY) a or an (302) Possessive pronoun (37) some (67) No determiner (428) this, that, these, those (106) Numeral (177) any (55) no (34) the (1813) all (206) every (68) NSRC WITH THAT 74.8 % 64.9 % 64.2 % 63.1 % 61.3 % 53.1 % 49.1 % 38.2 % 34.2 % 24.3 % 14.7 %

The variation in these numbers is striking, but it is by no means obvious why they are distributed as they are. Curious whether other lexical choices within NPs containing NSRCs might be correlated with relativizer frequency, we compared rates of relativizer occurrence for the nouns most commonly modied by NSRCs. Again, we found a great deal of variation, with no obvious pattern. If individual determiners and head nouns are correlated with such highly variable rates of relativizer presence, we reasoned that the words that come between determiners and head nouns namely, prenominal adjectives might show similar variation. And indeed they do: Figure 3 shows the relativizer frequencies for the prenominal adjectives that occur most frequently in NPs with NSRCs.

180

Thomas Wasow, T. Florian Jaeger, and David M. Orr

Table 2. NSRC that Rate by NP Head Noun. HEAD NOUN (FREQUENCY) stuff (46) people (64) one (106) problem (44) something (171) thing (523) kind (49) anything (48) place (99) everything (60) reason (91) time (247) way (325) NSRC WITH THAT 62.8 % 57.1 % 51.5 % 50.0 % 44.7 % 43.7 % 43.2 % 38.0 % 34.4 % 24.6 % 24.0 % 14.0 % 13.0 %

The differences in relativizer frequency based on properties of the modied NP are immense. For example, NSRCs modifying NPs with the adjective little are on average over eight times more likely to have a relativizer than NSRCs modifying NPs with the adjective last. These differences are not due to chance; chi-square tests on all three of these distributions are highly signicant. Why should lexical choices in the portion of an NP preceding an NSRC make such a dramatic difference in whether the NSRC begins with that or has
Table 3. NSRC that Rate by Prenominal Adjective. ADJECTIVE (FREQUENCY) little (41) certain (19) few (20) different (19) big (15) other (87) same (47) best (24) only (158) rst (99) last (79) NSRC WITH THAT 73.2 % 68.4 % 65.0 % 63.2 % 60.0 % 49.4 % 46.8 % 25.0 % 24.7 % 18.2 % 8.9 %

Lexical variation in relativizer frequency

181

no relativizer? How can we explain soft exceptions to the optionality of that in NSRCs . That is, why do the presence of words like a(n), every, stuff, way, little, and last correlate with exceptionally high or low rates of that in NSRCs that follow them within an NP? 4. Predictability

An example from Fox and Thompson (2007) provided a crucial clue. They observed that the following sentence sounds quite awkward with a relativizer.5 (6) That was the ugliest set of shoes (that) I ever saw in my life.

Moreover, the sentence seems incomplete without the relative clause: (7) That was the ugliest set of shoes.

(7) would be appropriate only in a context in which some comparison collection of sets of shoes is clear to the addressee. These observations led us to conjecture that the strong preferences in (6) for a relative clause in the NP and for no relativizer in the relative clause might be connected. Looking at the vs. a(n) in our corpus (the contrast that rst got us started on this line of inquiry), we found that, of the 30,587 NPs beginning with the, 1813 (5.93 %) contain NSRCs, whereas only 302 (1.18 %) of the 45,698 NPs beginning with a(n) contain NSRCs. This difference ( 2 = 812, p < 0.001) lent plausibility to our conjecture. Hence, we propose the following hypothesis: (8) The Predictability Hypothesis: In environments where an NSRC is more predictable, relativizers are less frequent.

This formulation is somewhat vague, since neither the notion of environment nor of predictability is made precise. Our initial tests of the hypothesis use simple operationalizations of these notions: the environments are the NPs containing the determiners, nouns, and adjectives described in the previous section, and an NSRCs predictability in the environment of one of these words is measured by the percentage of the NPs containing that word that also are modied by an NSRC.

5. Fox and Thompsons account of the preference for no relativizer in (6) is based on the claim that (6) falls at the monoclausal end of a continuum of monoclausality to bi-clausality. We discuss this idea in section 5 below.

182

Thomas Wasow, T. Florian Jaeger, and David M. Orr

Figures 13 plot cooccurrence with NSRCs against frequency of relativizer absence in NSRCs. The points in Figure 1 represent the eleven determiner types given in Table 1; the points in Figure 2 represent the thirteen head nouns given in Table 2; and the points in Figure 3 represent the eleven adjectives given in Table 3.6 . The lines represent linear regressions that is, the lines represent the best (linear) generalization over the data points in that the total squared distance between the points and the lines is minimized (other tests showed that the trend is indeed linear and not of a higher order). The correlation between NSRC cooccurrence and relativizer absence is signicant for all three categories. Correlating the predictability of NSRCs for all 35 words (the determiners, adjectives, and head nouns in our sample) against frequency of relativizer absence is also signicant (adjusted r2 = .36, F(1, 33) = 19.9, p < .001).7

Figure 1. Relativizer Frequency and NSRC Cooccurence by Determiner; adjusted r2 = .918 ; F(1, 9) = 105.1, p < .001. 6. The mean plots in the three gures represent rather different sample sizes. Determiners are a closed class, so Figure 1 includes almost all NSRCs, whereas Figures 2 and 3 are based on just the head nouns and adjectives that cooccur most frequently with NSRCs. And since almost all NPs include a head noun but most do not have prenominal adjectives, the sample size in Figure 3 is far lower than in Figure 2. 7. After removing two extreme outliers, the adjusted r2 = .56, F(1, 31) = 36.1, p < 0.001. 8. Adjusted r2 s provide a more reliable measure of the goodness of t of the model compared to normal, unadjusted r2 s, which usually are too optimistic. Generally, r2

Lexical variation in relativizer frequency

183

Figure 2. Relativizer Frequency and NSRC Cooccurrence by Head Noun; adjusted r2 = .35; F(1, 11) = 7.4, p = .02.

Figure 3. Relativizer Frequency and NSRC Cooccurrence by Adjectives; adjusted r 2 = .32; F(1, 9) = 5.8, p = .04.

These results support the Predictability Hypothesis: on average, if a determiner, prenominal adjective, or head noun within an NP increases the likelihood that the NP will contain an NSRC, then it also increases the likelihood that an NSRC in the NP will lack a relativizer.
estimates the amount of variation in the data accounted for by the model, e.g. an r2 of .91 means that the model accounts for 91 % of the variation.

184

Thomas Wasow, T. Florian Jaeger, and David M. Orr

The evidence presented above supports the Predictability Hypothesis, but the predictability measures employed are rather simple. We used one word at a time in the modied NP to estimate the predictability of an NSRC, and, we only used the most frequent types of determiners, adjectives, and head nouns.9 There are several ways to develop more sophisticated models of an NSRCs predictability that (i) take into account more than one word in the NP at a time, and (ii) are not limited to the most frequent types. We present one such approach, using a machine learning technique. This approach would also enable us to include information relevant to NSRC predictability that is not due to lexical properties of NPs (such as their grammatical function), but the study we report on here is limited to lexical factors.10 We created a maximum entropy classier (see Ratnaparkhi 1997), which used features of an NP to predict how likely a relative clause was in that NP.11 Features included the type of head noun, any prenominal adjectives, and the determiner, as well as some additional properties, such as whether the head noun was a proper name, and whether the modied NP contained a possessive pronoun. Based on these features, the classier assigned to each NP in the corpus a probability of having an RC, which we will refer to as its predictability index. We then grouped NPs according to these predictability indices, and examined how the relativizer likelihood in an NSRC varied across the groups.12
9. Furthermore, we used means to predict means (i.e. we used the mean predictablity of an NSRC given a certain word in the modied NP and correlated that against the mean relativizer likelihood for NSRCs modifying those NPs). This method arguably inates our r2 s (i.e. the measure of how much of the variation in relativizer omission is captured by predictability). Elsewhere (Jaeger, Levy, Wasow, and Orr 2005), we address this issue by using binary logistic regressions that predict the presence of a relativizer based on the predictability of the NSRC on a case-by-case basis. 10. Studies involving non-lexical factors are in progress. 11. This study differs from the earlier ones in that it considered the predictability of any relative clause, not just of non-subject relative clauses. This broader criterion provided the classier with more data on which to base its classications; the narrower criterion would have required a larger corpus in order to get reliable classications. So this study is testing for a slightly different correlation than the one stated in the Predictability Hypothesis. However, since the probability that an NP will contain an NSRC and the probability that an NP will contain an RC are highly correlated, a correlation between RC predictability and relativizer absence still supports our claims (cf. also footnote 14). Future research may determine which of the two measures is the better predictor of relativizer frequency. 12. Here we present the result of a classier trained on the Switchboard corpus, similar results were found for the parsed Wall Street Journal (Penn Treebank III release).

Lexical variation in relativizer frequency

185

Before checking on relativizer presence, however, we needed to test the accuracy of the predictability indices our classier assigned. We did this by comparing the predictability index range of each of the groups with the actual rates of RCs in the NPs in the groups. That is, we compared the fraction of the NPs in each group that contained an RC with the range of predictability indices the group represented. As can be seen in Figure 4, the occurrences of RCs in the NPs in each group were consistently within or close to the range assigned by the classier. This indicates that the predictability indices that the classier was assigning to the NPs were generally reasonable estimates.

Figure 4. Accuracy of Classier.

For the NPs containing NSRCs, we then used the classiers predictability indices to test whether relativizers are less frequent where RCs are more predictable. We did this by examining the rates of relativizer absence for each of our groupings of NPs. As Figure 5 shows, the results are similar to what we found looking at the most frequent determiners, adjectives, and nouns separately: NSRCs in NPs whose features make them more likely to contain RCs are less likely to have relativizers. This result provides more support for the Predictability Hypothesis. Furthermore, the fact that a simple maximum entropy classier provides reasonable measurements of the predictability of relative clauses suggests that predictability in this sense can be computed by means of a standard machine-learning method. Hence, it is reasonable to assume that speakers have access to estimates of how likely an RC is in a given context.

186

Thomas Wasow, T. Florian Jaeger, and David M. Orr

Figure 5. Predictability Index and Relativizer Absence; adjusted r2 =.86; F(1, 5)=36.9, p = .002.

5.

Explaining the Correlation

The Predictability Hypothesis seems to be correct: NSRCs evidently begin with that less frequently in environments where an NSRC (or any RC) is more likely to occur. But we have still not answered our original question: Why do different lexical choices correlate with such large differences in relativizer rates? Our answer involves two steps. First, we suggest a processing explanation for the correlation between NSRC predictability and relativizer absence. Second, we argue that there are semantic/pragmatic reasons why certain determiners, head nouns, and adjectives tend to cooccur with NSRCs relatively frequently. Put together, these will constitute an account of why those lexical choices lead to low relativizer rates. Explaining the presence vs. absence of relativizers in NSRCs in terms of processing can involve considerations of comprehension, production, or a combination of the two. Relativizers could facilitate comprehension by marking the beginning of a relative clause and thereby helping the parser recognize dependencies between the head NP and elements in the NSRC (see Hawkins 2004, for an account along these lines). Relativizers could facilitate production, e.g. by providing the speaker with extra time to plan the upcoming NSRC (see Race and MacDonald 2003, for an account along these lines). Both types of explanation predict that relativizers should occur more frequently in more complex NSRCs (though the factors contributing to comprehension complexity and pro-

Lexical variation in relativizer frequency

187

duction complexity might not be identical). Teasing apart the predictions of these different kinds of processing explanations is by no means straightforward (see Jaeger 2005, for much more detailed discussion of this issue). Whatever kind of processing explanation one adopts, it can be employed to explain why predictability of the NSRC inuences relativizer frequency. In a context in which an NSRC has a relatively high probability, the listener gets less useful information from having the beginning of the NSRC explicitly marked. Hence, relativizers do less to facilitate comprehension where NSRCs are predictable. And in environments where NSRCs are likely, speakers would begin planning the NSRC earlier (on average) than in environments where they are less likely. Consequently, they would be less likely to need to buy time by producing a relativizer at the beginning of the NSRC. In short, the correlation between predictability and relativizer absence follows from the hypothesis that relativizers aid processing. But why do certain lexical choices early in an NP have such a strong effect on the likelihood of there being an NSRC later in the NP? To answer this, it is useful to consider the semantic function of restrictive relative clauses. As the term restrictive implies, such clauses characteristically serve to limit the possible referents of the NPs in which they occur. For example, in (8), the NSRC that I listen to restricts the denotation of the NP to a proper subset of music, namely, the music the speaker listens to; without the NSRC, the NP could refer to any or all music. (8) music that I listen to

Certain determiners, nouns, and adjectives have semantic properties that make this sort of further restriction very natural or even preferred. Consider, for example, the determiners all and every, which express universal quantication. Universal assertions are generally true of only restricted sets.13 Thus, (9a) is true for many more VPs than (9b). (9) a. b. Every linguist we know VP Every linguist VP

13. Students in elementary logic classes are taught that sentences beginning with a universal quantier almost always have a conditional as their main connective. The antecedent of this conditional is needed to restrict the set of entities of which the consequent is claimed to hold. That is, for a sentence of the form xP(x) to be true, P should include some contingencies. In natural language, NSRCs are one way of expressing such contingencies.

188

Thomas Wasow, T. Florian Jaeger, and David M. Orr

More generally, universal assertions are more likely to be true if the quantication is restricted, and NSRCs are one natural way to impose a restriction.14 Hence, in order to avoid making excessively general claims, people frequently use NSRCs with universal quantiers. Notice that the opposite is true for existentials: (10a) is true for many more VPs than (10b), since (10a) is true if VP holds of any linguist, whereas (10b) is true only if it holds of a linguist we know. (10) a. b. A linguist VP A linguist we know VP

So while restricting a universally quantied assertion increases its chances of being true, restricting an existentially quantied assertion reduces its chances of being true. Correspondingly, every and all cooccur with NSRCs relatively frequently (10.40 % and 6.92 %, respectively), whereas a(n) and some rarely cooccur with NSRCs (1.18 % and 2.10 %, respectively). The denite determiner generally signals that the referent of the NP it is introducing is contextually unique that is, the listener has sufcient information from the linguistic and non-linguistic context to pick out the intended referent uniquely. But picking out a unique referent often requires specifying more information about it than is expressed by a common noun. NSRCs can remedy this: for example, there are many situations in which (11a) but not (11b) can be used to successfully refer to a particular individual. (11) a. b. the linguist I told you about the linguist

Even when the is used with plural nouns (e.g. the linguists) a contextually unique set of individuals is the intended referent. Hence the denotation of the head noun often needs to be restricted, and NSRCs are consequently relatively common. The pragmatic uniqueness associated with the denite article is very often a result of the fact that the referent of the NP introduced by the has recently been mentioned or is otherwise contextually very salient. In these cases, no restriction of the noun phrase is needed, so NSRCs would not be expected. And while the cooccurs with NSRCs at about three times the baseline rate for all
14. Other kinds of restrictive modiers such as subject-extracted relative clauses, prenominal restrictive adjectives, and postnominal PPs are also options. Whenever there is a need to restrict the reference of an NP, each of these options becomes more likely. For the current purpose, it only matters that NSRCs constitute one of these options.

Lexical variation in relativizer frequency

189

(nonpronominal) NPs, the vast majority about 94 % of NPs beginning with the have no NSRC. Certain adjectives, however, involve a uniqueness claim for the referent of NPs in which they appear, and these cooccur with NSRCs at far higher rates.15 The most frequent of these is only; others are superlatives like rst, last, and ugliest. Our arguments for the relatively high rate of cooccurrence of the with NSRCs applies equally to these adjectives. And since superlatives make sense only with respect to some scale of comparison, the reference set that the scale orders often needs to be explicitly mentioned. Consequently, it is not surprising that these words cooccur with NSRCs at a very high rate. Indeed, we noted in connection with example (6) (following Fox and Thompson 2007) that NPs containing these adjectives sometimes sound incomplete without a modifying relative clause. The dark bars in Figure 6 show that NPs with the uniqueness adjectives only and superlatives have far higher rates of cooccurrence with NSRCs than NPs with other adjectives. And, as the Predictability Hypothesis leads us to expect, the same applies to relativizer absence in those NSRCs (see the lighter bars in Figure 9).

Figure 6. Cooccurrence with NSRC and Relativizer Frequency by Adjective Type.

15. This was pointed out by Fox and Thompson (2007). As noted above, it was their discussion of this observation that led us to the Predictability Hypothesis.

190

Thomas Wasow, T. Florian Jaeger, and David M. Orr

Turning now to the head nouns, one striking fact about the ones that cooccur with NSRCs most frequently is their semantic lightness that is, nouns like thing, way, time, etc. intuitively seem exceptionally non-specic in their reference.16 Again, there is a semantic/pragmatic explanation for why semantically light nouns would cooccur with NSRCs more than nouns with more specic reference. In order to use these nouns successfully to refer to particular entities, some additional semantic content often needs to be added, and an NSRC is one way of doing this. For example, saying (12a) is less likely to result in successful communication than saying (12b): (12) a. b. The thing is broken. The thing you hung by the door is broken.

Testing this intuition requires some basis for designating a noun as semantically light. As a rough rst stab, we singled out the non-wh counterparts of the question words, who, what, where, when, how, and why. That is, we looked at how often NSRCs occur in NPs headed by person/people, thing, place, time, way, and reason, and compared the results to the occurrence of NSRCs in NPs headed by anything else. And, of course, we also compared the frequency of relativizers in those NSRCs. The results, shown in Figure 7, are as we expected,

Figure 7. Cooccurrence with NSRC and Relativizer Frequency by Head Noun Type.

16. This was noticed independently (and rst) by Fox and Thompson (2007).

Lexical variation in relativizer frequency

191

with a far higher percentage of NSRCs in the NPs headed by the light nouns and a far lower percentage of NSRCs introduced by that.

6.

Concluding Remarks

Summing up, the variation in relativizer frequency associated with particular lexical choices of determiners, prenominal adjectives, and head nouns in NPs with NSRCs can be explained in terms of two observations. First, whether a word is likely to cooccur with an NSRC depends in part on the semantics of the word and on what people tend to need to refer to. Second, the more predictable an NSRC is, the less useful a relativizer is in utterance processing. Thus, determiners, adjectives, and nouns that increase the likelihood of a following NSRC decrease the likelihood that the NSRCs following them will begin with relativizers. Our focus has been on how lexical choices inuence relativizer frequency. But many non-lexical factors are also known to be relevant. Ideally, a theory of this phenomenon would bring all of these together and explain variation in relativizer use in terms of a single generalization. One attempt at a unied account of several diverse factors inuencing relativizer frequency is Fox and Thompson (2007). They conducted a detailed analysis of a corpus of 195 NSRCs from informal speech, identifying a variety of factors that correlate with relativizer presence or absence. Adapting a suggestion from Jespersen (1933), they argue that their examples fall at different points along a continuum of monoclausality, with more monoclausal utterances being less likely to have relativizers. Among the factors contributing to monoclausality, in their sense, are semantic emptiness of the clause containing the NP that the NSRC modies (which subsumes semantic lightness of the head noun), simplicity of the head NP, and shortness of the NSRC. The idea of a one-dimensional scale combining various factors relevant to relativizer omission has obvious appeal, particularly if it can be characterized precisely. However, we have two reservations about Fox and Thompsons notion of monoclausality. First, their characterization is rather vague, and they give no independent way of assessing degree of monoclausality. Second, the terminology is confusing, since even the most monoclausal of their examples contain (at least) two clauses, in the sense that they have two verbs and two subjects. Nevertheless, we share the intuition that the contents of the two clauses in the more monoclausal examples are more closely connected. We believe that the notion of predictability might provide a precisely denable scale that can do the work of Fox and Thompsons monoclausality.

192

Thomas Wasow, T. Florian Jaeger, and David M. Orr

Predictability has the further advantages that its inuence on relativizer absence can be explained in processing terms and that it is often possible to explain why some NSRCs are more predictable than others, as we did above. Some of the utterances Fox and Thompson consider the most monoclausal are stock phrases or frequently used patterns (e.g. the way it is), which they suggest may be stored as units. Stock phrases are by denition highly predictable, so they t well with our account. Some higher-level grammatical patterns17 might not be covered by a simple, lexically-based characterization of predictability like the ones we employed. If so, it would suggest that more sophisticated metrics of predictability should be explored. In short, the Predictability Hypothesis of relativizer variation provides testable questions for future research. Next we briey mention some of them. First, we believe it is important to investigate what information speakers use to determine the predictability of an NSRC. For examples, does the grammatical function of the modied NP matter? Or do speakers only use local information to predict NSRCs (i.e. lexical properties of the NP).18 More specically it will be relevant for our understanding of predictability to see whether the factors investigated in this paper interact. In other words, do speakers use simple heuristic like the association of a particular lexical item with the likelihood of an NSRC, or do speakers compute the overall predictability of an NSRC given the combination of lexical items in the modied NP? A further question that deserves attention is whether speakers use some sources of information more than others to compute the predictability of a construction (here: NSRCs). As we have seen in Section 3 predictability information related to determiners seems to correlate much more strongly with the relativizer absence than information related to adjectives and the head noun of the modied NP. This may simply be due to the larger sample size available for the estimation of the mean for each of the words. But it is also possible that probability distributions for closed class items (like determiners) are easier to acquire or are more efcient to use, since there are fewer items in those classes. We hope future research will discover generalizations that go beyond the particular phenomenon discussed here. Ongoing research that addresses some of the above issues and investigates a related
17. We know of no clear cases of such patterns that dont have any identifying lexical items associated with them. One possible one is the X-er S1 , the Y-er S2 , as in The bigger they are, the harder they fall. But it is not clear that the two Ss (they are and they fall) should be analyzed as relative clauses here. 18. In this context, it is interesting that research on the effect of predictability on phonetic reduction (e.g., Bell et al. 2003) nds that the best measures of predictability are also the most local (i.e. bigrams).

Lexical variation in relativizer frequency

193

phenomenon, complementizer omission, is presented in Jaeger et al. (2005) and Jaeger (2006). Finally, let us return to the theme of this volume: exceptions. We have shown that the notion of exception can be generalized from hard (categorical) to soft (probabilistic) rules. We explored some soft exceptions to the optionality of relativizers in NSRC, ultimately concluding that they could be explained in terms of the interaction of the semantics of the exceptional words, the pragmatics of referring, and processing considerations. Those who question the use of gradient models in syntax might suggest that this illustrates an important difference between hard and soft generalizations, namely, that the latter reect facts about linguistic performance, not competence, and will hence always be explainable in terms of extra-grammatical factors, like efciency of communication. In contrast, they might argue, many categorical generalizations are reections of linguistic competence, and hard exceptions to them may be as well. We would respond that it is always preferable to nd external explanations that tie properties of language structure to the functions of language and to characteristics of language users. There is no basis for bifurcating linguistic phenomena a priori into those that are and those that are not amenable to external explanation. In particular, such explanations should be sought for both hard and soft exceptions. We know of no reason to believe that they will always be possible for the soft cases, but not the hard cases. References
Alan Bell, Daniel Jurafsky, Eric Fosler-Lussier, Cynthia Girand, Michelle Gregory, and Daniel Gildea 2003 Effects of disuencies, predictability, and utterance position on word form variation in English conversation. Journal of the Acoustical Society of America 113 (2): 10011024. Bresnan, Joan, Anna Cueni, Tatiana Nikitina, and Harald Baayen 2007 Predicting the dative alternation. In Cognitive Foundations of Interpretation, G. Boume, I. Kraemer and J. Zwarts (eds.), 6997. Amsterdam: Royal Netherlands Academy of Science Workshop on Cognitive Foundations of Interpretation. Bresnan, Joan, and Tatiana Nikitina 2003 On the Gradience of the Dative Alternation. Ms. Chomsky, Noam 1955/75 The Logical Structure of Linguistic Theory. Chicago: University of Chicago Press.

194

Thomas Wasow, T. Florian Jaeger, and David M. Orr

Chomsky, Noam 1957 Syntactic Structures. The Hague: Mouton. Chomsky, Noam 1962 A Transformational Approach to Syntax. Third Texas Conference on Problems of Linguistic Analysis in English, A. Hill (ed.), 124169. Austin: The University of Texas. Chomsky, Noam 1966 Topics in the Theory of Generative Grammar. The Hague: Mouton. Fowler, H. W. 1944 A Dictionary of Modern English Usage. Oxford: Oxford University Press.

Fox, Barbara A., and Sandra A. Thompson 2007 Relative clauses in English conversation: Relativizers, frequency and the notion of construction. Studies in Language 31, 293326. Hawkins, John A. 2004 Efciency and Complexity in Grammars. Oxford: Oxford University Press. Huddleston, Rodney, and Geoffrey K. Pullum 2002 The Cambridge Grammar of the English Language. Cambridge: Cambridge University Press. Jaeger, T. Florian 2005 Optional that indicates production difculty: Evidence from disuencies. Paper presented at Workshop on Disuencies in Spontaneous Speech. Aix-en-Provence. Jaeger, T. Florian 2006 Probabilistic syntactic production: Expectedness and syntactic reduction in spontaneous speech. Ph. D. diss., Stanford University. Jaeger, T. Florian, Roger Levy, Thomas Wasow, and David Orr 2005 The absence of that is predictable if a relative clause is predictable. Paper presented at conference Architectures and Mechanisms of Language Processing. Ghent. Jaeger, T. Florian, David Orr, and Thomas Wasow 2005 Comparing and combining frequency-based and locality-based accounts of complexity. Poster presented at the 18th CUNY Sentence Processing Conference. Tucson, Arizona. Jaeger, T. Florian, and Thomas Wasow 2006 Processing as a source of accessibility effects on variation. Proceedings of the 31st meeting of the Berkeley Linguistic Society, R.T. Cover and Y. Kim (eds), 169180. Ann Arbor: Sheridan.

Lexical variation in relativizer frequency Jespersen, Otto 1933

195

Essentials of English Grammar. London: Allen & Unwin.

Marcus, Mitchell P., Beatrice Santorini, Mary Ann Marcinkiewicz, and Ann Taylor 1999 Treebank III. Linguistic Data Consortium, University of Pennsylvania. Race, David, and Maryellen MacDonald 2003 The use of that in the production and comprehension of object relative clauses. Paper presented at 26th Annual Meeting of the Cognitive Science Society. Ratnaparkhi, Adwait 1997 A simple introduction to Maximum Entropy Models for Natural Language Processing. Technical Report 97-08, Institute for Research in Cognitive Science, University of Pennsylvania. Ross, John R. 1967 Constraints on variables in syntax. Ph. D. diss., MIT.

Wasow, Thomas 2002 Postverbal Behavior. Stanford: CSLI Publications.

Corpus evidence and the role of probability estimates in processing decisions Ruth Kempson

Wasow, Jaeger and Orr (WJO) address the phenomenon of exceptions from a background of increasing interest in models of language where generalizations about natural languages are made on the basis of probabilistic generalizations, rather than on categorical distinctions. What they provide is a case for a concept of gradient exceptionality, expressed in terms of what is unlikely to occur the other side of the coin from what does occur with high predictability. The example is the correlation between the predictability of a given determiner, or adjective, or noun occurring with a relative clause and the likelihood of that relative occurring without a relativizer: an expression which is likely to occur with a relative is unlikely to occur with a relativizer. In this demonstration and the consequences they draw from it, the window of focus is deliberately narrow, with subject relatives, relatives with a wh-relativizer, non-nite relatives, and pied-piping constructions all left on one side from the corpus cull they make, as displaying different idiosyncracies which detract from the primary issue of what makes the relativizer preferred or dispreferred when it is essentially optional. It is a little disappointing that the variety of relative-clause types considered is so narrow, since the distinction between restrictive and nonrestrictive relatives, one of the primary features supposedly distinguishing that- and whmarked relatives in English is, as they note, not clear-cut. Relatives with that exceptionally allow non-restrictive construals, particularly if they occur second in a sequence of relatives: (1) (2) There was that man at the party that you had introduced me to, that annoyed me enormously by his pompous posturing. Last week I bought this game pie for the party, that went bad on me before the end of the week.

* I am grateful to Jieun Kiaer, Nancy Kula and Lutz Marten for comments on this note, and the issues which the WJO paper raises.

198 (3)

Ruth Kempson

I am thinking of buying a piece of land, that I hope you like.

So, of the nite relative clauses, it is only relatizer-less relatives which require restrictive construal. Despite the restrictions on their corpus cull, WJO provide what are nonetheless fascinating tables displaying how individual determiners vary in their likelihood of co-occurrence with a that relativizer, and adjectives, and also nouns, with in the determiner class the indenite a being the determiner to occur most frequently with an accompanying that relativizer, in the noun class it is the indenite stuff which is the most common, way that is least common; and in the adjective class it is the adjective little that strikingly comes highest in the list, being an average of over eight times more likely to have a relativiser than nonsubject relatives modifying NPs with the adjective last. Some of these distributions seem more puzzling than others; however in all cases, as WJO demonstrate, there is a regular correlation between predictability in the corpus of the particular word being associated with a relative, overall phrasal predictability of relative clause modication, and predictability of the relativiser. On these gradience lists, the quantiers present perhaps the least obvious distribution, with a at the top of the list with the highest proportion of relative clauses with that, with some coming lower in the list, but nevertheless twenty-ve percent more than the numerals to occur with the relativiser. On the other hand, every and all come out bottom, with any displaying three times the proportion of that-specic relative clauses than every, and double that of all. This makes any simpleminded account of quantication based exclusively on quanticational properties seem unlikely. Based on these differential probabilities of occurrence with relative clause, WJO provide a measure of the cumulative predictability of relative clause construal of the determiner-adjective-noun sequences so collected; and from this basis, they pose the claim central to their paper, the so-called Predictability Hypothesis: the more predictable a non-subject relative, the less frequent is its co-occurrence with the relativiser that. Whatever the surprises there may be in the probability estimates associated with individual words, this is an intuitive result; and it is extremely good to see this properly quantitatively conrmed, buttressing what is otherwise no more than an intuition. The question, then, is why there should be such a strong correlation between predictability and lack of relativiser? And this is where the interest of probability-based results arises: should such correlations be explicable solely in terms of the interaction of other pragmatic, semantic, or processing-oriented considerations with probability assessements themselves playing no part in the explanation; or does the predictability itself have a role to play? The starting point for the analysis which WJO provide is their conjecture that because the

Corpus evidence and the role of probability estimates in processing decisions

199

maximal entropy classication which they used to provide the accumulated measurements of the predictability of relative clauses can be computed by standard machine-learning methods, it is plausible to assume that speakers have access to estimates of how likely a relative clause is in a given context. They go on from there to explore the composite effect of parsing and or production considerations in conjunction with the intrinsic content of the various determiners/adjectives/ nouns, and from there consider how these might in part explain the probability distributions. One such factor is that both determiners and nouns which allow anaphoric, context-dependent interpretations will not need a relative clause modier whenever they can be so identied. The other is that those nouns which are semantically light but do not allow anaphoric forms of construal almost must occur with a relative clause modier. The processing explanation they offer is that in such cases, the presence of the relativizer has relatively low functional load. In this connection, one factor which might inuence processing considerations over and above occasion-specic functionalload considerations is the effect of routinization. That is, where there is common co-occurrence of determiner/adjunctive/noun and the presence or absence of the relativizer, e.g. in the predictability of way and lack of relativizer and predictability of the indenite article a and presence of the relativizer, such cooccurrence might become stored as a routinized strategy associated with that particular item, thereby accentuating the frequency distribution results for cases at either end of the continuum (see Cann and Kempson 2008 for arguments that routinization is a force in syntactic change). WJO note with approval the Fox and Thompson observation of monoclausality of relativizer-less relatives, but without exploring any semantic analogue to this, they suggest that their account in terms of predictability might take the place of this rather vague mono-clausal notion; and they proceed to set out explanations that might conrm such a stance. However, this move is too swift. Rather than simply seeking to replace this observation altogether, the authors might have considered the semantic analogue to the Fox and Thompson observation. This is that relatives can be used either to build up a complex restrictor for a quantifying expression within a single clause, i.e. a restrictive relative clause construal, or, conversely, they can be used to provide an adjunct, independent structure, a nonrestrictive relative clause construal. Indeed, whatever the difculties of formally characterising nonrestrictive relative clause construal (see Potts 2004 for a re-analysis in terms of supplements for which he gives a conventional-implicature analysis), it is not in question that, unlike in restrictive relative clause construals, the two clauses give rise to two independent propositions, and in some analyses, the distinctiveness of the two is made explicit (Potts 2004, Kempson, Meyer-Viol and Gabbay 2001, Cann,

200

Ruth Kempson

Kempson and Marten 2005).1 This distinction is often reported to be only disambiguated by intonation. In writing, where relativizers may play the role of dening a clausal edge but cannot disambiguate between restrictive and nonrestrictive construals, it is only the lack of any such indicator that can unambiguously indicate a restrictive construal. The monoclausal observation of Fox and Thompson thus has a natural counterpart in a semantic characterisation of relative clause construal: relativizer-less relatives uniquely identify a single overall assertion, a distinctive attribute of restrictive relative-clause construal which has independent syntactic and semantic motivation. If, then, a speaker is planning a relative clause sequence indicating a restrictive construal, they may not even consider the possibility of using a form which would allow the alternative form of construal: certainly the most secure way of ensuring the appropriate construal is to select a form which precludes it, and of the nite forms, only the relativizer-less form denitively does so. Hence the more likely the form is to be associated with a restrictive form of construal, the less likely it is to be introduced with a form which allows for any other form of construal. By comparison, the move from the demonstration of the statistical correlation between probability of a relative-clause and inverse probability of the relativizer, to the assumption that calculations of probability might drive production decisions, is a leap which needs substantial independent argumentation. At the very least, there is a well-motivated alternative to be aired. There is in any case linguistic evidence from other languages which tends to favour the explanation of relativizer-less relatives in terms of denitively indicating the singleton status of the over-all propositional structure. In some languages boundary marking of structure can be made by tone. One such is Bemba. Bemba is a tone language which marks relative clauses in one of two ways, by tone or by pronominal marking. Relative clauses marked by tone alone are exclusively associated with restrictive construal and have to coincide with what is called the conjoint form of the verb, the low tone of the conjoint verb-form determining that the noun head and verb initiating the relative clause will be processed as a single prosodic unit. In consequence the construal of the relative as an integral part of the containing structure is unambiguously indicated. Relative clauses involving pronominal marking, being morphologically marked, can be construed restrictively or nonrestrictively, these construals being distin-

1. Some authors have argued that nonrestrictive relatives are presuppositional, but there are many examples to the contrary: (i) John ignored Mary, who burst into tears.

Corpus evidence and the role of probability estimates in processing decisions

201

guished by use of the conjoint form (low tone) and the disjoint verb (high tone).2 The striking aspect of the two strategies, tonal vs pronominal, is that they do not distribute in a complementary fashion. Rather, just one of those strategies provides unambiguous indication that the producer is continuing immediately with construction of a complex restrictor, i.e. a restrictive relative clause. Thus it is the use of the conjoint verb form with its low tone that forces restrictive construal in Bemba, analogous to the relativizer-less relatives of English. Such parallels from analysis of one language to another have to be treated with some caution, of course. In principle, the correlation between Bemba tone and morphologically explicit relative-clause marking might well be characterisable in terms of probability of co-occurrence. Nevertheless, the explanation of such conjoint low tone in terms of phonological indication of the mode of compositionality seems much more consistent with orthodox assumptions about how to explain encoded properties of natural language (see Cheng and Kula 2006 for independent arguments of the feeding relation between phonological marking and Bemba relative-clause structure). And this, by analogy, favours the explanation of the distribution of relativerless relatives in English in terms of their unambiguous correspondence with restrictive relative clause construal. WJO are careful to keep the Predictability Hypothesis as a claim restricted only to relative clauses of a particular type, and applied only to English. However, they end by asking questions of a much more general nature that presume the relevance of predictability weightings in the making of speakers decisions. They ask how do English speakers determine the predictability of a non-subject relative clause; and do the speakers compute over-all predictabilities or do they rather manipulate locally available heuristics of particular items? Further questions might be whether there are speed-up or conversely lengthening phenomena associated with presence or absence of complementizer choice. Are there also any correlations between how many average words follow after each determiner and whether this affects the occurrence of the relativizer? There are also more general questions. Predictability correlations are stringbased observations and not category-specic, and one might expect that if they can be manipulated constructively by speakers, they should provide a basis for
2. There are differences between object and subject marking, with morphological marking of object relatives taking several forms but with restrictions on the availability of the tonal strategy. However, all that is relevant here is that the low-tone strategy, which is the conjoint form of the verb, is invariably associated with a restrictive construal (see Cheng and Kula 2006 for details). These observations are due to Nancy Kula; and I am grateful to both her and Lutz Marten for discussing these data with me and reminding me of their relevance to this issue.

202

Ruth Kempson

explaining distributions in other cases where two options are apparently equally available. This raises fascinating new research questions. Is it the case, by analogy with these cases, that in cases where two alternative forms are possible, but one much more probable than the other, that the morphologically more marked form is less likely to be chosen, being unnecessary? One such case is structural vs prosodic indication of question-hood. Questions, incidentally like nonrestrictive relatives, are invariably marked by intonation, a para-linguistic marking characteristically recognisable early on in a parse sequence. By analogy, if such prosodic form is so reliably associated with question construal, one might expect that speakers of a language might deem it inessential to provide morphologically explicit forms of interrogative; and indeed in many languages they commonly do indeed use declarative rather than interrogative forms, relying solely on the prosody. However, as the authors are well aware, the relevance of probability results has to be treated with caution: probability of occurrence cannot in general be a guide as to whether or not a simpler form will be used. Take the case of approaching an information desk in an airport. The speaker has two ways of asking a yes-no question, either the declarative form (without auxiliary) or an inverted form with an auxiliary. Does the very fact that you are highly likely to be construed by the person at the desk as asking a question inuence your decision to present it in one form rather than another, with a tendency to choose the simpler declarative form? One might seek empirical test of this prediction, but intuition would surely suggest the answer is No. The moral to be drawn from this fascinating setting out of data and probability assignments thus seems to be two-fold. It is clear on the one hand that probability distributions over corpus evidence, if reliably replicable, provide fascinating new data which anyone facing up to the challenge of articulating grammar interfaces will be interested in mulling over. On the other hand, the conclusion that speakers manipulate probability estimates as input to the decisions as to how to say what they do would seem to be as yet premature. While there are clear probabilistic distributions to be culled from language data to great effect, providing new impetus for theoretical explanations of a subtlety most frameworks do not make provision for, it remains far from obvious that probabilistic distributions constitute part of the explanation. The test of such putative explanations will be their generalisability to explain optional distributions on a broad cross-linguistic basis.

Corpus evidence and the role of probability estimates in processing decisions

203

References
Cann, Ronnie, and Ruth Kempson 2008 Production pressures, syntactic change and the emergence of clitic pronouns. In Language in Flux, Robin Cooper and Ruth Kempson (eds.), 179220. London: College Publications. Cann, Ronnie, Ruth Kempson, and Lutz Marten 2005 The Dynamics of Language. Oxford: Elsevier. Cheng, Lisa, and Nancy C. Kula 2006 Syntactic and phonological phrasing in Bemba relatives. ZAS Papers in Linguistics 43: 31-54. Fox, Barbara A., and Sandra A. Thompson 2007 Relative clauses in English conversation: Relativizers, frequency, and the notion of construction. Studies in Language 31: 293-326. Kempson, Ruth, Wilfried Meyer-Viol, and Dov Gabbay 2001 Dynamic Syntax. The Flow of Language Understanding. Oxford: Blackwell. Potts, Christopher 2002 The Logic of Conventional Implicatures. Oxford: Oxford University Press.

Response to Kempsons comments Thomas Wasow, T. Florian Jaeger and David Orr
Kempsons interesting commentary raises two important points. First, while extolling the value of probabilistic corpus data, she is not ready to accept that speakers manipulate probability estimates as input to the decisions as to how to say what they do. Second, she suggests an alternative to our attempt to explain the correlation between predictability of non-subject relative clauses and the absence of that in such clauses. We discuss these points in reverse order and raise some additional questions for future research. Our proposed explanation of the correlation, which is admittedly somewhat programmatic, is that more predictable NSRCs are easier to produce and/or comprehend than less predictable ones, and hence do not need the extra function word. Kempsons alternative explanation is based on the fact that relative clauses without a relativizer must be interpreted as restrictive, whereas nonrestrictive construals are often possible when a relativizer is present. She suggests that relativizer omission is used as a way of disambiguating the intended construal of the relative clause. She points out that another method of disambiguation can be intonation. Since intonation is not marked in writing, her reasoning predicts that relativizer omission should be more common in writing than in speech. As we rst noted in Jaeger and Wasow (2005), this does indeed seem to be the case. NSRCs in the parsed portions of the Wall Street Journal (WSJ) and the Brown corpus (BC) are signicantly less likely to have a that relativizer (24% and 11%, respectively) than NSRCs in the parsed Switchboard corpus (SWBD: 43%; 2 = 453.0, p < 0.0001). This difference decreases but prevails even when all relativizer types are counted (WSJ: 47%, BC: 36%, SWBD: 52%; 2 = 79.8, p < 0.0001) and after other factors inuencing that are controlled for (see also Fox and Thompson 2007, who report 60% relativizer rate for NSRCs in informal conversations). Note in particular that NSRCs in the two written corpora are on average 2136% longer than NSRCs in the Switchboard. A priori, this
* This reply beneted immensely from the feedback by Harry Tily, whose challenging comments led us to entertain additional alternatives to our hypothesis that NSRC predictability drives that-omission.

206

Thomas Wasow, T. Florian Jaeger and David Orr

would suggest the opposite of the observed pattern since longer NSRCs are more likely to contain a relativizer (Race and MacDonald 2003; Jaeger 2006). The observed distributional differences between speech and written texts hence are in line with Kempsons hypothesis (see Jaeger and Wasow 2005, for an alternative explanation based on the hypothesis that relativizer mentioning is driven by production pressures). As intriguing as it is, Kempsons ambiguity avoidance hypothesis leads to a prediction that is inconsistent with the data discussed in our paper. The problem with the ambiguity account is related to the link between predictability and restrictiveness. Kempson does not discuss this link. The discussion in section 4 of our paper, on the other hand, provides a natural link between restrictiveness and predictability: when the content of an NP minus its relative clause is insufcient to pick out the intended referent, some kind of additional modier is likely to be included; an NSRC is one of the options, so the probability of an NSRC is relatively high. To be more precise, it is the probability of a restrictive NSRC that is relatively high in such contexts. After all it is restrictive NSRCs rather than non-restrictive NSRCs that serve to provide additional information necessary to identify a referent. In other words, the need for sufcient identiability inuences the distribution of restrictive NSRCs and hence is a cause for increased predictability of restrictive NSRCs in such contexts. Note that there may be other reasons why RCs are more predictable in some context than in others. Here and in our paper, we focus on increases in NSRC predictability due to the pragmatically motivated need for certain referents to be identiable. Crucially, it is not restrictiveness that causes greater NSRC predictability. If the need for identiability is one of the major factors determining NSRC predictability, this means that more predictable NSRCs are likely to be restrictive. The predictable NSRCs discussed here occur in contexts where they will naturally be interpreted as restrictive, irrespective of whether a relativizer is present. If disambiguation between restrictive and non-restrictive construals is one of the functions of relativizer omission, then we should expect omission to occur most when the possibility of a non-restrictive interpretation is greatest. By much the same reasoning that led to the prediction of more relativizer omission in writing than in speech, Kempsons disambiguation account would predict that less predictable restrictive NSRCs would have higher rates of relativizer omission. Since restrictive NSRCs in contexts that dont require further identifying information are more likely to be misconstrued as non-restrictive, speakers should be more likely to omit the relativizer to guarantee the intended (restrictive) reading. And this is of course the exact opposite of our central empirical nding.

Response to Kempsons comments

207

The point here is that the correlation between predictability of an NSRC and absence of a relativizer seems natural from a processing perspective, but not if relativizer omission is thought of as a disambiguation strategy along the lines Kempson suggests. There is at least preliminary evidence that relativizers facilitate processing. There is some debate as to whether relativizers help production or comprehension (or both). On the one hand, relativizer presence has been shown to facilitate comprehension (e.g. Race and MacDonald 2003). On the other hand, there is evidence that relativizer omission is correlated with production complexity (Jaeger and Wasow 2005; see also Ferreira and Dell 2000 for complementizers), but also that relativizers do not seem to alleviate production difculty (Jaeger 2005). While future studies are necessary to test whether speakers insert relativizer to facilitate production or comprehension, there is an established link between relativizer presence and processing (for further discussion, see Jaeger,2006; Levy and Jaeger 2007). Similarly, high-predictability of a parse can alleviate or avoid comprehension difculties (see Jurafsky 2003 for references). Thus providing relativizers for less predictable NSRCs seems like a reasonable hypothesis, although, admittedly, future work is necessary to test it. The discussion of Kempsons proposal brings up another interesting point. Our work so far does not show that there is a direct causal link between NSRC predictability and relativizer omission. Could it be that it is the need for identiability that directly causes relativizer omission? While we are not aware of any theory that would predict that this, it is a testable question that should be addressed in future research. As Harry Tily also points out to us, it would be worth investigating to what extent variance in the predictability of NSRCs is explained by the need for identiability, and to what extent other factors determine NSRC predictability. If other factors inuence NSRC predictability and if increases in NSRC predictability due to these other factors correlate with relativizer omission, this would provide strong evidence for a direct causal link between NSRC predictability and relativizer omission. Turning to the question of whether speakers manipulate probability estimates, we are puzzled why Kempson seems so reluctant to think that they do. In many other areas of cognitive science, including motor control (Trommerhuser et al. 2005), visual inference (Kersten 1999), concept learning (Tenenbaum 1999), and reasoning (Anderson 1990), there is little controversy over the fact that human information processing involves access to probabilistic distributions. Why should language be so different? Indeed, research over the past few years has revealed many cases of probabilistically-conditioned language production. For example, predictable syllables (Aylett and Turk 2004) and more predictable words (Bell et al. 2003) are pronounced shorter and with less ar-

208

Thomas Wasow, T. Florian Jaeger and David Orr

ticulatory detail. Similarly, vowels that are more predictable given the preceding segments in a word are produced shorter and with less distinct formants (van Son and Pols 2003). And cases of probabilistically-condition reduction are not limited to the phonetic level. Jaeger (2006) provides evidence that complementizer omission is correlated with predictability of a complement clause. Even phrasal omission has been linked to probabilistic distributions (see Resnik 1996 on the distribution of implicit objects as in John ate (dinner) before Mary arrived.). For a more detailed discussion of these phenomena as well as an information-theoretic account that links probabilistically-conditioned reduction to efciency and successful information transfer, see Jaeger (2006: Chapter 6). As far as we can tell, the widespread assumption that knowledge of language must consist of categorical mechanisms is a legacy of half a century dominated by grammatical theories built with tools borrowed from logic. That assumption was generally accepted for many years in part because the computations needed to develop serious quantitative models of language were infeasible with the technologies of the time. Over the past twenty years or so that has changed, and there is now a wealth of interesting results on language built with the tools of statistics and probability. This has led to a vigorous debate within linguistics over the role of probabilistic ndings in the theory of language; see, for example, Newmeyer (2003, 2006), Gahl and Garnsey (2004, 2006), and Jaeger (2007), among others. Kempson does not actually commit herself to one side or the other in this debate, but she makes it clear which side she thinks bears the burden of proof. There are, however, other passages in her comments in which her prose suggests just the opposite. One example is her suggestion that a correlation between a lexical item and either presence or absence of relativizers might become stored as a routinized strategy associated with that particular item. The examples she gives (way is associated with relativizer absence and a with relativizer presence) are not categorical constraints, as our corpus studies demonstrate; so the stored strategies she posits would have to be probabilistic.1 Similarly, in arguing for the role of restrictiveness in the correlation between predictability and relativizer omission, she writes the following:
the more likely the form is to be associated with a restrictive form of construal, the less likely it is to be introduced with a form which allows for any other form of construal.

1. Incidentally, the discussion in Jaeger (2006: Chapter 6.2.3) contains control studies suggesting that the effect of predictability on relativizer omission holds beyond a few conventionalized tokens.

Response to Kempsons comments

209

This is a manifestly probabilistic claim.2 Since relativizer absence categorically entails restrictiveness, she needs a probabilistic formulation in order to avoid the false prediction that all restrictive NSRCs lack relativizers. But if speakers do not manipulate probability estimates, Kempson needs to explain where the non-categorical nature of this correlation comes from. We hasten to add that we are in broad agreement with much of what Kempson says. She is quite right that the focus of our paper is narrow, and that much is to be learned from broader investigations looking at the effects of predictability in a wider range of constructions and languages (for examples of such work, see Jaeger 2006 on English complementizer omission; Jaeger 2007 on reduced subject relatives so-called whiz-deletion; and ongoing work on relativizer omission in Danish). We also agree that restrictiveness may be an important factor in relative clause structure. In both connections, her discussion of Bemba is fascinating and illuminating.

References
Anderson, John Robert 1990 The Adaptive Character of Thought. Hillsdale, NJ: Lawrence Erlbaum. Aylett, Matthew, and Alice Turk 2004 The Smooth Signal Redundancy Hypothesis: A functional explanation for relationships between redundancy, prosodic prominence, and duration in spontaneous speech. Language and Speech 47: 3156.

2. A clarication may be in order. We use the term probabilistic to refer to events that are conditioned by a probability of another event. This use probabilistic is different from its use in, for example, the Stochastic OT literature, where an event is called probabilistic when it occurs with a certain probability. In the latter sense, the claim that relativizer omission is probabilistic is almost trivially true. Even if all variation in relativizer omission were determined by absolutely categorical contrasts which is extremely unlikely given that the same speaker will sometimes say the same sentence with and sometimes without a relativizer the resulting distribution would still be binomial (with p being either 0 or 1, depending on the context). Our question here is different. Our specic hypothesis is that it is the probability of an RC that inuences relativizer omission. But we can ask more generally whether probabilities are part of the predictors of relativizer omission (cf. Jaeger 2006, 2007).

210

Thomas Wasow, T. Florian Jaeger and David Orr

Bell, Alan, Daniel Jurafsky, Eric Fosler-Lussier, Cynthia Girand, Michelle Gregory and Daniel Gildea 2003 Effects of disuencies, predictability, and utterance position on word form variation in English conversation. Journal of the Acoustical Society of America 113: 10011024. Ferreira, Victor S., and Gary S. Dell 2000 The effect of ambiguity and lexical availability on syntactic and lexical production. Cognitive Psychology 40, 296340. Fox, Barbara A., and Sandra A. Thompson 2007 Relative clauses in English conversation. Relativizers, frequency, and the notion of construction. Studies in Language 31: 293326. Gahl, Susanne, and Susan Garnsey 2004 Knowledge of grammar, knowledge of usage: Syntactic probabilities affect pronunciation variation. Language 80: 748775. Gahl, Susanne, and Susan Garnsey 2006 Knowledge of grammar includes knowledge of syntactic probabilities. Language 82: 405410. Jaeger, T. Florian 2006 Redundancy and Syntactic Reduction in Spontaneous Speech. Stanford University dissertation. Jaeger, T. Florian 2007 Usage or grammar? Comprehension and production share access to same probabilities. Paper presented at the 81st Annual Meeting of Linguistic Society of America (LSA), Anaheim. Jaeger, T. Florian, Roger Levy, Thomas Wasow, and David Orr 2005 The absence of that is predictable if a relative clause is predictable. Paper presented at Architectures and Mechanisms of Language Processing conference. Ghent. Jaeger, T. Florian, and Thomas Wasow 2005 Production-complexity driven variation: Relativizer omission in nonsubject-extracted relative clauses. Paper presented at the 18th CUNY Sentence Processing Conference, Tucson, AZ. Kersten, Daniel 1999 High-level vision as statistical inference. In The New Cognitive Neurosciences, 2nd ed., Michael S. Gazzaniga (ed.), 353364. Cambridge, MA: MIT Press.

Levy, Roger, and T. Florian Jaeger 2007 Speakers optimize information density through syntactic reduction. In Advances in Neural Information Processing Systems (NIPS) 19, B. Schlkopf, J. Platt, & T. Hoffman, 849856. Cambridge, MA: MIT Press.

Response to Kempsons comments

211

Newmeyer, Frederick J. 2003 Grammar is grammar and usage is usage. Language 79: 682707. Newmeyer, Frederick J. 2006 On Gahl and Garnsey on usage and grammar. Language 82: 399404. Race, David, and Maryellen MacDonald 2003 The use of that in the production and comprehension of object relative clauses. Paper presented at the 26th Annual Meeting of the Cognitive Science Society. Resnik, Philip 1996 Selectional constraints: An information-theoretic model and its computational realization. Cognition 61, 127159.

Tenenbaum, Joshua B. 1999 Bayesian modeling of human concept learning. In Advances in Neural Information Processing Systems (NIPS) 11, M.S. Kearns, S.A. Solla and D.A. Cohn (eds.). Cambridge, MA: MIT Press. Trommershuser, Julia, Sergei Gepshtein, Laurence T. Maloney, Michael S. Landy,and Martin S. Banks 2005 Optimal compensation for changes in task-relevant movement Variability. The Journal of Neuroscience 25(31): 71697178. van Son, Rob J.J.H., and Louis C.W. Pols 2003 How efcient is speech? Proceedings of the Institute of Phonetic Sciences 25, 171184.

Structured exceptions and case selection in Insular Scandinavian Jhannes Gsli Jnsson and Thrhallur Eythrsson

Abstract. The diachronic development of case selection in Insular Scandinavian (Icelandic and Faroese) provides strong support for a dichotomy of structured exceptions, which display partial productivity, and arbitrary exceptions, which are totally unproductive. Focusing on two kinds of exceptional case, we argue that verbs taking accusative experiencer subjects form a similarity cluster on the basis of shared lexical semantic properties, thus enabling new lexical items to be attracted to the cluster. By contrast, verbs taking genitive objects have no common semantic properties that could be the source of partial productivity.

1.

Introduction

The syntax of natural languages is characterized by general mechanisms that operate independently of particular lexical items and enable the speaker to produce and understand an innite number of sentences. Thus, it is fair to say that syntax, more than any other component of grammar, illustrates the regular and creative aspect of language. Still, syntax is not entirely free of irregularities, especially in the domain of argument realization. To take one example, the fact that envy can have two objects in English (e.g. I envy you your good looks) is an exception to the generalization that only verbs denoting transfer of some kind can be ditransitive in English (see Goldberg 1995: 131132 for relevant discussion).

* The work reported on here was funded in part by a three-year research grant from Ranns The Icelandic Centre for Research during 200406, which is gratefully acknowledged. We wish to thank Heimir Freyr Viarsson, our research assistant at the University of Iceland, for his invaluable assistance in preparing this paper. We would also like to thank an anonymous reviewer and the editors for useful comments, and the latter in particular for their patience. The authors bear a joint responsibility for the paper, but divided their labor in such a way that Jhannes Gsli largely took care of the Icelandic part and Thrhallur of the Faroese part.

214

Jhannes Gsli Jnsson and Thrhallur Eythrsson

In this paper we argue that exceptions to general patterns of argument realization are of two kinds. First, there are exceptions that are stored in the lexicon without any associative links between them, i.e. links which make it easier for speakers to memorize the exceptions. These can be referred to as arbitrary exceptions as they are based on an arbitrary list of lexical items. Second, there are exceptions which involve clustering of lexical items on the basis of shared semantic properties. These can be called structured exceptions and they display partial productivity in contrast to arbitrary exceptions. Thus, arbitrary exceptions are totally unproductive whereas structured exceptions can be extended to new lexical items, provided that these exceptions have sufcient token frequency.1 As we illustrate below, the diachronic development of case selection in Insular Scandinavian (Icelandic and Faroese) provides strong support for the proposed dichotomy between structured and arbitrary exceptions. The discussion will focus on two kinds of exceptional case selection, accusative subjects and genitive objects. It will be shown that accusative subjects, especially experiencer subjects, have been semi-productive in the history of Insular Scandinavian whereas genitive objects have been completely unproductive. To account for this difference, we argue that verbs with accusative experiencer subjects form a similarity cluster on the basis of shared lexical semantic properties. This enables new lexical items to be attracted to the cluster. By contrast, verbs with genitive objects are a disparate group with no common semantic properties that could be the source of partial productivity. Exceptional but semi-productive classes are probably best known in inectional morphology. For instance, the class of strong verbs in English exemplied by cling/clung has been shown to be productive in experiments where speakers are asked to produce past tense forms of nonce verbs (Bybee and Slobin 1982, and Bybee and Moder 1983). This class has also attracted some new members, e.g. the originally weak verbs dig, ing and string (Jespersen 1942: 4953), despite the sharp reduction in the overall number of strong verbs in the history of English. Bybee and Moder (1983) argue that the productivity of the cling/clung class is based on the phonetic similarity between the verbs in this class. They claim that cling/clung verbs are organized by family resemblance around a prototypical member with a velar nasal word-nally; thus, there is no rule at work here since there is no single phonetic feature that all these verbs have in common. Moreover, many verbs that are phonetically similar to
1. Our use of the term new lexical item in this context lumps together verbs that are truly new in the language as well as verbs that are attested in Old Icelandic but with a different case frame.

Structured exceptions and case selection in Insular Scandinavian

215

the cling/clung verbs have a different inection (e.g. rig, bring and ring in the sense encircle, put a ring on). Clearly, our claims about exceptional and semiproductive case in Insular Scandinavian are similar in spirit to this proposal although we will not make any use of prototype theory.2 The paper is organized as follows. In section 2, we provide some background information on the Icelandic case system. Section 3 discusses the decline of genitive objects in the history of Icelandic. The diachronic development of accusative subjects is discussed in section 4, where it is shown that accusative case has been extended to the subjects of some new verbs. Comparative data from Faroese are presented in section 5 and shown to follow the Icelandic pattern discussed in sections 3 and 4. Finally, some concluding remarks are offered in section 6. 2. The case system of Icelandic

There are two basic types of case in Icelandic: structural case and lexical case. Structural case is determined by syntactic position whereas lexical case is selected by particular lexical items. The main evidence for this dichotomy comes from the fact that lexical case is preserved in passives and ECM-innitives but structural case is not (see e.g. Zaenen, Maling and Thrinsson 1985). Using these diagnostics, nominative subjects and accusative objects represent structural case whereas oblique subjects and dative and genitive objects exemplify lexical case. Nominative is by far the most common subject case in Icelandic, as illustrated in (1). However, numerous non-agentive verbs take oblique subjects, as in (2). The verb langa want, for example, selects an accusative subject and leiast be bored takes a dative subject and a nominative object. (1) a. Nemendurnir lsu bkina. the.students-NOM read-3.PL the.book-ACC The students read the book. Vi hjlpuum ngrnnunum. we-NOM helped-1.PL the.neighbours-DAT We helped the neighbours. Fairinn saknar barnanna. the.father-NOM misses the.children-GEN The father misses the children.

b.

c.

2. We have also been inuenced by Pinkers (1999) discussion of strong verbs in English which, in turn, draws on ideas from connectionist psychology.

216 (2)

Jhannes Gsli Jnsson and Thrhallur Eythrsson

a.

b.

Mig langar a fara. me-ACC wants to go I want to go. Sumum leiist essi hvai. some-DAT bores this-NOM noise-NOM Some people are tired of this noise.

Nominative subjects trigger number and person agreement with the nite verb but oblique subjects do not. Apart from this difference, oblique subjects behave syntactically very much like nominative subjects in Icelandic (see Zaenen, Maling and Thrinsson 1985, Sigursson 1989: 204209, and Jnsson 1996: 110 119 among others). Accusative is clearly the most common object case in Icelandic, but many verbs take dative objects, e.g. hjlpa help, as in (1b). Only a handful of verbs select genitive objects (see Appendix for a list), including sakna miss, as shown in (1c). Nominative objects occur almost exclusively with two-place verbs taking dative subjects, such as leiast be bored, as in (2b). Lexical case in Icelandic is semantically predictable in some instances and this is most evident with dative indirect objects (see Yip, Maling and Jackendoff 1987, Jnsson 2000, and Maling 2002). It can also be argued that dative case with experiencer subjects is largely predictable from lexical semantics (Jnsson 199798, 2003). The focus of this paper is on lexical case that is idiosyncratically associated with particular lexical items and therefore has to be learned on an item-to-item basis. It is impossible, for example, to predict the accusative subject with langa want or the genitive object with sakna miss from the lexical semantics of these verbs. Hence, it must be specied in the lexical entries of these verbs that they select an accusative subject and a genitive object, respectively.3 Still, as we will show in this paper, the semantic similarity between verbs taking accusative experiencer subjects has enabled these verbs to display some productivity in the history of Icelandic. This means that the productivity of a particular case frame does not require semantic (or syntactic) predictability. On the other hand, the status of accusative experiencer subjects in current-day Icelandic is quite weak as there is a strong tendency to replace them by dative

3. We will not concern ourselves here with the issue of how idiosyncratic case arises diachronically. For interesting discussion on irregularization in inectional morphology, see Nbling (this volume).

Structured exceptions and case selection in Insular Scandinavian

217

subjects, a phenomenon often referred to as Dative Sickness or Dative Substitution.4 Contrast the example in (3) with the one in (2a): (3) Mr langar a fara. me-DAT wants to go I want to go.

It has been shown that idiosyncratic case is acquired rather late in Icelandic (Sigurardttir 2002 and Bjrgvinsdttir 2003) and for some speakers it may never be acquired with certain verbs. For example, a child that fails to acquire the standard accusative case with langa want during the critical period of language acquisition is likely to use a dative subject instead, as in (3). Thus, Dative Substitution is an ongoing diachronic change that results from the unsuccessful transmission of a grammar from one generation of speakers to the next. We will assume that the loss of genitive objects in the history of Icelandic also has its roots in language acquisition but lack of historic data makes it nearly impossible to argue for this on empirical grounds. 3. Genitive objects

3.1. Old Icelandic The number of verbs taking genitive objects has been signicantly reduced in the history of Icelandic, from about 100 verbs in Old Icelandic to about 30 in Modern Icelandic (see Appendix).5 With many of these verbs, the genitive object has been replaced by a PP or an object bearing a different case. In some cases, the verb has simply become obsolete, at least in the use where a genitive object was possible. We have not systematically investigated all the genitive object verbs in Old Iceland but we suspect that frequency is the most important factor in explaining why some of these verbs have survived but others have not. To judge by the textual sources, many of the genitive object verbs were already quite rare in Old Icelandic. Genitive objects were also losing ground
4. There is also a tendency to substitute nominative for oblique case on theme/patient subjects in Modern Icelandic so called Nominative Substitution (see Eythrsson 2002, Jnsson 2003, and Jnsson and Eythrsson 2003, 2005 and references cited there). 5. The loss of genitive objects is well-known from other Germanic languages (see e.g. Delsing 1991 on Swedish, Allen 1995: 217219 on English, and Donhauser 1998 on German).

218

Jhannes Gsli Jnsson and Thrhallur Eythrsson

in that some verbs could occur with other cases or PPs instead of the genitive (see Nygaard 1906: 142148 for examples). One example is the verb missa miss, lose which alternated between genitive and accusative object in Old Icelandic. In Modern Icelandic the object must be accusative, except for a few idiomatic phrases which preserve genitive, e.g. missa marks be to no avail (literally miss the target), missa sjnar lose sight of and missa ftanna trip (literally lose the feet). Verbs with genitive objects in Old Icelandic can be divided into ve syntactic classes, depending on the number and case marking of other arguments of the verb. The three biggest classes are exemplied in (4):6 (4) a. Nominative subject + Genitive object (NG-verbs) sgerur var eftir og gtti bs eirra. sgerur was then left and guarded farm-GEN their sgerur then stayed behind and looked after their farm. (Egils saga, p. 455) Nominative subject + Accusative object + Genitive object (NAGverbs) orgeir latti hann utanferar. orgeir discouraged him-ACC travel.abroad-GEN orgeir discouraged him to go abroad. (Finnboga saga ramma, p. 651) Nominative subject + Dative object + Genitive object (NDG-verbs) mls. Hann kvast ekki varna mundu henni he said not prevent would her-DAT speech-GEN He said he would not prevent her from speaking. (Brennu-Njls saga, p. 160)

b.

c.

There was also a small class of verbs taking an accusative experiencer subject + genitive object (fylla become full of, fsa want, girna desire, minna (seem to) remember, vara expect, vilna expect, hope, and vnta/vtta expect), and an even smaller class of verbs with a dative experiencer subject + genitive object (batna get better, btast recover from, f suffer, ltta recover from and lj get). This is exemplied in (5):

6. Most of the examples from Old Icelandic in this paper are cited from editions using Modern Icelandic spelling (see bibliography) but they have all been checked for authenticity against critical editions or manuscripts.

Structured exceptions and case selection in Insular Scandinavian

219

(5)

a.

b.

Accusative subject + Genitive object (AG-verbs) ess minnir mig a mundir koma. it-GEN remembers me-ACC that you would then come I seem to remember that you would come then. (Heiarvga saga, p. 1391) Dative subject + genitive object (DG-verbs) uri batnai sttarinnar. urur-DAT improved the.illness-GEN urur recovered from her illness. (Eyrbyggja saga, p. 608)

Most of the AG-verbs and DG-verbs have either become obsolete (in the relevant use) or have ceased to select genitive case. For instance, the DG-verb batna get better can only take a nominative object in Modern Icelandic.7 The only verb that still takes a genitive object is vnta expect, but the subject case has shifted to nominative. However, four of the AG-verbs still select accusative subjects (i.e. fylla become full of (water), fsa want, minna (seem to) remember and vara expect). This is consistent with the major empirical claim of this paper that accusative subjects have been more resistent to diachronic change than genitive objects in the history of Icelandic. The historical decline of genitive objects can also be seen with NAG-verbs. Members of this class in Old Icelandic include the following verbs:8 (6) beia request, bija request, dylja hide, eggja incite, rna blame, fregna ask, frtta ask, fylla ll with, fyrirkunna blame for, fsa incite, krefja demand, kveja demand, letja discourage, minna remind, saka accuse of, spyrja ask, sa incite

Of all these verbs, the only ones that are still regularly used with genitive objects are bija request, krefja demand and spyrja ask. This is exemplied in (7):

7. Bardal (2001: 197198) claims that DG-verbs disappeared in Icelandic because of their low type frequency. An alternative explanation is that the token frequency of genitive objects with each DG-verb was simply too low for the genitive to be successfully acquired. 8. As already mentioned, fylla ll with, fsa incite and minna remind also occur as AG-verbs in Old Icelandic.

220 (7)

Jhannes Gsli Jnsson and Thrhallur Eythrsson

g arf rugglega a bija ig einhvers I need surely to ask you-ACC something-GEN morgun. tomorrow I will surely need to ask you to do something tomorrow.

With the other NAG-verbs, the genitive has been replaced by PPs, unless the verb has become obsolete or lost the sense indicated in (6). One example of this is minna remind. In Old Icelandic this verb could be used either with a genitive object, as in (8a), or a PP complement, but in Modern Icelandic it only takes a PP, as in (8b): (8) a. Hn her minnt mik eirra hluta. she has reminded me-ACC those-GEN things-GEN (Fornmannasgur.i.3 ) Hn hefur minnt mig hluti. she has reminded me-ACC of those-ACC things-ACC She has reminded me of those things.

b.

With some verbs, the genitive has become more or less restricted to formal registers in Modern Icelandic, e.g. ba wait for. For example, while the PPvariant in (9b) sounds very natural in all kinds of registers, the genitive variant in (9a) is clearly rather formal. This is very different from Old Icelandic where the genitive was the norm and the PP-variant was extremely rare. (9) a. b. Margir biu Jns. many waited John-GEN Margir biu eftir Jni. many waited for John-DAT Many waited for John

The story of genitive objects in Icelandic is a story of continuous loss and no gain. In fact, we are aware of only one verb with a genitive object that has been added to the vocabulary of Icelandic in the last centuries.9 This is the verb ska wish, a variant of skja wish which takes a genitive object and is already attested in Old Icelandic. Moreover, we only know of one example
9. The verbs ira repent and krefjast demand are attested with a genitive object in Modern Icelandic but not in Old Icelandic. Still, they can hardly be seen as new additions since the variants irast repent (with the sufx -st) and krefja demand (without the sufx -st) are attested with a genitive object in Old Icelandic.

Structured exceptions and case selection in Insular Scandinavian

221

where a genitive object has become a variant alongside an original accusative or dative object. This happened in the Eastern fjords of Iceland with the verb nenna bother where genitive replaced dative.10 We suspect that this development has its roots in phonology: The most common dative object with nenna is essu this, which becomes homophonous with the genitive form ess this if the nal vowel is elided. Deletion of the nal vowel takes place regularly before a word that begins with a vowel, e.g. before the negation ekki not which quite often follows the verb nenna. Thus, the change from dative to genitive with nenna seems to be a case of reanalysis triggered by phonological neutralization of the contrast between dative and genitive. 3.2. Analysis The facts discussed above show that genitive objects have been extremely unproductive in the history of Icelandic. This has manifested itself in the following ways: (10) a. b. c. The number of verbs taking genitive objects has sharply declined from the Old Icelandic period. There are hardly any cases where genitive has become a variant with verbs originally taking accusative or dative objects. Virtually no new verbs with genitive objects have been added to the Icelandic lexicon since the Old Icelandic period.

We claim that this lack of productivity is because verbs selecting genitive objects in Old (and Modern) Icelandic do not form any semantically coherent subclasses.11 In other words, they are arbitrary exceptions to general case selection rules in Icelandic. Thus, Nygaard (1906: 142148), in his classic syntax of Old Icelandic, is at pains to classify verbs with genitive objects, presenting no fewer than eight subclasses and most of them are neither well-dened nor coherent. One may wonder about the NAG-verbs listed in (6) which form a reasonably coherent class as most of these verbs denote communication which the referent of the indirect object is expected to respond to in some way. The problem with this class may be that the genitive is not common enough to provide a real basis
10. Using ta sn hurry with a genitive (literally hurry self-GEN) instead of the standard ta sr with a dative is also a known dialectal feature of the Eastern fjords of Iceland. However, little is known about the details of this change; it awaits further investigation. 11. Verbs with genitive objects are all non-telic, i.e. they denote events that do not have a natural endpoint. Still, that property does not distinguish them from other transitive verbs as many non-telic verbs take accusative or dative objects.

222

Jhannes Gsli Jnsson and Thrhallur Eythrsson

for productivity. To take one example, the verb eggja incite only occurs with genitive objects that denote some kind of action or undertaking (e.g. atganga attack, verk deed, fer trip or tganga walking out), and even these kinds of noun phrases are often expressed as objects of the preposition til to when they are used with eggja.12 Being arbitrary exceptions, verbs with genitive objects are stored in the lexicon without any associative links between them. Such links between lexical items make them easier to memorize and therefore more learnable and less likely to undergo diachronic change. However, this may not be sufcient to explain why verbs with genitive objects have failed to attract new members. We seem to need the additional assumption that new verbs entering the language always follow some general pattern with respect to case selection. Thus, a new verb cannot easily be analogized to an established verb on the basis of semantic similarity between the two verbs, a phenomenon referred to as isolate attraction by Bardal (2001). This can be seen with the verb passa guard, take care of, an 18th century borrowing from Danish. Since this verb is more or less synonymous with the NG-verb gta guard, take care of it looks like a good candidate for isolate attraction. Still, despite its obvious afnity with gta, the verb passa has occured with an accusative object from its earliest attestation. The accusative with passa represents the default object case and this option seems to be chosen whenever there is no semantic class of dative verbs that the new verb could be attracted to. However, a new verb may vary between accusative and dative object if the semantic basis for dative case is unclear. A case in point is the loan verb transportera transport which is possible both with a dative and an accusative object. Arguably, the dative is chosen by those speakers who feel that transportera belongs semantically to the class of motion verbs selecting a dative object (e.g. hrinda push, kasta throw, lyfta lift, sveia swing and ta push; see Maling 2002 and Svenonius 2002 for a discussion of these verbs). By contrast, speakers who do not share this intuition will opt for the default accusative. The latter choice may be inuenced by the fact that the verb ytja move, transport, taking an accusative object, is semantically closer to transportera than any other verb. However, this would not necessarily be a case of isolate attraction as the existence of ytja might simply prevent speakers from placing transportera in the same class as motion verbs with dative objects.

12. An informal count in the electronic corpus of Old Icelandic texts at http://www. lexis.hi.is (Textasafn Orabkar Hsklans) reveals that genitive occurs with about 12% of the examples of eggja.

Structured exceptions and case selection in Insular Scandinavian

223

4.

Accusative subjects

Verbs with accusative subjects divide into two main types semantically: experiencer verbs and verbs taking theme/patient subjects. These classes will be discussed separately in 4.1 and 4.2 below since there are important differences between them in terms of their diachronic development. 4.1. Verbs with experiencer subjects For convenience, all the verbs with accusative experiencer subjects in Old Icelandic are shown in (11) with some preliminary semantic subclassication. This list, as the list in (18) below, is quite extensive as it includes all verbs that are attested with an accusative subject in a critical edition of Old Icelandic texts. Note also that some of these verbs could occur with a nominative or a dative subject in Old Icelandic (see Viarsson 2006 for a detailed discussion): (11) Verbs with an accusative experiencer subject in Old Icelandic: a. Verbs of physical discomfort: hungra be hungry, kala suffer frostbites, saka be hurt, skaa be hurt, stinga feel pain, sundla become dizzy, svimra become dizzy, syfja become sleepy, velgja feel nausea, verkja/virkja ache, yrsta feel thirsty b. Verbs of lacking: bila fail, lack, bresta run out of, nausynja need, skorta lack, vanta lack, need, rota/rjta lack, run out of c. Verbs denoting feelings: angra grieve, ngja become happy, forvitna be curious, fsa want, girna desire, harma vex, heimta want, hryggja grieve, langa want, lysta desire, want, muna want, gleja become unhappy, tta fear, slgja til want (to have), ta want, trega grieve (over), ugga fear, undra be surprised, vilna hope, funda envy d. Verbs of cognition: dreyma dream, greina um disagree on, gruna suspect, minna remember vaguely, misminna remember wrongly, vara expect, vna expect, vnta/vtta expect e. Verbs with affected experiencers: henda happen to, concern, kosta cost, skipta matter to, tma happen to, vara concern

Some representative examples from the rst three classes are provided in (12):

224 (12)

Jhannes Gsli Jnsson and Thrhallur Eythrsson

a.

b.

c.

og mun ig ekki saka. and will you-ACC not be.hurt and you will be all right (Vga-Glms saga, p. 1926) mig og flk mitt skortir aldrei mat. me-ACC and people-ACC my-ACC lacks never food Me and my people never run out of food (Bandamanna saga, p. 21) en ekki slgir mig hr til langvista. but not wants me-ACC here for long.stay But I am not tempted to stay here for a long time (Grettis saga, p. 1094)

As the overview in (11) illustrates, these verb classes are fairly coherent semantically. This can be seen e.g. in the number of verbs having roughly the same meaning, e.g. the verbs in (11b) and all the verbs glossed as want in (11c). This raises the question if these verbs really are exceptional rather than following a rule linking accusative subjects and verbs meaning want. We doubt that there can be such a narrow lexical rule, because it would hardly be learnable. In any case, this hypothetical rule would not apply to the verb vilja want which takes a nominative subject in Old Icelandic as well as in Modern Icelandic. Thus, we conclude that verbs with accusative experiencer subjects are structured exceptions, i.e. they are not stored as isolated items in the lexicon but linked via shared lexical semantic properties. Therefore, it is unsurprising that these verbs have displayed some productivity in the history of Icelandic. First, among the many new verbs that have been added to the Icelandic lexicon since the Old Icelandic period there are some that take accusative experiencer subjects, e.g. hrylla vi be horried by, ra fyrir dream of and rma have a vague recollection of: (13) a. Nemendurna hryllir vi essari tilhugsun. the.students-ACC horries at this thought The students are horried by the thought of this. Engan hefi geta ra fyrir essu. nobody-ACC had could dreamed for this Nobody could have dreamed of this. Mig rmar a hafa hitt hann einu sinni. me-ACC recollects in to have met him one time I have a vague recollection of having met him once.

b.

c.

Structured exceptions and case selection in Insular Scandinavian

225

None of these verbs is attested in Old Icelandic but the oldest examples that we have found of hrylla vi and ra fyrir are from the 17th century and the oldest example of rma is from the 19th century.13 Second, we know of one loan verb taking an accusative subject, the verb ske happen. According to skarsson (19971998), the oldest example of this verb is from the end of 14th century. In that particular example, and many later ones, the verb takes an accusative experiencer subject. The following example is from the middle of the 17th century: (14) eins og mig hafi ske fyrir tta rum. as me-ACC had happened for eight years As had happened to me eight years earlier (Pslarsaga sra Jns Magnssonar, p. 60)

The youngest example of an accusative subject with ske that we have found is from the middle of the 19th century. In Modern Icelandic, the affected experiencer is always expressed by a PP with the preposition fyrir for (if it is expressed at all). Third, accusative has become a variant with some experiencer verbs. With the verbs hlakka til look forward to and kva fyrir dread, be anxious about, both accusative and dative as well as the original nominative are possible subject cases in Modern Icelandic. This is shown for kva fyrir in (15): (15) a. b. c. Hn kvei fyrir prfunum. she-NOM was.anxious for the.exams Hana kvei fyrir prfunum. her-ACC was.anxious for the.exams Henni kvei fyrir prfunum. her-DAT was.anxious for the.exams She was anxious about the exams.

Fourth, accusative subject used to be possible with the verbs vona hope (Old Icelandic vna) and skynja sense, but nominative is the original subject case with these verbs and the only possible subject case in Modern Icelandic.14

13. These examples are in the electronic corpus of Icelandic texts at http://www.lexis.hi.is (Ritmlssafn Orabkar Hsklans (ROH)). 14. Interestingly, there are many examples of a genitive object with vona in ROH, mostly dating from the 19th century. It seems that the verb is used in the sense expect in these examples, similar to the genitive object verb vnta expect.

226 (16)

Jhannes Gsli Jnsson and Thrhallur Eythrsson

a.

b.

vonar mig a r smmsaman fjlgi. then hopes me-ACC that they gradually increase Then I hope that they increase in number. (Alingistindi 1859,466; ROH) mig skiniar ecki sannara en seigi. me-ACC senses not truer than say I do not sense more truthfully than I say. (GAndrDeil, 45; ROH)

Our nal example here is klja itch which only takes a dative subject in Old Icelandic (17a) but also occurs with an accusative subject in Modern Icelandic (17b): (17) a. v a mr kljar ar mjg. since me-ACC itches there much (Sturlunga saga, p. 560) v a mig kljar ar mjg. since me-ACC itches there much because I am itching there so much.

b.

The examples in (13)(17) illustrate the partial productivity of verbs with accusative experiencer subjects in the history of Icelandic. However, it seems that this class has become unproductive in present-day Icelandic. This can be seen most clearly in Dative Substitution, as in (3) above, which rst became common in the middle of the 19th century and is widespread in present-day Icelandic. The overall number of verbs with accusative experiencer subjects has also declined somewhat since the Old Icelandic period. The only clear sign of productivity of this class in present-day Icelandic is the occasional use of accusative for the regular nominative with the verbs nna til feel pain and kenna til feel pain (literally feel to). Using accusative for nominative with these verbs is not unexpected as many verbs denoting physical discomfort take accusative experiencer subjects (see (11a)). 4.2. Verbs with theme/patient subjects Verbs with accusative theme/patient subjects in Old Icelandic can be divided into three semantic classes. As shown in (18), the class of verbs denoting change of state is by far the biggest:

Structured exceptions and case selection in Insular Scandinavian

227

(18)

Verbs with theme/patient subjects in Old Icelandic a. Motion verbs: bera carry, draga pull, hefja raise; begin, kefja sink, keyra drive, reia move about, reka drift, velkja be tossed about, vkja be moved to one side b. Verbs denoting change of state: belgja blow out, birta become clear, blsa swell, blow, brjta break, brydda arise, byrja begin, daga uppi dawn up (turn to stone), deila divide, drepa be knocked down, dkkva darken, enda nish, endurnja renew, eya be destroyed, fenna be covered with snow, festa fasten, fjara ebb, fjlga increase, frjsa freeze, fylla ll, gera become, grynna become shallow, hera become hard, knta/hnta become crooked, kreppa become crippled, kvelda become evening, kyrra calm, leggja become covered with ice, leia af follow from, lengja lengthen, leysa dissolve, la come to an end, lsa shine, lgja lower, minnka decrease, ntta be overtaken by the night, opna open, mtta lose strength, nta become unusable, rifna tear, rjfa split, ryja disperse, rma become wider, rsa come true, setja become, skemma become short, skera cut, skilja divide, slta cut, stemma be obstructed, stkka become bigger, stra swell, skja be affected by, taka be taken, vatna disappear in water, vekja upp awaken, verpa be thrown, vgja suppurate, rngva diminish, ynna become thin, sa be stirred c. Stative verbs: ba exist, f be available, geta exist, hafa t blow through, heyra be heard, sj be seen, skara protrude, sna be seen

Representative examples from all these three classes are provided in (19) below. Note that the singular agreement on the verb in (19b) is crucial in showing that the subject is accusative rather than nominative. (19) a. velkti lengi ti ha. them-ACC tossed long out in ocean They were in rough seas for a long time. (Eirks saga raua, p. 526) Fraus a honum klin ll. froze-3.SG at him the.clothes-ACC all-ACC All his clothes froze to his body. (Finnboga saga ramma, p. 635)

b.

228

Jhannes Gsli Jnsson and Thrhallur Eythrsson

c.

svo a gerla s veguna so that unclearly saw the.roads-ACC So that it was difcult to see the roads. (Egils saga, p. 478)

Verbs with accusative theme/patient subjects have shown very limited productivity in the history of Icelandic. In fact, of the 77 verbs listed in (18), only about 1015 are still regularly used with an accusative subject in Modern Icelandic and even these verbs are increasingly used with nominative subjects. This number includes none of the stative verbs and only two motion verbs, bera carry and reka drift. Moreover, we know of only one verb where accusative seems to have replaced nominative case with theme/patient subjects. This is the verb drfa a come ocking exemplied below where Old Icelandic (20a) is contrasted with Modern Icelandic (20b): (20) a. Afangadag jla drfa okkarnir a bnum. Christmas Eve ock the.bands-NOM to the.farm (Svarfdla saga, p. 1788) Afangadag jla drfur okkana a bnum. Christmas Eve ocks the.bands-ACC to the.farm On Christmas Eve the bands come ocking to the farm.

b.

It is also worth noting that accusative is sometimes used instead of the original nominative case with the verb taka niri touch bottom in Modern Icelandic as shown in (21). The use of the accusative here is presumably inuenced by all the accusative subject verbs denoting phenomena involving natural forces (see further discussion below). (21) a. b. Bturinn tk niri. the.boat-NOM took down Btinn tk niri. the.boat-ACC took down The boat touched bottom.

It could be argued that verbs with accusative theme/patient subjects in Old Icelandic formed semantically coherent classes just like verbs with accusative experiencer subjects. Nevertheless, these verbs have shown very little productivity in the history of Icelandic. We hypothesize that there are two reasons for this lack of productivity. First, the token frequency of these verbs was quite low

Structured exceptions and case selection in Insular Scandinavian

229

since they had a very restrictive usage.15 For instance, many of the verbs listed in (18b) were primarily used to describe phenomena involving natural forces, e.g. the verb lgja lower. Some fairly typical examples of this verb are shown in (22): (22) a. En egar um vori er sj tk a lgja. but already in the.spring as sea-ACC began to lower But already in the spring as the sea got calmer. (Egils saga, p. 408) egar er slina lgi. already as the.sun-ACC lowered already when the sun set (Eyrbyggja saga, p. 579) tk a lgja veri. then began to lower the.weather-ACC Then the storm subsided (Brennu-Njls saga, p. 219)

b.

c.

The other reason for the lack of productivity of verbs taking accusative theme/ patient subjects is competition from verbs with the middle sufx -st. The regular way of forming causative pairs in Old and Modern Icelandic is by marking the intransitive (inchoative) variant by the sufx -st in which case the subject must be nominative. This is exemplied by the verb opna open in (23): (23) a. b. Jn opnai hurina. John-NOM opened the.door-ACC Hurin opnaist. the.door-NOM opened

Many of the verbs listed in (18) could in fact take the sufx -st in Old Icelandic and in some cases this variant would encroach upon the semantic territory of these verbs so that the form with -st prevailed. This is the case, for example, with the verbs endurnja renew, grynna become shallow, opna open, sj be seen and velkja be tossed about, which have all been ousted as intransitive verbs by endurnjast, grynnast, opnast, sjst and velkjast.

15. Low type frequency cannot be the explanation here since the number of verbs with accusative theme/patient subjects was quite high; in fact, it was clearly higher than the number of verbs in the class of verbs with accusative experiencer subjects.

230 5.

Jhannes Gsli Jnsson and Thrhallur Eythrsson

Comparison with Faroese

The Faroese case system is quite similar to the Icelandic one. However, an important difference is that lexical case has been lost to a much greater extent in Faroese than in Icelandic, both with subjects and objects. In particular, genitive case has more or less fallen out of use in Faroese and has been replaced by other case forms or by prepositional constructions (see Thrinsson et al. 2004: 248252). Given the close relations of the two Insular Scandinavian languages, an investigation of the changes in lexical case in Faroese is interesting for the purpose of testing the predictions of our hypothesis for Icelandic that a semantically coherent class is more resistant to change than a non-coherent class. However, there is a problem with a diachronic investigation of Faroese in that this language is poorly documented in its older periods. Therefore, it is not possible to trace the changes Faroese has undergone as thoroughly as in Icelandic, which is well documented from the 12th century onwards. Already in early texts, i.e. the Faroese ballads and other texts from the late 18th century and the early 19th century, Faroese was in the process of losing some of the case patterns that are still preserved in Icelandic (Thrinsson et al. 2004: 426436). As a result of these changes, in Modern Faroese no verbs take genitive objects, whereas accusative case on subjects is still found, although only to a very limited degree (see Barnes 1986, Petersen 2002, Eythrsson and Jnsson 2003, Thrinsson et al. 2004, and Jnsson and Eythrsson 2005). Thus, the Faroese facts are comparable to the Icelandic ones discussed in section 4 above, although the development in Faroese is in a sense more progressed than in Icelandic. This means that the situation in Faroese is consistent with the hypothesis that a semantically coherent class is more resistant to change than a non-coherent class. 5.1. Genitive objects Whereas adnominal genitives and genitive objects of prepositions still occur to some extent in Modern Faroese, genitive objects of verbs have completely disappeared. Modern Faroese verbs corresponding to verbs taking genitive objects in Old and Modern Icelandic typically take accusative objects or PP complements. Examples from Faroese involving the monotransitives freista tempt and njta enjoy, both with an accusative object, are given in (24) (Poulsen et al. 1998):

Structured exceptions and case selection in Insular Scandinavian

231

(24)

Faroese a. um hgra eyga ttt freistar teg, t slt ta if right eye your tempts you-ACC, then tear it t. out if your right eye tempts you, then tear it out. b. Hann neyt gott av hennara strevi. he enjoyed good-ACC of her hard.work He beneted from her hard work.

As shown in (25), these verbs take genitive objects in Modern Icelandic, as was also the case in Old Icelandic: (25) Icelandic a. ef hgra auga itt freistar n, slt a if right eye your tempts you-GEN, then tear it t. out if your right eye tempts you, then tear it out. b. Hann naut gs af hennar striti. he enjoyed good-GEN of her hard.work He beneted from her hard work.

However, there are a few examples of genitive objects preserved in older Faroese, as evidenced in the ballads (cf. Thrinsson et al. (2004: 431, ex. (120)). The relevant verbs are all monotransitives, e.g. goyma watch (over),16 hevna avenge, vitja visit, vnta expect and ba wait for. (26) Older Faroese a. tann i duranna goymir. he who the.doors-GEN watches he who watches the door. b. hevna mn. avenge me-GEN c. hennar rei at vitja. her-GEN rode to visit rode to visit her.

16. Note that in Old Icelandic, the verb geyma (corresponding to Faroese goyma) also occurs with the accusative in the meaning keep. In Modern Icelandic geyma only means keep and only takes an accusative object.

232

Jhannes Gsli Jnsson and Thrhallur Eythrsson

d.

e.

aftur skalt t vnta mn. back shall you-SG expect me-GEN you shall expect me (to come) back. kirkjumaur bar tn. churchman waits you-GEN the church man waits for you.

Already in the ballads and 19th century texts there are also examples of the innovative accusative with some of these verbs, as in (27) (cf. Thrinsson et al. 2004: 431, ex. (121)).17 (27) Older Faroese a. hevna tap fair sns. avenge loss-ACC father his avenge the loss of his father. b. kom ppin at vitja hana. came the.father to visit her-ACC the father came to visit her. c. hann vntar ringt veur seinnapartin. he expects bad-ACC weather-ACC afternoon He expects bad weather in the afternoon.

There are no known examples of genitive objects of ditransitives in the ballads. Apparently, these had already been replaced by accusative objects or PP complements by the time of their composition, as in the examples in (28) involving bija ask and krevja demand (cf. Thrinsson et al. 2004: 433, ex. (125a, 125cd)). (28) Older Faroese a. Eg ba hann eina bn. I asked him-ACC a-ACC favor-ACC I asked asked him a favor. b. Teir kravdu hann eftir lyklinum til they demanded him-ACC after the.key-DAT to hsi. the.house They demanded the key to the house from him.

17. Interestingly, two verbs, ba wait for and goyma watch, which today govern accusative, could earlier also take dative (cf. Thrinsson et al. 2004: 431). This indicates that, with these two verbs, genitive was rst replaced by dative case, and only later by accusative.

Structured exceptions and case selection in Insular Scandinavian

233

c.

Teir kravdu lykilin til hsi fr honum. they demanded the.key-ACC to house from him They demanded the key to the house from him.

The question arises why genitive objects were lost earlier with monotransitives than ditransitives. Presumably, genitive objects were preserved longer with monotransitives than ditransitives simply because the former had a higher token frequency.18 As a result, there would have been less evidence for the language learner of genitive case with ditransitive verbs, which would have made it more difcult to preserve this type of genitive from one generation to the next. Moreover, it can also be seen in Modern Icelandic that genitive objects are better preserved with monotransitives than with ditransitives. In particular, the replacement of genitive objects by PPs is very common, and is attested already in Old Icelandic as well (see section 3). 5.2. Accusative subjects Around fty verbs with oblique subjects are documented in Faroese sources, all of them involving experiencers, whereas no verbs taking oblique theme/patient subjects are attested (cf. Petersen 2002, Thrinsson et al. 2004). However, most of the relevant verbs have fallen into disuse, occurring only in xed expressions that have a literary or an archaic avor. Therefore, the token frequency of the oblique subject verbs in current spoken Faroese is a lot lower than the above gure indicates. There is a strong tendency in Faroese to substitute nominative case for oblique case with subjects (Nominative Substitution). Thus, for example, the original accusative case with droyma dream (29a) has been virtually eliminated in favor of nominative case (29b): (29) Faroese a. Meg droymdi ein sran dreym. me-ACC dreamt-3.SG a bad dream-ACC b. Eg droymdi ein sran dreym. I-NOM dreamt-1.SG a bad dream-ACC I had a bad dream.

18. See Bybee (1994) for the relevance of lexical and categorial token frequency in inectional morphology. Thus, analogical leveling has been observed to affect the less frequent lexical items rst while the more frequent ones persist longer.

234

Jhannes Gsli Jnsson and Thrhallur Eythrsson

Verbs taking dative subjects in Faroese include verbs originally taking subjects in the accusative that was replaced by dative (Dative Subsitution), e.g. lysta want in (30) (cf. Barnes 1986, Petersen 2002, Eythrsson and Jnsson 2003). (30) Faroese a. Meg me-ACC b. Mr me-DAT

lystir wants lystir wants

at to at to

vita. know vita. know

I want to know. There are very few speakers of Modern Faroese who use accusative as a possible subject case with experiencer verbs (Eythrsson and Jnsson 2003, Jnsson and Eythrsson 2005). However, there is evidence that it was productive to some extent in earlier Faroese. This evidence involves a few verbs that are likely to be new creations in Faroese: hugbta (eftir) long for, ntra shudder, skra ( feginsbrgv, feginsbrgv) tickle (in the left/right eybrow), i.e. expect (something good/bad), and minnast remember. The following examples are from Thrinsson et al. (2004: 253): (31) Faroese a. Meg ntrar holdi. me-ACC shudders in the.esh I shudder. b. Meg skrur feginsbrgv. me-ACC tickles in left.eyebrow I expect something good.

These verbs either did not exist, or did not take an oblique subject, in Old Icelandic, and the same is true of Modern Icelandic. Particularly telling in this respect is the verb minnast, an -st-verb which has replaced the active minna remember (with accusative subject). Old and Modern Icelandic -st-verbs are incompatible with accusative subjects so the occurrence of this verb with an accusative must be a Faroese innovation. In any case, the existence of verbs taking accusative experiencer subjects in Faroese that do not have a counterpart in Old and Modern Icelandic indicates the partial productivity of such verbs at an earlier stage of the language. This is compatible with the hypothesis (cf. 4.1 above) that verbs taking accusative subjects form a coherent semantic class whereas verbs taking genitive objects do not.

Structured exceptions and case selection in Insular Scandinavian

235

The fact that oblique experiencer subjects were preserved longer than theme/ patient subjects in Faroese is likely to be due to the higher token frequency of the former, thus making them easier for children to acquire during the acquisition period. For example, the verbs that originally took accusative experiencer subjects include some very common ones (e.g. droyma dream, minnast remember), whereas the verbs that may be assumed to have taken accusative theme subjects in earlier Faroese (e.g. reka drift, taka t take out) appear to be infrequent in the spoken language (cf. Thrinsson et al. 2004: 276277). 6. Conclusion

As we have amply illustrated in this paper, there are good reasons for distinguishing between two kinds of exceptions to general patterns of argument realization: what we have termed structured exceptions, which involve clustering of lexical items on the basis of shared properties, and arbitrary exceptions, which involve an arbitrary list of lexical items. Structured exceptions display partial productivity and can be extended to new items, whereas arbitrary exceptions are totally unproductive. We have argued that the diachronic development of case selection in Insular Scandinavian (Icelandic and Faroese) provides strong support for this dichotomy. The discussion has focused on two cases of exceptional case selection: accusative subjects and genitive objects. We showed that accusative experiencer subjects have been semi-productive in the history of Insular Scandinavian, whereas genitive objects have been completely unproductive. Appendix In these lists we have left out all verbs that only occur with the relevant case in idiomatic expressions such as nema staar stop (literally hold place-GEN). We have also omitted verbs that are listed in dictionaries of Old Icelandic but only attested in Norwegian texts. (A) Verbs with genitive objects in Old Icelandic aa acquire, rna wish, batna recover from, beia request, beiast request, bija ask, bindast refrain from, ba wait for, blinda make blind to, bta improve, btast recover from, dirfast dare, dylja hide, deny, efast/ifast change ones mind about, eggja incite, endurminnast remember (again), f suffer, rna blame for, forvitnast enquire, foryast refrain from, fregna ask, freista tempt,

236

Jhannes Gsli Jnsson and Thrhallur Eythrsson

try, frtta hear, ask about, frja challenge, question, fylla become full of (water), ll with, fyllast become full of, fyrirkunna blame for, fyrirmuna envy, fsa want; incite, fsast want, g pay attention to, beware of, geta guess, solve, geta mention, geyma pay attention to, take care of, keep, girna desire, girnast desire, gjalda pay for, gleyma forget, gta guard, take care of, gtast mention, hafa (ekki) miss, hefna revenge for, hefnast suffer revenge for, heitast threaten, hrast fear, hrra set in motion, hvetja encourage, incite, irast regret, kenna feel, touch, klifa repeat, kosta try, pay, krefja demand, kunna be angry for, kveja demand, request, leita search for, letja discourage, ltta recover from, lj give, get, meta value, minna remember vaguely, remind of, minnast remember, visit, missa miss, lose, be without, neita refuse, neyta make use of, njta prot from, enjoy, orka effect, cause, orkast get, obtain, minnast/minnast to be unmindful of, neglect, rvilnast/rvilnast despair of, rvnta/rvnta/rvtta despair, lose hope, rvntast despair, lose hope, reka avenge for, saka accuse of, sakna miss, skammast be ashamed of, spyrja ask about, sverja swear, svfast refrain from, synja acquit of, ssla do, get, unna grant, vangeyma neglect, vara expect, varleita search insufciently for, varna prevent, v blame for, vna expect, hope for, villast get lost, vilna expect, hope, vilnast expect, hope for, vira value, vita signal; know, vitja visit, go to, vna give hope of, vnast hope for, vnta/vtta expect, skja wish, arfa need, arfnast need, egja refrain from saying, rta deny, urfa need, sa incite, skja wish, sta ask for, demand (B) Verbs with genitive objects in Modern Icelandic aa acquire, rna wish, bija ask, ba wait for, dirfast dare, freista tempt, try, frja challenge, question, geta mention, gjalda pay for, gta guard, take care of, hefna revenge for, ira regret, irast repent, regret, kenna feel, krefja demand, krefjast claim, leita search for, meta value, minnast remember, visit, neyta make use of, njta prot from, enjoy, ska wish, sakna miss, spyrja ask about, synja acquit of, unna grant, varna prevent, vira value, vitja visit, go to, vnta expect, skja wish, arfnast need, urfa need Verbs with accusative subjects in Old Icelandic angra grieve, ngja become happy, belgja blow out, bera carry, bila fail, birta become clear, ba exist, blsa swell,

(C)

Structured exceptions and case selection in Insular Scandinavian

237

blow, bresta run out of, brjta break, brydda arise, byrja begin; be required to, daga uppi dawn up (turn to stone), deila divide, draga pull, dreyma dream, drepa be knocked down, dkkva darken, enda nish, endurnja renew, f get, fenna be covered with snow, festa fasten, fjara ebb, fjlga increase, forvitna be curious, frjsa freeze, fylla ll, fsa want, gera do, make, geta exist, girna desire, greina um disagree on, gruna suspect, grynna become shallow, hafa t blow through, harma vex, hefja raise; begin, heimta want, henda happen, concern, hera become hard, heyra be heard, hindra be hindered, hryggja grieve, hungra feel hungry, ira regret, kala suffer frostbites, kefja sink, keyra drive, knta/hnta become crooked, kosta cost, kreppa become crippled, kvelda become evening, kyrra calm, langa want, leggja become covered with ice, leia af follow from, lengja lengthen, leysa be dissolved, la come to end, lysta desire, want, lsa shine, lgja lower, minna remember vaguely, minnka decrease, misminna remember wrongly, muna want, nausynja necessitate, ntta be overtaken by the night, opna open, gleja become unhappy, nta become unusable, mtta lose strength, tta fear, reia move about, reka drift, rifna tear, rjfa split, ryja disperse, rma become wider, rsa come to pass, saka hurt, setja become, sj be seen, skaa hurt, skara protude, skemma become short, skera cut, skilja divide, skipta concern, skorta lack, slta be cut, slgja be tempted, stemma be obstructed, stinga sting, stkka become bigger, stra swell, sundla become dizzy, svimra become dizzy, syfja become sleepy, sna be seen, skja be affected by, taka be taken, ta want, tma happen to, trega regret, ugga fear, undra wonder, vanta lack, need, vara expect, vara concern, vatna disappear in water, vekja upp awaken, velgja feel nausea, velkja toss, verkja/virkja ache, verpa be thrown, vilna hope, vkja be moved to one side, vgja suppurate, vna expect, vnta/vtta expect, rjta run out of, rota lack, rngva force, ynna become thin, yrsta feel thirsty, sa be stirred, funda envy

238

Jhannes Gsli Jnsson and Thrhallur Eythrsson

References
Primary sources Fornmanna sgur. Copenhagen 18251835. Heimskringla (lafs saga helga) 1991 Bergljt Kristjnsdttir, Bragi Halldrsson, Jn Torfason and rnlfur Thorsson (eds.). Reykjavk: Ml og menning. Pslarsaga sra Jns Magnssonar 2001 Matthas Viar Smundsson s um tgfuna. Reykjavk: Ml og menning. slendinga sgur (Bandamanna saga, Brennu-Njls saga, Egils saga, Eirks saga raua, Eyrbyggja saga, Finnboga saga ramma, Grettis saga, Svarfdla saga, Vga-Glms saga) 1985/86 Bragi Halldrsson, Jn Torfason, Sverrir Tmasson and rnlfur Thorsson (eds.). Reykjavk: Svart hvtu. Sturlunga saga 1988 Bergljt Kristjnsdttir, Bragi Halldrsson, Gsli Sigursson, Gurn sa Grmsdttir, Gurn Inglfsdttir, Jn Torfason, Sverrir Tmasson and rnlfur Thorsson (eds.). Reykjavk: Svart hvtu.

ROH = Ritmlssafn Orabkar Hskla slands [Corpus of Written Icelandic of the University of Iceland Dictionary Project], see: http://www.lexis.hi.is (Ritmlssafn). Secondary sources Allen, Cynthia L. 1995 Case Marking and Reanalysis. Grammatical Relations from Old to Early Modern English. Oxford: Oxford University Press. Bardal, Jhanna 2001 Case in Icelandic A Synchronic, Diachronic and Comparative Approach [Doctoral dissertation]. Lund: Department of Scandinavian Languages. Barnes, Michael 1986 Subject, Nominative and Oblique Case in Faroese. Scripta Islandica 37: 1346. Bjrgvinsdttir, Ragnheiur 2003 Frumlagsfall mli barna [Subject Case in Child Language]. B.A. thesis, University of Iceland, Reykjavk Bybee, Joan 1994 Morphological Universals and Change. In The Encyclopedia of Language and Linguistics 5, R. E. Asher (ed.), 25572562. Oxford: Pergamon Press.

Structured exceptions and case selection in Insular Scandinavian

239

Bybee, Joan, and Dan I. Slobin 1982 Rules and Schemas in the Development and Use of the English Past Tense. Language 58: 265289. Bybee, Joan, and Carol Lynn Moder 1983 Morphological Classes as Natural Categories. Language 59: 251 270. Delsing, Lars-Olof 1991 Om genitivens utveckling i fornsvenskan [On the Development of the Genitive in Old Swedish]. In Studier i svensk sprkhistoria 2, SvenGran Malmgren and Bo Ralph (eds.), 1230. Gteborg: Acta Universitatis Gothoburgensis. Donhauser, Karin 1998 Das Genitivproblem und (k)ein Ende? Anmerkungen zur aktuellen Diskussion um die Ursachen des Genitivschwundes im Deutschen. In Historische germanische und deutsche Syntax, John Ole Askedal (ed.), 6986. Frankfurt am Main: Lang. Eythrsson, Thrhallur 2002 Changes in Subject Case Marking in Icelandic. In Syntactic Effects of Morphological Change, David Lightfoot (ed.), 196212. Oxford: Oxford University Press. Eythrsson, Thrhallur, and Jhannes Gsli Jnsson 2003 The Case of Subject in Faroese. Working Papers in Scandinavian Syntax 72: 207232. Goldberg, Adele E. 1995 Constructions: A Construction Grammar Approach to Argument Structure. Chicago: University of Chicago Press. Jespersen, Otto 1942 A Modern English Grammar on Historical Principles, IV: Morphology. Copenhagen: Munksgaard.

Jnsson, Jhannes Gsli 1996 Clausal Architecture and Case in Icelandic. Doctoral dissertation, University of Massachusetts, Amherst. Jnsson, Jhannes Gsli 1997/98 Sagnir me aukafallsfrumlagi [Verbs Taking Oblique Subject]. slenskt ml og almenn mlfri 1920: 1143. Jnsson, Jhannes Gsli 2000 Case and Double Objects in Icelandic. In Leeds Working Papers in Linguistics and Phonetics, Diane Nelson and Paul Foulkes (eds.), 71 94. (Also available at http://www.leeds.ac.uk/linguistics/index1.htm)

240

Jhannes Gsli Jnsson and Thrhallur Eythrsson

Jnsson, Jhannes Gsli 2003 Not so Quirky: On Subject Case in Icelandic. In New Perspectives on Case Theory, Ellen Brandner and Heike Zinsmeister (eds.), 127164. Stanford, California: CSLI. Jnsson, Jhannes Gsli, and Thrhallur Eythrsson 2003 Breytingar frumlagsfalli slensku [Changes in Subject Case in Icelandic]. slenskt ml og almenn mlfri 25: 740. Jnsson, Jhannes Gsli, and Thrhallur Eythrsson 2005 Variation in Subject Case Marking in Insular Scandinavian. Nordic Journal of Linguistics 28: 223245. Maling, Joan 2002 a rignir gufalli slandi. Verbs with Dative Objects in Icelandic. slenskt ml og almenn mlfri 24: 31105.

Nbling, Damaris this vol. How do exceptions arise? On different paths to morphological irregularity. Nygaard, Marius 1906 Norrn syntax. Oslo: Aschehoug. skarsson, Veturlii 1997/98 Ske. slenskt ml og almenn mlfri 1920: 181207. Pinker, Steven 1999 Words and Rules. The Ingredients of Language. New York: Basic books.

Petersen, Hjalmar P. 2002 Quirky Case in Faroese. Frskaparrit 50: 6376. Poulsen, Jhan Hendrik W., Marjun Simonsen, Jgvan Lon Jacobsen, Annnur Johansen, and Zakaris Svabo Hansen (eds.) 1998 Froysk orabk [A Dictionary of Faroese]. Trshavn: Froya Frskaparfelag & Frskaparsetur Froya. Sigurardttir, Herds . 2002 Fallmrkun barnamli: Hvernig lra slensk brn a nota fll? [Case Marking in Child Language: How do Icelandic Children Learn to Use Cases?] M.A. thesis, University of Iceland, Reykjavk. Sigursson, Halldr rmann Sigursson 1989 Verbal Syntax and Case in Icelandic. Doctoral dissertation, Lund University. Svenonius, Peter 2002 Icelandic Case and the Structure of Events. Journal of Comparative Germanic Linguistics 5: 197225.

Structured exceptions and case selection in Insular Scandinavian

241

Thrinsson, Hskuldur, Hjalmar P. Petersen, Jgvan Lon Jacobsen, and Zakaris S. Hansen 2004 Faroese: An Overview and Reference Grammar. Trshavn: Froya Frskaparfelag. Viarsson, Heimir Freyr 2006 Breytilegt frumlagsfall fornslensku: athugun breytileika fallmrkun skynjandafrumlaga [Variable Subject Case in Old Icelandic: a Study of Variation in the Case Marking of Experiencer Subjects]. B.A. thesis, University of Iceland, Reykjavk. Yip, Moira, Joan Maling, and Ray Jackendoff 1987 Case in Tiers. Language 63: 217250. Zaenen, Annie, Joan Maling, and Hskuldur Thrinsson 1985 Case and Grammatical Functions: The Icelandic Passive. Natural Language and Linguistic Theory 3: 441483. Zaenen, Annie, and Joan Maling 1990 Unaccusative, Passive and Quirky Case. In Modern Icelandic Syntax, Joan Maling and Annie Zaenen (eds.), 137152. San Diego: Academic Press.

Remarks on two kinds of exceptions: arbitrary vs. structured exceptions Susann Fischer

Jhannes Gsli Jnsson and Thrhallur Eythrsson (this volume) argue for a dichotomy between structured and arbitrary exceptions with respect to case selection in Insular Scandinavian. Their main claim is that structured exceptions share semantic properties and display a partial productivity which enables them to attract other lexical items to this group by analogy. Arbitrary exceptions on the other hand dont share semantic properties and are unproductive. Jnsson and Eythrsson draw their arguments from the diachronic development of case selection in Icelandic and Faroese and show that experiencer accusative subjects have been semi-productive in the history of Insular Scandinavian whereas genitive objects have been totally unproductive. It has been suggested that three kinds of case with arguments have to be recognized in Icelandic (Yip, Maling and Jackendoff 1987, Jnsson 2000 etc.). The theoretical consequence that seems to follow Jnsson and Eythrssons argument is to divide idiosyncratic case again into structured and totally arbitrary idiosyncratic case. The dichotomy between structured and arbitrary exceptions seems well motivated when argued on the basis of productivity. It is an interesting and new observation, and the explanation given seems reasonable. Nevertheless, it would be interesting to see in what way the semi-productivity of accusative subjects in Icelandic has to do with the fact that these oblique subjects are syntactic subjects and not only logical subjects like e.g. in Modern German. In Modern German we nd the same groups of accusative experiencer subjects, e.g. verbs of physical discomfort (i), verbs denoting feelings (ii), and verbs of cognition (iii): (i) mich drstet be thirsty, mich friert be cold, mich hungert be hungry, mich schmerzt feel pain, mich schauert / mich schaudert to tremble; mich wundert be surprised, mich frchtet be afraid, mich erstaunt be surprised, mich gelstet have a craving for, mich erheitert / mich belustigt be amused, mich verlangt to long for, mich erfreut be

(ii)

244

Susann Fischer

happy, mich langweilt be bored, mich erbost be furious, mich rgert be angry, mich erzrnt be infuriated, mich deprimiert be depressed; (iii) mich dnkt / mich deucht me thinks, mich berrascht be surprised, mich wundert / mich verwundert be astonished, mich verblfft be amazed.

However, these verbs even though they obviously share semantic properties do not attract new members to this class. On the contrary they lose ground1 and we nd more and more verbs that not only allow for the original accusative, but also for dative or nominative case within one and the same speaker: (1) mich schaudert mir schaudert ich schaudere (2) mich verlangt mir verlangt ich verlange

It seems as if next to the similar semantic properties of accusative subjects something else were at stake? Most of the old Germanic languages display accusative experiencer subjects. Most of the modern Germanic languages have either lost experiencer subjects altogether, e.g. English, or still display accusative experiencer subjects, e.g. German, however they are no longer syntactic subjects but only logical subjects. As has been argued by Fischer (2004) and also Hrafnbjargarson (2004) these non-nominative subjects in Old Germanic, Modern Icelandic and Modern Faroese seem to make use of additional functional material in the left periphery. Under this view the loss of accusative experiencer subjects in e.g. English and the change from syntactic subjects to logical subjects in German is explained by the loss of this additional functional category. It seems plausible to assume that the difference between e.g. German and Icelandic/Faroese accusative experiencer verbs where Icelandic and Faroese verbs are productive and German verbs are unproductive even though they share semantic properties might depend on the difference with respect to phrase structure. It seems that only as long as accusative experiencers are real syntactic subjects they attract new members to their class, see also Dative Sickness in Icelandic and Faroese (Eythrsson 2002), and when they lose the capacity to appear as syntactic subjects they also lose ground in the respective languages. Another point I want to mention regards their argument with respect to arbitrary exceptions, i.e., verbs selecting genitive case. Jnsson and Eythrsson show that genitive objects have seen a steady decline in the history of Icelandic, from formerly 100 verbs selecting genitive to
1. More and more speakers of Modern German avoid using oblique experiencer subjects and choose instead a verb with a nominative subject.

Remarks on two kinds of exceptions: arbitrary vs. structured exceptions

245

now only 35 verbs. According to their argumentation this is due to the fact that these verbs do not form any semantically coherent subclass, instead they select idiosyncratic/exceptional case that has to be learnt on an item-to-item basis and is therefore highly susceptible to change. I will not enter the discussion here that obviously generations of Icelanders were able to correctly acquire verbs selecting genitive case before they lost this capacity. Instead I would like to point out that the loss of genitive objects might be connected to other changes that have been going on in the history of Icelandic and Faroese and therefore might not have anything to do with the fact that genitive objects represent arbitrary exceptions to case selection. Verbs selecting genitive objects represent a crosslinguistic phenomenon. To name only a few modern languages that allow for genitive objects, i.e. that show morphological case distinction between accusative vs. genitive/partitive in the object domain: e.g. Russian, Polish, Serbo-Croatian, Finnish, Estonian, Turkish etc. It has long been noticed that there seems to be a connection between case-alternation on objects and the fact that these languages do not have articles (King 1995, Neidle 1988 among many others), and that additionally in most of these languages there is some interaction between case-morphology, reference and aspect (Kiparsky 1998, de Hoop 1992, Leiss 2000). See below (1) the semantic contrast in Russian with respect to reference and the interaction with aspect. (1) a. Ja dobavil saxar v aj I added.perf sugar.acc in tea I added the sugar to the tea. a. Ja dobavil saxara v aj sugar.gen I added some of the sugar to the tea. Ja dobavljal saxar / *saxara2 v aj in tea I added.imp sugar.acc / -gen

b.

For other languages again it was shown that the old stratum possessed genitive objects whereas the modern languages have lost this possibility altogether or at least in spoken language Old English, Old Swedish, Old High German, Middle German etc. The loss of genitive objects in these languages has been of paramount interest within historical linguistics and has resulted in an abundance
2. The English progressive is a rather new development that only started during Early Modern English.

246

Susann Fischer

of speculations about what exactly triggered this change. We nd approaches explaining the loss of genitive morphology by the phonological reduction of end syllables (e.g. Behaghel 1923). These approaches dont differentiate between the loss of genitive inected objects and the general loss of case morphology on NPs. Others connect the loss of genitive objects to a change in the conception of the world, i.e., to the Verkmmerung partitiver Denkformen [degeneration of partitive forms/ways of thinking] (Wolff 1954). Additionally, it has been convincingly argued on the basis of the modern Germanic languages that still use the accusative genitive distinction on objects - that there is an interaction between the loss of case and aspect morphology and the availability of articles. In other words, case morphology is used to express reference and to a certain degree interacts with aspect, if no article system is available (Abraham 1997, Leiss 2000, Fischer 2005). Let us go back in time: Proto-Germanic had a highly developed verbal aspect system and case morphology but no articles. During the development of the Germanic languages they lost more (English, Dutch etc.) or less (Icelandic, German etc.) their case morphology, their verbal aspect morphology,3 but developed denite articles (all) and indenite articles (all but Icelandic). With respect to German it has been argued that the loss of genitive case on objects and the weakening of the aspectual morphology interacted with the emergence of the articles (Donhauser 1998, Abraham 1997, Leiss 2000). Donhauser (1992) even proposed to see genitive on objects as a structural case because it only alternates with accusatives and only ever appears in the direct object position. Abraham (1997) observes that the [+/-def] interpretation of the object NP in Old High German was the result of the interplay between aspectual and case morphology similar to the modern Russian system (see also Fischer 2005). A genitive NP didnt combine with an imperfective verb; in the scope of a perfective verb it always received a [-def] reading. An accusative marked NP however, could receive a [+def] reading in the scope of a perfective verb and a [+/-def] reading in the scope of an imperfective verb. Only accusatives combined with both perfective and imperfective verbs and could receive a [+def] interpretation and a [-def] interpretation. After aspectual marking disappeared, genitive lost its status as being opposed to the accusative marked objects, and as a result the verbally governed genitive case disappeared, i.e. the denite/indenite reading of the object NP could no longer be obtained through the interplay between case opposition and aspectual conditions. The interplay
3. Genitive case is usually excluded with imperfective verbs. Some verbs do occur with imperfective morphology and genitive case on the object; these verbs however get an iterative interpretation (Fischer 2003).

Remarks on two kinds of exceptions: arbitrary vs. structured exceptions

247

weakened and nally disappeared completely; in its place, the determiner category was lexically lled, rst with a denite and later with the indenite article (Abraham 1997: 59). With respect to Icelandic maybe a similar development took place. According to Leiss (2000) Proto-Nordic encoded deniteness only in indenite contexts (i.e., in rheme position) by the alternations of SVO to SOV. Additionally, we know that quite a lot of verbs in Proto-Nordic allowed the alternate use of accusative case next to genitive case in the position of the direct object. In Old Icelandic word-order gets xed towards V2 and from the 7th century onwards preverbal aspect marker started to disappear. Since the verb had to appear in second position, deniteness in rheme positions could no longer be encoded by word-order alternations it needs to be encoded now by the use of denite articles (according to Leiss 2000). However, the alternating use of accusative vs. genitive is still available in Old Icelandic (Nygaard 1906). Modern Icelandic does not use preverbal markers in order to denote aspect, and it still doesnt use an indenite article but it still allows for some verbs to appear with a genitive object. So it seems possible that the loss of aspectual morphology and the emergence of the denite article somehow triggered the loss of genitive objects in most verbs. This seems especially plausible since we do know that the use of genitive in those languages that do not have article systems is used in order to denote indeniteness with respect to the object NP (Neidle 1988 among many others) and also interacts with aspectual morphology, or in some languages even denotes verbal aspectual differences (e.g. Finnish, cf. de Hoop 1992, Kiparsky 1998). Of course it is impossible without a thorough investigation of the Old Icelandic data to argue that the loss of genitive objects is denitely connected to the loss of aspectual morphology and to the emergence of articles. However, the previous discussion is meant to at least cast some doubts on the claim that genitive objects get lost only because they represent arbitrary exceptions. References
Abraham, Werner 1997 The interdependence of case, aspect and referentiality in the history of German: the case of the verbal genitive. In Parameters of Morphosyntactic Change, Ans van Kemenade and Nigel Vincent (eds.), 2961. Cambridge: Cambridge University Press. Behaghel, Otto 1923 Deutsche Syntax. Eine geschichtliche Darstellung. Bd. I. Heidelberg: Carl Winters Universittsbuchhandlung.

248

Susann Fischer

de Hoop, Helen 1992

Case Conguration and NP Interpretation. Doctoral dissertation, Rijksuniversiteit Groningen. Published New York: Garland 1996.

Donhauser, Karin 1992 Das Genitivproblem in der historischen Kasusforschung. Ein Beitrag zur Diachronie des deutschen Kasussystems. Habilitationsschrift Passau. Donhauser, Karin 1998 Das Genitivproblem und (k)ein Ende? Anmerkungen zur aktuellen Diskussion um die Ursachen des Genitivschwundes im Deutschen. In Historische germanische und deutsche Syntax, John Ole Askedal (ed.), 6986. Bern: Lang. Eythrsson, Thrhallur 2002 Changes in subject case marking in Icelandic. In Syntactic. Effects of Morphological Change, David Lightfoot (ed.), 196212. Oxford: Oxford University Press. Fischer, Susann 2003 Partitive vs. Genitive in Russian and Polish: an empirical study on case alternation in the object domain. In: Experimental Studies I (Linguistics in Potsdam 21), Susann Fischer, Ruben van de Vijver and Ralf Vogel (eds.), 7389. Potsdam: Universitt Potsdam. The diachronic relationship between quirky subjects and stylistic fronting. In Non-Nominative Subjects, Vol. 1, Karumuri Venkata Subbarao and Peri Bhaskarao (eds.), 193212. Amsterdam/Philadelphia: Benjamins. The interplay of reference and aspect. In Specicity and the Evolution/Emergence of Nominal Determinations Systems in Romance, (Konstanzer Arbeitspapiere zur Sprachwissenschaft 119), Elisabeth Stark, Klaus von Heusinger and Georg Kaiser (eds.), 118. Universitt Konstanz.

Fischer, Susann 2004

Fischer, Susann 2005

Hrafnbjargarson, Gunnar Hrafn 2004 Oblique subjects and stylistic fronting in the history of Scandinavian and English. PhD. diss. Aarhus Universitet. King, Tracy Holloway 1995 Conguring Topic and Focus in Russian (Dissertations in Linguistics). Stanford, CA: Center for the Study of Language and Information.

Remarks on two kinds of exceptions: arbitrary vs. structured exceptions Kiparsky, Paul 1998

249

Partitive Case and Aspect. In The Projection of Arguments, Miriam Butt and Wolfgang Geuder (eds.), 265307. Stanford, CA: CSLI Publications. Artikel und Aspekt. Die grammatischen Muster von Denitheit. Berlin/New York: de Gruyter.

Leiss, Elisabeth 2000

Meyer-Lbke, Wilhelm 1888 Die Lateinische Sprache in den romanischen Lndern. In Grundri der Romanischen Philologie, Gustav Grber (ed.), 351382. Straburg: Trbner. Neidle, Carol 1988 Nygaard, M. 1906 Wolff, Ludwig 1954 The Role of Case in Russian Syntax. Dordrecht: Kluwer. Norrn syntax. Oslo: Aschehaug. ber den Rckgang des Genitivs und die Verkmmerung der partitiven Denkformen. (Helsinki Annales Academiae Scientiarum Fennicae. Series B). Helsinki.

Response to Susann Fischer Jhannes Gsli Jnsson and Thrhallur Eythrsson

In our paper we focus on two instances of idiosyncratic case in Insular Scandinavian (Icelandic and Faroese): (i) genitive case with objects and (ii) accusative case with subjects. We argue that there is a difference between these two types in terms of their productivity. Thus, while genitive objects have been totally unproductive in the recorded history of Icelandic, accusative subjects have displayed some productivity in the same period. The development has gone even further in Faroese in that genitive case has been completely lost as an object case, but accusative can still be found with subjects to a very limited degree. On the basis of the observed difference between accusative subjects and genitive objects we argue for a dichotomy of structured and arbitrary exceptions. Structured exceptions share semantic properties and display partial productivity which enables them to attract new lexical items into their group. Arbitrary exceptions, on the other hand, do not share any semantic properties and are entirely unproductive. In her remarks on our paper, Fischer grants that the dichotomy between structured and arbitrary exceptions is well motivated. Nevertheless, on the basis of comparative evidence from German, she claims that some other factors in the historical development of case should be considered. Thus, the loss of genitive case with objects in German may have interacted with a weakening of the aspectual morphology of the verb and the emergence of the articles. Fischer suggests that a similar development may have taken place in Icelandic. In fact, however, no such development occurred in Icelandic in the period under investigation in our paper (i.e. from the 13th century to the present day). More generally, we are not aware of any morphosyntactic changes in the relevant period that could have contributed to the decline of genitive objects. The other major point made by Fischer is that it would be interesting to see how the semi-productivity of accusative experiencers in Icelandic is connected to the fact that they are syntactic subjects. By contrast, accusative experiencers have a much weaker status in Modern German, where subject-like obliques are usually assumed to be non-subjects. This hypothesis is clearly undermined by Faroese, where accusative experiencers have more or less disappeared despite

252

Jhannes Gsli Jnsson and Thrhallur Eythrsson

the fact that their subject status is not in doubt (Barnes 1986, Eythrsson and Jnsson 2003, Thrinsson et al. 2004). Thus, both Faroese and German have undergone more changes in their case systems than Icelandic, but we will not speculate here why this is so. Fischers hypothesis is further weakend by the fact that German has been argued to have oblique subjects (e.g. Eythrsson and Bardal 2005), contrary to the standard view in Germanic linguistics. But even if we accept the standard view, it is unclear why the productivity of accusative experiencers in Icelandic (as opposed to German) should be enhanced by their subject status. In fact, acquisition studies show that accusative experiencer subjects are acquired fairly late in Icelandic (Bjrgvinsdttir 2003, Sigurardttir 2002) much later than e.g. accusative or dative objects. The conclusion, then, is that subject status does not provide any defence for accusative experiencers against diachronic change. References
Barnes, Michael 1986 Subject, nominative and oblique case in Faroese. Scripta Islandica 37: 13-46. Bjrgvinsdttir, Ragnheiur 2003 Frumlagsfall mli barna [Subject case in child language]. B.A. thesis, University of Iceland, Reykjavk. Eythrsson, Thrhallur, and Jhanna Bardal 2005 Oblique subjects: A Common Germanic inheritance. Language 81: 824-881. Eythrsson, Thrhallur, and Jhannes Gsli Jnsson 2003 The case of subject in Faroese. Working Papers in Scandinavian Syntax 72: 207-232. Sigurardttir, Herds . 2002 Fallmrkun barnamli: Hvernig lra slensk brn a nota fll? [Case marking in child language: How do Icelandic children learn to use cases?] M.A. thesis, University of Iceland, Reykjavk. Thrinsson, Hskuldur, Hjalmar P. Petersen, Jgvan Lon Jacobsen and Zakaris S. Hansen 2004 Faroese: An Overview and Reference Grammar. Trshavn: Froya Frskaparfelag.

Loosening the strictness of grammar

Three approaches to exceptionality in syntactic typology Frederick J. Newmeyer

Abstract. In this paper, I contrast three approaches to handling exceptionality in syntactic typology: the macroparametric approach associated with the Government-Binding Theory; the microparametric approach associated with the Minimalist Program; and an extrasyntactic approach, in which parsing and other performance principles account for typological variation and exceptions to typological generalizations. I conclude that the extrasyntactic approach is best motivated.

1.

Introductory remarks

Most generative grammarians have taken what one might call a strongly deterministic approach to language. The methodological strategy of generative grammar has always been to push to the side what seems non-deterministic, irregular, and unpredictable. That is, in the search for maximally general principles, generative grammarians have often ignored the messiness typically found in linguistic data. Consider the most famous (some would say notorious) passage in all of Chomskys writings:
Linguistic theory is concerned primarily with an ideal speaker-listener, in a completely homogeneous speech-community, who knows its language perfectly and is unaffected by grammatically irrelevant conditions in applying his knowledge of the language in actual performance. (Chomsky 1965: 3)

That passage seems to imply a mechanical approach, at least in practice, to the question of how ones knowledge of language affects ones use of language. And as far as the internal structure of the grammar is concerned, we again see a methodology that abstracts away from the messier facts. After all, with few and entirely recent exceptions, all formal linguists have taken an algebraic view
* I would like to thank Edith Moravcsik, Thomas Wasow, and two anonymous referees for their comments on an earlier draft of this paper. Portions have appeared in Newmeyer (2004, 2005) and are reprinted with permission.

256

Frederick J. Newmeyer

of grammar, rather than a stochastic one. Chomsky could not have been more explicit on the question of determinism when he wrote: The principles of universal grammar are exceptionless (Chomsky 1988: 62). In early transformational grammar, language-particular rules were assumed to admit exceptions (see, for example, Lakoff 1965/1970). However, with the introduction of the general all-purpose movement rule, Move- , in the mid1970s, exceptionality was removed entirely from the transformational component. Problematic cases such as (1) through (3) below, which seem to be primafacie counterexamples to the idea of exception-free rules, were either consigned to the lexicon or (more often) ceased to be subject matter for theoretical discussion: (1) (2) (3) a. b. a. b. a. b. He is likely to be late. *He is probable to be late. (likely, but not probable, allows raising) He allowed the rope to go slack. *He let the rope to go slack. (let does not take the innitive marker) He isnt sufciently tall. *He isnt enough tall. / He isnt tall enough. (enough is the only degree modier that occurs post-adjectivally)

It is also no secret that there is indecisiveness in judgments about the acceptability of sentences. Chomsky wrote about the problem in his earliest work (Chomsky 1957). He was aware that judgments are not a yes/no matter and hoped that analyses could be arrived at on the basis of totally clear and uncontroversial judgments. The theory would then decide the status of the unclear cases. But in fact that has happened only rarely. Indeed, we have seen the precise reverse numerous times, in which appeal is made to the most unclear and most controversial data. To give one example, Lasnik and Saito (1984) consider it a major selling point of their theory of proper government that it accounts for the ambiguity of sentence (4): (4) Why do you think that he left?

In their view, why can be understood as questioning the reason for thinking or questioning the reason for leaving. Its supposed ambiguity was a crucial piece of data in their analysis. But for Aoun, Hornstein, Lightfoot and Weinberg (1987), a major advantage of their account of proper government is that it accounts for the lack of ambiguity of the very same sentence. Examples of this state of affairs are all too frequent.

Three approaches to exceptionality in syntactic typology

257

The grammatical indeterminism and exceptionality that will be the focus of the remainder of this paper is the sort that we nd in typological generalizations. There is a tradition over a century old of dividing languages into broad types. Until recently, the types were morphologically based: inecting, analytic, agglutinative, polysynthetic, and so on (see Sapir 1921). For the past forty years, the types have generally been syntactically based. Since the publication of Greenberg (1963), most linguists have taken the order of heads and complements to be the most revealing for purposes of syntactic typology. That is, one says that if a language is head-initial, then a particular constellation of properties follows and that if a language is head-nal, then a different constellation of properties follows. When one looks at things closely, however, it is not just the supposed correlates of head-initiality and head-nality that are full of exceptionality, but the very notion of head-initial language itself and the notion of head-nal language itself. Consider the statistics in (5), all of which but (5e) are drawn from Dryer (1991): (5) For verb-nal languages: a. 96 % are postpositional b. 85 % have predicate-copula order c. 73 % have sentence-question particle order d. 71 % are wh-in-situ e. 64 % have case marking (Siewierska and Bakker 1996) f. 43 % have relative-noun order

It is the theoretical status of typological generalizations and, in particular, exceptions to them that constitute the subject matter of the remainder of this paper. In other words, I take on the question of how linguistic theory should best handle, say, the 4 % of prepositional verb-nal languages, the 15 % of verb-nal languages with copula-predicate order, and so on. In the following sections, I contrast three approaches to handling exceptionality in syntactic typology: The macroparametric approach associated with the Government-Binding Theory (GB) ( 2); the microparametric approach associated with the Minimalist Program (MP) ( 3); and an extrasyntactic approach, in which parsing and other performance principles account for typological variation and exceptions to typological generalizations ( 4). I conclude that the extrasyntactic approach is best motivated. Section 5 is a brief conclusion.

258 2.

Frederick J. Newmeyer

The macroparametric approach of GB

In broad outline, the GB program for typology was very simple (I should perhaps write is rather than was, since Mark Bakers recent book Atoms of Language [Baker 2001] reasserts it). The idea is that the principles of UG are associated with a small number of broad-scope macroparameters, each of which admits to a small number of settings. A language as a whole is specied for each setting. So English might be set positively for the Overt-Wh-Movement parameter, negatively for the Verb Raising parameter, and with the setting S and NP for the Subjacency parameter. In the GB view, the interactions of these settings combine to generate the diversity of human languages. Furthermore, since the parameters themselves are highly abstract, the idea is that unexpected clusterings of typological features should follow automatically. Now, what about typological exceptionality in this model? In fact, there are several methods proposed for its treatment in classical GB. One is by means of markedness relations among parameter settings. For example, Chinese is consistently head nal except in the rule expanding X to X0 (if the head is verbal it precedes the complement). So, as noted in Huang (1982: 46), Chinese manifests the ordering V-NP, but NP-N: (6) a. y u o s ng rn m i-le a a sh u EXISTENTIAL three man buy-ASP book Three men bought books Zh ngs n de a a s nb n sh a e u Zhangsan NOM three book Zhangsans three books

b.

Travis (1989) suggested that Chinese has a marked parameter setting for word order. Normally, if a language is head nal, it assigns Case and Theta-Role to the left, as in (7a). However Chinese has a special setting that violates this default ordering, namely (7b): (7) a. b. Unmarked setting: HEAD-RIGHT THETA-ASSIGNMENT TO LEFT & CASE-ASSIGNMENT TO LEFT Marked setting (Chinese): HEAD-RIGHT & THETA-ASSIGNMENT TO RIGHT & CASE-ASSIGNMENT TO RIGHT

In other words, Chinese grammar is more complicated than the grammar of a consistent language and is therefore required to pay for its typological exceptionality.

Three approaches to exceptionality in syntactic typology

259

Another strategy within GB was to assign typologically exceptional processes to the marked periphery, namely a system lying outside the principles and parameters of core grammar. Some candidates proposed for the marked periphery in Chomsky (1981) are the following: (8) a. b. c. d. Elliptical expressions (He is seeing someone, but I dont know who) Exceptional Case Marking (I believe her to be clever) Picture noun reexives (John thinks that the picture of himself is not very attering). Preposition-stranding (van Riemsdijk 1978) (Who did you talk to?)

The GB program for handling typological generalizations and exceptions to these generalizations has not worked out as originally envisioned. To a large extent, such is because some of the most discussed generalizations turned out to be spurious. It does not make much sense to talk about an exception to a nonexistent generalization. Most seriously, the hoped for clustering of typological properties characterizable by a simple parameter setting seems not to exist. I illustrate this rst with the Null-Subject Parameter, which by far is the best studied parameter in GB. The theory of Rizzi (1982) predicts the following possible clustering of features:1 (9) NULL TS yes no no NULL NTS yes yes no SI yes yes no THAT-T yes yes no

But still other language types exist, or at least appear to. In particular, we nd languages such as Brazilian Portuguese (Chao 1981) and Chinese (Huang 1982, 1984) that have null subjects, but not subject inversion. Taking such language types into account, Sar (1985) broke the Null Subject Parameter into three parts, dissociating null nonthematic subjects, null thematic subjects, and subject inversion, thereby predicting a wider set of languages than did Rizzi, namely the following:

1. In this and in the following examples, the following abbreviations are used: NULL TS = Null thematic subjects; NULL NTS = Null nonthematic subjects; SI = subject inversion; THAT-T = the possibility of that-trace lter violations.

260 (10)

Frederick J. Newmeyer

NULL TS yes yes no no no

NULL NTS yes yes yes no no

SI yes no yes yes no

THAT-T yes no yes yes no

If Sars predictions were correct, then an exceptional language would be one that had, say, null thematic subjects, no null nonthematic subjects, subject inversion, and no that-trace effects. Rizzis and Sars predictions were put to the test by Gilligan (1987), who worked with a 100 language sample, which he attempted to correct for areal and genetic bias.2 Gilligan devotes many pages of discussion to the problems involved in determining whether a language manifests one of the four properties or not. His nal determination was often based on the results of then-current generative analyses, rather than on mere surface facts about the language in question. For example, he excluded Chinese, Thai, Indonesian, Burmese and other languages that lack agreement morphology from the ranks of those permitting null thematic subjects on the basis of the analysis of Chinese in Huang (1984), which takes the empty subject in that language to be a null topic, rather than a pro. Gilligan found the following correlations of properties in his sample (languages for which there was not sufcient data are excluded): (11) NULL TS NULL NTS NULL TS SI NULL TS THAT-T NULL NTS SI NULL NTS THAT-T SI THAT-T yesyes 24 22 5 14 7 4 yesno 0 49 3 25 2 0 noyes 15 11 2 1 0 3 nono 2 15 1 1 1 4

According to Gilligan, the data in (11) reveal that the only robust correlations among the four features are the following: (12) a. b. c. d. NULL TS SI SI THAT-T NULL NTS NULL NTS THAT-T NULL NTS

2. The most extensive published discussion of Gilligans work that I am aware of is found in Croft (2003: 8084).

Three approaches to exceptionality in syntactic typology

261

These results are not very heartening for either Rizzis theory nor for Sars, nor, indeed, for any which sees in null subject phenomena a rich clustering of properties. In three of the four correlations, null nonthematic subjects are entailed, but that is obviously a simple consequence of the virtual nonexistence of languages that manifest overt nonthematic subjects. Even worse, ve language types are attested whose existence neither theory predicts. Current work on null subjects pretty much ignores the clustering issue and therefore (necessarily) the questions of exceptions to the predicted clusterings. To take another example of a failed prediction of clustering within GB, Kayne (1984) links parametrically the following four properties of French, all of which differ from their English counterparts: (13) a. b. c. d. The assigning of oblique case by prepositions (as opposed to objective case) (avec lui/*le) The impossibility of Preposition-stranding (*Qui as-tu parl ?) The impossibility of Exceptional Case Marking (*Je crois Jean tre sage) The impossibility of Dative Shift (*Jai donn Marie un livre)

For Kayne, then, it would be exceptional to nd a language that allowed Preposition-Stranding, but disallowed Dative Shift. Unfortunately, Kaynes parameter appears to make incorrect predictions crosslinguistically. For example, many English-based creoles lack stranding, as the following examples from Sranan illustrate (Muysken and Law 2001: 53): (14) a. b. koti a nanga san u with what you cut the *san u koti a brede what you cut the bread brede? bread nanga? with

Yet in Sranan there is no evidence for distinguishing objective from oblique case. Saramaccan is also in conict with the parameter, in that oblique Case and stranding are missing, yet it does have Exceptional Case Marking and double object constructions (Veenstra 1996). Also, Kaynes parametric account does not distinguish elegantly between Icelandic, a case-rich stranding language, from Southern German and some Slavic languages, also case-rich, but nonstranding. Chinese and Indonesian have Dative Shift, but no stranding, while Prince Edward Island French has stranding but no Exceptional Case Marking. And nally, there is experimental work by Stromswold (1988, 1989) and Sugisaki and Snyder (2001) that shows that acquisitional data do not bear out the idea that one parameter is implicated in these processes.

262

Frederick J. Newmeyer

Further impeding an adequate GB-based approach to typological exceptionality is the fact that the notion setting for a particular parameter is not necessarily constant within a single language. The original vision of parameters was an extremely attractive one, in that the set of their settings was conceived of as a checklist for a language as a whole. But the Lexical Parameterization Hypothesis (LPH) has put an end to this vision: (15) Lexical Parameterization Hypothesis (Borer 1984; Manzini and Wexler 1987): Values of a parameter are associated not with particular grammars, but with particular lexical items.

Something like the LPH is certainly necessary. For example, different anaphoric elements in the same language can have different binding domains, as is the case with Icelandic hann and sig. But the LPH forces us to give up the idea that the child simply checks off one parameter setting after another in the process of language acquisition. What is even worse is that different structures in the same language seem to have different settings. For example, Rizzi (1978) tried to capture the differences in extraction possibilities between English and Italian by positing that S is a bounding node for Subjacency in English, but S in Italian. But observe in (16) that English is as permissive as Italian when the extracted wh-element is a direct object, especially if the lowest clause is non nite: (16) This is the car that I dont know how to x.

There has also been very little support for a GB-style parameter-setting model from language acquisition. Actually, most work in the generative tradition simply assumes that acquiring a language is a matter of parameter-setting, rather than providing evidence directly bearing on the question. That is, it takes parameters as a given and raises questions such as: Do parameters have default values?, Can parameters be reset in the course of acquisition?, and so on. Yet a number of factors suggest that a parameter-setting strategy for rst language acquisition is far from the simple task that it is portrayed in much of the literature. Several of the problems for parameters result from the fact that what the child hears are sentences (or, more correctly, utterances), rather than structures. But any given utterance is likely to massively underdetermine the particular structural property that the child needs to set some particular parameter. The greater the number of parameters to be set, the greater the problem, particularly given that few of the parameter settings appear to have unambiguous triggers. Citing Clark (1994), Janet Fodor points out that there is an exponential explosion from the parameters to the number of learning steps to set them If

Three approaches to exceptionality in syntactic typology

263

so, the learner might just as well check out each grammar, one by one, against the input; nothing has been gained by the parameterization. [to] set one parameter could cost the learner thousands or millions of input sentences (Fodor 2001b: 736). What makes the problem even more serious is the fact that children are obviously not born with the ability to recognize triggers for any one particular language. English-speaking, Chinese-speaking, and Japanese-speaking children all need to arrive at a negative setting for the Ergativity Parameter, given its existence, but it is by no means obvious what feature common to the three languages would lead the very young child to arrive at that particular setting. In other words, the fundamental problem is that parameter-setting presupposes some non-negligible degree of prior structural assignment. To illustrate the problem, Hyams (1986) speculates that a child sets the Null Subject Parameter with a negative value when it hears an expletive. But how can the child know what an expletive is without already having a syntax in place? Expletive is not an a priori construct available to the newborn, but is interpreted only with respect to an already existing grammar. But if the grammar is already in place, then why do we need parameters at all?3 Given the hypothesis that parameters are complemented by rules in the marked periphery, the learners task is not simplied by the positing of parameters. As pointed out by Foley and Van Valin (1984: 20), it is made more complicated. Since learners have to acquire rules anyway, they have a double burden: acquiring both rules and parameter settings and guring out which phenomena are handled by which. And one would assume (along with Culicover 1999: 16) that any learning mechanism sophisticated enough to acquire the hard stuff in the periphery would have no trouble acquiring the easy stuff at the core, thereby rendering the notion parameter superuous. Along the same lines, there is no evidence that peripheral knowledge is stored and/or used any differently from that provided by the system of principles and parameters per se. When head-directionality or V2-ness are at stake, do German speakers perform more slowly in reaction time experiments than do speakers of head-consistent non-V2 languages? Do they make more mistakes in everyday speech, say by substituting unmarked constructions for marked ones? Do the marked forms pose comprehension difculties? In fact, is there any evidence whatsoever that such knowledge is dissociable in some way from more core knowledge? As far as I am aware, the answers to all of these questions are no. As Janet Fodor has stressed: The idea that there are two sharply different syntax learning mechanisms at work receives no clear support that I know
3. For similar arguments see Nishigauchi and Roeper (1987); Haider (1993); Valian (1990); and Mazuka (1996).

264

Frederick J. Newmeyer

of from theoretical, psychological, or neurological studies of language (Fodor 2001a: 371). Finally, there is little credence to the idea that there is a robust correlation between the order of acquisition of some feature and its typological status, a fact which casts into doubt the idea that parameters are organized in an implicational hierarchy. Some late-acquired features are indeed typologically relatively rare, as appears to be the case for the verb-raising that derives VSO order from SVO order (see Guilfoyle 1990 for Irish; Radford 1994 for Welsh; Ouhalla 1991b for Arabic). But other grammatical features appear to be acquired relatively late, without being typologically rare (see Eisenbeiss 1994 on scrambling in German). In general, however, children acquire the relevant structures of their language quite early, regardless of how common that structure is crosslinguistically. Hence English-speaking children acquire P-stranding before pied-piping (Sugisaki and Snyder 2001). French-speaking children have verb-raising from the earliest multi-word utterances (Dprez and Pierce 1993; Pierce 1992; Meisel and Mller 1992; Verrips and Weissenborn 1992). English-speaking children never manifest verb-raising (Stromswold 1990; Harris and Wexler 1996). Furthermore, children gure out very early whether their language is null subject or not (Valian 1991) and children acquiring English, German, and French evidence strong knowledge of locality in wh-extraction domains at early ages (Roeper and De Villiers 1994).4 Before leaving GB, I want to mention one more proposal for handling typological exceptionality within that framework. Baker (2001) suggests attributing typological consistency to purely grammatical factors and typologically inconsistent behavior to extragrammatical causes, just as physicists attribute to extraneous factors such as air resistance the fact that objects are not observed to fall to earth at the rate of 9.8 meters per second squared. Baker noted that the Ethiopian language Amharic is SOV, yet is exceptional for an SOV language in having prepositions. Baker writes:
Languages that are close to the ideal types are much more common than languages that are far from them. According to the statistics of Matthew Dryer, only 6 percent of languages that are generally verb nal are like Amharic in having prepositions rather than postpositions. The conict of historical and geograph4. On the other hand, a number of language acquisition researchers continue to provide evidence for the idea, rst articulated in Borer and Wexler (1987), that principles of grammar mature with age (see, for example, Babyonyshev, Ganger, Pesetsky and Wexler (2001). For an interesting, albeit brief, overview of the issues involved see Smith and Cormack (2002).

Three approaches to exceptionality in syntactic typology

265

ical inuences could partially explain why Amharic is a mixed case. (Baker 2001: 8283)

As an initial point to make in response to this quote, Bakers 6 percent is somewhat misleading, perhaps inviting the reader to conclude that 94 percent of languages are typologically consistent. But when the totality of the typological generalizations are taken into account, very few, if any, languages are exception-free typologically. In fact, I agree with Bakers point that historical and geographical inuences are at the root of much typological inconsistency But the analogy between Amharics having prepositions and leaves not falling to earth at the predicted rate seems far-fetched. Principles that predict the rate of falling bodies and those that predict the effects of air resistance belong to two different physical systems. Is there any evidence that an Amharic speakers knowledge that auxiliaries follow verbs in that language (as is typically the case for SOV languages) and their knowledge that it is prepositional (which is rare for SOV languages) belong to two different cognitive systems? No, there is absolutely no evidence whatsoever for such an hypothesis. In summary, the GB program for typology has been abandoned by all but a few scholars. The hypothesized clustering of typological properties based on the positing of abstract parameter settings appears not to exist. Even worse, it appears that parameter settings need to be attributed to individual lexical items (and possibly even to constructions), rather than to entire languages. Since this program has not succeeded in capturing typological regularity, it can be dismissed as a contender for a theory capable of capturing exceptions to typological generalizations. 3. The microparametric approach of the MP

In the MP, parameter settings are not associated with principles of UG that hold for an entire language, but rather with particular functional projections present or not present in a particular language. Languages are posited to differ in terms of which functional projections are manifest and in terms of the featural content of these projections. By way of illustration, McCloskey (2002) argues that whether or not a language makes productive use of resumptive pronouns depends on the inventory of Complementizer-type elements in the language, and in the analysis of Pesetsky and Torrego (2001), whether or not a language has Subject-Aux Inversion depends on the featural properties of the COMP node. Going along with this shift in the interpretation of where parameters reside is a shift from a focus on macroparameters to one on microparameters. The latter are, essentially, slight differences in the properties of functional heads

266

Frederick J. Newmeyer

that are responsible for minute difference in structure between closely related languages and dialects. But what about the handling of typological exceptionality in the MP? In fact, it is not at all clear. The basic ontology of the MP is very sparse, consisting essentially of the basic operations of Merge and Move, subject to economy conditions of various sorts. The inventory of basic operations is so pared down and minimal that there is no elegant way for the syntactic component per se to distinguish typologically regular processes from typologically exceptional ones. Everything essentially boils down to idiosyncratic properties of the lexicon, in particular, to functional heads in the lexicon and their projections. Let me provide a concrete example based on Cinque (1994). Cinque presents a minimalist analysis of certain differences between French and English, as depicted in (17)(19): (17) (18) (19) a. b. a. b. a. b. c. un gros ballon rouge a big red ball un tissu anglais cher an expensive English fabric an old friend (= friend who is aged or friend for a long time) une vieille amie (= friend for a long time) une amie vieille (= friend who is aged)

Cinques analysis of (17)(19) is summarized in (20ac): (20) a. French has postnominal adjectives (as in 17a) because of a parametric difference with English that allows N-movement to a higher functional projection in the former language, but not in the latter. Cher has scope over anglais in (18a) because French has a parametric difference with English that triggers movement of a N-ADJ constituent. In (19), the two positions for vieille in French, but only one for old in English, result from a parametric difference between the two languages regarding the feature attraction possibilities of functional categories in the two languages.

b.

c.

As Bouchard (2003) points out, the problem with such an account is that the word parameter is used as nothing more than a synonym for the word rule. There is no increase in descriptive elegance, economy, or whatever in Cinques account over an account which does no more than say that English and French

Three approaches to exceptionality in syntactic typology

267

have different rules of adjective placement. And most importantly for our purposes, there is nothing in Cinques minimalist account that would begin to explain why N-Adj order is quite a bit more common than Adj-N order crosslinguistically. Consider some data from Dryer (1988: 188189), which show that for all three basic word orders, more languages manifest N-before-Adj than Adj-before-N order: (21) (22) (23) a. b. a. b. a. b. SOV & AdjN SOV & NAdj SVO & AdjN SVO & NAdj VSO & AdjN VSO & NAdj 64 languages 94 languages 23 languages 67 languages 15 languages 24 languages

Note that an SVO language with AdjN (like English) order is exceptional only 23 % of languages manifest that correlation. That result does not follow from Cinques MP-based approach. In GB, at least in principle, the number of parameters and their settings was small. In the MP, the number seems open-ended. How many parameters are in fact necessary in the MP? It is possible to make a rough count, given the assumption that there is one binary setting for each functional head. And how many functional heads are there? If Cinque (1999) is right, there are at least 32 functional heads in the IP domain alone. On the basis of a look at fteen languages, fourteen of them Indo-European (from only four subfamilies), Longobardi (2003) proposes 30 binary parameters for DP. Cinque (1994) divides Adjective Phrase into at least ve separate maximal projections encoding Quality, Size, Shape, Color, and Nationality. Beghelli and Stowell (1997) break down Quantier Phrase into projections headed by Wh, Neg, Distributive, Referential, and Share. CP has also been split into a dozen or more projections, including ForceP, FocusP, and an indenite number of Topic Phrases (Rizzi 1997). Facts pertaining to clitic inversion and related phenomena in some northern dialects of Italian have led to the positing of Left Dislocation Phrase, Number Phrase, Hearer Phrase, and Speaker Phrase (Poletto 2000). Damonte (2004) proposes projections corresponding to the set of thematic roles, including Reciprocal, Benefactive, Instrumental, Causative, Comitative, and Reversive Phrases. We have seen Verb Phrase split into two projections, one headed by V and the other by v (Chomsky 1995). Zanuttini (2001) posits four distinct Negative Phrase projections for Romance alone and McCloskey (1997) argues that at least three subject positions are needed. The positing of a new functional projection (and hence a new parameter) to capture any structural difference between two lan-

268

Frederick J. Newmeyer

guages has led to what Ackerman and Webelhuth (1998: 225) have aptly called the diacriticization of parameters. Other proposals have led to a potentially exponential increase in the number of functional projections and their interrelationships, and hence in the number of parameters. For example, Giorgi and Pianesi (1997) have mooted the possibility of syncretic categories, that is, those that conate two or more otherwise independent ones, as, for example, TP/AgrP. Along similar lines, Bobaljik (1995); Thrinsson (1996); and Bobaljik and Thrinsson (1998) suggest that languages differ not only in terms of the settings of their parameters, but also in terms of the presence, or not, of particular functional categories (see also Fukui 1995). Such a proposal leads to at least a ternary value for each parameter: positive, negative, or not applicable. Complicating things still further, Ouhalla (1991a) argues that an important dimension of parametric variation among languages is the relative ordering of embedding of functional categories. So for example, in his analysis, in Berber and Chamorro, the AgrP projection is below the TnsP projection, while in English and Welsh, TnsP is below AgrP. One might, of course, argue along with Cinque and contra Ouhalla that the ordering among functional categories is universal. In that view, languages would differ parametrically in their lexicalization possibilities, some functional categories being lexicalized in some languages, but not in others. However, transferring the parametric choice to the lexicon neither decreases the number of potential parameters nor gives them an edge over rules. First, the number of parameters is not reduced, since the burden of specifying whether a functional category is present in a particular language or not has merely been transferred to the lexicon. Second, the statement that some language makes the parametric choice that lexical item L licenses functional projection P is indistinguishable from the statement that there is a language-particular rule involving L that species P. In order to account for parametric variation and the some number substantially greater than 5 billion grammars that might exist in the world (Kayne 2000: 8), Kayne calculates that only 33 binary-valued parameters would be needed. His math may be right, but from that fact it does not follow that only 33 parameters would be needed to capture all of the microvariation that one nds in the worlds languages and dialects. In principle, the goal of a parametric approach is to capture the set of possible human languages, not the set (however large) of actually existing ones. One can only speculate that the number of such languages is in the trillions or quadrillions. In any event, Kaynes own work suggests that the number of parameters is vastly higher than 33. Depending on precisely what counts as a parameter (Kayne is not always clear on that point), just to characterize the difference among the Romance dialects discussed in the

Three approaches to exceptionality in syntactic typology

269

rst part of Kayne (2000) with respect to clitic behavior, null subjects, verb movement, and participle agreement would require several dozen distinct parameters. It is hard to avoid the conclusion that characterizing just a few more differences among the dialects would lead to dozens of new parameters. If the number of parameters needed to handle the different grammars of the worlds languages, dialects, and (possibly) idiolects is in the thousands (or, worse, millions), then ascribing them to an innate UG to my mind loses all semblance of plausibility. True, we are not yet at the point of being able to prove that the child is not innately equipped with 7846 (or 7,846,938) parameters, each of whose settings is xed by some relevant triggering experience. I would put my money, however, on the fact that evolution has not endowed human beings in such an exuberant fashion. In other words, despite its claims to the contrary, the MP takes us back to the old idea that languages differ from each other simply by having different rules a solution that does nothing to distinguish typologically common processes from the exceptional ones. Recall that the great promise of parametric theory was its seeming ability to provide a generative approach to language typology, that is, to be able to characterize the difference from one language to the next by means of differences in parameter settings. The LPH, which is central to the MP, dashes all hope that this promise might be fullled. Puzzlingly from my point of view, relocating the site of parametric variation from grammars of entire languages to lexical items and their associated functional categories is often portrayed as a major step forward. For example, Pierre Pica writes that this move allows a radical simplication of the nature and design of UG (Pica 2001: vi). But the price paid for this radical simplication is both an explosion in the number of functional categories needed to be posited within UG and, more seriously, the transfer of the burden for accounting for language-particular differences from properties of UG per se to idiosyncratic properties of lexical entries in particular languages. In earlier versions of principles-and-parameters syntax (and in current versions such as Mark Bakers) a given language L was posited to have a particular setting for the Head Directionality Parameter, the Serial Verb Parameter, and so on. But now, in principle it is individual lexical items in L that need to be specied as to how they relate to head directionality, serial verbs, and so on. That brings us back in effect to the earliest versions of transformational grammar, where each lexical item bore a set of tags indicating each rule that it governed or failed to govern. I certainly agree with Pica (2001) that twenty years of intensive descriptive and theoretical research has shown that macroparameters do not exist. But we have to regard that conclusion as a cause for disappointment, not rejoicing.

270

Frederick J. Newmeyer

To summarize, as far as the ability to capture typological exceptionality is concerned, the MP represents a step backward from GB. The latter approach had a program for capturing exceptionality, albeit a awed one. The MP does not even have a program aimed at such a result. 4. An extrasyntactic approach to typological generalizations and their exceptions

In this section I advocate a very different approach to capturing typological generalizations and exceptions to them. The burden for handling both is shifted from UG to performance principles that are sensitive to grammatical structure. A wide variety of performance principles have been proposed in the literature some, in my view, convincing, and some less so and it is not my purpose here to review them all. In fact, I will focus on only one, the parsing principle of Minimize Domains, proposed in Hawkins (2004): (24) Minimize Domains (Hawkins 2004): The hearer (and therefore the parsing mechanism) prefers orderings of elements that lead to the most rapid recognition possible of the structure of the sentence.

In short, there is performance-based pressure for language users to identify constituents of a phrase as rapidly as possible. To illustrate, consider the tendency of heads consistently to precede complements or to follow complements. As we have seen, formal approaches have provided a head-parameter provided by UG. But the performance basis of this generalization seems quite straightforward and follows directly from Minimize Domains. Consider a VO language like English, where heads typically precede complements: (25) V-NP, P-NP, A-of-NP, N-of-NP

In each case a lighter head precedes a heavier complement; putting the heavier phrasal complement after the lighter lexical head allows for a quicker recognition of all of the constituents of the dominating phrase. In fact, the lightbefore-heavy tendency in the grammar involves far more than the head-complement relation. For example, the canonical order of VP constituents is relentlessly lighter-to-heavier: (26)
VP [V-NP-PP-CP]

(convince my students of the fact that all grammars

leak)

Three approaches to exceptionality in syntactic typology

271

Also notice that single adjectives and participles can appear in pre-head position in English: (27) a. b. a silly proposal the ticking clock

But if these adjectives and participles themselves have complements, the complements have to appear in post-head position: (28) (29) a. *a sillier than any Ive ever seen proposal b. a proposal sillier than any Ive ever seen a. *the ticking away the hours clock b. the clock ticking away the hours

The evidence for a performance, rather than for a UG, basis of the light-beforeheavy tendency is based on the fact that when speakers have a choice in a VOtype language, they tend to put shorter before longer constituents. So, except for cases in which there is a strong lexical relation between V and P, PPs can typically occur in any order after the verb: (30) a. b. Mary talked to John about Sue. Mary talked to Sue about John.

But all other things being equal, the greater the length differential between the two PPs, the more likely speakers will be to put the shorter one rst (Hawkins 1994).5 Interestingly, Hawkinss approach makes precisely the opposite length and ordering predictions for head-nal languages. And to be sure, there is a heavy-before-light effect in those languages, both in language use and in the grammar itself. Now then, where do exceptions t into the picture? Minimize Domains predicts straightforwardly that a VO language should be prepositional and that an OV language should be postpositional. And indeed, such is generally the case. As is shown in Dryer (1992), 94 % of OV languages are postpositional and 85 % of VO languages are prepositional.6 The exceptional nature of a prepositional OV language (like Amharic) and a postpositional VO language (like Finnish) follows directly. To illustrate, consider the four logical possibilities, illustrated

5. The discourse status of the elements involved also plays a role in ordering (see Arnold, Wasow, Losongco and Ginstrom 2000; Hawkins 2003). 6. To be accurate, Dryers count involves genera genetic groups roughly comparable in time depth to subfamilies of European not languages per se.

272

Frederick J. Newmeyer

in (31ad): VO and prepositional (31a); OV and postpositional (31b); VO and postpositional (31c); and OV and prepositional (31d): (31)

Let us assume with Hawkins, that grammars are organized so that users can recognize the major constituents of a phrase as rapidly as possible. In (31a) and (31b), the two common structures, the recognition domain for the VP is just the distance between V and P, crossing over the object NP. But in (31c) and (31d), the uncommon structures, the recognition domain is longer, in that it involves the object of the preposition as well. So both regularity and exceptionality follow naturally in this approach. The exceptional cases are simply those that fail to be in accord with the principle of Minimize Domains. One might object that exceptions pose as great a challenge for parsing principles as for UG principles after all, in both cases, some theory-based generalization has been violated. But one expects performance principles to admit exceptions. Rather than being like the either-or (or yes-no) switch settings inherent to UG parameters, they are part-and-parcel of a theory of language use. And nobody, as far as I know, believes that an algebraic theory sufces to explain facts about language use. Rather, usage-based generalizations are generalizations about populations (whether of speakers or of languages). To give an analogy, the generalization that cigarette smoking causes lung cancer is not

Three approaches to exceptionality in syntactic typology

273

threatened by the fact that there exist (exceptional) individuals who smoke ve packs of cigarettes per day over their lifetimes and do not develop lung cancer. The rare OV, yet prepositional, languages are parallel, in crucial respects, to these individuals.7 Consider another example of a robust, but not exception-free, typological generalization. Hawkins (1983) proposed the following hierarchy: (32) Prepositional Noun Modier Hierarchy (PrNMH): If a language is prepositional, then if RelN then GenN, if GenN then AdjN, and if AdjN then DemN.

The PrNMH states that if a language allows long things to intervene between a preposition and its object, then it allows short things. This hierarchy predicts the possibility of prepositional phrases with the structures depicted in (33) (along with an exemplifying language): (33) a. b. c. d. e. (Arabic, Thai) [P NP [___N]; PP [P NP [Dem N] (Masai, Spanish) PP PP [P NP [___N]; PP [P NP [Dem N]; PP [P NP [Adj N] (Greek, Maya) PP [P NP [___N]; PP [P NP [Dem N]; PP [P NP [Adj N]; PP [P NP [PossP N] (Maung) PP [P NP [___N]; PP [P NP [Dem N]; PP [P NP [Adj N]; PP [P NP [PossP N]; PP [P NP [Rel N] (Amharic)
PP [P NP [___N]

The Minimize Domains-based explanation of the hierarchy is straightforward. The longer the distance between the P and the N in a structure like (34), the longer it takes to recognize all the constituents of the PP. Given the idea that grammars try to reduce the recognition time, the hierarchy follows: (34) P X PP NP N

Since relative clauses tend to be longer than possessive phrases, which tend to be longer than adjectives, which tend to be longer than demonstratives, which are always longer than silence, the hierarchy is predicted on parsing grounds.
7. See the Featherston and Wasow, et al. papers in this volume for interesting discussions of how stochastic generalizations bear on the handling of seeming typological exceptionality.

274

Frederick J. Newmeyer

It is far from clear how this generalization might be captured by means of parameters, whether macroparameters or microparameters. There are a few exceptions to the PrNMH. Hawkins (1994) reports that in the prepositional Sino-Tibetan language Karen, genitives are the only daughters of NP to precede N and (citing unpublished work by Matthew Dryer), he points to a small number of prepositional languages (e.g. Sango) in which AdjN cooccurs with NDem. Again, a small number of exceptions to a performancebased principle are entirely to be expected. Let us nish this section with one more example of a typological generalization that has a performance-based explanation. As observed in (5d) above, verb-nality is accompanied by wh-elements being in situ, though there are a signicant number of exceptions (29 %). The parsing explanation of this generalization is straightforward. Heads, in general, are the best identiers of their subcategorized arguments. If one hears the verb give, for example, one is primed to expect two associated internal arguments, one representing a recipient and the other an object undergoing transfer. On the other hand, a human NP might or might not be a recipient and an inanimate NP might or not be an object undergoing transfer. Hence, if arguments precede their heads, as they do in SOV languages, extra cues are useful to identify their thematic status. Such can be accomplished by keeping them contiguous to the head (that is, by restricting their movement possibilities) and / or by endowing them with case marking that uniquely identies their thematic role or helps to narrow down the possibilities. The question naturally arises (both for parametric accounts and for performance-based accounts) of why there are so many exceptions to this generalization. I have nothing to offer in terms of an answer to this question, except to suggest that there must be a countervailing performance pressure for all languages to front wh-elements The focusing property of wh-elements immediately comes to mind as a basis for why they so often occur fronted. However, I readily concede that without a precise characterization of the nature of the pressure to front the focus of questioning and why this pressure tends to be weaker than the pressure for arguments to remain in situ in OV languages, my suggestion amounts to little more than hand-waving. It is worth pointing out by way of summary that there is a built-in advantage to parsing-based explanations of grammatical structure that one does not nd with UG-based explanations. In a nutshell, the advantage to parsing rapidly can hardly be controversial. We know that parsing is fast and efcient. Every word has to be picked out from an ensemble of 50,000, identied in one third of a second, and put into the right structure. It simply makes sense that parsing-pressure would have left its mark on grammatical structure. Further-

Three approaches to exceptionality in syntactic typology

275

more, performance-based solutions allow the grammar itself to be kept cleaner. As Stefan Frisch has noted:
For the traditional formalist, it is actually desirable for some linguistic patterns, especially those that are gradient, to be explained by functional principles. The remainder, once language processing inuences are factored out, might be a simpler, cleaner, and more accurate picture of the nature of the innate language faculty and its role in delimiting the set of possible human languages. (Frisch 1999: 600)

I agree completely. 5. Conclusion

This paper has focused on exceptionality in syntactic typology, that is, the means of handling exceptions to broad typological generalizations. Three approaches were considered: The macroparametric approach of the Government-Binding theory; the microparametric approach of the Minimalist Program; and an approach that attempts to handle typological generalizations (and exceptions to them) by parsing and other extra-syntactic mechanisms. The GB approach, as a priori appealing as it is, has simply not been borne out by the empirical evidence. On the other hand, the MP approach to typology seems to boil down to nothing more than saying that some languages have one set of functional projections and other languages have another set of functional projections, without explaining why more languages would do things one way than another way. I hope to have shown that a processing-based approach shows the greatest degree of promise in handling the exceptionality that one nds in syntactic typology. References
Ackerman, Farrell, and Gert Webelhuth 1998 A Theory of Predicates. Stanford, CA: CSLI Publications. Aoun, Joseph, Norbert Hornstein, David Lightfoot, and Amy Weinberg 1987 Two types of locality. Linguistic Inquiry 18: 537578. Arnold, Jennifer E., Thomas Wasow, Anthony Losongco, and Ryan Ginstrom 2000 Heaviness vs. newness: The effects of structural complexity and discourse status on constituent ordering. Language 76: 2855. Babyonyshev, Maria, Jennifer Ganger, David M. Pesetsky, and Ken Wexler 2001 The maturation of grammatical principles: Evidence from Russian unaccusatives. Linguistic Inquiry 32: 143.

276

Frederick J. Newmeyer

Baker, Mark C. 2001

The Atoms of Language: The Minds Hidden Rules of Grammar. New York: Basic Books.

Beghelli, Filippo, and Timothy A. Stowell 1997 Distributivity and negation: The syntax of each and every. In Ways of Scope Taking, Anna Szabolcsi (ed.), 71107. Dordrecht: Kluwer. Bobaljik, Jonathan D. 1995 Morphosyntax: The syntax of verbal inection. Ph. D. diss., MIT. Bobaljik, Jonathan D., and Hskuldur Thrinsson 1998 Two heads arent always better than one. Syntax 1: 3771. Borer, Hagit 1984 Parametric Syntax: Case Studies in Semitic and Romance Languages. Dordrecht: Foris.

Borer, Hagit, and Kenneth Wexler 1987 The maturation of syntax. In Parameter Setting, Thomas Roeper, and Edwin Williams (eds.), 123172. Dordrecht: Reidel. Bouchard, Denis 2003 The origins of language variation. Linguistic Variation Yearbook 3: 141. Chao, Wynn 1981 PRO-drop languages and nonobligatory control. University of Massachusetts Occasional Papers 6: 4674.

Chomsky, Noam 1957 Syntactic Structures. The Hague: Mouton. Chomsky, Noam 1965 Aspects of the Theory of Syntax. Cambridge, MA: MIT Press. Chomsky, Noam 1981 Lectures on Government and Binding. Dordrecht: Foris. Chomsky, Noam 1988 Language and Problems of Knowledge: The Managua Lectures. Cambridge, MA: MIT Press. Chomsky, Noam 1995 The Minimalist Program. Cambridge, MA: MIT Press. Cinque, Guglielmo 1994 On the evidence for partial N movement in the Romance DP. In Paths towards Universal Grammar, Guglielmo Cinque, Jan Koster, JeanYves Pollock, Luigi Rizzi, and Raffaella Zanuttini (eds.), 85110. Washington: Georgetown University Press.

Three approaches to exceptionality in syntactic typology

277

Cinque, Guglielmo 1999 Adverbs and Functional Heads: A Cross-linguistic Perspective. Oxford: Oxford University Press. Clark, Robin 1994 Finitude, boundedness, and complexity. In Syntactic Theory and First Language Acquisition: Cross-linguistic Perspectives. Vol. 2: Binding, Dependencies, and Learnability, Barbara Lust, Gabriella Hermon, and Jaklin Kornlt (eds.), 473489. Hillsdale, NJ: Erlbaum. Typology and Universals. 2nd ed. Cambridge: Cambridge University Press.

Croft, William 2003

Culicover, Peter W. 1999 Syntactic Nuts: Hard Cases, Syntactic Theory, and Language Acquisition. Oxford: Oxford University Press. Damonte, Federico 2004 The thematic eld: The syntax of valency-enriching morphology. Ph. D. diss., University of Padua. Dprez, Viviane, and Amy Pierce 1993 Negation and functional projections in early grammar. Linguistic Inquiry 24: 2567. Dryer, Matthew S. 1988 Object-verb order and adjective-noun order: Dispelling a myth. Lingua 74: 185217. Dryer, Matthew S. 1991 SVO languages and the OV:VO typology. Journal of Linguistics 27: 443482. Dryer, Matthew S. 1992 The Greenbergian word order correlations. Language 68: 81138. Eisenbeiss, Sonia 1994 Kasus und Wortstellungsvariation im deutschen Mittelfeld. In Was determiniert Wortstellungsvariation?, Brigitta Haftka (ed.), 277298. Opladen: Westdeutscher Verlag. [special issue of Linguistische Berichte] Fodor, Janet D. 2001a Fodor, Janet D. 2001b Parameters and the periphery: Reections on syntactic nuts. Journal of Linguistics 37: 367392. Setting syntactic parameters. In The Handbook of Contemporary Syntactic Theory, Mark Baltin, and Chris Collins (eds.), 730767. Oxford: Blackwell.

278

Frederick J. Newmeyer

Foley, William A., and Robert D. Van Valin 1984 Functional Syntax and Universal Grammar. Cambridge: Cambridge University Press. Frisch, Stefan 1999 [Review of Thomas Berg, Linguistic Structure and Change: An Explanation from Language Processing]. Journal of Linguistics 35: 597601. Theory of Projection in Syntax. Stanford, CA: CSLI Publications.

Fukui, Naoki 1995

Gilligan, Gary M. 1987 A cross-linguistic approach to the pro-drop parameter. Ph. D. diss., University of Southern California. Giorgi, Alessandra, and Fabio Pianesi 1997 Tense and Aspect: From Semantics to Morphosyntax. Oxford: Oxford University Press. Greenberg, Joseph H. 1963 Some universals of language with special reference to the order of meaningful elements. In Universals of Language, Joseph Greenberg (ed.), 73113. Cambridge, MA: MIT Press. Guilfoyle, Eithne 1990 Functional categories and phrase structure parameters. Ph. D diss., McGill University. Haider, Hubert 1993 Principled variability: Parameterization without parameter xing. In The Parametrization of Universal Grammar, Gisbert Fanselow (ed.), 116. Amsterdam: John Benjamins.

Harris, Tony, and Ken Wexler 1996 The optional-innitive stage in Child English: Evidence from negation. In Generative Perspectives on Language Acquisition: Empirical Findings, Theoretical Considerations, and Crosslinguistic Comparisons, Clahsen, Harald (ed.), 142. Amsterdam: John Benjamins. Hawkins, John A. 1983 Word Order Universals. New York: Academic Press. Hawkins, John A. 1994 A Performance Theory of Order and Constituency. Cambridge: Cambridge University Press. Hawkins, John A. 2004 Efciency and Complexity in Grammars. Oxford: Oxford University Press.

Three approaches to exceptionality in syntactic typology

279

Huang, C.-T. James 1982 Logical relations in Chinese and the theory of grammar. Ph. D. diss., MIT. Huang, C.-T. James 1984 On the distribution and reference of empty pronouns. Linguistic Inquiry 15: 531574. Hyams, Nina M. 1986 Language Acquisition and the Theory of Parameters. Dordrecht: Reidel. Kayne, Richard S. 1984 Connectedness and Binary Branching. Dordrecht: Foris. Kayne, Richard S. 2000 Parameters and Universals. Oxford: Oxford University Press. Lakoff, George 1965/1970 Irregularity in Syntax. New York: Holt, Rinehart, and Winston.

Lasnik, Howard, and Mamoru Saito 1984 On the nature of proper government. Linguistic Inquiry 15: 235290. Longobardi, Giuseppe 2003 Methods in parametric linguistics and cognitive history. Linguistic Variation Yearbook 3: 101138. Manzini, M. Rita, and Kenneth Wexler 1987 Parameters, binding, and learning theory. Linguistic Inquiry 18: 413 444. Mazuka, Reiko 1996 Can a grammatical parameter be set before the rst word? Prosodic contributions to early setting of a grammatical parameter. In Signal to Syntax: Bootstrapping from Speech to Grammar in Early Acquisition, Jerry L. Morgan, and Katherine Demuth (eds.), 313330. Mahwah, NJ: Erlbaum.

McCloskey, James 1997 Subjecthood and subject positions. In A Handbook of Theoretical Syntax, Liliane Haegeman (ed.), 197236. Dordrecht: Kluwer. McCloskey, James 2002 Resumption, successive cyclicity, and the locality of operations. In Derivation and Explanation, Samuel David Epstein, and Daniel Seeley (eds.), 184226. Oxford: Blackwell. Meisel, Jrgen, and N. Mller 1992 Finiteness and verb placement in early child grammars. In The Acquisition of Verb Placement: Functional Categories and V2 Phenomena

280

Frederick J. Newmeyer in Language Acquisition, Jrgen Meisel (ed.), 109138. Dordrecht: Kluwer.

Muysken, Pieter, and Paul Law 2001 Creole studies: A theoretical linguists eld guide. Glot International 5: 4757. Newmeyer, Frederick J. 2004 Against a parameter-setting approach to language variation. Linguistic Variation Yearbook 4: 181234. Newmeyer, Frederick J. 2005 Possible and Probable Languages: A Generative Perspective on Linguistic Typology. Oxford: Oxford University Press. Nishigauchi, Taisuke and Thomas Roeper 1987 Deductive parameters and the growth of empty categories. In Parameter Setting, Thomas Roeper, and Edwin Williams (eds.), 91121. Dordrecht: Reidel. Ouhalla, Jamal 1991a Ouhalla, Jamal 1991b Functional Categories and Parametric Variation. London: Routledge. Functional categories and the head parameter. Paper presented at The 14th GLOW Colloquium.

Pesetsky, David M., and Esther Torrego 2001 T-to-C movement: Causes and consequences. In Ken Hale: A Life in Language, Michael Kenstowicz (ed.), 355426. Cambridge, MA: MIT Press. Pica, Pierre 2001 Pierce, Amy 1992 Poletto, Cecilia 2000 Introduction. Linguistic Variation Yearbook 1: vxii. Language Acquisition and Syntactic Theory: A Comparative Analysis of French and English Child Grammars. Dordrecht: Kluwer. The Higher Functional Field: Evidence from Northern Italian Dialects. Oxford: Oxford University Press.

Radford, Andrew 1994 Clausal projections in early child grammars. Essex Research Reports in Linguistics 3: 3272. Rizzi, Luigi 1978 Violations of the wh-island constraint in Italian and the subjacency condition. In Montreal Working Papers in Linguistics No 11, C. Du-

Three approaches to exceptionality in syntactic typology

281

buisson, David Lightfoot, and Y. C. Morin (eds.), 4976. - Reprinted in Luigi Rizzi, Issues in Italian Syntax. Dordrecht: Foris. 1982) Rizzi, Luigi 1982 Rizzi, Luigi 1997 Issues in Italian Syntax. Dordrecht: Foris. The ne structure of the left periphery. In Elements of Grammar: Handbook of Generative Syntax, Liliane Haegeman (ed.), 281337. Dordrecht: Kluwer.

Roeper, Thomas, and Jill De Villiers 1994 Lexical links in the wh-chain. In Syntactic Theory and First Language Acquisition: Cross-linguistic Perspectives. Vol. 2: Binding, Dependencies, and Learnability, Barbara Lust, Gabriella Hermon, and Jaklin Kornlt (eds.), 357390. Hillsdale, NJ: Erlbaum. Sar, Kenneth J. 1985 Syntactic Chains. Cambridge: Cambridge University Press. Sapir, Edward 1921 Language. New York: Harcourt, Brace, and World.

Siewierska, Anna, and Dik Bakker 1996 The distribution of subject and object agreement and word order type. Studies in Language 20: 115161. Smith, Neil, and Annabel Cormack 2002 Parametric poverty. Glot International 6: 285287. Stromswold, Karin 1988 The acquisitional implications of Kaynes theory of prepositions. Unpublished ms., MIT. Stromswold, Karin 1989 Using naturalistic data: Methodological and theoretical issues (or how to lie with naturalistic data). Paper presented at the 14th Annual Boston University Child Language Conference, October 1315, 1989. Stromswold, Karin 1990 Learnability and the acquisition of auxiliaries. Ph. D. diss., MIT. Sugisaki, Koji, and William Snyder 2001 Preposition stranding and double objects in the acquisition of English. Proceedings of the Second Tokyo Conference on Psycholinguistics, 209225. Thrinsson, Hskuldur 1996 On the (non)-universality of functional categories. In Minimal Ideas: Syntactic Studies in the Minimalist Framework, Werner Abraham, Samuel David Epstein, Hskuldur Thrinsson, and C. Jan-Wouter Zwart (eds.), 253281. Amsterdam: John Benjamins.

282

Frederick J. Newmeyer

Travis, Lisa 1989

Parameters of phrase structure. In Alternative Conceptions of Phrase Structure, Mark R. Baltin, and Anthony S. Kroch (eds.), 263279. Chicago: University of Chicago Press.

Valian, Virginia V. 1990 Logical and psychological constraints on the acquisition of syntax. In Language Processing and Language Acquisition, Lyn Frazier, and Jill De Villiers (eds.), 119145. Dordrecht: Kluwer. Valian, Virginia V. 1991 Syntactic subjects in the early speech of Italian and American children. Cognition 40: 2181. van Riemsdijk, Henk 1978 A Case Study in Syntactic Markedness: The Binding Nature of Prepositional Phrases. Dordrecht: Foris. Veenstra, Tonjes 1996 Serial verbs in Saramaccan: Predication and creole genesis. Ph. D. diss., Leiden University. Verrips, M., and Jrgen Weissenborn 1992 The acquisition of functional categories reconsidered. Ms. Zanuttini, Raffaella 2001 Sentential negation. In The Handbook of Contemporary Syntactic Theory, Mark Baltin, and Chris Collins (eds.), 511535. Oxford: Blackwell.

Remarks on three approaches to exceptionality in syntactic typology Artemis Alexiadou

1.

Introduction

Newmeyer (this volume) contrasts three approaches to handling exceptionality in syntactic typology: The macroparametric approach associated with the Government-Binding Theory (GB); the microparametric approach associated with the Minimalist Program (MP); and an extrasyntactic approach, in which parsing and other performance principles account for typological variation and exceptions to typological generalizations. He argues in detail that the extrasyntactic approach is best motivated. Neymeyers paper proposes to change the general theoretical assumptions according to which certain phenomena are exceptional. The main intuition is that typological generalizations are not free of exceptions. This is unexpected under a parametric approach, while it is expected under a performance based approach, since the domain of performance is less constrained. Since the exceptional or not status of grammatical patterns is highly dependent on the theoretical framework one assumes, a change of assumptions leads to a different picture and analysis of the empirical data. But is this a necessary step in order to deal with exceptions? Roberts and Holmberg (2005) have already raised several points of criticism against the parsing approach, among which the mere fact that it is difcult to evaluate. Since I completely agree with their points, in this brief commentary I would like to concentrate on two phenomena discussed by Newmeyer, as providing evidence both against the macro-parameter approach and the micro-parameter approach. These are the null subject parameter and adjective placement crosslinguistically. I show that a correct examination of the data and of the claims made by the authors working within the parameters model does not necessarily lead to the conclusions Newmeyer drew.

284 2.

Artemis Alexiadou

Null subject parameter

Newmeyerss point is that the correlations proposed by Rizzi (1982) in connection with the null subject parameter have been shown not to hold. As Roberts and Holmberg (2005) point out, the version of the parameter adopted by Newmeyer is that the possibility of null thematic subjects in tensed clauses, null non-thematic subjects, free subject inversion and apparent that-trace effect violations were typologically connected. Naturally the strongest hypothesis is that any language must have all or none of these properties. Newmeyer cites Gilligans (1987) study as showing that these tight correlations do not actually hold. He further mentions languages such as Brazilian Portuguese and Chinese which lack subject inversion but are still considered to be null subject languages, as examples illustrating the falsity of the proposals made within the parameters framework. There are two remarks here. First, what needs to be stresssed as far as Gilligans study is concerned is that it did not show that no correlation is possible, it only showed that a different arrangement exists than perhaps the one initially assumed. The fact that the correlations go a different way does not falsify the validity of Rizzis claims, instead it supports it. Second, while it is true that not all pro-drop languages have identical properties, it is a bit puzzling to mention as counterexamples to the generalization two languages for which it can be established that they are non pro-drop. That Brazilian Portuguese is not a pro-drop language can be seen in (2), where the presence of an overt subject is necessary, although it has been already mentioned in the discourse; see Britto (2000), where the data come from. Data such as (2) are not found in languages like Spanish or Greek, which are characterized as null subject languages. (1) (2) O Joo vai trazer a salada? Joo will bring the salad O Joo, O VINHO *pro/ele vai trazer Joo the wine will bring

If Brazilian Portuguese is not a pro-drop language, then the fact that it lacks subject inversion it is not so surprising. As for Chinese, it is debatable whether such languages are subject drop or rather topic drop, if the latter holds they should not be analysed on a par with languages such as Spanish or Italian. In fact, Huang (1984) argued that null subject in Chinese are identied by an NP in a superordinate clause, while others

Remarks on three approaches to exceptionality in syntactic typology

285

have argued that Chinese pro-drop is actually Topic NP deletion. Hence again here the fact that this language lacks inversion is not surprising. 3. Adjective placement crosslinguistically

One other concrete case that Newmeyer discusses in order to present arguments against microparametric approaches involves adjective placement facts of the type discussed in Cinque (1994). The data below illustrate some differences between French and English: (3) (4) (5) a. b. a. b. a. b. c. un gros ballon rouge a big red ball un tissu anglais cher an expensive English fabric an old friend (= friend who is aged/friend for a long time) une vieille amie (= friend for a long time) une amie vieille (= friend who is aged)

Newmeyer briey summarizes Cinques original proposal. The pattern in (3) is to be understood as resulting from the parametric availability of N-movement: N-movement takes place in French, but not in English. The facts in (4) can be made sense of if N-movement takes place in French but not in English. In (5) we see that both pre-nonminal and post-nominal adjective placement is available in French. Newmeyer says that for Cinque the two positions for vieille in French, but only one for old in English, result from a parametric difference between the two languages regarding the feature attraction possibilities of functional categories in the two languages. To begin with, the English facts in (5a) are discussed in Larson (1998) who makes the point that the two readings cannot be result of the same base structure. Larson argues in detail that in examples like this it is the properties of the N that give rise to ambiguity. In particular, when a noun contains an event argument the adjective can be conceived of as modifying the event, and the individual to which a certain property is being attributed. Others have made the point that Cinques particular analysis of this set of data is not efcient. Apart from the fact that Cinque himself (2005) has revised his analysis, other authors have emphasized that the pattern in (5) cannot be made sense by appealing to N-movement. Rather what is required is to associate patterns such as the above with two different syntactic patterns for modication (see Alexiadou 2001 and Larson 2004). On this view, a version of which is

286

Artemis Alexiadou

pursued by Cinque in his most recent work, UG makes two structures available for modication, and different options are available in the different languages. The one structure involves a relative clause (building on Kayne 1994 and Jacobs and Rosenbaum 1968), giving rise to N-Adj orders in languages like Romance, and the other structure involves so called direct modication, i.e. some form of A-N compound formation. While it is correct that Cinques original account does not explain why N-A orders are more common, two observations are in order here. First, Cinques account did not aim at explaining this. Second, if N-A orders are the result of relative clause formation, we come to a different understanding of the typological tendencies. Since most languages of the world lack adjectives and make use of relative clauses for modication, it does not come as a surprise that exactly the pattern that is related to relative clause formation is the most common crosslinguistically. Thus as soon as we have identied the relevant structures that are available for different readings we can get a better grasp of the phenomena involved and the cause for variation. To conclude, what the research within the parameters approach has taught us is that there is systematic variation and systematic similarity among unrelated languages. It is precisely this mere fact that provides the strongest argument possible in favor of UG; assuming language specic rules or extrasyntactic approaches would leave this factor unaccounted for or a mere accident of nature. Exceptions to the extent that they can be identied can be shown to follow from some other, perhaps yet un-detected correlation. Thus a change of framework does not seem a necessary step in order to deal with exceptions. References
Alexiadou, Artemis 2001 Adjective syntax and noun raising: word order asymmetries in the DP as the result of adjective distribution. Studia Linguistica 55: 217248. Britto, Helena 2000 Syntactic codication of categorical and thetic judgements in Brazilian Portuguese. In Brazilian Portuguese and the Null Subject Parameter, Mary Kato and E. Negro (eds.), Frankfurt/Madrid: Vervuert/ IberoAmericana.

Cinque, Guglielmo 1994 On the evidence for partial N movement in the Romance DP. In Paths towards Universal Grammar, Guglielmo Cinque, Jan Koster, Jean-

Remarks on three approaches to exceptionality in syntactic typology

287

Yves Pollock, Luigi Rizzi and Raffaella Zanuttini (eds.), 85110. Washington: Georgetown University Press. Cinque, Guglielmo 2005 The dual source of adjectives and XP vs. N-raising in the Romance DP. LSA 2005 class notes. Gilligan, Gary M. 1987 A cross-linguistic approach to the pro-drop parameter. Ph. D. dissertation. University of Southern California. Huang, C.T. James 1984 On the distribution and reference of empty pronouns. Linguistic Inquiry 15: 531574. Jacobs, Roderick, and Peter Rosenbaum 1968 English Transformational Grammar. Waltham, MA: Ginn and Company. Kayne, Richard 1994 The Antisymmetry of Syntax. Cambridge, MA: MIT Press.

Larson, Richard 1998 Events and modication in nominals. In Proceedings from Semantics and Linguistic Theory (SALT) VIII, Devon Strolovitch and Aaron Lawson (eds.). Ithaca, NY: Cornell University. Larson, Richard 2004 The projection of DP. Talk given in the Guest lecture series at the University of Stuttgart, January 2004. Rizzi, Luigi 1982 Issues in Italian Syntax. Dordrecht: Foris

Roberts, Ian, and Anders Holmberg 2005 On the role of parameters in Universal Grammar: a reply to Newmeyer. In Organizing Grammar. Linguistic studies in honor of Henk van Riemsdijk. Hans Broekhuis, Norbert Corver, Riny Huybregts, Ursula Kleinhenz and Jan Koster (eds.), 538553. Berlin/New York: Mouton de Gruyter.

A reply to the commentary by Artemis Alexiadou Frederick J. Newmeyer

My principal goal in Three Approaches to Exceptionality in Syntactic Typology (henceforth TAEST) was to motivate a parsing account of typological generalizations and exceptions to them. Surprisingly, Artemis Alexiadou (henceforth AA) has virtually nothing to offer by way of criticism of such an account, remarking that Roberts and Holmberg 2005 have already raised several points of criticism against the parsing approach, among which the mere fact that it is difcult to evaluate (AA). But AA is mistaken; Roberts and Holmberg offer no criticism at all against such an approach. Their comments, like AAs, are devoted entirely to a defense of the parametric approach. They have nothing to say about parsing.1 AA replies to two of the arguments that I advanced in TAEST against the parametric approach to typological generalizations. One of my points was to stress, using the Null-Subject Parameter as an example, that the hoped for clustering of typological properties characterizable by a simple parameter setting seems not to exist. However, AA, in her reply, does not rebut my point by providing a version of this parameter (or any other) where we nd a robust example of clustering of abstract properties. The reader is left wondering whether she even knows of such examples. My other argument addressed by AA was to point out that in much minimalist work the word parameter is used as nothing more than a synonym for the word rule. My concrete example (following Bouchard 2003) was based on the treatment of adjective placement in Cinque (1994). AA challenges Cinques analysis, but does not address the broader point about parameters and rules. In the course of her presentation, AA hypothesizes that N-A orders are the result of relative clause formation, remarking that since most languages of the world lack adjectives and make use of relative clauses for modication, it does not come as a surprise that exactly the pattern that is related to relative clause formation is the most common crosslinguistically. On the other hand, A-N orders involve so called direct modication, i.e. some form of A-N compound
1. For a reply to Roberts and Holmberg see Newmeyer (2005).

290

Frederick J. Newmeyer

formation. Even if AAs facts were correct, the conclusion would be a non sequitur. One could argue just as easily that N-A orders should be rare, since there is no semantic or discourse need for such orders, given the availability of the relative clause option. And A-N orders should be common, since, being quite different in meaning from A-N compounds, they ll a semantic gap. But in any event, AAs facts are wrong. In his introductory overview to the most extensive work on adjectives to date, Dixon (2004) argues that a formal class of Adjective can be identied in every language in the world. AA concludes her piece by remarking that extrasyntactic approaches would leave [systematic variation and systematic similarity among unrelated languages] unaccounted for or a mere accident of nature. But the entirety of section 4 of TAEST (which AA ignores) is devoted to demonstrating that an extrasyntactic approach accounts beautifully for systematic variation and systematic similarity. Parsing principles are universal, so it follows necessarily that languages that are unrelated genetically and/or areally would respond to such principles identically, as far as typological generalizations are concerned. References
Bouchard, Denis 2003 The origins of language variation. Linguistic Variation Yearbook 3: 141. Cinque, Guglielmo 1994 On the evidence for partial N movement in the Romance DP. In Paths towards Universal Grammar, Guglielmo Cinque, Jan Koster, JeanYves Pollock, Luigi Rizzi and Raffaella Zanuttini (eds.), 85110. Washington: Georgetown University Press. Dixon, R.M.W. 2004 Adjective classes in typological perspective. In Adjective Classes: A Cross-Linguistic Typology, R.M.W. Dixon and Alexandra Y. Aikhenvald (eds.), 149. Oxford: Oxford University Press.

Newmeyer, Frederick J. 2005 Newmeyers rejoinder to Roberts and Holmberg on parameters. [http://ling.auf.net/lingBuzz/000248] Roberts, Ian and Anders Holmberg 2005 On the role of parameters in Universal Grammar: A reply to Newmeyer. In Organizing Grammar: Linguistic Studies in Honor of Henk van Riemsdijk, Hans Broekhuis, Norbert Corver, Riny Huybregts, Ursula Kleinhenz and Jan Koster (eds.). Berlin/New York: Mouton de Gruyter.

Three types of exceptions and all of them rule-based Sam Featherston

Abstract. A basic premise of this paper is that a simpler grammar is a more adequate one, and that exceptions are thus undesirable. We present studies concerning three different grammatical structures which contain phenomena standardly regarded as exceptions, and show how, in all three cases, the attribution of the status as an exception was unnecessary. In each case, the collection of better data and the explanatory advantages of rstly, a model of gradient grammaticality and secondly, the distinction between the effects of the grammar and the effects of production processing, reveal the phenomenon to be rule-governed.

1.

Introduction

For a model of generative grammar, exceptions are anathema. The overriding aim of the generative project is to attain explanatory adequacy, specically, to account for the fact that most three-year-olds exhibit more grasp of the language system than linguists have been able to gain in decades of research effort. The standard account of this is to assume that the acquisition task must be much simpler than the task of description facing the linguist. The research programme thus consists of the ambition to design or discover a grammatical system so simple that it can realistically either be acquired by a toddler or else be part of the human genetic inheritance. This simplicity criterion forces the generative linguist to assume that the basis of linguistic patterning is a rule system, which must operate blindly and indiscriminately, the apparent complexity of language springing from the interactions of these wider generalizations.

* This work took place within the project Suboptimal Syntactic Structures of the SFB441, supported by the Deutsche Forschungsgemeinschaft. Thanks are due to project leader Wolfgang Sternefeld, Tanja Kiziak and to Frank Keller for WebExp. Thanks too to Horst Simon and Heike Wiese for comments and arranging the workshop. All remaining weaknesses are my own.

292

Sam Featherston

Exceptions are the deadly enemy of simplicity, since they are by denition not rule-governed, and must be memorized or processed individually, which complicates both acquisition and use. For this reason, linguistic phenomena which offer apparent exceptions must be regarded as problem cases: the ideal grammar should be exceptionless. Unfortunately for generative grammar, linguistic data is peppered with exceptions, and most grammars can only deal with generalizations about the observed data, rather than with the raw data itself. In order to address this, linguistic theory has tended to work on idealized data, which does not show so many exceptions. There is nevertheless signicant interest in dealing with this manifest problem, as the papers in this volume demonstrate. There is good reason for this interest, since alternative, non-generative analyses are breathing down our neck. For as soon as the weight of learning exceptions simply by exposure reaches a certain point, the case for rules breaks down. If we can learn so much by simple exposure and frequency, then we can do without the additional mechanism of rules, since regularities can be seen as mere epiphenomena of local probabilities, the argument runs (e.g. Bybee and Hopper 2001). In this paper we shall attempt to show that many apparent exceptions are in fact rule-governed phenomena. We do this by addressing three different sorts of exceptions, and set forth how, in our example phenomena, characteristics of data and the assumptions of linguistic theory are obscuring regularities. Our conclusion will be that at least these parts of the primary linguistic data are more exceptionless and rule-governed than they appear. Our ndings can thus be seen as supporting the generative approach, as the problem of exceptions is less severe than is often thought. There are three keys to this identication of wider, unrecognized generalizations. First we must pay far more attention to the data base on which we construct our theories, taking both judgements of well-formedness and corpusderived frequencies into account, and looking at both in detail. This approach makes it clear that far more factors play a role in inuencing the perceived wellformedness of a structure than is generally assumed, and multiple factors can be causing apparent differences between even a minimal pair of structures. With this insight, certain phenomena thought to be showing exceptional behaviour can be seen to be rule-governed, but the basket of factors affecting the phenomenon is larger than had been thought. The next step is motivated by ndings about the contrast of frequency and judgement data: structures can appear although they are relatively poorlyformed, or not occur even though they are relatively well-formed. From this nding, we conclude that an empirically adequate architecture of grammar requires us to distinguish two separate modules of the grammar: Constraint Ap-

Three types of exceptions and all of them rule-based

293

plication and Output Selection. The rst is responsible for the determination of well-formedness, the second selects structures for output, operating competitively on the basis of well-formedness weightings. When we have differentiated these, we see that the exceptional occurrence of some structures generally categorized as ungrammatical becomes explicable and compatible with a blind and exceptionless rule system. These steps additionally permit us to rethink our concept of well-formedness and consider more carefully what happens when a structure breaks a rule. It appears that such violations do not directly exclude the structure from being part of the language, but merely reduce the well-formedness of the structure. This requires us to accept that there is a variable of violation cost, which can vary over constraints. Every violation cost naturally leads to the structure being less likely to occur, but the link between violation and non-occurrence is not direct, but rather mediated by cumulative, gradient well-formedness. It follows also that the rules we nd in the grammar are less like (1), rather they resemble (2). (1) (2) Structure XYZ does not occur in / is no part of language L. Structure XYZ in language L incurs the violation cost in well-formedness V.

These changes in the architecture of the grammar prove their value by allowing us to account for phenomena which on current assumptions about the architecture of the grammar appear to be exceptions. The net effect of these example studies is to show that the use of inadequate data and erroneous assumptions about the architecture of linguistic theory are throwing up ghost exceptions, which in reality are none. We aim to show that the grammar contains fewer exceptions and that grammatical rule systems have greater coverage than is at rst glance apparent. 1.1. Experimentally obtained judgements A crucial factor in our argumentation will be the collection of more detailed information about the perceived well-formedness of key examples and example sets. To allow us to concentrate on the data at hand in the individual studies below, we shall briey outline our data collection method here. We gather this judgement data using a variant of the magnitude estimation methodology (Bard et al. 1996). This is a procedure for obtaining judgements from naive informants with the greatest possible degree of differentiation and reliability. It varies from the simple elicitation of standard categorical judgements (Is this grammatical or not?) in several ways. First, subjects are asked to provide purely rela-

294

Sam Featherston

tive judgements: at no point is an absolute criterion of grammaticality applied. Judgements are relative to a reference example and the informants own previous judgements. Second, all judgements are proportional; i.e. subjects are asked to state how much better or worse sentence A is than sentence B. Next, the scale along which judgements are made is open-ended: subjects can always add an additional higher or lower score. Last, the scale has no minimum division: participants can always place a score between two previous ratings. The task thus has the form You gave this example 20 and that example 30, so how much would you give this one? The result is that subjects are able to produce judgements which distinguish all the differences in well-formedness they perceive with no interference from an imposed scale. This approach produces judgement data of much higher denition and quality than traditional techniques and permits much greater insights into the factors which affect perceived well-formedness (e.g. Cowart 1997, Keller 2000). We shall argue that many apparent exceptions in the grammar are only epiphenomena of inadequate data, a problem compounded by the inappropriate idealization of data and the insufciently articulated model of the grammar which has resulted from this data poverty. The location of the rst exception type is within the grammar, and it consists of a phenomenon which appears to satisfy the structural description for the application of the Binding Conditions, but where they seem nevertheless not to apply. The second exception type is a language. What shall we make of it when apparent cross-linguistic generalizations seem not to apply to a particular language? Can a language be an exception? Our third and last exception type is that of the structure which appears in language output even though there are wellrecognized restrictions which should forbid it. Briey, why do ungrammatical structures occur? What mechanism permits exceptional occurrence?

2.

Exception type one: incomplete generalization

Our rst case study concerns anaphoric binding in object coreference constructions in German, a set of phenomena characterized as a problem for generative grammar (Grewendorf 1985, cf. the related dilemma for transformational grammar, Reis 1976). This phenomenon is exceptional because the simple Binding Conditions (Chomsky 1981) seem not to work in these cases. This is therefore an exception which consists of an apparent limit to the generality of application of a rule within the grammar. The third-person reexive sich in German can have either dative or accusative case. It regularly occurs when the subject and a subsequent other

Three types of exceptions and all of them rule-based

295

clause-mate NP are coreferent, in line with Binding Condition A. When two non-subject NPs are coreferent, however, the facts become much less clear. It might be expected that the relationship between direct and indirect objects would be either symmetrical, so that each of the two would systematically ccommand, and be able to bind, the other, or else asymmetrical, so that only one would be able to bind the other. This issue is potentially of great interest, because of the insight that it offers into the relative hierarchical positions of these constituents and thus the structure of the clause. Unfortunately the data offers a much less clear picture than might be hoped, and coreference structures in German which do not involve a subject all seem rather marked. Authors have suggested a number of ways in which the Binding Conditions might be improved in order to account for the data. Grewendorf (1985, cf. Primus 1987) attributes the restrictions on the binding of reexives by objects to a hierarchy of grammatical functions, arguing that the binder of a pair must always be higher up in the hierarchy than the bindee. Since most binders are subjects and the most oblique grammatical functions are never binders, this works well for the vast majority of the time, but it also predicts that direct objects and indirect objects should clearly either bind or fail to bind each other (depending on where these functions are located in the hierarchy). Grewendorf (1988) argues that this is the case, and offers the following judgements. (3) im Der Arzt zeigte den Patienten i sichi /*ihm i the doctor showed the patient.acc REFL/PRN.dat in.the Spiegel. mirror Der Arzt zeigte dem Patienten i *sichi /ihn i im the doctor showed the patient.dat REFL/PRN.acc in.the Spiegel. mirror

(4)

Example (3) indicates that a dative reexive may be bound by an accusative binder, but a dative pronominal cannot be. Example (4) shows that an accusative reexive cannot be bound by a dative antecedent, but that an accusative pronominal can. This account is supercially attractive since it links in to the noun phrase accessibility hierarchy, which has been advanced for other purposes (Keenan and Comrie 1977). In addition, it captures the three-way split between subject, objects and obliques quite well. However other authors have analysed these structures quite differently: Sternefeld for example contests some of the judgements (Sternefeld and Featherston 2003), while Reinhart and Reuland (Reinhart and Reuland 1993;

296

Sam Featherston

Reuland and Reinhart 1995) offer a completely different account of reexivity. They distinguish between simplex SE-type and complex SELF-type anaphors, and argue that the former are in fact pronouns, not reexives, but that they can occur as co-arguments where pronouns cannot because they are underspecied for phi-features, which makes them non-referential and thus feasible feet of chains. Both of these types appear as sich in German, though they have different forms cross-linguistically. The disjoint distribution of SE-and SELF-type anaphors is achieved by a rewriting of Binding Condition B, which stipulates that a semantically reexive predicate must be reexive-marked. Predicates which are inherently reexive are lexically reexive-marked (5), other predicates must be reexive-marked by having a SELF-anaphor as an argument (6). (5) Max benimmt/schmt sich Max behaves/shames SE-REFL Max behaves himself /is ashamed Max hasst/liebt sich Max hates/loves SELF-REFL

(6)

One nal puzzling feature of these constructions is noted by Elena Anagnostopoulou (pc). It seems that pronominals are better than full NPs as antecedents in these structures, so that (7) is less acceptable than (8): (7) (8) im Spiegel. ?Die Friseurin zeigte dem Kunden i sich the hairdresser showed the customer himself in.the mirror Die Friseurin zeigte ihm i sich i im Spiegel. the hairdresser showed him himself in.the mirror

Now the NP type of the antecedent is not generally thought to play a role in binding structures, which implies that we have not yet attained a full understanding of the issues. In the light of this confusing situation in which the standard Binding Conditions appear not to hold, Featherston and Sternefeld (2003) carried out an experimental study eliciting judgements on the relevant structures to obtain a clearer view of what is happening. 2.1. Investigating object coreference in German In this study we used a variant of the magnitude estimation methodology as described above. We tested sixteen conditions, of which we shall report just eight here for illustrative purposes (see Featherston and Sternefeld 2003 for full details). These eight structures varied on three binary parameters: antecedent NP type (pronoun, full NP), relative linear order of dative and accusative case

Three types of exceptions and all of them rule-based

297

Table 1. Eight syntactic conditions tested in our rst experiment on object coreference. The sense is always the same. Note that > means linearly precedes. Code ndr ndp nar nap pdr pdp par pap Antecedent NP NP NP NP pronoun pronoun pronoun pronoun Case order dat>acc dat>acc acc>dat acc>dat dat>acc dat>acc acc>dat acc>dat Anaphor reexive pronoun reexive pronoun reexive pronoun reexive pronoun Structure form dem NP sich selbst gezeigt dem NP ihn selbst gezeigt den NP sich selbst gezeigt den NP ihm selbst gezeigt ihm sich selbst gezeigt ihm ihn selbst gezeigt ihn sich selbst gezeigt ihn ihm selbst gezeigt

(dat>acc, acc>dat) and anaphor type (reexive, pronoun). We present the forms of these conditions in Table 1.1 The high quality data we collected under strictly controlled conditions allows us to distinguish the various factors affecting the judgements of these structures. We present (a subset of) the results in Figure 1, which shows the mean normalized grammaticality judgement score and 95 % condence interval for each experimental condition. The error bars show the condence intervals, their midpoints show the mean values. The syntactic conditions are arranged along the horizontal axis. Let us briey recall the form which this data type takes. In our experiment we collected judgements of naturalness, expressed in numerical form, and anchored by reference to other judgements, but with no reference to a concept of absolute (un)grammaticality. So results graphs such as Figure 1 show us how good or bad subjects judged example structures to be, with higher numerical judgements indicating that a structure is more natural (up on the graph). The zero on the scale shows the mean of all judgements. The syntactic conditions on the horizontal axis are identied by the codes shown in Table 1. Looking at the graph we can clearly see three effects at work. All effects that we mention here are statistically signicant (see Featherston and Sternefeld 2003). The rst effect relates to the NP type of the antecedent (full NP vs pronoun). The four conditions on the right-hand side of the chart (whose codes begin with a p) have pronouns as antecedents. They are judged better than those
1. The codes for the conditions are made up as follows. The rst letter indicates the antecedent type (NP or pronoun), the second letter shows the case of the antecedent (dative, accusative) and the third letter species the anaphor type (reexive, pronoun).

298

Sam Featherston

Figure 1. Results of experiment on object coreference, showing mean normalized judgement scores by syntactic variant of the object coreference structures. Higher scores indicate judgements that the structure is more natural, but since these are relative judgements, the precise scores refer only to the group means.

on the left-hand side which have full lexical NPs as antecedents (and thus have codes beginning with n). The reason for this is simple and it has nothing to do with binding. Since an antecedent normally linearly precedes an anaphor (in this experiment they always do), and an anaphor is necessarily a pro-form (i.e. not a full NP), it thus follows that when the antecedent is a full NP, then a word order restriction is violated, namely, that light (short) sister-like constituents linearly precede heavier (longer) ones, for whatever reason (Behaghel 1909; Lenerz 1977). The nding is therefore real but irrelevant to conditions on binding, since it relates to linear ordering preferences. The second visible effect is a case order effect. Of each minimal pair of structures, that with dative antecedent and accusative anaphor is judged better than that with accusative antecedent and dative anaphor. To see this we compare the rst and second conditions from the left (with second letter d in their codes) with the third and fourth (with second letter a), and similarly the fth and sixth with the seventh and eighth. This preference has a robust effect in the data, but we should be clear that it too has probably nothing to do with binding, since a preference for datives to linearly precede accusatives can be found in structures without object coreference (Behaghel 1932; Lenerz 1977; Uszkoreit 1986), as our own studies have replicated in this data type. The fact that exactly those examples in which the antecedent is dative and the anaphor is accusative are judged better is thus merely an epiphenomenon; the interaction of two factors:

Three types of exceptions and all of them rule-based

299

rst, that antecedents more naturally linearly precede anaphors, and second, that datives more naturally linearly precede accusatives, in the default case. Here too therefore we nd no specic effect of binding or coreference. The third effect we see in this data is that of the anaphor type (reexive vs pronoun). In each of the four adjacent minimal pairs of conditions in the chart, the rst, with a reexive as anaphoric element (and with third letter r in its code), is judged clearly better than the second of the pair, which has a pronoun (and third letter p in its code). This is exactly what Binding Condition B would predict, since there is an accessible binder. In fact the pronouns are rather better than one might have predicted, but this is perhaps because these anaphoric elements are all followed by an adverbial selbst (self) (see forms in Table 1) which improves the reexivizability of the pronoun (how this works is controversial, e.g. Primus 1992). Let us sum up. In the experiment which we present a part of here, we showed that there are a number of factors affecting the apparent well-formedness of object coreference structures, but none of them relate specically to binding except the known Binding Conditions A and B, which are fully operative here. There is thus no reason to see this data set as being in any way an exception to binding theory. Our experimentally obtained judgements demonstrate that the standard binding constraints apply here, but other irrelevant but nevertheless systematic constraints operating cumulatively (as Keller 2000 so clearly demonstrates) are confusing the picture. There is in this data therefore no problem of generative grammar, no dilemma and no exception. The simple picture was being obscured by the large number of additional factors affecting these structures. Improved data collection techniques, making use of the great strides forward which have been achieved in the gathering and analysis of judgements, can reveal the predictions of generative theory to be validated. This situation, in which irrelevant factors are fogging the wider picture, is much more common than is generally realised. In particular, many syntacticians still have failed to integrate into their perception of judgement data the nding that grammatical (and other) constraints operate cumulatively the difference between two structures is often not just the effect of one constraint but of several of them additively. Irrelevant factors can tip a structure into apparent ungrammaticality which would without these noise factors be good (cf. Sternefeld 2000). Failure to recognize these facts can result in exceptions being mistakenly identied. Syntacticians most effective weapon against being misled into identifying exceptions is improved data with much ner differentiation, and the testing of sets of materials. This approach allows us to distinguish between the effects of the phenomenon we are interested in and other irrelevant factors.

300 3.

Sam Featherston

Exception type two: cross-linguistic variation

Our second study relates to a difference between languages. While languages such as English exhibit island constraints, in others, such as German, they are not so apparent. This nding raises the question whether German should be seen as an exception, because it does not conform to the expectations which the analysis of restrictions like the Empty Category Principle (ECP) as universals give rise to. The ECP is an account of such phenomena as subject-object asymmetries, operating at an abstract level. As such it can and perhaps must be hypothesized to be universal. But this means that German, in which these effects do not seem to appear, would have to be regarded as an exceptional language (cf. Haider 1993, preface). This needs to be explained within the system of universals, and in fact it is precisely this sort of nding which motivated the inclusion of parameters in the Principles and Parameters model (Chomsky 1981). Parameters are the escape hatch which permit inter-language exceptions to be accounted for: not the effects of the ECP, but the option of having the ECP is universal, on this analysis. This approach to cross-linguistic variation, although tenable, must be recognized to be weaker than a position in which restrictions apply across language without exception. We shall argue here that German is not an exceptional language, and more generally sketch out why we think that the whole treatment of inter-language exceptional behaviour using the mechanism of parameters is unnecessary. To do this we shall draw on data from our studies of the whole group of phenomena gathered under the heading of constraints in the sense of Chomsky (1973) and Ross (1967), that is, those limitations upon structure which are insufciently general to be regarded as rules. We shall note that such constraints, frequently island constraints, are the archetypal structural exceptions in the grammar: such constraints are postulated for exactly those effects which are not otherwise predicted. The use of such mechanisms presupposes some form of an overgenerate and lter grammar architecture, which is itself uneconomical, since it requires two component parts to the grammar with quite different functions and characteristics. 3.1. Superiority in German and English In English, while multiple wh-questions are generally possible, it has been noted that certain wh-items cannot be moved to the initial position when certain others are in situ (e.g. Chomsky 1973). Most generally it can be stated that in-situ whsubjects are not possible when other wh-items are in raised position. So while (9a) is a perfect sentence of English, (9b) would not normally occur at all. The

Three types of exceptions and all of them rule-based

301

precise motor of this effect is to this day unclear (see Ginzburg and Sag 2000 for thoughtful discussion), but we can distinguish two groups of grammatical accounts. Chomsky (e.g. 1993) has suggested that it is an economy effect, and that structurally more distant wh-items cannot move to satisfy feature requirements when a closer one could do so as well. The alternative is the ECP (e.g. Lasnik and Saito 1984), which has accounted for this and other asymmetries between subjects, objects, and adjuncts with restrictions on which positions can be lexically and antecedent governed. (9) a. Who bought what to the party? b. *What did who bring to the party?

The situation in German is different. The consensus position has been that the equivalent effect does not occur in German, for instance: German lacks the set of simple ECP effects like superiority, *[that-t]-effects Lutz (1996: 35). This is illustrated in the examples below, where (10a) is the most normal form but (10b), unlike (9b), can be fairly readily found. (10) a. b. Wer who Was what hat has hat has was what wer who zur to.the zur to.the Party party Party party gebracht? brought gebracht? brought

To test for superiority, we applied our magnitude estimation methodology to twenty-six different multiple wh-question structures, hoping to establish whether German has such an effect, and if so, which combinations of grammatical functions as wh-items would trigger it (for detail: Featherston 2005). The twenty-six multiple wh-questions consisted of every pair of wh-subject wer (who), wh-direct object was (what), d-linked wh-direct object welches X (which X), wh-indirect object wem (to whom), d-linked wh-indirect object welchem X (to which X), and wh-adjunct wann (when). We present the relevant part of the results in Figure 2. This graph shows the same data type and uses the same conventions as Figure 1 above; mean normalized judgements are measured on the vertical scale, syntactic conditions are distinguished on the horizontal scale. In this chart we have grouped the conditions by the in-situ wh-item. It is clear that the group of conditions represented by the left-most error bar, which is all those conditions with in-situ wh-subjects, is judged worse than all the others. There is a degree of variation between the other (groups of) conditions, but this is due to independent effects which need not concern us here (for full details see Featherston 2005). Since the presence in-situ of a wh-subject when another wh-item is in

302

Sam Featherston

Figure 2. Results of experiment on superiority in German, showing mean normalized judgement scores by in-situ wh-item. Conditions with in-situ subjects are judged signicantly worse than all others. Abbreviations: wh-DO means bare direct object wh-item, wx-IO means which X-type indirect object whitem etc.

initial position is the structural description of superiority, and the perceived unacceptability of such examples is the characteristic symptom of the superiority effect, it would appear that we observe a superiority effect in German too. In order to conrm that what we found in German was indeed the same effect to the phenomenon familiar from English, we repeated the experiment on English data. Again we tested 26 conditions, amongst which were all possible combinations of wh-subject who, d-linked wh-subject which (person) wh-direct object what, d-linked wh-direct object which (thing), wh-indirect object (to) who(m), and d-linked wh-indirect object (to) which (person). We present the results by in-situ wh-item as before. Figure 3 shows the judgements of multiple wh-questions in English, parallel to the ndings on German in Figure 2. Again one group of structures is judged clearly worse than all the others and again it is precisely those structures which have an in-situ bare wh-subject, as the superiority phenomenon describes. The very close correspondence of the results on German and English leaves no doubt that the effect that we observed in German is of the same type as the superiority effect which we see in English. This has clear implications but also raises important questions. The most important implication for our purposes here is that German is not an exceptional language, because German has the same effect that other languages have. Why therefore has there been a consensus among linguists which doubted the existence of this effect? The answer seems to be that the perceived strength of the

Three types of exceptions and all of them rule-based

303

Figure 3. Results of experiment on superiority in English, showing mean normalized judgement scores by in-situ wh-item. As in German, conditions with in-situ subjects are judged clearly worse than all others. wh-DO means bare direct object wh-item, wx-IO means which X-type (d-linked) indirect object whitem etc.

relevant constraint is less in German than in English. Given the general assumption within syntactic theory that only those effects which are strong enough for their violation to cause absolute ungrammaticality are narrowly syntactic, and that all weaker effects are mere markedness or stylistics, linguists have tended to discount weaker effects as irrelevant. We can see this assumption in the argumentation that syntacticians use. Haider reveals this assumption explicitly when he argues in a related question (1993: 159):
If clausal subjects occupy the spec-IP position in German, then the Condition on Extraction Domains forbids extraction, and that without exception. But only one single example is sufcient to refute this. [our translation]2

Arguments of this sort presuppose that well-formedness is dichotomous, but evidence such as our experimental data on superiority in English and German would seem to show that this assumption of a binary model of well-formedness is erroneous. In this case of German superiority, the idealization to binary wellformedness is hiding generalizations. We think that this is much more commonly the case. In our project SFB441 A3 Suboptimal syntactic structures (project leader Wolfgang Sternefeld) we have carried out numerous studies gathering introspective judgement data of
2. Wenn im Deutschen Subjektstze die Spec-I Position einnehmen, verbietet CED Extraktion, und zwar ausnahmlos. Um dies zu widerlegen gengt aber schon ein einziges Beispiel.

304

Sam Featherston

experimental quality, controlling for irrelevant factors, standardizing materials. The picture of perceived well-formedness that this data reveals is unambiguous. We consistently nd that non-categorical constraints can be syntax-relevant, and indeed that the idea that any constraints are categorical is probably false. There are constraints which are sufciently strong to appear categorical, in the sense that speakers would not choose to use structures violating them, but if we present informants with structures which are unambiguously ungrammatical, but nevertheless comprehensible, so that the error can be seen to be syntactic and not semantic, then they consistently rate such sentences relative to the perceived severity of the rule violation leading to ungrammaticality. They do not simply reject them absolutely. These sanctions can be stronger or weaker, but are constraint-specic. They are also cumulative: a structure violating two rules is judged worse than one which violates only one of them (Keller 2000). To illustrate this further we will briey present another study on an island constraint which we have conducted. 3.2. The that-trace effect in German and English This phenomenon is very clear in English. While both subject and object can be equally well extracted from a complementizerless complement clause (11ab), extraction from a clause introduced by a complementizer reveals a subjectobject asymmetry: standardly the object extraction is judged acceptable (11c), but the subject extraction is much worse (11d). (11) a. Who does Hillary think Bill loves? b. Who does Hillary think loves Bill? c. Who does Hillary think that Bill loves? d. *Who does Hillary think that loves Bill?

This effect has been tested extensively by Cowart (e.g. 1997), who has found it to be consistent and pervasive. As is the case with many extraction restrictions, the fundamental cause of the effect is obscure. The classic ECP account (Chomsky, 1981; Lasnik and Saito, 1984) motivates the asymmetry in the same way as the ECP-related accounts of other constraints which contain subject-object asymmetry, such as the superiority effect. It is perhaps fair to say that we do not yet have a complete understanding or fully satisfactory account of the that-trace effect. This constraint has been generally held not to exist in German. There are certain differences between embedded clause structures in English and German which make it credible that different constraints on movement apply. Most importantly, German complement clauses come in two types, which we shall refer

Three types of exceptions and all of them rule-based

305

to as V-nal and V2. The rst type has a complementizer in initial position while the second never has one, and the verb in the V-nal type is clause-nal, while the verb in the V2 type is near the beginning of the clause, generally as second constituent after a phrasal topic. The contrast between a complement clause with and without a complementizer is thus in German part of a larger syntactic contrast, unlike in English where the complementizers sometimes appear to be optional elements. (12) Wen i meint Doris, liebt Gerhard t i ? whom thinks D. loves G. Who does D. think G. loves? b. Wer i meint Doris, liebt t i Gerhard? G. who thinks D. loves Who does D. think loves G.? c. ?Wen i meint Doris, dass Gerhard t i liebt? whom thinks D. that G. loves Who does D. think that G. loves? d. ?Wer i meint Doris, dass t i Gerhard liebt? G. loves who thinks D. that Who does D. think that loves G.? a.

If (12a), (12b), and (12c) were all grammatical, but (12d) were not, then we could say that German has a that-trace effect. However, the consensus view seems to be that standard German has no that-trace effect (Haider 1983; Grewendorf 1988; Stechow and Sternefeld 1989; Bayer 1990; Haider 1993; Lutz 1996). The grammaticality status of structures of types (12c) and (12d) may be said to be marginal, as they are not generally felt to be part of the standard language, although they occur in speech in southern varieties. In our experiment we aimed to test whether this effect would be identiable in German using our more sensitive experimental judgement elicitation methods. Will German here turn out to be an exceptional language? We tested the four structures each in eight different lexical forms. The results are presented in Figure 4 together with the results of a parallel experiment on English by Cowart (1997). This result shows a very clear picture. First, there is indeed a that-trace effect in German, for the resemblance of the German data to the English is remarkable. The existence of the effect in German can thus not be in doubt. In both languages the subject and object extractions from complementizerless clauses are judged about equally good, and are clearly better than the extractions from clauses with complementizers. The extraction of a subject from a clause with a complementizer is judged much worse than the extraction of an object, in both

306

Sam Featherston

Figure 4. Results of our experiment on that-trace in German (on the left), with Cowarts (1997) results on English for comparison (on the right). The pattern of results from the two languages is very similar.

languages. It therefore seems safe to assert that the basic factors affecting this set of structures are the same in the two languages. 3.3. Implications for theory I: gradience Since closer inspection of the data has shown that German has both superiority and that-trace effects, there can be no question of German being an exceptional language in that the ECP (if that is indeed the causal factor) does not apply in it. Precisely the same phenomena can be found cross-linguistically, and we do not need the mechanism of parameter setting to account for the apparent nonexistence of presumed universals in a given language, at least in this case. We consider it highly likely that this nding will prove to be the rule, not the exception: effects found in one language will generally be found in others too (cf. Bresnan et al. 2001). This must reinforce the hypothesis that there is such a thing as a universal grammar. But we still have to explain why the superiority effect and the that-trace effect are uncontroversial in English, but usually thought to be absent from German. Why did German look like an exception? A look back at Figure 4 offers some insight. In English, the best three structures are regarded as well-formed, while the worst one, the extraction of a subject over a complementizer, is regarded as ill-formed. That is the that-trace effect. In German, by contrast, the top two are regarded as well-formed, and the lower two both as marginal, hence no that-trace effect is recognized. The most likely explanation of this mismatch between the standard assumption and the empirical data is that this data set, in English and in German, demands no less than three degrees of well-formedness, but the standard idealized model of grammaticality provides only two.

Three types of exceptions and all of them rule-based

307

In both languages, there is a clear difference between the V2 extractions and the V-nal extractions; the rst pair is plainly better. But there is also a clear difference within the lower pair, that is, between the two V-nal extractions. We thus have three distinct levels, but the binary model of grammaticality allows the linguist to capture only one of these differences at a time. In German the difference between the extractions with and without complementizers is recognized (no doubt because it is slightly greater), which means that the difference between the two extractions with complementizers becomes effectively invisible. In English, the difference between the subject and object extractions over that is recognized (again, it is the greater of the two), which means that the fact that all extractions over complementizers are worse than the others is invisible. This explanation of how and why linguists have misread the data rests entirely on the assumption that well-formedness is a gradient, not a dichotomy. But this conclusion is forced upon us by the data anyway. The more detailed data revealed by our controlled methods of collecting judgements can only be faithfully represented on a continuum of well-formedness. The data simply has this form: there is not only well-formed and ill-formed, there is also better-formed and more ill-formed. These two example studies show that the effects of constraint violations can be larger or smaller, and can vary cross-linguistically, and can be added together. For this to happen, these effects must have quantiable values, and this is only possible in a model of gradient well-formedness. The idealization to a binary opposition of well-formed and ill-formed can thus be seen to be obscuring important information. It is hiding what we would have predicted, namely that German too has ECP effects. This effect of the binary model reveals it to be an abstraction from primary data, not a feature of the primary data. Let us be clear what we are arguing for. Certain sorts of idealization are desirable and necessary. The idealization described by Chomsky in his famous ideal speaker-listener paragraph (1965: 3) we consider useful and indeed essential. But it should be noted that the idealization of well-formedness to a binary opposition does not occur in that paragraph. On the contrary, Chomsky explicitly avows in that text that grammaticality is a matter of degree (1965: 11). Chomsky explicitly limits his chosen low-data approach to clear cases, to the masses of evidence that are hardly open to serious question (1965: 19). The idealization of well-formedness to a binary model, we argue, may be feasible with data which are not open to question, but is inappropriate to data where ner distinctions are to be made, as in these cases. Over-zealous idealization of well-formedness has brought about an assumption that well-formedness really is binary, contrary to fact. German too has island constraints, but their weaker

308

Sam Featherston

violation costs make them less visible. We are creating ghost exceptions when we impose an unempirical single possible violation cost on the data. Let us briey review our ndings so far. First, some apparent exceptions are merely due to inadequate data. More and better data makes for fuller coverage of theory. Next, some idealizations of data can cause phantom exceptions. Ignoring non-categorical constraints on structure, the assumption of a single constraint violation cost, and the idealization of well-formedness to a binary scale can all obscure important evidence and make exceptions appear to occur where in fact there are none. This can be avoided by the adoption of a model of gradient well-formedness. Admittedly a gradient grammar requires several additional features in a theory. It must allow constraint-specic violation costs, which in turn necessitates that judged well-formedness be represented as a continuum. Violation costs must be quantied, so that they can be cumulative. All this adds complexity to the linguists task, but provides a more explanatory grammar, since the grammar produces fewer exceptions, and these features are anyway robustly present in the primary language data; their inclusion in our grammar thus also increases its empirical adequacy. Our task is not yet nished, however, for our grammar must also allow exceptions in the output to be produced, which we shall argue requires the architecture of the grammar to include probabilistic competition for output. We turn to this last feature now. 4. Exception type three: exceptional occurrence

This section addresses a very different type of exception, namely the occurrence in naturalistic output, such as corpus data, of structures which our grammar would exclude. Every linguist will have had the experience of nding examples of structures that they would have predicted not to occur. We shall give just one example here, superiority violations. A search in the British National Corpus (Oxford, 100 million words) reveals two examples of structures which violate superiority. Both have a direct object wh-item in clause-initial position and an in-situ wh-subject (What.DO who.SUBJ). Searching the internet with Google reveals more examples.3 A search in Google UK (google.co.uk, February 2005, English language only,
3. For the effectiveness of internet search engines and the validity of the results, we nd Keller et al. (2002) very convincing, in which it is shown that internet search engine results can match introspective judgements no less accurately than can even carefully compiled corpus data. See Featherston (2005) for further details which show clearly that the corpus search and the web search produce closely matching patterns.

Three types of exceptions and all of them rule-based

309

UK based sites only) for what did who yielded 371 hits, of which 236 were non-repeats. Detailed inspection of each of these revealed 112 apparent real examples. A similar search for who did who yielded 831 hits, of which 486 were non-repeats. The exclusion of linguistics sites and non-anglophone sites and the like revealed ve apparent real examples, of which three had the form who.DO who.SUBJ, and two others who.IO who.SUBJ4 . Now occurrences of structures which we would predict would not appear are common, and nothing hangs on this particular example. But this phenomenon poses a real problem for linguists, since grammatical models generally cannot account for this. Our own model of a grammar incorporating gradient wellformedness, which we have noted above is anyway required to deal with other phenomena, does however permit the occurrence of exceptions. It does this by introducing a differentiation into the architecture of the grammar. It distinguishes two modules: Constraint Application and Output Selection. We shall lay out roughly how these two operate and see how this arrangement predicts exceptional occurrence. The rst of these carries out the function of structure building, essentially in the form of a constraint satisfaction model. This stage develops the form of the structure, being guided by the requirements of the semantic content but at the same time constrained at every step by the application of the constraints on linguistic form. This process is roughly equivalent to the stage of grammatical encoding in the formulator in Levelts (1989) blueprint for the speaker. But the process of applying constraints to structures involves trade-offs, and with each violated constraint the nascent structure incurs a violation cost. Let us note that this process no doubt takes place incrementally, on roughly phrase-sized utterance planning chunks, but we shall abstract from this for simplicity of presentation here. The result of the application of constraints is that each candidate structure receives a well-formedness weighting which can be accessed in introspective judgements: structures breaking more rules/preferences are judged worse. Of the possible output forms for a given semantic content, one will usually be better than the others, but it may happen that two (or more) of the best
4. It is worth noting here that some of them were unambiguously not echo questions. This is important because it establishes that the examples were originally generated in this form. Echo question examples could be argued to just have one element questioned in an otherwise quoted string. For instance, if a child recites the two times table 2, 4, 6, 9, 12..., their parent can ask: 2, 4, 6, what, 12? without putting into question the syntactic generalization that wh-items are fronted in English. Echo questions, as quoted strings, are thus only weak examples of the generability of superiority violations. It is thus important that we nd examples which the context shows are not echo questions.

310

Sam Featherston

are roughly equally good. We can illustrate this with sets of examples such as in (13). (13) a. b. Jack looked the word up in the dictionary. Jack looked up the word in the dictionary.

Both phrasal verb particles and NP complements are preferred adjacent to their head verb (see excellent discussion in Wasow 2002), but only one of them at a time can appear there. The violation costs of these two structural preferences must be about equal however, since both (13a) and (13b) are fairly natural and both occur, more or less in free variation. How does the human language production system choose between the pair (13a) and (13b)? There must necessarily be some module selecting output among options, since we never nd both being output when only one would do. Let us next note that this Output Selection procedure must also take note of the well-formedness status of the competing alternatives, since better-formed structures are apparently preferred to less well-formed ones: we do not usually nd examples of Jack looked the word in the dictionary up. It seems economical to assume that this Output Selection module selects a single form for output on the basis of the well-formedness weightings assigned by the rst Constraint Application module. Since both are about equally well-formed, as our perceptions conrm, we therefore regularly nd both (13a) and (13b). How does this account for exceptional occurrence? Well, to err is human and human linguistic behaviour is probabilistic. This can be readily veried by taking a look at the frequent experimental studies in the Journal of Second Language Acquisition where the performance of second language learners is compared with that of a native speaker control group. The native speaker control group never get everything right; most commonly they attain 9095 % of the target behaviour. It is thus unsurprising that output selection too operates probabilistically. When two forms are equally good, Output Selection chooses one of them more or less randomly. If one were just slightly better than the other, then we would nd this reected in their distribution frequencies, but the slightly less good candidate would still occur. The key point: every now and again, we select for output a candidate structure which is more signicantly less good than some other. Exceptional occurrence is thus merely a slip in operation, probably in the assessment of the well-formedness of a candidate. Some noise in the appreciation of well-formedness is not stipulation but a well-attested fact: introspective judgements of well-formedness are well-known to have a degree of random variation in the individual judgement event (Schtze 1996). It is therefore not at all surprising that the output selection function, which must use perceived

Three types of exceptions and all of them rule-based

311

well-formedness as its criterion for selection, exhibits some degree of variability in its choices. We may thus summarize our account of exceptional occurrence as follows. Examples such as those in (13) demonstrate that it is at least sometimes the case that speakers must choose between two or more equally legal structures in production processing. It follows that we have such a thing as an Output Selection function: if we did not, any pair of equally good forms would crash the production system. It will also be fairly uncontroversial that this function makes use of well-formedness information about competing candidates: slightly better forms are selected more frequently than less good forms. This demonstrates clearly that our Output Selection module functions probabilistically; if it did not, even very slightly less good forms would never occur at all. These assumptions are sufcient to account for exceptional occurrence: when a system functions probabilistically, improbable outcomes occasionally occur, in our specic case, substandard structures are occasionally selected for output. The good news is that this account of exceptional occurrence in no way complicates our grammar or forces us to include a probabilistic element into the grammar, for the grammar and the selection of output are completely separate functions. In the next section we look in a little more detail what this might mean for the architecture of the grammar and how it is used. 4.1. Implications for theory II: Well-formedness is not identical to output We have argued for two features of the grammar which allow us to account for exceptions. Many apparent exceptions in the grammar can be seen to be mere phantom exceptions if we assume a gradient model of well-formedness, while exceptional occurrence is predictable probabilistic behaviour, if we distinguish between Constraint Application and Output Selection. In this section we shall attempt to show that these two features of language behaviour t well together into a coherent model of the architecture of human language computation, and are directly motivated by judgement and corpus frequency data. We summarize the features and functions of the two modules that we distinguish in (14) and (15). We refer to this as the Decathlon Model because the selection of structures for output takes place in two separate stages (see footnote for the reason for this name)5
5. We called this the Decathlon Model because of the similarity of the scoring system in the athletic discipline decathlon and the way that well-formedness and output operate. In the decathlon, competitors take part in ten separate events and receive a points score from each. The points reect not their performance relative to the other competitors, but their absolute performance. It therefore does not matter whether a

312 (14)

Sam Featherston

Constraint Application a. applies rules b. takes note of rule violations, and c. applies violation costs (well-formedness weightings) blindly and exceptionlessly, all constraints applying to all structures. Output Selection selects structures for output on the basis of well-formedness weightings competitively, and probabilistically.

(15)

A key point in this model is the non-identity of well-formedness and occurrence. Syntactic realisations of a given semantic content compete for output on the basis of their well-formedness ratings. This means that a given syntactic realisation of a semantic content can be fairly well-formed (as perceived in judgements), but virtually never appear, simply because better syntactic realisations exist. Similarly, a syntactic realisation can be judged to be fairly poor, but nevertheless appear in linguistic output (e.g. corpus data) because it is the best of the set of structural alternatives.

competitor comes rst, third or sixth in the sprint, what matters is the absolute time achieved. These points are summed, and the athlete with the highest total gets the gold. The grammar and production systems work in a similar way. All structural alternatives are subject to all linguistic constraints, and all violations cause reductions in their well-formedness rating (these are presumably caused by ease of processing at some level, which would explain why well-formedness is perceptible, but not explicable, to the speaker and can be accessed through intuitive judgements). This well-formedness is the equivalent of athletes points scores. The structural alternatives compete to be selected for output on the basis of their well-formedness in the same way as athletes compete to win the gold medal on the basis of the points they have gathered in the individual events. In language this competition is probabilistic, probably at the stage of the perception of well-formedness. To continue the sports analogy, the architecture of generative grammar resembles the slalom, in that the candidates must pass through all the gates to win; missing even one gate (violating even one restriction) causes categorical exclusion. OT is like the high jump: the bar is put at a certain level and all competitors try to jump over it. All who fail are excluded, and the bar is put higher. This continues until only one candidate remains, who becomes the optimal high jumper.

Three types of exceptions and all of them rule-based

313

Note that this contrasts with both the traditional maxims of generative theory and with the precepts of Optimality Theory (OT, Prince and Smolensky 1993). In generative grammar the standard supposition has been that any successfully generated structure will be used and will thus appear in the output. More recently the idea of competition in syntax has gained favour, appearing in the Minimalist Program and occupying a central place in OT (for review Mller and Sternefeld 2001). But in both of these types competition takes place in the grammar and contributes to the denition of well-formedness. In our own Decathlon Model (Featherston 2006) the situation is different. Well-formedness is not a result of competition but the result of cumulative violation costs. Competition steers only the choice of output from among candidate syntactic realisations. This competition is based upon the well-formedness weightings of the candidates, but has a probabilistic element. Although the best realization of a content will normally win the race to be output, it is predicted that less optimal candidates will sometimes produced, with a frequency proportional to their degree of ill-formedness relative to the most well-formed candidate. Exceptional occurrence is thus the effect of the probabilistic element in Output Selection. One of the major advantages of this grammar architecture is the fact that it is directly related to the evidence of the primary language data, both judgements and corpus frequencies. We illustrate this in Figure 5. This graph shows the results of two studies on object coreference structures in German, the rst using experimentally obtained judgements, the second using corpus frequencies (COSMAS I, Institut fr Deutsche Sprache, Mannheim). For full details of this study see Featherston (2002). In this graph the judgements are represented by error bars and refer to the left-hand scale. These judgements show clear differences between the sixteen syntactic variants. These vary on four binary parameters, and their perceived well-formedness is a product of the number and severity of the constraints that each violates. This is thus the same sort of nding as that which we found in our magnitude estimation studies reported above; in fact this data type always reveals this picture of well-formedness as a gradient phenomenon, reecting cumulative quantiable violation costs. The frequency information was gathered from the corpus COSMAS I (W-PUB Archiv, 530 million word forms), and relates to the right-hand scale. The syntactic variant judged best occurred fourteen times, that judged second best occurred just once, and no other form was found at all. This is thus a very different pattern to that of the judgements: the frequency data shows strong evidence of probabilistic competition for output, unlike the perceived well-formedness data which shows no sign of competition (others nd similar patterns e.g. Kempen and Harbusch 2005).

314

Sam Featherston

Figure 5. Experimental judgements and corpus frequencies of a set of syntactic variants (from our work on object coreference). The judgements reveal a continuum of perceived well-formedness. The frequencies show that just the best and second best structure ever occur (W-PUB Archive 530 million word forms). This pattern reveals that competition for output occurs over well-formedness values.

On the basis of data like this, we suggest that perceived well-formedness and occurrence are not identical. They are of course related, and it seems to be a natural assumption that the competition for output proceeds on the basis of the well-formedness values: we produce the best syntactic variant available to us, (probably because it is easiest or the rst available, which may be the same thing). But notice that even in this limited data set, competition for output is probabilistic: not only the best but also the second best structure occurs, occasionally. This illustrates variation. Exceptional occurrence results when one of the weaker candidates is selected; this will be rare, but it will occasionally occur.6
6. Frequency data only ever contains the very best alternatives and fails to distinguish variants with zero occurrences. It is perhaps worth noting here that it is this very limited range of alternatives present (which we dub the Iceberg Effect) which makes important factors such as cumulativity difcult to spot in frequency data (e.g. van der Feen, Hendriks and Hoeks 2006). Experimental judgements, which provide evidence from the full range of possible and impossible structures, make the reality of cumulativity perfectly clear (Keller (2000). In the sporting analogy, frequency data only shows you the medal winners. But to see what it takes to become a medal winner, you gain much more information by comparing medal winners with athletes who precisely didnt win medals.

Three types of exceptions and all of them rule-based

315

4.2. Architectural simplicity vs coverage: Contrasting grammatical models To clarify the nature and implications of our analysis we shall compare our own model with classic generative grammar and stochastic OT (henceforth often: StOT; Boersma and Hayes 2001) in order to bring out its features and reveal how it combines architectural simplicity with empirical coverage. In Table 2 we see the relevant characteristics of the three grammar models contrasted. The details are of course greatly simplied, but readers will be able to supply the details of their own favourite grammar architecture on the basis of the information given. We shall rst make clear what the chart shows by contrasting the two familiar models. First row: Traditional generative grammar applies all constraints to all structures, blindly and exceptionlessly. OT is more complex. The application of constraints to candidate structures is ranked, that is effectively ordered; in stochastic OT this application order is additionally stochastically variable (Boersma and Hayes 2001).
Table 2. The internal architectures of two familiar models of grammar, showing how they compare with our own Decathlon Model. Traditional generative grammar has a simple architecture, but cannot account for exceptional occurrence. Stochastic OT can account for this, but requires ordered and conditional functioning and a relative, unstable well-formedness model. Our own Decathlon Model requires no complex architecture to account for exceptional occurrence. Generative grammar Constraint application Violation cost application Violation costs Well-formedness model Output selection Exceptional occurrence blind, exceptionless blind, exceptionless only one value: ungrammatical dichotomous, absolute, stable trivial: all non-bad structures output not accounted for Stochastic OT ordered, probabilistic smart, conditional only one value: ungrammatical dichotomous, relative, unstable trivial: single nonbad structure output accounted for Decathlon Model blind, exceptionless blind, exceptionless constraintspecic, cumulative gradient, absolute, stable competitive, probabilistic accounted for

316

Sam Featherston

Second row: Generative grammar also has a simple violation cost application function, again, blind and exceptionless. If a structure breaks a rule then the structure is automatically penalized. OT is again more complex, since the application of violation costs is not automatic but conditional: conditional upon whether it will distinguish the candidates, if not, no penalty is applied, and the violation has no effect upon the outcome. Third row: On the other hand, OT and generative grammar both have only one violation cost value. There is only one possible outcome of a violation cost being applied: the attribution of the status ungrammatical. Fourth row: The features so far result in both models having a dichotomous model of well-formedness. But generative grammars well-formedness is absolute and inherent, while OTs well-formedness is always relative to a comparison set (hence optimality). The status of a given structure in generative grammar is stable, while the same structure in StOT can vary between good or bad even within the same comparison set, depending on the effect of the random factor in the application order. Fifth row: Both familiar models assume that well-formedness is sufcient to license output. All and only well-formed structures should appear. In simple OT this will always be just a single candidate, the optimal structure. In StOT this single structure may vary between evaluation events. Sixth row: Stochastic OT can thus capture the empirical reality that not only one form of a competition set in practice appears in the language. Since constraint ordering has a random weighting applied, occasionally a candidate will win which would normally lose: exceptions may therefore win through the competition and thus occur. Traditional generative grammar permits multiple acceptable candidates to occur, but does not easily permit forms to occur which are other than fully acceptable. It thus has no ready account of exceptional occurrence. It is, however, considerably simpler than any version of OT as a theory. It is perhaps not surprising that the ability to deal with exceptional occurrence incurs a cost in terms of complexity of the architecture. Our own Decathlon Model contrasts with both of these, but also shares aspects of both.7 Our model patterns with generative grammar in that it applies
7. I am often asked how my model relates to Jger and Rosenbachs (2006) MaxEnt model. It is a step closer than Boersma and Hayes StOT, but no more. It is still effectively a method of modelling frequency distributions only, and its concept of well-formedness is still categorical in any single evaluation event, as any model will be which confounds well-formedness and occurrence. The Decathlon Model is based upon evidence from both judged well-formedness and frequency data, and accounts for the constrasts in ndings in these two data types by distinguishing constraint application and output selection. It is also much more empirically grounded than any

Three types of exceptions and all of them rule-based

317

all constraints to all structures, requiring no smart application function. The application of violation costs is simple, since it too is blind and exceptionless. However, instead of having a simple binary contrast of good and bad, our model reects the data from our judgement experiments in requiring constraintspecic violation costs which vary in severity. A violation does not make a structure bad, it merely makes it worse, by a xed, constraint-specic amount. What is more, these violation costs are cumulative, so that if a structure violates two restrictions, it is worse than if it merely violates one. These two factors, constraint-specic violation costs and cumulativity, demand a gradient model of well-formedness, which is of course more complex than the familiar dichotomous model. On the positive side, this allows gradient well-formedness to be absolute and stable; structures are inherently good, bad or marginal, on a continuum scale. This is the basic operation of our Constraint Application module, which roughly corresponds to what is generally thought of as the grammar. It is the weightings assigned to structures by this module that we believe can be measured with the elicitation of judgements. The Output Selection function is, we argue, no part of the grammar but merely a facet of production. It works very simply. In production processing, we have to choose between different structural variants in just the same way as we must choose between non-structural variants. We generally choose the best structural variant, probably because it is the easiest for us to compute; (in fact the causal factor of well-formedness is probably ease/speed of computation, and the competition is no doubt a race). But since we are humans and we do not process deterministically, we sometimes perceive well-formedness inconsistently or make mistakes. It follows that ill-formed structures are occasionally produced, essentially for the same reasons that we sometimes call our mother when we intended to call our sister, or scrape the car driving out of the garage. The grammar need not generate exceptional occurrence, and in our model it does not. 5. Conclusions

Our aim in this paper was to argue that the grammar has many fewer exceptions than it sometimes appears. We looked at examples of three different sorts of exceptions, and tried to show that each of them is, on closer inspection, not
variant of OT since it makes no use of empirically unobservable parameters such as constraint rankings. Our own violation costs are directly obtained from experiments of well-formedness judgements. It is therefore able to predict frequency distribution, rather than being obtained from frequency distribution.

318

Sam Featherston

exceptional. In each case we were able to account for the phenomena in a systematic, rule-governed way, in each case backing up our explanation with hard data from our experimental judgement studies. Our rst example was one of incomplete generalization. The conditions of the binding theory have considerable descriptive adequacy, but they do not seem to apply in German object coreference structures. We therefore carried out an experiment in which we gathered informants judgements using the magnitude estimation methodology. The results supercially support the exception hypothesis as they show a much more complex pattern than the binding theory alone would predict. However, the far greater differentiation in the data made available by the experimental technique and the testing of many different minimally different structural variants allowed us to identify the factors affecting the perceived well-formedness of the structures. Surface factors such as constituent weight and constraints on linear precedence can be discounted from the data set, leaving only the relevant information. This winnowing process reveals that the relevant binding condition is fully active in the data. Better data with ner differentiation thus allows us to discount the suspicion that these structures were an exception to the binding theory. We think that syntacticians will nd this to be quite commonly the case. Perceived well-formedness is sensitive to grammatical and non-grammatical effects, acting cumulatively, but these irrelevant factors can be disentangled with high quality data, when it is recognized that even between structural minimal pairs, not just the one factor of interest may be active. Our second type of exception was that of ECP phenomena in German, a language in which these effects have been consensually thought not to apply. Again improved data allowed us to conrm that the ECP does indeed operate, but that it seems to cause weaker violation costs than in English. In such a case it is necessary to take a critical look at the idealizations embedded in the standard theoretical assumptions. The main reason why these ECP effects were thought not to apply in German seems to be that their violation costs do not cause categorical grammaticality: structures violating superiority or the that-trace effect may be more readily found in German corpuses than in English. Working on the assumption that constraints on structure which can be violated cannot be narrowly grammatical, linguists had tended to deny the existence of these ECP effects. However, in this case it seems very clear that this assumption is causing exceptions to be identied where none in fact are present. This idealization to a dichotomous model of well-formedness is revealed in such cases as an abstraction applied to the primary language data, not a generalization derived from it. The conclusion must be that ECP effects do indeed apply in German and that German is in no way an exceptional language in this regard. In this case, the

Three types of exceptions and all of them rule-based

319

simplifying idealization to a binary model of well-formedness was causing the ghost exception, which must throw its usefulness as a simplifying assumption into some doubt. Let us hasten to add that we do not consider all idealization of the data base of syntax to be erroneous or unnecessary. Precisely those abstractions from the raw data of language use or language intuition which relate to the difference between competence and performance seem to us to be fully justied, for the discarded information about the speaker-listeners state of mind or dialectal idiosyncrasies are of little relevance in ascertaining the structure of linguistic expressions. This is not true of the idealization to a binary well-formedness model, however, for the information discarded in this simplication can indeed be relevant to structure, as is clearly demonstrated in this paper. We suspect that the abandonment of this admittedly long-standing but ultimately unmotivated idealization will cause other ghost exceptions to be revealed for what they are. The last exception type we addressed was that of exceptional occurrence. Every linguist has had the experience of seeing or hearing structures that theory would suggest should not occur. Traditional generative grammar has no real account of this, since it assumes that all but only grammatical structures should ever make to through to output. Stochastic OT does offer a mechanism which can model occurrence variation and occasional exceptional occurrence, but does this at a very high price in complexity of grammar architecture. This approach is forced to abandon both the blind and exceptionless application of constraints to candidate structures and the blind and exceptionless assignment of violation costs to violating structures. Additionally it must assume a model of well-formedness in which the well-formedness status of a structure is not inherent, but only ever relative to a comparison set, and not stable, as it may change between different competition events. Put briey, stochastic OT discards the assumption that the grammar is maximally general and instead provides it with a mechanism to allow exceptions. Our own architecture preserves the blind and exceptionless grammar and accounts for exceptional occurrence as a product of Output Selection, a function of human language processing independently motivated by our ability to choose between equally well-formed alternative structures. We therefore consider our model to be more explanatorily adequate, since it preserves the simple grammar and accounts for the data using only assumptions which are independently motivated. We noted at the beginning of this paper that exceptions are the deadly enemy of generative grammar, since they offend against the simplicity criterion which is an essential component of an explanatory grammar. We hope to have shown that many exceptions are merely apparent, the epiphenomena of assumptions about data, the nature of well-formedness, and the architecture of the grammar,

320

Sam Featherston

which are not empirically motivated but rather apocryphal. Without this theoretical baggage generative grammar will move further and faster, in my opinion. References
Bard, Ellen, Dan Robertson, and Antonella Sorace 1996 Magnitude estimation of linguistic acceptability. Language 72: 32 68. Bayer, Josef 1990 Behaghel, Otto 1909 Behaghel, Otto 1932 Notes on the ECP in English and German. Groninger Arbeiten zur Germanischen Linguistik 30: 155. Beziehungen zwischen Umfang und Reihenfolge von Satzgliedern. Indogermanische Forschungen 25: 110142. Deutsche Syntax. Vol. 4: Wortstellung, Periodenbau. Heidelberg: Winter.

Boersma, Paul, and Bruce Hayes 2001 Empirical tests of the Gradual Learning Algorithm. Linguistic Inquiry 32: 4586. Bresnan Joan, Shipra Dingare, and Chris Manning 2001 Soft constraints mirror hard constraints: Voice and person in English and Lummi. In Proceedings of LFG01 Conference, Miriam Butt and Tracy King (eds.), 1332. Stanford: CSLI. Bybee, Joan, and Paul Hopper 2001 Frequency and the Emergence of Linguistic Structure. Amsterdam: Benjamins. Chomsky, Noam 1965 Aspects of the Theory of Syntax. Cambridge, MA: MIT Press. Chomsky, Noam 1973 Conditions on transformations. In A Festschrift for Morris Halle, Stephen Anderson, and Paul Kiparsky (eds.), 23286. New York: Holt, Reinhart & Winston. Chomsky, Noam 1981 Lectures on Government and Binding: The Pisa Lectures. Berlin: Mouton de Gruyter. Chomsky, Noam 1993 A minimalist program for linguistic theory. In The View from Building 20, Ken Hale, and Samuel Keyser (eds.), 1-52. Cambridge, MA: MIT Press.

Three types of exceptions and all of them rule-based Cowart, Wayne 1997

321

Experimental Syntax: Applying Objective Methods to Sentence Judgements. Thousand Oaks, CA: Sage.

Featherston, Sam 2002 Coreferential objects in German: Experimental evidence on reexivity. Linguistische Berichte 192: 457484. Featherston, Sam 2005 Universals and grammaticality: Wh-constraints in German and English. Linguistics 43: 667-711. Featherston, Sam 2006 The Decathlon Model: Design features for an empirical syntax. In Linguistic Evidence Empirical, Theoretical, and Computational Perspectives, Stephan Kepser, and Marga Reis (eds.), 187208. Berlin: Mouton de Gruyter. Featherston, Sam, and Wolfgang Sternefeld 2003 Experimental evidence for Binding Condition B: The case of coreferential arguments in German. In Arbeiten zur Reexivierung, Lutz Gunkel, Gereon Mller, and Gisela Zifonun (eds.), 2550. (Linguistische Arbeiten 481) Tbingen: Niemeyer. Ginzburg, Jonathan, and Ivan Sag 2000 Interrogative Investigations. Stanford, CA: CSLI Publications. Grewendorf, Gnther 1985 Anaphern bei Objekt-Koreferenz im Deutschen: Ein Problem fr die Rektions-Bindungs-Theorie. In Erklrende Syntax des Deutschen, Werner Abraham (ed.), 137171. Tbingen: Narr. Grewendorf, Gnther 1988 Aspekte der Deutschen Syntax. Eine Rektions-Bindungs-Analyse. Tbingen: Narr. Haider, Hubert 1993 Deutsche Syntax Generativ. Tbingen: Narr.

Jger, Gerhard, and Anette Rosenbach 2006 The winner takes it all almost: cumulativity in grammatical variation. Linguistics 44: 937971. Keenan, Edward, and Bernard Comrie 1977 Noun phrase accessibility and universal grammar. Linguistic Inquiry 8: 6399. Keller, Frank 2000 Gradience in grammar: Experimental and computational aspects of degrees of grammaticality. Ph. D. diss., University of Edinburgh.

322

Sam Featherston

Keller, Frank, Maria Lapata, and Olga Ourioupina 2002 Using the web to overcome data sparseness. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Jan Hajic and Yuji Matsumoto (eds.), 230237. Philadelphia. Kempen, Gerard, and Karin Harbusch 2005 The relationship between grammaticality ratings and corpus frequencies: A case study into word order variability in the mideld of German clauses. In Linguistic Evidence Empirical, Theoretical, and Computational Perspectives, Stephan Kepser, and Marga Reis (eds.), 329350. Berlin: Mouton de Gruyter. Lasnik, Howard, and Mamoru Saito 1984 On the nature of proper government. Linguistic Inquiry 15: 235289. Lenerz, Jrgen 1977 Levelt, Willem 1989 Lutz, Uli 1996 Zur Abfolge nominaler Satzglieder im Deutschen. Tbingen: Narr. Speaking: From Intention to Articulation. Cambridge, MA: MIT Press. Some notes on extraction theory. In On Extraction and Extraposition in German (Linguistik Aktuell 11), Uli Lutz, and Jrgen Pafel (eds.), 144. Amsterdam/Philadelphia: Benjamins.

Mller, Gereon, and Wolfgang Sternefeld (eds.) 2001 Competition in Syntax. Berlin: Mouton de Gruyter. Primus, Beatrice 1987 Grammatische Hierarchien(Studien zur Theoretischen Linguistik 7). Mnchen: Fink. Primus, Beatrice 1992 Selbst Variants of a scalar adverb in German. Linguistische Berichte. Sonderheft 4: 5488. Prince, Alan, and Paul Smolensky 1993 Optimality Theory: Constraint Interaction in Generative Grammar. Technical Report No. 2. Center for Cognitive Science, Rutgers University. Reinhart, Tanya, and Eric Reuland 1993 Reexivity. Linguistic Inquiry 24: 657720. Reuland, Eric, and Tanya Reinhart 1995 Pronouns, anaphors and Case. In Studies in Comparative Germanic Syntax, Hubert Haider, Susan Olson and Sten Vikner (eds.), 241269. Dordrecht: Kluwer.

Three types of exceptions and all of them rule-based Reis, Marga 1976 Ross, John 1967

323

Reexivierung in deutschen ACI-Konstruktionen: Ein transformationsgrammatisches Dilemma. Papiere zur Linguistik 9: 582. Constraints on variables in syntax. Ph. D. diss., MIT.

Schtze, Carson T. 1996 The Empirical Base of Linguistics: Grammaticality Judgements and Linguistic Methodology. Chicago: University of Chicago Press. Stechow, Arnim von, and Wolfgang Sternefeld 1989 Bausteine syntaktischen Wissens. Opladen: Westdeutscher Verlag. Sternefeld, Wolfgang 2000 Grammatikalitt und Sprachvermgen. Anmerkungen zum Induktionsproblem in der Syntax. In Von der Philologie zur Grammatiktheorie: Peter Suchsland zum 65. Geburtstag, Josef Bayer, and Christine Rmer (eds.), 1544. Tbingen: Niemeyer. Sternefeld, Wolfgang, and Sam Featherston 2003 The German Reciprocal einander in Double Object Constructions. In Arbeiten zur Reexivierung (Linguistische Arbeiten 481), Lutz Gunkel, Gereon Mller, and Gisela Zifonun (eds.), 239266. Tbingen: Niemeyer. Uszkoreit, Hans 1987 Word Order and Constituent Structure in German. CLSI Lecture notes no. 8. Stanford, CA: CSLI Publications. van der Feen, Marieka, Petra Hendriks, and John Hoeks 2006 Constraints in language processing: Do grammars count? In Proceedings of the COLING-ACL Workshop on Constraints and Language Processing (CSLP-06). Sydney: Association for Computational Linguistics. Wasow, Thomas 2002 Postverbal Behavior. Stanford, CA: CSLI Publications.

Anomalies and exceptions Hubert Haider

1.

Anomalies and exceptions

Anomalies in the observed data patterns are usually construed as exceptions in the grammar of the data patterns. Anomaly is a characterization of data properties in terms of normalization expectations. Exception is the reconstruction of an anomaly in terms of a rule with a restriction. Whether an anomaly is best characterized as the local effect of an exceptional rule or as a global system effect (result of conicting but otherwise exceptionless rules) is an empirical issue. Not every anomaly is an exception, though. It may be a mere processing difculty (see below). The conceptual difference between anomaly judgements sampled from performance data and exceptionality ascriptions to rules of grammar must not be obscured by equivocation. The source of the performance data is a selfevaluation of the mental processing of a stimulus, the second one, namely the grammar ascription, is an attempt to model the grammar-related properties of the performance data. Note that the grammar-related aspects are only a subsystem of the complex cognitive computations whose composite output is the global acceptability judgement for a given stimulus. Anomalies may be the result of processing difculties in the absence of any exceptional trait in the grammar of the given construction. Strong garden path effects are the best examples: (1) a. Man glaubt, dass Max Musiker vorgestellt bekamen.1 one believes that Max musicians introduced got3.PL

1. Scrambling of an object without overt case across the subject in the German get-passive variant produces strong deviance feelings for informants. Thanks to M. Schlesewsky for the datum.

326

Hubert Haider

b.

Das sind es.2 this are it

The perceived anomaly of (1a) is the difculty of identifying scrambling in the absence of overt case markings. If you replace Max by denacc unbekannten Max the unknown Max, and (optionally) Musiker by viele Musiker many musicians, the anomaly disappears. In (1b), both pronouns are singular, but the nite copula is marked for plural. This mismatch is perceived as an anomaly. Only when the informant is pushed to realize that the predicate es it may as a predicate refer to a plural entity, the judgement changes from deviant to perfect. An exception is a restriction on the range of a rule to a subset of its possible range of application (2b). In (2b), xn is exempted from the range of the universally quantied rule predicate P. (2) a. b. P(x) [Xxn ] P(x) exception: the range is the set X minus the member xn

There is no denying that grammars are exceptional to a certain extent. Even if exceptions exist on the level of individual items (3), the frequent case is a restriction on a subclass (4). Here is an example for a restriction on the level of an individual lexeme. In every Germanic language, the cognate of genug enough is anomalous, since it follows rather than precedes the modied item: (3) G. gross genug, E. big enough, D. groot genoeg

An example for a subclass restriction is the restriction on the German Ersatzinnitiv construction3 that is itself triggered by the exceptional avoidance of the participle form for verbs that select a bare innitive (modal, perception, causative verbs). In German, but not in Dutch, the construction is restricted to the nite clause. It is ungrammatical in innitival clauses (4b) in German, but not in Dutch (4c). (4) a. dass er es hat essen wollen that he it has eat want

2. Thanks to M. Bierwisch for reporting me this datum (source: E. Lang). Informants reject it rst, but they accept it immediately once you ask for instance Is this a possible answer for: Are this really 47 envelopes? 3. Descriptively speaking, the (nite) auxiliary that would trigger the participial form on the preceding verb is fronted and the would-be-participle appears as the bare innitive form.

Anomalies and exceptions

327

b. *ohne without c. zonder without

es it het it

zu to te to

haben essen wollen have eat want hebben willen eten have want eat

Binding by pronouns in German provides examples of an anomaly caused by inconsistent grammar requirements: on the one hand, an antecedent must ccommand the bindee (5a),4 on the other hand, a dative pronoun must follow an accusative pronoun in German. So, the German restriction on pronoun order rules out binding of an accusative reexive by a dative pronoun, since the dative would have to precede (in order to full binding) and thereby it would violate the order restriction embodied in the Grammar of German pronouns (5b). This leaves (5c) as the only option and rules out (5b), in accordance with Featherstons experimental ndings, reported in his Figure 1.5 (5)
i a. *Wir haben sich acc (selbst) ihnen i dat berlassen. we have themre (selves) to.them left i b. *Wir haben ihnen i sich acc (selbst) berlassen. dat i c. Wir haben sie acc sich i (selbst) berlassen. dat we have them (to) themre (selves) left

However, this account does not fully cover the Dative-Accusative anomaly with respect to binding in German. Although a dative may be a binder for an anaphor (6a), it appears to be disqualied if the bindee is a co-argument (6b) of the dative binder. In addition, there is another anomaly involved. A dative not only does not bind in these cases, it also interferes with binding between the subject and the object (6d), if it intervenes. (6) Er hat den Leuten i ber einander/sich i erzhlt / Biograen von dat einander/sich i gezeigt. b. *Er hat den Leuten i einander/sich acc vorgestellt. dat i c. Er hat die Leute acc einander/sich i vorgestellt. dat a.

4. The only apparent exception is binding by the nominative subject: Hat sich i re jemand i geirrt? But in this case, there is an agreeing item, i.e. the nite verb, that c-commands (see Frey 1993). 5. Note that the subjects in the experiment seem to treat binding by objects as an undened case, since they allow personal pronouns (Principle B) on a par with reexives (Principle A) in this case (see Featherstons gure 1, ndp and nar). This would follow if the core case of binding in German is binding between the subject and an object and binding between objects is un-/ill-dened in the application of German grammar by the informants.

328

Hubert Haider

d. ??Wir i haben diesem Mann einander i /uns i vorgestellt. e. Wir i haben einander i /uns i diesem Mann vorgestellt. These data have not been part of Featherstons experiment, but they are crucial for the dative anomaly. In my view, the anomaly is not yet fully understood.6 We do not yet see clearly enough whether it involves a genuine exception or is just a result of conicting interactions of otherwise unexceptional rules of binding. 2. Discreteness or gradience

Data judgements are necessarily gradient, grammars are discrete. Judgements are the composite, unconscious result of the interaction of various components (grammar, information structure, stylistic preferences, implicit comparison with potential paraphrase variants, ease of parsing and interpretation, anticipative judgements of the experimenters expectations, ). Even if each of these components would produce a discrete evaluation value, the aggregate would be a gradient function. Well-formedness as dened by human grammars is discrete, that is, a matter of yes or no. There is no such thing as 75% grammatical. Crucially, wellformedness as a property determined by grammar must not be equated with the introspective attribution of the quality of well-formedness by informants. Their judgements are reports on introspection experiences and these are surely not discrete. From a theoretic point of view, grammatical well-formedness is not adequately characterized as a matter of cumulated (relative) weights.7 The fact that informants asked for grammaticality judgements are at a loss in certain cases does not prove that grammaticality is gradient. Discreteness shows in the majority of cases if clear cases are tested. To make an issue testable is difcult,
6. Mller (1995, sect. 4.5) suggests that the Dative is base generated lower than the accusative and then raised, and therefore it could not serve as a binder for the accusative. Immediate counterevidence for this claim is the fact that i) Dutch does not allow object scrambling but has Dative < Accusative as the obligatory order, and ii) that a raised argument ought to be a possible binder (see: The meni seem to eachi other to be incompetent). 7. Optimality theory suggests discrete rules with a relative weighting in terms of their violability ranking. Until now, the proponents have not produced a universal theory of weighting, however. Without a UG of ranking, a weighted system is not learnable: If the perceived input deviates from the childs interim grammar output, the necessary changes in the interim grammar involve intractable computations of comparing alternative rankings.

Anomalies and exceptions

329

of course. It is a matter of a meticulous test design, including pilot studies, and it requires ultimately also a lot of experience, giftedness and even luck on the experimenters side. There is no need to succumb to experimentally testing the robustness of the contrast between (7a) and (7b). It is evident. The head of the attribute phrase must be adjacent to the head-initial phrase it modies (see Haider 2004a, on the edge effect). Second, an adverbial modier precedes the modied element, with the exception of genug (see (3) above). Because of the adjacency restriction, (7b) is deviant. Some speakers resort to (7d), but they never use (7e). Is the difference between (7b), (7d) and (7e) a gradient one? Yes, it is in terms of acceptability, but it is not in terms of well-formedness. (7b) and (7d) are equally ungrammatical. Why is (7d) considered less deviant? It is felt to be the best solution in a no-win situation: If you stick to the order required by the idiosyncrasy of genug, you would have to inect and thereby turn it into a kind of fake head of the attribute, otherwise it violates the adjacency requirement (7d). What remains is the violation of inecting an uninectable item that is not the head. In sum, gradience is often the result of dealing with a situation in which two requirements are in conict. (7) a. b. c. d. e. ein gengend deutliches Beispiel a sufciently clear example *ein deutliches genug Beispiel a clear enough example *ein deutliches gengend Beispiel ??ein deutlich genuges Beispiel *ein deutlich gengendes Beispiel (*, in the reading of (7a))

It is an undeniable fact that the results gained in psycholinguistic experiments are heavily inuenced by the design of the experiment. If the experimenter was in the lucky situation to be able to control most of the potentially intervening variables, the results will be close to the ideal of being representative for the issue under experimentation, but in most cases this is hard to achieve. Featherstons experiment on superiority is a good case. The experimental data document a preference for the patterns congruent with superiority in German as well as in English, but for German, the preference is characterized as not categorical in the sense that speakers would not choose to use structures violating the superiority constraint. This characterization does not pay enough attention to intervening variables, however. The experiment tests only a subset of superiority cases, namely direct questions. Indirect questions (8c, d) would

330

Hubert Haider

have been the better choice. The contrast between English and German would become even clearer. (8) a. It is unclear what belongs to whom. b. *It is unclear to whom what belongs. c. Es ist unklar, was wem gehrt. it is unclear what whomdat belongs d. Es ist unklar, wem was gehrt. it is unclear whomdat what belongs e. Wem gehrt was? whomdat belongs what f. *To whom does what belong?

Direct questions presuppose a context since they presuppose that a possible answer exists and the answer is an appropriate choice of discourse participants for the questioned elements. Second, for this choice, the order of the wh-elements provides the sorting key. For instance, in (8e), the sorting key is wem to whom. So, the elements of a subset of the set of possessors are mapped on elements of the set over which was what ranges, namely the set of possessed elements. Indirect questions do not require an answer, so they do not need a discrete choice of potential discourse participants, and so the question of the sorting key is not relevant, and therefore the information structure effect of the order of whelements is less salient. What the experimental data conrm is this. If an informant is presented a sentence in isolation, (s)he implicitly embeds it in a potential discourse situation and judges the information structure. If you have to choose between (9a) and (9b), you will easily identify the order in (9a) as congruent with the base order. The preference of (9a) is a preference of the contextually unmarked information structure. (9) Wer hat was zur Party mitgebracht? who has what to the party brought b. Was hat wer zur Party mitgebracht? what has who to the party brought c. Ich mchte gar nicht wissen, was wer zur Party mitgebracht hat. d. *I do not want to know what who has brought along to the party. a.

What a simple comparison between English and German fails to honour is a cross-linguistic generalization. Superiority does not only hold for English, it holds for any VO language (Haider 2004b), but not for OV languages. The

Anomalies and exceptions

331

absence of superiority is not just a property of German, it is a property of OV languages in general, cf., for instance, Japanese (10a, b). (10) a. Nani o dare ga katta no. [Japanese] what-obj who-sub bought Q-PRT What did who buy? b. Dare ga naze kita no who-sub why came Q-PRT *Who came why? c. Wer hat wie/weshalb/wann/wo protestiert? who has how/why/when/where protested d. *Who protested when/where/*why/*how?

The reason is this (see Haider 2004a): In VO, the VP-internal subject is preverbal and therefore not in the directionality domain of the verb. In OV, any argument, and in particular the subject, too, is in the directionality domain of the verbal head. So, the subject in VO, but not in OV, needs a functional head as a directional licenser. This is the grammar theoretic source of the existence of obligatory functional subject positions in VO and the peculiar behaviour of an in-situ wh-subject (see Haider 2004b, 2005, for details). The that-trace effect is just a facet of this phenomenon, but an ill-understood one. First, in English, the effect is absent if an adverbial intervenes (Browning 1996).8 Second, neither German nor any other OV language punishes that-tracestructures. Third, languages differ with respect to the general transparency of C-introduced clauses. In German, speakers of northern varieties object to any wh-extraction out of dass-clauses whereas Southerners are free extractors. This has been documented already by Paul (1919: 321f.). He devoted a subsection of his German Grammar to long distance wh-dependencies and referred to them justly as Satzverschlingung (sentence intertwining). In his collection, he documents plenty of cases of that-t-violations for interrogative and relative clauses. Andersson and Kvam (1984) tested extractions out of that-clauses in various locations in Germany and not only showed a contrast between extractors and non-extractors, but also a kind of adaption effect for non-extractors that tolerate extractions by others. So, that-t-violations need to be checked as carefully as Torris (1984) did in her dissertation. She showed that systematic extractors for wh-constructions are systematic extractors for the other cases (comparatives, relative clauses, long

8. Here is an example: i) Who do you think that *(under these circumstances) would disagree?

332

Hubert Haider

distance topicalization), too. On the other hand, there are extraction admitters and they show an unsystematic behaviour. They tolerate extractions by others but long distance extraction out of dass-clauses is not part of their grammar. This is reminiscent of Andersens (1973) concept of via-rules for communities in which the common vernacular language is fed by microparametrically contrasting grammars. In sum, subject extraction out of a C-introduced nite clause is on the one hand subject to constraints on extraction out of C-introduced nite clauses in general, and on the other hand, it is restricted by constraints for traces in functional spec position. The former restriction applies to variants of German, the latter one does not. Just like experimental data, corpora do not speak for themselves. They require evaluation and interpretation. We have learnt that water freezes at 0 centigrade. If you want to test it, you will nd out that the results are gradient and that you may be puzzled by exceptions and learn about the inuence of intervening third factors, like impurities, pressure uctuations, etc. Corpora are performance records, and performance is unavoidably prone to inuences of third factors, including simple mistakes and imperfections. A strict methodology of corpora evaluation is still wanting. Nevertheless, corpora may serve heuristic purposes. If something is a robust phenomenon, a big enough corpus will reect this. More subtle relations are hard to immediately assess by mere corpora inspection.

3.

Methods and theories

The ideal grammar is exceptionless (Featherston, sect. 1). But, grammars of human languages are not ideal. Unlike platonic objects (e.g. logical calculi), they are biologically grounded, cognitive systems for a culturally formed dynamic behaviour, namely human languages. What we perceive as exceptions are compromises in the ne-tuning of a complex modular system. Some of them are externally geared (diachronic relics), some of them internally (inconsistent rule demands). From the methodological point of view, it is a crucial question as to whether a perceived anomaly is the reex of an exceptional trait of the system or just apparent. It is apparent if what we perceive as exception is the result of inadequately modelling a phenomenon that would turn out as regular in a wider context of an adequate account. This is exception by error or, in other words, a scientists deciency. A main business of science is to remove these exceptions by testing and changing their theories.

Anomalies and exceptions

333

How can we distinguish between real exceptions and apparent exceptions? Featherstons claims rest on the interpretation of the results of a rened method of gaining introspection data from informants (magnitude estimation). This method in my opinion is not reliable and valid enough to call into question generalizations arrived at by a systematic comparative study of grammar by the scientic community of experts. First, naive introspection is notoriously erratic across individuals and across categories for a single individual except for the most robust contrasts. Second, an informants judgement is an aggregate of all factors that inuence a this is how I myself would (not) say-judgement. So, third, informants judgements would have to accompanied by a protocol of what the informants report as the crucial traits of the stimulus that (s)he has based the judgement on. Fourth, the patterns of reaction have to be tested for consistency (across the categories under examination) and for retest stability. Fifth, the minimal battery of statistic analysis tools needs to be employed in order to guarantee that the sample is representative, that the correlations are signicant, that the results are solid enough to stand up against the null hypothesis, and so on. If linguists apply psychological or sociological methods, they are bound to comply with the required methodological standards developed for these methods. In sum, in the face of the results and interpretations of Featherstons investigations, I do not feel compelled to give up my conviction that grammars determine discrete characteristic functions (well-formed vs. ill-formed) for linguistic expressions. What appears to be gradient is not the grammar, but the reactions of the test subjects. Models that employ weighted rules (violation costs) necessarily obscure this important difference: discrete systems produce gradient outputs if the output is mediated by additional interacting system. This is the case for human languages. Grammar theory models a cognitive capacity for a discrete symbol management algorithm. Grammar theory does not model the cognitive architecture of language production and perception. This is the realm of processing theories. Performance data bear only in a highly indirect way on competence issues. References
Andersen, Henning 1973 Abductive and deductive change. Language 49: 765793. Andersson, Sven-Gunnar, and Sigmund Kvam 1984 Satzverschrnkung im heutigen Deutsch. Tbingen: Narr.

334

Hubert Haider

Browning, Margaret A. 1996 CP-recursion and that-t-effects. Linguistic Inquiry 27: 237255. Frey, Werner 1993 Haider, Hubert 2004a Haider, Hubert 2004b Syntaktische Bedingungen fr die semantische Interpretation. (Studia Grammatica 35) Berlin: Akademie Verlag. Pre-and postverbal adverbials in VO and OV. Lingua 114: 779807. The superiority conspiracy. In The Minimal Link Condition, Arthur Stepanov, Gisbert Fanselow and Ralf Vogel (eds.), 147175. Berlin: Mouton de Gruyter. How to turn German into Icelandic and derive the VO-OV contrasts. The Journal of Comparative Germanic Linguistics 8: 153. Deutsche Grammatik. Vol. III, Part IV: Syntax. Halle an der Saale: Niemeyer. Congurations syntaxiques et dpendances discontinues en allemand contemporain. Ph.D. diss., Universit de Paris VIII-Vincennes.

Haider, Hubert 2005 Paul, Herman 1919 Torris, Thrse 1984

Distinguishing lexical and syntactic exceptions Sam Featherston

A quality that I value in Hubert Haiders work is the consistency he shows where others seek refuge in fuzziness. It was Haider who correctly stated that, in a binary model of grammar, a single counter-example falsies a rule (see the exceptionlessness quote in my paper in this volume). This fundamental fact is far too often nessed round, and it is particularly important in the context of a discussion of the role of exceptions in the grammar. If it is empirically correct that grammatical rules apply exceptionlessly, then this reveals important information about the nature of the grammatical system. If it does not hold, when all the irrelevant factors which Haider correctly mentions are controlled for, then this nding too has important implications. I agree with most of what Haider says in his commentary: the criteria he applies, the distinctions he makes, and the type of data he considers relevant. He calls for appropriate methodological standards, for careful distinctions between the various factors which can inuence introspective judgements, for the careful selection of the syntactic conditions in syntactic studies like our work on binding and superiority. All of these are concerns which we entirely share. There are however one or two points which would divide us, and it is these which I shall discuss here. Haiders distinction between an anomaly (an irregularity in the data) and an exception (a restriction on the applicability of a rule) is useful. I should like to see this extended further, however, so as to differentiate between lexical effects and rule-based effects. In my view, the lexicon is the location of all effects which are related to or restricted to specic lexical items; only patterning which is independent of lexis need be included in the rule system. The rule system, I would argue, should in the ideal case be exceptionless: lexical effects on the other hand need not be. There are naturally patterns to the behaviour of lexically-driven behaviour, but the existence of exceptions to these is in no way problematic for our conception of the linguistic system. If the lexicon is learnt, then the patterns we nd in it are mere association-based generalizations, not rules. Lexicon-based exceptions are thus to be expected and rather unexciting.

336

Sam Featherston

I would therefore hesitate before attributing to the position of genug (enough) (Haiders example 3) after a modied adjective any great importance, since we would assume this to be lexical. There are enough other examples of similar behaviour for this to be fairly clear (e.g. Engl. ago, Germ. entlang along) . Potentially more interesting cases of linear ordering are those where structures can optionally appear before or after heads, independent of lexis, such as complements to adjectives in German (stolz auf seine Kinder, auf seine Kinder stolz proud of his children), especially when this behaviour is not even marginally possible in closely related languages (Engl. *of his children proud). A thorough investigation of this phenomenon could reveal insights into head-complement order, I suspect. I would also take issue with the status of Haiders example (1b, my (1), acceptable in German. (1) A: Sind es wirklich 47 Umschlge? are it really 47 envelopes Is that really 47 envelopes? B: Das sind es. that are it That it is.

Here I would argue that the effect is not just a garden path, as Haider suggests, but a lexical exception, or rather perhaps two. The use of expletive es (it) in presentational structures with apparent plural verb agreement, if not reference (Es sind derer zwei There are two of them) is certainly exceptional, but it is not systematic, for the exception is specic to this lexical item.1 The particular example Haider advances is yet more exceptional however, because these copula structures with es normally contain a plural which can license, in whatever way, the plural verb form (Es kommen jede Woche 47 Studenten There come each week 47 students). Das sind es has no overt plural NP, since the pro-form das is not marked for plural. This structure is additionally limited to sein (to be), or perhaps just to copula verbs (2), (3). (2) Wenn ich zehn hinzufge? Werden es 47 Umschlge? If I ten add become it 47 envelopes ?? Das werden es. that werden it

1. The pronominal das seems to allow this more restrictedly too.

Distinguishing lexical and syntactic exceptions

337

(3)

Es kommen jede Woche 47 Studenten zu deiner It come each week 47 students to your Sprechstunde? ofce hour *Das kommen es. that come it

The main aim of my paper was to show linguists shy of gradience that gradience can be good news for syntactic theory. A more empirically adequate model of grammar is at the same time more descriptively adequate but also more explanatorily adequate, because it permits us to reduce the number of exceptions within the rule system, which means that the grammar can be more general, more learnable and/or more universal. Haider and I are largely in agreement about most of the terms of this important debate, but I have argued here that syntactic and lexical restrictions may have very different qualities.

Disagreement, variation, markedness, and other apparent exceptions Ralf Vogel

Abstract. With the example of case conicts in German free relative constructions, I discuss three problems which might count as grammatical exceptions: constraint violation, systematic variation within a speech community, and contradicting evidence from different empirical domains. The systematic nature of these phenomena calls for a more complex conception of grammar and language in linguistic theory. What is called into question are some of the idealisations that gure frequently in linguistic work.

1.

Introduction

Whether a particular linguistic observation is classied as exception, depends on what we assume to be the rule. Background assumptions about the nature of grammars and languages guide this classication. A perhaps naive or folk linguistic characterisation of languages and grammars might include the following statements: Grammatical rules and constraints are obeyed by all expressions in a language. Otherwise they would not be rules and constraints of that language. If two speakers speak the same language, they speak the same language in every respect. Otherwise they would not speak the same language. Linguists have an objective elicitation method at hand which allows them to gure out exactly which expressions are well-formed in a language, and which are not. It is common wisdom among linguists that none of these three assumptions can seriously be upheld. But I have the impression that the consequences of this insight are farther reaching than linguists are usually willing to admit. If we give up the idea that there is a clear-cut boundary between well-formed and ill-formed, for instance, our notion of language changes: there are better and worse examples for expressions of a language, an expression might be seen as belonging to that language only to a particular degree.

340

Ralf Vogel

A grammatical model that accounts for this has to embody the means to deal with gradient well-formedness. Exceptions to empirical generalisations have a different status in such a grammar. If expressions are seen to be good members of a language to varying degrees, then in fact every expression is an exception to a certain degree. Such a line of reasoning has a quite long research tradition within linguistic theory which mostly revolves around the notion of markedness. An important insight from markedness theory is the idea that a linguistic generalisation, stated in the form of a grammatical constraint or rule, is not falsied by counterexamples. Rather, constraints and rules are seen as violable tendencies, or so-called soft constraints. A very inuential recent development that emerged from this line of grammatical thinking is Optimality Theory (its founding document is Prince and Smolensky 2004, which has been circulated as manuscript from 1993 on). The violability of constraints is a core assumption of Optimality Theory (OT). In this model, constraints frequently come into conicts which are resolved by prioritisation. Every expression that violates a markedness constraint can be seen as an exception to the linguistic generalisation that motivates the constraint. So, exceptions are expected to occur quite frequently, and in general are easier to accommodate in OT than in a theory that assumes inviolable constraints, where every exception would at the same time be a counterexample. An exception, i.e., the violation of a constraint, occurs in order to full another, more important grammatical constraint. Not only can our conception of the grammar be liberalised in order to deal with exceptions, but also our views of what a language is. The study of the diachronic development of language together with sociolinguistic and dialectological research have led to a conception of languages as constantly changing continua with unsharp boundaries. Nevertheless, particular stages in the historical development or particular dialectal varieties are often described, as if their speakers linguistic behaviour was uniform. This picture might be too idealised. That corpus frequencies and psycholinguistic experiments mirror grammar only in a distorted way has been an important argument in early generative grammar in discarding this kind of work as irrelevant for grammatical theory. But an alternative elicitation method has not been developed within that framework. Generative linguistics, especially in syntax, often seems to presuppose that we already know which expressions are well-formed in a language. But very often, this is not the case. The development of solid elicitation methods continues to be a central task in linguistic research. The studies that I want to present in this paper touch on each of these issues. The morpho-syntactic phenomenon that we will explore are case con-

Disagreement, variation, markedness, and other apparent exceptions

341

icts in German free relative constructions (FR). This construction is one of the rare cases where conicting constraints can really be observed, as I will briey sketch in Section 2. The wh-pronoun in (1) is sensitive to the case requirements of both the matrix verb besuchen, here: accusative, and the verb inside the FR, vertrauen, here: dative. The pronoun can only realise one case morphologically. (1) Ich besuche wem ich vertraue I visit [who-DAT I trust]-ACC I visit who I trust.

This conict situation leads to ungrammaticality in far fewer cases than one might expect. However, we also observe disagreement among both linguists and native speakers about these structures, which nevertheless follows a systematic pattern, as I will show in Section 3. The generalisation about this systematic pattern can be phrased in terms of tolerance of markedness. It would get lost, if German were particularised into several unconnected sub-variants, each with their own exceptionless grammar. We gathered empirical evidence with psycholinguistic experiments and corpus studies (see Sections 4 and 6) which largely conrms the Optimality Theoretic analysis that I will present below (see Section 5). But we also found counterevidence in one interesting case pattern. Perfectly well-formed expressions can be quite rare, if they contain redundant material. This sheds some light on how representative corpus data are for the grammar of a language. 2. Case realisation as violable grammatical constraint

The morphological non-realisation of an assigned case on a noun phrase usually leads to ungrammaticality, as in the German examples in (2) where nominative is required on the subject:1 (2) Der the b. *Den the c. *Dem the a. Schiedsrichter referee-NOM Schiedsrichter referee-ACC Schiedsrichter referee-DAT hat has hat has hat has gepffen. whistled gepffen. whistled gepffen. whistled

1. I will follow here the standard assumption that nominative case is assigned to the subject by the nite verb here, the auxiliary.

342

Ralf Vogel

Not all instances of non-realisations of case lead to ungrammaticality. It can happen that a noun is confronted with two conicting case requirements. An example is the relative pronoun of argument free relative clauses (FR), as in the following examples from Modern Greek: (3) a. Agapo opjon/*opjos me agapa. love-1Sg whoever-ACC/*NOM me loves I love whoever loves me. Opjon/opjos piaso tha timorithi. whoever-ACC/NOM catch-1Sg FUT be punished-3Pl Whoever I catch will be punished. (Alexiadou and Varlokosta 1995)

b.

In (3a) we see that the FR pronoun realises the case assigned by the matrix verb (m-case, here: accusative), and suppresses the one assigned by the relative clause internal verb (r-case, here: nominative). If the FR pronoun has the chance of retaining r-case, it can nevertheless do so, as shown in (3b). The two options for (3b) result from the fact that Modern Greek is a pro-drop language: (3b) can be given two different syntactic analyses, where only one of them actually displays a case conict: (4) a. b. [IP [FR ] [IP pro ] ] = FR is left dislocated, an empty pronoun is the subject, no case conict, r-case required [IP [FR ] ] = FR is the subject, case conict, m-case required

When the FR is in the syntactic position of the subject, it is assigned nominative case by the nite verb. This case surfaces on the FR pronoun, yielding opjos in (3b). When the FR is left dislocated, we have an empty pronoun that serves as subject and is assigned nominative. Now, the FR pronoun is free to realise r-case, and so it does, yielding opjon in (3b). These examples show on the one hand that the morphological realisation of accusative case is a constraint of the Greek grammar. But, on the other hand, it is obviously violable. Otherwise, case conicts would not be tolerated at all. As already explained in the rst section of this chapter, constraint violation is a kind of exception that is expected under an Optimality Theoretic perspective on grammar.2 It is the result of a situation where different constraints come into conict, and only one of them can be fullled by an expression. The conict is resolved by giving one of the constraints higher priority.
2. For representative work in OT syntax, see, for instance, (Legendre et al. 2001). I developed an OT analysis of FR constructions in (Vogel 2001, 2002, 2003).

Disagreement, variation, markedness, and other apparent exceptions

343

That the FR pronoun in German, which obligatorily realises r-case, is sensitive to both case requirements has already been reported by Pittner (1991). According to her, (5c) and (5d) are ungrammatical: (5) Ich I b. Ich I c. *Ich I d. *Ich I a. lade ein wen ich treffe. invite[acc] who-ACC I meet lade ein wem ich begegne. invite[acc] who-DAT I meet lade ein wer mir begegnet. invite[acc] who-NOM me-DAT meets helfe wen ich treffe. help[dat] who-ACC I meet

The crucial difference between (5b) and (5c) lies not in the fact that the case assigned by the matrix verb, accusative, is left unrealised. It is important, in favour of which case it remains unrealised: accusative might be suppressed in favour of dative, but not in favour of nominative. In general, a FR with a case conict is well-formed, according to Pittner, only if the suppressed case is not lower on the following case hierarchy: (6) The German case hierarchy: NOM < ACC < OBLIQUE (DAT,GEN,PP).

This hierarchy goes hand in hand with further distinctions among the case forms. Nominative is assumed to be the default case, nominative and accusative are structural cases, while dative, genitive and PPs are usually assumed to be oblique case forms. While the dative can be argued to have a certain semantic contribution to the meaning of the clause it occurs in, and seems to be limited to thematic roles of a certain kind, no such restrictions can be proposed for nominative and accusative. 3. Variation among linguists

The judgements given by Pittner are not shared by all linguists. In the view of Groos and van Riemsdijk (1981), structures like (5b) are ill-formed already, in addition to (5c) and (5d). But in (Vogel 2001), I nevertheless observe that for many, though not all, speakers, only (5d) is bad, while the other three clauses in (5) are acceptable. There is, thus, an obvious disagreement among linguists about the facts. In Vogel (2001), I deal with this disagreement in terms of variation. German might

344

Ralf Vogel

have three variants, German A, B, and C, respectively, which differ in their tolerance of case conicts. The four types of FRs in (5) can be distinguished by how serious the case conict is. They differ in their relative markedness which can be described as in (7). (7) a. b. c. d. The two case requirements match, no case conict. Case conict, where the hierarchically lower case here: accusative is suppressed. Case conict, where the hierarchically higher case a structural case, here: accusative is suppressed. Case conict, where the hierarchically higher case an oblique case, here: dative is suppressed,.

No dialectal or sociolectal factor could be identied for the three variants German A, B, and C. The best interpretation of this observation is perhaps statistical: it might be an unrealistic expectation that a speech community is homogeneous in every grammatical respect. But still, the variation follows a characteristic pattern which itself can be described in grammatical terms, as sketched in (7). Furthermore, it might be the case that German speakers agree in the relative acceptability of the structures. The variants only differ from each other in the tolerance of markedness. But what counts as marked seems to be determined in the same way in all variants. Under a categorical view on grammar, two of the three variants of German must be seen as exception to the third variant, which would have to be identied as the norm. But the decision which variant represents the norm can only be arbitrary. Alternatively, and this is the point of view that I prefer, the observed variation, which follows a systematic pattern of markedness, could be seen as the norm, the reality of the German grammar. If we rely on absolute grammaticality, we deal with three ununiable dialects. But if we use relative acceptability as our empirical base, we might only have one single language. The three apparent variants, empirical exceptions for one another, might result from the use of the wrong empirical base in grammar modeling, namely, absolute, rather than relative acceptability. Whether an observation counts as exceptional or regular is sometimes only a matter of theoretical background assumptions. In the next section, I will briey present the results of elicitation experiments on case conicts in German FR constructions, that I undertook in collaboration with Stefan Frisch and Jutta Boethke at the University of Potsdam (see Boethke 2005), in order to get a clearer picture of the empirical side of the phenomenon.

Disagreement, variation, markedness, and other apparent exceptions

345

4.

Empirical exploration

We carried out three speeded acceptability judgement experiments, exploring different case conicts in FR constructions, nominative versus dative (experiment 1), nominative versus accusative (experiment 2), and accusative versus dative (experiment) 3. The participants of the experiments were 24 (different) students in each of the experiments. Stimulus sentences were presented wordwise on a computer screen, in randomised order, mixed with test sentences from 3 other experiments which served as distractor items. After the sentence was nished, subjects had to press one of two buttons for (non-)acceptance. Two of the experiments were carried out by Jutta Boethke, as part of her diploma thesis (Boethke 2005). Each experiment contained 8 conditions. Among them were four sentences with FRs with the four possible case patterns. In experiment 1, all FRs were clause-initial, and each of the four sentences was paired with a correlative variant which avoids the case conict with an additional resumptive d-pronoun. In experiment 2 and 3, the FRs were tested in clause-initial and in clause-nal position. 4.1. Experiment 1 nominative versus dative The rst experiment dealt with the conict between nominative and dative. The FRs were clause-initial in each of the eight test conditions. The four logically possible case patterns appeared in two versions, one with (correlative) and one without (FR) a resumptive d-pronoun following the FR. Each participant saw eight items of each of the eight conditions. Lexical variation between the blocks and the item sets that the participants saw ensured that there was no confounding by the lexical material. The test conditions were constructed as in (8). (8) a. Wer uns who-NOM us-DAT vertrauen. trust Wem wir who-DAT we-NOM wir vertrauen. we-NOM trust hilft, (der) wird uns helps (that-one)-NOM will us-DAT

b.

helfen, (dem) werden help, (that-one-DAT) will

346

Ralf Vogel

c.

d.

Wem wir who-DAT we-NOM vertrauen. trust Wer uns who-NOM us-DAT wir vertrauen. we-NOM trust

helfen, (der) wird uns help, (that-one-NOM) will us-DAT

hilft, (dem) werden helps (that-one-DAT) will

Subjects were presented each sentence wordwise on a computer screen and were nally prompted to press one of two buttons for (non-)acceptability. We recorded both the judgements and the reaction times. I will only present the judgement data here. Table 1 displays the results of the four conditions without resumptive pronoun.3 In the statistical analysis, we found that matching FRs are signicantly more likely to be accepted than non-matching FRs. Furthermore, suppression of nominative is signicantly more likely to be accepted than suppression of dative, and nally, clause-initial matching nominative FRs are more likely to be accepted than clause-initial matching dative FRs. This latter observation is presumably due to an additional factor which has not been controlled for in this experiment. The clause-initial position is the default position for subjects, but not for objects. Matching dative FRs are objects of their matrix clause, and so should occur in object position. For subordinate object clauses, this is the clause-nal position. We controlled for this factor in the second and third experiment.
Table 1. Experiment 1, FRs only, average acceptance in %. m-case nom dat nom dat r-case NOM DAT DAT NOM % accepted 86.98 70.83 61.98 16.67

3. Here and throughout, I abbreviate case patterns in the form case1-CASE2, where the rst case, in lowercase letters, is the suppressed m-case, while the case in uppercase letters is the case that surfaces on the wh-pronoun, r-case.

Disagreement, variation, markedness, and other apparent exceptions

347

The results conrm the picture that has been drawn in the literature. FRs without a case conict have a higher probability to be accepted than FRs with a conict, and among the conicting FRs, suppression of nominative is more often accepted than suppression of dative. 4.2. Experiment 2 nominative versus accusative The second experiment again contained eight test conditions, but this time each of the four case patterns varied with the FR in initial and nal position. We had no correlative structures. (9) illustrates this with the two case patterns nomNOM and nom-ACC. In addition to these conditions, we constructed parallel items with the patterns acc-ACC and acc-NOM. (9) a. Wer uns vermisst, wird uns suchen. [who-NOM us-ACC misses]-nom will us-ACC search Who misses us, will look for us. Uns wird suchen, wer uns vermisst. us-ACC will search [who-NOM us-ACC misses]-nom Wer uns vermisst, werden wir suchen. [who-NOM us-ACC misses]-acc will we-NOM search Wir werden suchen, wer uns vermisst. we-NOM will search [who-NOM us-ACC misses]-acc

b. c. d.

Apart from the different construction of the test items, the experiment had the same design as experiment 1. The acceptability results are displayed in Table 2.
Table 2. Experiment 2, average acceptance in %. m-case nom acc nom acc r-case NOM ACC ACC NOM initial 94.27 83.85 69.27 49.48 nal 81.25 91.15 78.65 70.83 total 87.76 87.50 73.96 60.16

The results again show that matching FRs are more likely to be judged acceptable. Interestingly, our suspicion that the syntactic position is a relevant factor could be conrmed. Matching nominative FRs are signicantly better in initial position, while matching accusative FRs are signicantly better in nal position. The case hierarchy could also be shown to play a role. Suppression of nominative is signicantly more likely to be judged acceptable than suppression of accusative. Interestingly, non-matching FRs are signicantly more likely to be

348

Ralf Vogel

judged acceptable in nal position, no matter which grammatical function they serve in the matrix clause. Even a non-matching FR that serves as subject of the matrix clause has a higher probability of acceptance, if it occurs in nal position. It seems that the initial position is only advantageous, if the case of the wh-pronoun does not provide contradictory information. In a statistical analysis across the two experiments, we found a signicant contrast between initial FRs, where accusative is suppressed (49 % in Exp. 2) and initial FRs where dative is suppressed (17 % in Exp.1). This result, together with the previous ndings, conrms the proposal by Pittner (1991) that the case hierarchy NOM < ACC < OBLIQUE (DAT,GEN,PP) is a crucial factor for the acceptability of FRs with case conicts. 4.3. Experiment 3 accusative versus dative The third experiment dealt with the two object cases accusative and dative. The experiment design was the same as in experiment 2. We had eight conditions, namely the presence of FRs in the four logically possible case patterns in clauseinitial and clause-nal position. Examples for the test conditions with the datDAT and dat-ACC patterns are given in (10). (10) a. Ich helfe wem ich vertraue. I help [who-DAT I trust]-dat I help whom I trust. Wem ich vertraue helfe ich. [who-DAT I trust]-dat help I Ich besuche wem ich vertraue. I visit [who-DAT I trust]-acc Wem ich vertraue besuche ich. [who-DAT I trust]-acc visit I (plus 4 conditions with the acc-ACC and dat-ACC patterns)

b. c. d.

The default position for both accusative and dative object clauses is the clausenal position. We therefore leave out the results for clause-initial FRs here. The results for the clause-nal FRs are given in Table 3. This experiment replicates the results of our earlier pilot study (Vogel and Frisch 2003) on the same case pattern. Matching FRs are at the same high level of acceptability, which is signicantly higher than for non-matching FRs, where suppression of accusative has higher acceptability than suppression of dative.4
4. The non-signicant contrast between matching accusative and matching dative FRs might be an effect of the overall lower frequency of dative in German.

Disagreement, variation, markedness, and other apparent exceptions Table 3. Experiment 3, average acceptance in %. m-case acc dat acc dat r-case ACC DAT DAT ACC accept. 91.67 86.46 72.40 54.17

349

This result also very clearly conrms the two factors that have been identied as crucial for the acceptability of FRs. Having no case conict is better than having one, and suppressing the less important case is better than suppressing the more important one. The still quite high acceptability rate for the worst structure in this experiment might cause some worries. If suppression of dative counts as ungrammatical in German, why, then have such structures been accepted at the rate of 54 %? Can we take the percentages that we get in such experiments at face value? Certainly not. An experiment creates quite special laboratory conditions which inuence linguistic intuitions in many ways. Because such factors usually inuence our experimental conditions to the same degree, this does not affect the relation between the conditions, but we cannot assume, from the experiments results, that such structures are grammatical or ungrammatical, not even that they have gradient acceptability of a certain degree. All we can trust in, is the fact that there are statistically signicant contrasts given our experimental conditions. A second, sociolinguistic factor is the fact that most subjects came from the Brandenburg area and that the local dialect mixes up the two object cases dative and accusative. Hence, we expect a certain amount of uncertainty and confusion when subjects from this area deal with a case conict between dative and accusative. To clarify this, the experiment will have to be replicated in a different area of Germany. 4.4. General results of the experiments On the basis of these results we can identify four types of case patterns for FRs. These four types differ signicantly in their relative probability of being accepted by native speakers. We thus get an acceptability hierarchy of case patterns. The three observed variants of German can be classied along this hierarchy. As illustrated in (11), each variant correlates with one of the markedness contrasts that showed up as statistically signicant in our experiments.

350 (11)

Ralf Vogel

Markedness hierarchy of German FR constructions: (i) Matching FRs. (acceptable in German A,B,C) (ii) Non-matching FRs where the suppressed case is lower on the case hierarchy. (acceptable in German A,B) (iii) Non-matching FRs where the suppressed case is higher on the case hierarchy, but not an oblique case (i.e., the [acc-NOM] pattern). (acceptable in German A) (iv) Non-matching FRs where an oblique case is suppressed. (not acceptable in German, according to the common view)

Assume that our statistical ndings correctly characterise the German facts. Then one prediction would be that in an arbitrarily selected group of informants, more of them would accept FRs with the nom-ACC pattern than FRs with the acc-NOM pattern. If only absolute acceptability is elicited in such a population, and only one item per case pattern, then this elicitation would reproduce the three variants German A, B, and C. Thus, the nding that there are three variants is in fact compatible with our statistical ndings. If there were three variants, and if they were characterised as given in the literature, then, under the assumption that each variant shows up equally in a given population, the results should come out as they did in our experiment. There is, however, another important observation. Each test condition was elicited eight times with every participant of our experiments. It was not the case that participants consistently judged all items of one condition alike. Rather, for the structures with intermediate acceptability status, it has also been the case that they were accepted to an intermediate degree by many participants, for instance, we got three rejections and ve acceptances from one and the same participant. In statistical terms, the population had normal distribution which is, in fact, a prerequisite for the application of the statistical tests we used, analyses of variance (ANOVA). If one wants to make this nding compatible with the idea that there are three variants of German, one would have to postulate that individual speakers constantly shift between the three variants. Consider, however, that speakers had no choice to express intermediate acceptability status of structures. They only had two buttons for acceptance and rejection. The four-point scale in (11) has to be mapped onto the two-point scale in our experiment. There are three possible options for this mapping, and these correspond exactly with our three variants. Are there three variants or is there only one variant with differing probabilities of acceptance? Given the above reasoning, both characterisations are equally acceptable. They are both valid descriptions of the same underlying grammar.

Disagreement, variation, markedness, and other apparent exceptions

351

In the next section, we will briey sketch an optimality theoretic description of this underlying grammar. 5. A brief OT reconstruction

The constraints that we need for the Optimality Theoretic reconstruction of the grammar of German FRs must encode two things, (i) the fact that having a case conict is worse than not having one, and (ii) the case hierarchy. With a slight modication of the constraint system I previously proposed (Vogel 2001, 2002, 2003), these tasks are fullled by the following three constraints:5 Realise Case(RC): An assigned case requires a morphological instantiation. Realise Case (relativised)(RCr): An assigned case requires a morphological instantiation of itself or a case that is higher on the case hierarchy. Realise Oblique (RO): Oblique Case must be morphologically realised. These constraints evaluate the morphological realisation of syntactically assigned case features. The constraints are seen as markedness constraints insofar as they evaluate properties of the expression itself, in particular the correspondence between syntactic case relations and their morphological expression.6 The constraint Realise Oblique especially refers to the realisation of dative case. It is the most respected constraint, neither of our observed variants accepts the non-realisation of dative case. The three constraints tolerate a situation, where one phrase, the FR pronoun, serves two case assigners at the same time, as in a situation where both case requirements match. The constraint Realise Case is fullled under such conditions. Realise Case (relativised) is a
5. In Vogel (2001, 2002, 2003) I do not use the constraint Realise Oblique. Its function is taken over by more complex mechanisms of the OT grammar which I will not go into here. The constraint Realise Oblique might be seen as a shortcut for those complex mechanisms. 6. However, whether a constraint counts as markedness constraint or as faithfulness constraint is also matter of the architecture of the OT model: faithfulness constraints evaluate the preservation of certain features of the input in output candidates. Markedness constraints do not refer to the input. In our case, the constraints on case realisation are markedness constraints because the candidates are syntactic structures with morphologically inected words as terminal nodes. Thus, the constraint violations can be checked without reference to the input. A different scenario is possible, however, where the syntactic structure is given in the input and the word forms are given in the output candidates. In that case, the constraints compare the syntactic case conguration given in the input with the morphological case instantiations in the candidates, and the constraints would have to be seen as faithfulness constraints.

352

Ralf Vogel

liberal version of that constraint, which accepts a case to be realised by a case which is higher on the case hierarchy. The ranking of these constraints in German is as in (12). (12) RO RCr RC

FRs which full RO are more likely to be accepted than those which violate RO. FRs which additionally also full RCr fare even better, and those which also full RC fare best. Thus, this ranking models the empirical ndings presented in the previous section. If we also want to reconstruct the three German variants, we have to specify our OT model in the standard way, by predicting optimal structures as grammatical. We need an input specication and a set of competing candidate outputs. Suppose that we have two competing candidate structures, a FR structure, and a correlative structure (CORR), one where the FR is accompanied by an additional resumptive d-pronoun, as in the rst experiment presented in the previous section. Let us assume that the input is a syntactic specication, including the case pattern and the structure of a FR. In such an OT competition, the CORR structure violates a constraint on input preservation, a faithfulness constraint, because its syntactic structure differs from the FR structure specied in the input. (13) Faithfulness (F): The input is preserved in the output.

Different rankings of faithfulness now yield the three variants. The higher the rank of F, the more FR structures are allowed. We only have to integrate F into the constraint hierarchy in (12), as in (14). (14) German A: RO German B: RO German C: RO F RCr RC RCr F RC RCr RC F

The interaction of faithfulness and markedness constraints is the standard approach to optionality and ineffability within Optimality Theory (see Legendre et al. 1998; Bakovi and Keer 2001, for applications of faithfulness in OT sync tax). Each variant, for instance, allows for CORR structures, as these perform perfectly, when they do not violate F. This is the case, when the input is specied for CORR, rather than FR. As there is no restriction on possible inputs in OT, such a competition also has to be considered. In a sense, the task of faithfulness is the reconstruction of an old style set of grammatical expressions. The substance of the grammar, however, lies in the markedness constraints, here: the three constraints on case realisation.

Disagreement, variation, markedness, and other apparent exceptions

353

The tableaux in (15) show how the acceptability patterns of German A are derived. (15) German A:

Only suppression of dative is ruled out in that variant, so F is ranked immediately below RO. In less problematic case conicts, like the acc-DAT and the acc-NOM pattern, the FR is given an advantage by faithfulness. Only in the case of non-realisation of dative, like the dat-ACC conict, is the unmarkedness of CORR crucial. When we make an elicitation experiment, we do not know in advance which variant an informant belongs to. Likewise, as the same candidate structure might appear in many different OT competitions, and violations of faithfulness vary with different competitions (it evaluates input-out relations!), faithfulness constraints are not very informative. What remains constant among different competitions, are the violations of markedness constraints which are summed up in Table 4. The result is a relative ranking of our test structures.

Table 4. Relative markedness of FRs with different case conicts. RO 1. FR: acc-ACC 2. FR: acc-DAT 3. FR: acc-NOM 4. FR: dat-ACC RCr RC * * *

* *

354

Ralf Vogel

Our OT grammar predicts that the relative markedness of these structures, as determined by their markedness proles, should result in relative acceptability in judgement experiments, relative frequency in corpora, and similar contrasts in other empirical studies. The facts presented thus far conrm this prediction. But there is contradictory evidence which will be discussed in the next section. 6. Contradictory evidence

6.1. FR vs. CORR in experiment 1 In the rst experiment that we reported in Section 4, we contrasted clause-initial FRs in the four case patterns with nominative and dative to the corresponding correlative structures, as in (16). (16) Wer uns hilft, (der) wird uns who-NOM us-DAT helps (that-one)-NOM will us-DAT vertrauen. trust Whoever helps us will trust us.

The acceptability results are displayed in Table 5. The correlative structures avoid the case conict by providing one pronoun for each of the two assigned cases. Consequently, they receive similar and high acceptability in all four case patterns. The contrast between FR and CORR is signicant for all case patterns, except for the least problematic one, nom-NOM, where the acceptability of CORR is also higher, but the acceptability of the FR is already too high to yield a signicant contrast between the two structures. The lacking of a signicant contrast might not be problematic here. But an earlier corpus study shows that this case pattern is more problematic for the model proposed here.
Table 5. Mean acceptability percentages for FR and CORR in different case congurations. nom-NOM FR 87 CORR 95 dat-DAT FR 71 CORR 91 nom-DAT FR 62 CORR 92 dat-NOM FR 17 CORR 90

Disagreement, variation, markedness, and other apparent exceptions

355

6.2. A corpus study In (Vogel and Zugck 2003), we studied the relative corpus frequency of FR and CORR. We used the publicly available Cosmas II corpus of the Institut fr Deutsche Sprache in Mannheim, Germany (an extremely large corpus, consisting mainly of newspaper texts). Random samples, each 500 sentences long, have been selected, with the wh-pronouns wer, wem, wen (who in nominative, dative, accusative). The FR and CORR usages in each of the samples have been counted and sorted into the different case patterns. The relevant results of this count are displayed in Table 6.7 We see that FRs are quite rare, when m-case is dative or accusative. With nominative as m-case, CORR is more frequent than FR in case conict congurations. But very surprisingly, in the nom-NOM pattern, the frequency of FR is about nine times as high as that of CORR. The average distance between wh-pronoun and the rst word of the matrix clause in the nom-NOM context was 6.02 (FR) vs. 12.04 (CORR). There was a highly signicant correlation of length and clause type.
Table 6. Frequencies of clause-initial FR and CORR in the context of different case patterns. m-case nom nom nom acc dat r-case NOM DAT ACC ACC DAT FR 274 (89.8 %) 33 (34.4 %) 5 (25 %) 1 (20 %) 1 (5.6 %) CORR 31 (10.2 %) 63 (65.6 %) 15 (75 %) 4 (80 %) 17 (94.4 %)

For this case pattern, it seems that FR is less marked than CORR. This contradicts the tendency we found in the elicitation experiment, and it also contradicts the expectations generated from our OT model. The results of two different empirical methods thus contradict each other. Should the results of one method be treated as exception and those of the other as the rule? Which results should be assumed to reect the underlying grammar more appropriately, and how can we justify such a decision? Given our grammatical description of FR and CORR, there is no doubt that CORR should count as the less marked structure. This is also clear from the corpus counts for conicting case patterns. The typological perspective also
7. Pittner (2003) reports another corpus study on this phenomenon, with largely equivalent results.

356

Ralf Vogel

provides arguments in favour of our view, as the languages that have FRs seem to build a proper subset of those that have CORRs (see Vogel 2002). The explanation for the exceptionally low frequency of nom-NOM correlatives is that they are over-correct: the resumptive pronoun is redundant, because both the position of the FR and the case of the FR pronoun already signal the correct m-case. In judgement experiments, the resumptive pronoun makes grammatical information explicit. Judgements are more accurate and easier to make, and so the resumptive pronoun is rewarded. But in production, there seems to be a tendency to avoid redundant material, if possible.8 This effect might be observable with long-winded expressions in general. Consider the two English questions in (17): (17) a. b. Who stole my car? Who was it that stole my car?

These examples are semantically equivalent. Though both questions are wellformed English, the cleft construction (17b) is certainly much less frequent. In the case of long-winded expressions, low frequency is no sign of reduced acceptability, and thus should not be reected in a grammar model. It is necessary to have a theory about how grammatical properties enter corpus frequencies. No empirical method mirrors only the properties of the grammar. This is also true of acceptability judgements. It is a well-known fact that acceptability judgements are strongly inuenced by properties of the human parser, the limitations of working memory, and other psychological factors (see Fanselow and Frisch 2006 for a recent discussion of these issues). Consider the question how many grades of acceptability there are. In our experiments we only used two, but these yielded a four-way distinction among types of case conicts. Should we therefore use a four-point scale in the future? Perhaps, when dealing with other phenomena we might nd three-way, veway etc. distinctions. It is impossible to postulate a priori, how ne-grained acceptability is, or even whether it is categorical or gradient. Consequently, there cannot be a right elicitation method. Results of experiments have to be interpeted in the light of independently developed linguistic theories. 7. Conclusion

Exceptions to grammatical rules and constraints are expected under an Optimality theoretic perspective that assumes that these can come into conict with8. See Vogel (2006) for further discussion of this issue.

Disagreement, variation, markedness, and other apparent exceptions

357

out leading to ill-formedness. Such a perspective is empirically well-motivated. One obvious example for such a situation are case conicts in FR constructions. Variation is not simply exception if it follows the pattern of relative markedness observed here. It only reects tolerance of markedness dened on the basis of the same underlying grammar (understood as system of ranked markedness/well-formedness constraints). Hence, the same grammar might lead to different empirical outcomes. Individual members of a speech community might contradict each other, though it can be shown that the community as a whole follows the same grammatical system. Discrepancies between different empirical methods do not necessarily constitute grammatical exceptions. As in our case, they sometimes only reect that studying grammar cannot be reduced to analysing one single empirical domain, and that empirical methods have their limits. Violation proles assigned to structures by an OT grammar can be rather successfully used for empirical predictions. This increases falsiability. A comparison of the relative markedness assigned by the grammar with the relative acceptabilities, frequencies, and preferences observed with different empirical methods, not to forget the typology of a given phenomenon, leads to deeper insights into the nature of both the grammar and those empirical domains. Acknowledgements I want to thank my collaborators Stefan Frisch, Jutta Boethke, and Marco Zugck, without whom the empirical research presented in this paper would not have been undertaken. I am also grateful to the audience of the DGfS workshop on exceptions in February 2005 and its organisors, Horst Simon and Heike Wiese, for a fruitful discussion and helpful suggestions. This work has been supported by a grant from the Deutsche Forschungsgemeinschaft, grant FOR-375/2-A3, for the interdisciplinary research group Conicting Rules in Language and Cognition at the University of Potsdam. References
Alexiadou, Artemis, and Spyridoula Varlokosta 1995 The syntactic and semantic properties of free relatives in Modern Greek. ZAS Working Papers in Linguistics 5: 130. Bakovi , Eric, and Edward Keer c 2001 Optionality and ineffability. In Optimality Theoretic Syntax, Graldine Legendre, Jane Grimshaw, and Sten Vikner (eds.), 97112. Cambridge, MA: MIT Press.

358

Ralf Vogel

Boethke, Jutta 2005

Kasus im Deutschen: Eine empirische Studie am Beispiel freier Relativstze. Diploma thesis, Institute of Linguistics, University of Potsdam.

Fanselow, Gisbert, and Stefan Frisch 2006 Effects of processing difculty on judgments of acceptability. In Gradience in Grammar: Generative Perspectives, Gisbert Fanselow, Caroline Fry, Matthias Schlesewsky, and Ralf Vogel (eds.), 291-316. Oxford: Oxford University Press. Groos, Anneke, and Henk van Riemsdijk 1981 Matching effects with free relatives: A parameter of core grammar. In Theories of Markedness in Generative Grammar, Adriana Belletti, Luciana Brandi, and Luigi Rizzi (eds.), 171216. Pisa: Scuola Normale Superiore di Pisa. Legendre, Graldine, Jane Grimshaw, and Sten Vikner (eds.) 2001 Optimality Theoretic Syntax. Cambridge, MA: MIT Press. Legendre, Graldine, Paul Smolensky, and Colin Wilson 1998 When is less more? Faithfulness and minimal links in WH-chains. In Is the Best Good Enough? Optimality and Competition in Syntax, Pilar Barbosa, Danny Fox, Paul Hagstrom, Martha McGinnis, and David Pesetsky (eds.), 249289. Cambridge, MA: MIT Press. Pittner, Karin 1991 Pittner, Karin 2003 Freie Relativstze und die Kasushierarchie. In Neue Fragen der Linguistik, Elisabeth Feldbusch (ed.), 341347. Tbingen: Niemeyer. Kasuskonikte bei freien Relativstzen eine Korpusstudie. Deutsche Sprache 31: 193208.

Prince, Alan and Paul Smolensky 2004 Optimality Theory. Constraint Interaction in Generative Grammar. Cambridge, MA: MIT Press. [The 1993 manuscript is available at http://roa.rutgers.edu.] Vogel, Ralf 2001 Case conict in German free relative constructions. An Optimality Theoretic treatment. In Competition in Syntax, Gereon Mller, and Wolfgang Sternefeld (eds.), 341375. Berlin: Mouton de Gruyter. Free relative constructions in OT syntax. In Resolving Conicts in Grammars: Optimality Theory in Syntax, Morphology, and Phonology, Gisbert Fanselow, and Caroline Fry (eds.), 119162. [Linguistische Berichte, Sonderheft 11] Hamburg: Buske.

Vogel, Ralf 2002

Disagreement, variation, markedness, and other apparent exceptions Vogel, Ralf 2003

359

Surface matters. Case conict in free relative constructions and Case Theory. In New Perspectives on Case Theory, Ellen Brandner, and Heike Zinsmeister (eds.), 269299. Stanford: CSLI Publications. Degraded acceptability, markedness, and the stochastic interpretation of Optimality Theory. In Gradience in Grammar: Generative Perspectives, Gisbert Fanselow, Caroline Fry, Matthias Schlesewsky, and Ralf Vogel (eds.), 246-269. Oxford: Oxford University Press.

Vogel, Ralf 2006

Vogel, Ralf, and Stefan Frisch 2003 The resolution of case conicts. A pilot study. In Experimental Studies in Linguistics I, Susann Fischer, Ruben van de Vijver, and Ralf Vogel (eds.), 91103. [Linguistics in Potsdam 21] Potsdam: Institute of Linguistics, University of Potsdam. Vogel, Ralf, and Marco Zugck 2003 Counting markedness. A corpus investigation on German free relative constructions. In Experimental Studies in Linguistics 1, Susann Fischer, Ruben van de Vijver, and Ralf Vogel (eds.), 105122. [Linguistics in Potsdam 21] Potsdam: Institute of Linguistics, University of Potsdam.

What is an exception to what? Some comments on Ralf Vogels contribution Henk van Riemsdijk

1.

Do exceptions exist?

The main message in Vogels article, if I interpret him correctly, is that exceptions as such do not exist. There is a lot of variation, among languages, among dialects or sociolects, among speakers. But in each case we should refrain from dening a norm, and thereby it is inappropriate to say that other variants are exceptions to that norm. What I will try to do in these comments is to argue that, perhaps, the notion of exception is not to be dismissed out of hand. This point of disagreement does not detract from the value of Vogels contribution. He is quite right in showing that there is much more variation at all levels than many, perhaps most, grammarians would admit, and that there are interesting links between strictly grammatical variation and statistical data. Indeed, things are always more complex than one rst suspects, but the question remains how we should deal with this complexity. I intend my comments to be largely independent of the issue of Optimality Theory (OT), but their implication may well be that OT is too exible and powerful an instrument to raise a number of fundamental questions about the sorts of phenomena that Vogel (henceforth V) discusses. My main focus will be on the question of whether there are reasons to suppose that variation is truly symmetrical, as Vogel argues. I believe that there are good reasons to doubt that that is the case. My doubts have to do, on the one hand, with the concept of markedness, and on the other with the fact that certain theoretical choices force the linguist to commit himself/herself as to what is the expected pattern and what is a deviation from that pattern.

2.

Markedness

Variation is intricately linked with the notion of markedness. And indeed, V brings up the concept of markedness. But I confess that I do not fully understand

362

Henk van Riemsdijk

what he means. Before looking at his formulation, let us clarify the notion to some extent. Among a variety of uses, I detect three major interpretations of markedness: 1. Markedness as an evaluation of the status of (introspective) data, as in I am working home being more marked than I am working at home, 1 meaning that the rst variant is less acceptable, rare, stylistically special, etc. 2. Markedness as a tool to rank grammars, that is as a tool to help the child acquiring the language to choose the optimal grammar, cf. Kean (1980), Van Riemsdijk (1978). 3. Markedness as a tool to rank values of parameters, default vs. marked. This is a kind of local or micro-variant of the interpretation in 2. Most likely, interpretation 1 is the one that V has in mind. It is indeed an interesting question to what extent the type of evaluation that violable constraints impose on syntactic structures have any bearing on or resemblance to the interpretations intended in 2 and 3. Questions that typically arise in the context of markedness considerations of type 3 do indeed come to mind quite directly when we confront the sort of data that constitutes the core empirical object of Vs article. Suppose matching is a binary parameter (undoubtedly a simplied assumption), then we should ask if the child acquiring the language in question, say German, starts out with a default hypothesis. Most plausibly, the default would be matching. Any data indicating a deviation from the matching pattern would lead to a resetting of the parameter in that childs grammar. In a more ne-grained system, certain violations of matching in the primary data would lead to a resetting in a limited sector of the relevant part of the parameter system while retaining the general matching pattern elsewhere. The opposite approach, assuming that non-matching is the default value leads immediately to the difcult question of how matching patterns found in the primary data could be properly evaluated by the child and, in the absence of negative data, lead to a resetting of the parameters. Relative rankings of constraints in an OT framework are not, as far as I am aware, linked to markedness considerations of this type. Instead, a subset of the constraints, the markedness constraints, is assumed to be ranked low, but subject to raising in the constraint hierarchy on the basis of relevant primary data. It is not clear to me, however, whether the constraints V uses to derive the matching/non-matching data and their variation are in any way linked to the OT way of dealing with markedness and acquisition.
1. The examples are taken from Collins (2008: 18), in particular example (67a) and footnote 16, when the observation is attributed to Paul Postal (p.c.) .

What is an exception to what? Some comments on Ralf Vogels contribution

363

I raise this issue, because it is a possible way of comparing the OT approach to other frameworks. 3. Free relatives as grafts

In my own recent work on free relatives (cf. Van Riemsdijk 2006a, Van Riemsdijk 2006b and references cited there) I have suggested that free relatives should be treated as grafts. By this I mean that the matrix tree and the relative clause tree are built up independently from one another and that the wh-word in the relative clause is merged into a position in the matrix. The effect of this is that the wh-word is truly a shared element and that there is no empty head position in the matrix clause that the relative clause is an adjunct to. Clearly, an analysis like this predicts that matching must be complete. The fact that deviations from matching are found has been known (within the generative literature) since the seventies (cf. Groos and Van Riemsdijk 1981, Hirschbhler 1977). Any deviation from the matching pattern therefore constitutes a serious problem for the graft analysis. It is, indeed, the position of Grosu (2003) that a graft analysis should be rejected and that free relatives (including transparent free relatives) should all be analyzed essentially like regular relative clauses. From a perspective like his, however, any matching effect comes as a real surprise, and artifacts must be introduced into the theory to account for them. This is not the place to continue this debate. But what I am trying to say here is that the choice of framework or analysis yields immediate predictions as to whether matching effects should or should not be found. In each case, a default pattern is predicted, and the non-default pattern is the exception. The approach suggested by V seems to me to be diametrically opposed to such proposals, be it Grosus or mine, in that variation is taken to be the central given, nothing is exceptional with respect to anything else. While it is perfectly possible that this is indeed the way things are, it does appear to lead to a rather unconstrained view of what is possible and what is not, a view that, without further elaboration, would seem to lead to considerable problems when we think about the acquisition process, as discussed in section 2. Pursuing my own interpretation of the matching facts, i.e. that matching is the norm, a norm that is entirely unexpected on any analysis that makes use of essentially the same structures as headed relatives, the onus of dealing with deviations from the matching pattern is on me. And I am still looking for a good solution, but I regard the fact that I am forced into such a situation as an

364

Henk van Riemsdijk

advantage rather than as a disadvantage, as it is illustrative for the inherently constrained nature of my overall approach. Note, in fact, that the issue of matching vs. non-matching in shared constituents is by no means unique to the grammar of free relatives. As another notorious example, consider Right Node Raising (RNR). Leaving open the question of whether RNR should be treated as ATB-movement, multiple dominance, backward ellipsis or something else, it is quite clear that the phenomenon is characterizable in terms of shared material at the right edge of two or more conjuncts. That morphological identity sometimes is and sometimes is not required is shown in the following examples, taken from Bokovi (2004: ex. (8)): c (1) a. b. c. d. e. f. 4. will (sleep in her ofce), and Peter denitely was, sleeping in her ofce John will (sleep in her house), and Peter already has, slept in her house John hasnt (questioned our motives), but Bill may be, questioning our motives John has (slept in her house), and Peter denitely will, sleep in her house *John is (entering the championship), but Jane wont, enter the championship *John will (be obnoxious), and Jane actually was, being obnoxious
? John

Charting the variation

V has done an outstanding job in painting a ne-grained picture of the variation found among speakers with respect to matching phenomena. The outcome of his experimental research is very interesting. Unfortunately, experimental work is inherently somewhat impeded by the fact that you can never take into account all the factors that might possibly affect the results. The one factor that he shows has signicant effects on speakers judgments is whether or not the free relative is fronted in its clause or in situ. Let me suggest two more factors that in my own judgments play a role. I mention them because both suggest that the domain in which deviation from matching is allowed has to be further limited. But before turning to these two factors, let me make another point. It is true that I am in the strictest dialect, that is, among the four examples in Vs (5), repeated here for convenience, (2) a. Ich lade ein wen ich treffe I invite [acc] who-ACC I meet

What is an exception to what? Some comments on Ralf Vogels contribution

365

Ich lade ein wem ich begegne I invite [acc] who-DAT I meet c. *Ich lade ein wer mir begegnet I invite [acc] who-NOM meets me d. *Ich helfe wen ich treffe I help [dat] who-ACC I meet I also reject (2b). However, I perceive a clear difference in acceptability between (2b) and (2c/d) in the sense that I nd (2b) signicantly less bad than (2c/d). What this suggests to me is that perhaps the differences among the various individual grammars of the subjects tested may well be smaller than suspected in that the differences might be more about the cutoff point at which individuals switch from grammatical to ungrammatical in assessing examples. You might then call those with a strict cutoff point the normative group and those with a liberal cutoff point the anti-authoritarian group.2 The rst factor that I would like to suggest can inuence judgments on matching is that of deniteness. Note indeed that, due to the choice of the present tense in the examples in (2), the interpretation that is most prominent is that of the universally quantied (free choice) free relative, that is (2a) would be taken to mean I invite whoever I meet. When we force the other, denite, interpretation, for example by switching to the past tense, I feel that the nonmatching pattern of (2b) is harder to accept. Consider (3):3 (3) a. #Ich lade ein wem auch immer ich begegne I invite [acc] whomever-DAT I meet b. ##Ich habe eingeladen wem ich gestern abend I have invited [acc] who-DAT I last night begegnet bin met have I have invited the person that I met last night

b.

2. It may well be the case that my own normative judgment is due to the fact that I grew up in the German speaking part of Switzerland, where children learn Swiss German at home and outside school, but learn Standard German at school and from the media only. 3. I use the symbol # to indicate that speakers may vary in their judgment, but the distinction between one and two #s signals relative acceptability, quite likely for all speakers.

366

Henk van Riemsdijk

Needless to say, these are just the introspective judgments of a single individual. Further experimental work would be needed to decide whether this factor does indeed systematically affect the acceptability of non-matching examples. The second factor that I submit inuences relative acceptability is the distinction between the direct object accusative as in (2b) and the prepositional accusative.4 Compare (2b) with the following examples. (4) a. ##Ich habe mich an wem ich begegnet bin I have myself to [acc] whom-DAT I met have gewandt turned I turned to whom I met b. ##Der Brgermeister hat den ganzen Abend auf the mayor has the whole evening for [acc] wem der Preis verliehen werden sollte gewartet whom-DAT the prize awarded be should waited The mayor waited the whole evening for who the prize was supposed to be conferred on Again, I believe that if we look beyond the simplest of examples the matching effect establishes itself more strongly.5 But, assuming that these factors do in4. It is true that in Vs Case Hierarchy (6) PPs are listed among the oblique cases in the lowest position, but I believe that the PP as a whole is meant and that, therefore, this statement is relevant for the matching of prepositions (cases like I talk to whom you talk vs. *I count on/to whom you talk), and not for the cases that the prepositions govern. 5. Observe, somewhat paradoxically, that there is one other construction in which the prepositional accusative seems to be less rigid than the direct object accusative: nominal appositives. Leirbukt (1978) shows that prepositional accusatives can take nonagreeing appositives in the dative, while direct object accusatives can never do that. Here are two examples of this phenomenon from Van Riemsdijk (1983: exx. (48/50)) (i) Der Verkauf des Grundstcks an den Komponisten, the sale of the land to the composer-ACC dem spteren Ehrenbrger der Stadt the later honorary citizen-DAT of the city (ii) *Ich besuchte dann Herrn Mller, I visited then Mr. Mller-ACC unserem Vertreter in Pforzheim. our representative-DAT in Pforzheim

What is an exception to what? Some comments on Ralf Vogels contribution

367

deed play a role, the tendency that we observe is that matching is the default case while non-matching is the marked variant or, if you wish, the exception. 5. Conclusion

My comments in these few pages amount to the following points. It is important to map out ne-grained patterns of variation, and Vs experimental work constitutes a ne illustration of the techniques that one may (among others) resort to in doing so. It would be wrong to underestimate the number of factors that may inuence introspective judgments. The more work that is done along such lines, the greater the risk of being overwhelmed by the impression that everything varies endlessly. But not only is there structure in the variation, there are also patterns that suggest that some facts are more representative of the underlying grammatical system than others. It would not be wrong to use the term exception for the latter category of facts. There are, indeed, as I have tried to show, both theoretical and empirical reasons to believe that variational scales are (or at least can be) asymmetrical with the options at one end being the default and the options at the other end being marked exceptions. References
Bokovi , eljko c 2004 Two notes on right node raising. In Cranberry Linguistics 2; UConn Working Papers in Linguistics Vol. 12, Miguel Rodriguez-Mondoedo and Emma Ticio (eds.), 1324. Storrs: University of Connecticut. Collins, Chris 2008 Home sweet home. NYU Working Papers in Linguistics 1: 134.

Groos, Anneke, and Henk C. van Riemsdijk 1981 Matching effects in free relatives: a parameter of core grammar. In Theory of Markedness in Generative Grammar. Proceedings of the 1979 GLOW Conference, Adriana Belletti, Luciana Brandi and Luigi Rizzi (eds.), 171-216. Pisa: Scuola Normale Superiore.

368

Henk van Riemsdijk

Grosu, Alexander 2003 A unied theory of standard and transparent free relatives. Natural Language and Linguistic Theory 21: 247331. Hirschbhler, Paul 1976 Headed and headless Free Relatives: a study in Modern French and Classical Greek. In Les contraintes sur les rgles, Philippe Barbaud (ed.), 176229. Montral: Universit du Qubec Montral. Kean, Mary-Louise 1980 The Theory of Markedness in Generative Grammar. Bloomington: Indiana University Linguistics Club. Leirbukt, Oddleif 1978 ber dativische appositionen bei akkusativischem Bezugswort im Deutschen. Linguistische Berichte 55: 117. Riemsdijk, Henk C. van 1978 A Case Study in Syntactic Markedness: the Binding Nature of Prepositional Phrases. Dordrecht: Foris Riemsdijk, Henk C. van 1983 The Case of German adjectives. In Linguistic categories: auxiliaries and related puzzles, Frank Heny and Barry Richards (eds.), 223252. Dordrecht: Reidel. Riemsdijk, Henk C. van 2006a Grafts follow from Merge. In Phases of Interpretation, Mara Frascarelli (ed.), 17-44. Berlin: Mouton de Gruyter. Riemsdijk, Henk C. van 2006b Free Relatives. In The Blackwell Companion to Syntax, Martin Everaert and Henk C. van Riemsdijk (eds.), 338382. Oxford: Blackwell.

Response to van Riemsdijk Ralf Vogel

At rst, I would like to thank Henk van Riemsdijk (henceforth HvR) for his insightful comments on my contribution. He raises more issues than space allows me to address. So I will concentrate on those I nd most important in the context of this book. 1. Are there no exceptions?

According to HvR, I make the claim that exceptions as such do not exist. This is more than I actually wanted to say. My major concern is that we have to distinguish exceptions from predictable variation within a speech community. The example that I chose is instructive here in several respects. Two different understandings of the term exception have to be kept apart. There is an empirical notion of exception that is instantiated by deviating observations. Typical examples of such cases come from morphology, like the case of regular vs. irregular inection. We have learned to deal with such cases in our grammar models, for instance in the form of rule ordering, blocking or constraint interaction. For an Optimality Theoretic approach to grammar, such observational exceptions are welcome, as they are empirical evidence for the violability of grammatical constraints, a core feature of OT grammars. Secondly, there is a notion of exception which might better be termed (potential) counterevidence, i.e. observations that plainly contradict the predictions of a particular grammar. The linguist defending that grammar has the choice to either explain away these cases and treat them as exceptions arising from a different cause, or accept them as counterexamples and modify her theory. The possibility of non-matching free relative clauses in German is, rst of all, a fact. Non-matching FRs are observed to be exceptions: usually, case is morphologically realised in German, and it cannot be substitutionally realised by some other case. A clausal subject usually bears nominative case, and it would be unacceptable if it was realised with dative case instead. But a nonmatching FR with a wh-pronoun in the dative case can nevertheless serve as grammatical subject.

370

Ralf Vogel

Pittner (2003), Vogel and Zugck (2003), Vogel (2006), Vogel, Frisch, and Zugck (2006), as well as the studies reported in my contribution repeatedly make this observation. Hence, Groos and van Riemsdijk (1981) as well as some other linguists working on German were simply wrong in their claim that nonmatching free relative clauses are impossible in German. If a grammar crucially relies on this factual error, the observation of nonmatching free relative clauses is true counterevidence that either needs to be explained away, or requires a revision of the theory in question. HvR admits that this is true and that he has no answer yet. When he additionally reports feeling comfortable with this situation, as it shows how restrictive his theory is, then it might be important to remind him that descriptive adequacy is the very least level a grammar should reach, even on a Chomskyan approach. Without empirical adequacy, higher explanatory features of a theory are inapplicable. A more interesting question, to my mind, is whether we are really forced to separate German speakers into three groups of German A, B, and C. This partition can be seen as an artefact of the elicitation method. Suppose we elicited judgements from German speakers for the following three FR clauses: (1) a. Ich besuche, wen ich nett nde. I visit who-acc [ACC] I nice nd I visit who I nd nice. matching Ich besuche, wem ich vertraue. I visit who-acc [DAT] I trust I visit whom I trust. non-matching, obeying case hierarchy Ich besuche, wer mir vertraut. I visit who-nom [NOM] me-dat [DAT] trusts I visit who trusts in me. non-matching, disobeying case hierarchy

b.

c.

German A speakers will accept all three clauses, German B speakers will reject (1c), and German C speakers will only accept (1a). But, as HvR himself also conrms, German C speakers (like him) nevertheless have the intuition that (1c) is worse than (1b). Likewise, German A and B speakers both will nd (1a) to be the best and (1c) to be the relatively worst example. Hence, if we asked for relative rather than absolute acceptability judgements, the difference between the three groups would very likely disappear. Would it then even be legitimate to speak of different variants? This is the question that I am addressing. Empirical claims about the German language are

Response to van Riemsdijk

371

claims about the speech community as a whole, and they are usually based on observations of representative samples of this speech community. We should expect that grammatical generalisations show up as statistical tendencies only, and not in an all-or-nothing fashion. A generative grammar, as standardly conceived, assigns a grammaticality value to an expression. This value is Boolean: an expression is either grammatical or ungrammatical. Optimality theory deviates from the generative tradition in two respects: one aspect is that the grammaticality of an expression E is not determined by inspecting E in isolation, but in a holistic fashion: E is grammatical, if it performs better than all possible alternatives in an evaluation based on a hierarchy of violable constraints. The constraint hierarchy contains markedness and faithfulness constraints. Markedness constraints formulate various aspects of well-formedness, while faithfulness constraints mainly determine which aspects of markedness are tolerated within a language. Thus, grammaticality is derived from markedness. This is the second aspect where OT leaves the generative tradition. Contrary to grammaticality, markedness is a relative concept, and therefore it is much better suited to construct a descriptively adequate grammar, if description includes empirical studies like those under discussion here. So how shall we understand markedness?

2.

Markedness and Optimality Theory

HvR is quite right in assuming that, under the denitions of markedness that he offers, his option (1) is the one that comes closest to the way I see it. But I am interested in exploring to what extent the OT conception of markedness correlates with this empirical conception of markedness. The OT understanding of markedness is very much inspired by the traditional use of this concept in phonology and morphology, as well as in language typology. It is not based on gradient acceptability data, but on distributional facts, along a typological and a language-internal dimension:
Typological dimension: The number of languages that admit the unmarked form is larger than the number of languages that allow for the marked form. In many cases, the languages that admit the marked form are a proper subset of the languages that admit the unmarked form. That is, if a language admits the marked form then it usually also admits the unmarked form, but not vice versa. Language-internal dimension: the number of contexts that allow for the marked form is larger than the number of contexts for the unmarked form. Very often,

372

Ralf Vogel

contexts that allow for the marked form also allow for the unmarked form. But some contexts only allow for the unmarked form.

Such typological and distributional entailments are among the core empirical phenomena that an OT model aims at reconstructing. While it is true that such typological observations mostly go hand in hand with acceptability status within a single language, this is not a necessity. The issue that I raise in my contribution is: to what extent can the typologybased conception of markedness of standard OT be related to graded acceptability as we are able to elicit it using psycholinguistic methods? Here, it is also important to note which method we chose in our experiments: graded acceptability has only been measured indirectly. The subjects had only two choices for their acceptability judgement: yes and no. Gradient acceptability shows up as statistical tendency both within individual subjects (because of inconsistent judgements for the same clause types in repeated measuring) and within the whole group of participants. In other words, two effects have been attested: a. Marked patterns are more likely to be rejected by the same subject. b. Marked patterns are more likely to be rejected by the whole group of participants. So, we still are making distributional observations, though these are observations of introspective judgements, not of the expressions themselves. 3. Learning constraint rankings and marked grammars

HvR asks for the relation of the proposed model to learning theory. He wonders how a denition of markedness as a measure to rank grammars as a whole, as in earlier generative models of language acquisition, is related to either my or the OT notion of markedness. In particular, if this measure is understood as in his denition (3), how can we establish the difference between default and marked state in OT? HvR takes the example of matching vs. non-matching in FRs as a parameter, stating that a grammar that restricts FRs to those with case matching should be the default. Does the OT grammar that I presented capture this intuition? Yes, it does. We have three constraints, RC, RCr and RO. What does this constraint system predict for cases where there is no case conict? Consider the German clause (2): (2) Der Hund bellte. the dog-nom [NOM] barked

Response to van Riemsdijk

373

The subject has nominative case. The OT competition for this structure will have candidates that differ in the case of the subject. Why do such candidates lose? Because of their constraint violations:
Table 1. OT competition for the case morphology of a German subject noun phrase [NOM] + nom acc dat * * RC RCr RO

The constraint Realise Case directly implements matching as the default. Without a case conict, there cannot be a situation where an assigned or required case does not surface, independent of how the constraints are ranked. In case of conict, so HvRs argument goes, the default should be matching, i.e. the non-acceptability of an FR. Does this follow from what I proposed? Here, I would like to refer to Table 4 of my contribution where the constraint violation proles of a matching and three non-matching FRs are compared. As one can gure out easily, the matching FR violates none of the above three constraints, while the non-matching ones have more violations. One can therefore state that a matching FR will always have a better violation prole than nonmatching ones, irrespective of the particular constraint rankings. Even more so, as the violations rise cumulatively between the candidates, their relative markedness can be determined independently of the constraint ranking, and therefore will be the same in every language. Non-matching FRs will always come out as more marked than matching FRs. A grammar that allows for nonmatching FRs can be called more marked in the sense that it allows for more marked structures. If we were to assume that the initial state of the language faculty of the language learner were the unmarked grammar, this would imply that, for the OT model, faithfulness constraints are initially ranked lower than markedness constraints. Learning the grammar of German with respect to FRs involves, for instance, learning a constraint hierarchy that excludes case attraction as in the example (3a) from Modern Greek in my contribution. The constraint rankings that refer to German A, B and C only rank faithfulness differently. Standard OT learning theory, as outlined by Tesar and Smolensky (2000), uses the constraint demotion algorithm to derive such rankings: when you observe an expression E, rerank the constraints violated by E such that each of Es competitors has a worse violation prole than E. In our example, the crucial rerankings target the

374

Ralf Vogel

position of faithfulness. From this perspective, the constraint system proposed here is as learnable as any other standard OT constraint system. One advantage of the constraint system that I propose can be seen in the fact that there is no particular case matching constraint. The constraints are very general and simple constraints on case realisation. A particular constraint ranking can be seen as the OT correlate of parameter setting in traditional generative models of language acquisition. In (Vogel 2001, 2002), I use the constraint set introduced here together with a few additional ones to derive the whole attested typology of case conict resolutions in FR constructions. I thus use a typologically motivated grammar here to predict the outcomes of empirical studies. 4. Syntactic Analysis

HvR offers his own analysis of FRs as grafts, the basic idea being that the wh-pronoun of the FR is at the same time contained in the FR and in the matrix clause, and this is the structural basis for the case conict. I would like to briey introduce results from a further experimental study that we undertook, and which is reported in detail in Vogel, Frisch and Zugck (2006). In this study we tested a case conict that occurs on a noun phrase that is the complement of two coordinated verbs at the same time, as in (3): (3) Maria mochte Maria liked b. *Maria mochte Maria liked c. *Maria half Maria helped a. und untersttzte den Arzt. and supported the doctor-ACC und half dem Arzt. and helped the doctor-ACC [DAT] und mochte dem Arzt. and liked the doctor-DAT [ACC]

The two verbs in (3a) both assign accusative to their object, and the clause is well-formed. This is an instance of case matching. In (3b) and (3c), the coordinated verbs differ in their case requirements, as helfen assigns dative to its object. No matter, which of the two cases is realised on the object noun phrase, the clause is ungrammatical. This could be conrmed in our experiment, again a speeded grammaticality judgement study, that took into account all six logical possbilities for the pattern in (3). We found no effect of case hierarchy here. It made no difference, whether dative was realised on the object NP or accusative. With the case conicts in FRs in mind, this is unexpected. How can this be explained? The route that we took, in accord with my earlier analysis of FRs (e.g., Vogel 2002), is the following: while in (3bc), the conict is syntactic, in FRs it is only morphological.

Response to van Riemsdijk

375

The object NP in (3bc) is assigned two different cases syntactically. This unavoidably leads to ill-formedness, as an NP can only be assigned one case, no matter which case this is. The wh-item of the FR is only assigned r-case. The m-case is assigned to the FR as a whole. The case conict comes about, because the wh-item is the element that has to realise morphologically the case assigned to the FR. Thus, case conict in an FR is a morphological case conict only, while the conguration in (3) leads to a truly syntactic case conict. The analysis that HvR offers for FRs creates a situation for the wh-item that is like the one of the object NP in (3). Consequently, HvR argues that matching should be the norm, just as it is for (3). But our experiments conrmed this for (3), while it didnt do so for FRs! Hence, our results do not support HvRs syntactic analysis of FRs. References
Pittner, Karin 2003 Kasuskonikte bei freien Relativstzen. Eine Korpusstudie. Deutsche Sprache 31: 193208.

Tesar, Bruce, and Paul Smolensky 2000 Learnability in Optinality Theory. Cambridge, MA: MIT Press. Vogel, Ralf 2001 Towards an optimal typology of the Free Relative Construction. In IATL8. Papers from the 16th Annual Conference and from the Research Workshop of the Israel Science Foundation The Syntax and Semantics of Relative Clause Constructions, Alex Grosu (ed.), 107 119. Tel Aviv: Tel Aviv University. Free Relative Constructions in OT syntax. In Resolving Conicts in Grammars: Optimality Theory in Syntax, Morphology, and Phonology, Gisbert Fanselow, and Caroline Fry (eds.), 119162. (Linguistische Berichte, Sonderheft 11) Hamburg: Helmut Buske Verlag. Degraded acceptability and markedness in syntax, and the stochastic interpretation of Optimality Theory. In Gradience in Grammar. Generative Perspectives, Gisbert Fanselow, Caroline Fry, Matthias Schlesewsky, and Ralf Vogel (eds.), 246269. Oxford: Oxford University Press.

Vogel, Ralf 2002

Vogel, Ralf 2006

Vogel, Ralf, Stefan Frisch, and Marco Zugck 2006 Case matching. An empirical study on the distinction between abstract case and case morphology. Linguistische Berichte 208: 357384.

376

Ralf Vogel

Vogel, Ralf, and Marco Zugck 2003 Counting markedness. A corpus investigation on German Free Relative Constructions. In Experimental Studies in Linguistics 1, Susann Fischer, Ruben van de Vijver, and Ralf Vogel (eds.), 105122. (Linguistics in Potsdam 21) Potsdam: University of Potsdam.

Describing exceptions in a formal grammar framework Frederik Fouvry

Abstract. Phenomena that a grammar does not describe cannot be analysed by a system or theory that use the grammar. It is impossible to offer clues as to what may be going on: is it an error in the input, an omission in the grammar (intended or not), an extra-grammaticality, or an exception? One family of formal frameworks that has been developed and used to write natural language grammars is Typed Feature Logic (TFL). In this paper, we propose an extension to such a formalism to ensure that there always is a minimal analysis of the input. It relaxes the constraints on the information that is associated with the input just enough to make rule applications succeed, and ranks the results based on how much had to be relaxed. Analyses for the input that the unrelaxed grammar does not describe contain the precise location of the error (in the analysis tree and in the tree nodes), as well as the set of values that are involved.

1.

Introduction

The goal of linguistic theory is to discover regularities and develop generalisations in the description of facts about natural language. Form and nature of the generalisations help us to understand how language works. Unfortunately, these generalisations are not perfect. Exceptions pose a challenge because they disrupt the structure of theoretical descriptions. The descriptions have to be revised to incorporate the exceptions if that is at all possible. When we consider the problem from the viewpoint of the theory, and not as natural language experts, we can reformulate it in a simplied form: any utterance that is acceptable to language users but not completely described by a linguistic theory is an exception (in the wider sense) to the rules of that theory. In what follows, we will present a technique that enables a grammar writer to obtain a grammatical description for exceptions of this general nature, while at the same time postponing the need for their detailed and precise treatment. In this paper therefore, exceptions, also called extra-grammaticalities, are phenomena that are not covered by the rules of the theory.

378

Frederik Fouvry

The ideas in this paper and the need for this solution originated from work in computational linguistics. While working with linguistic grammars, computational linguists typically face the problem that when the input cannot be described by the grammar, the quality of the output drops dramatically. We rst present a computational linguists view on exceptions. Then, we describe our method, and nally, some implications of this technique are discussed. 2. Exceptions in computational linguistics

Whereas in development of linguistic theory on paper, dealing with exceptions can be conveniently postponed, there is a strong need for some way of treating them in applications that have to deal with real-life texts, such as systems that use grammar implementations. Although the single frequency of exception types may be relatively low, the numbers add up, and on the token level they can be quite frequent.1 When we take into account that implemented grammars tend to break down when descriptions are incomplete, the need for a treatment is clear. Often the distinction is made between real errors and things that the system does not get right. We would like to point out that it is impossible for the kind of system we assume to distinguish between those cases. In order to be able to do so, it would need to contain at least two grammars: one for the natural language (as humans do in their capacity as speakers of a natural language) and one for the (implemented) theory. We assume that a system only has one such grammar. That is the case in all systems we know of. We furthermore assume that the utterances the system will be dealing with are grammatical or at least acceptable. Dropping the latter assumption makes any kind of tolerant processing practically impossible, because too many parameters can be varied. Exceptions have to get a place in the linguistic description in a system. What are the options for dealing with them? Much of the concrete answer to this question depends on the level in the linguistic description the phenomena belong to. Mostly the formalisms (formal frameworks that are used for theory development and implementation) determine what can be done and how it should be done. The presence of a default mechanism for instance makes a great difference in the treatment of exceptions: defaults and exceptions are each others complement, and therefore go together very well. Without defaults, other tech1. Even though pure lack of lexical coverage is the most serious problem, the incomplete grammar coverage is considerable as well (Baldwin et al. 2004).

Describing exceptions in a formal grammar framework

379

niques have to be used. An overview of how exceptions have been integrated in linguistic theory is given in Moravcsik (this volume), but without touching on formal issues. Morphological analysis components form an example of a concrete formalism. They are often realised with pure table lookup (chosen for reasons of computing efciency). In that case, anything that is not in the table is an exception; there are no other rules. This approach must not be confused with the generation of tables for this lookup. Here rules will be used, and exceptions are likely to occur. An example of this generation is described in Corbett (this volume). We will return to it later on. For more complex levels in natural language such as semantics, pragmatics or stylistics, the rules are not sufciently clear and explicit as to be able to speak of exceptions. In this paper, we limit ourselves to mainly syntax, as there are certain exceptions (although not to the extent of e.g. morphology) and because the framework we work in has been developed for that purpose. 2.1. Grammars A grammar implementation typically consists of a grammar and a parser. The grammar consists of a lexicon and a set of syntactic rules to combine the elements of the lexicon with. The parser reads the input, looks up the words in the lexicon and then tries to apply all rules to the constituents (the words and the results of earlier rule applications). In the case of a context-free grammar (cfg) and bottom-up processing, a rule application consists of combining constituents with each of the rule daughters. If the combination is successful, a new constituent has been found. The combination is the point where a constituent is found to be good or bad for this rule application. We assume as in linguistic theory that a sentence that is not described by the grammar, does not belong to the language. Many parsers also operate on this principle, and return the failure to nd a description as the result: they make a binary distinction. The grammar in the linguistic theory is assumed to be a complete model of the grammar of the speaker or of the language, and the input is assumed to be conforming to that grammar. The same holds for grammar implementations, with the additional complication that it is not possible to distinguish between phenomena that should have been in the grammar and phenomena that do not belong in the grammar. The assumption that the grammar is a complete and correct model of the grammar of the speaker is almost certainly false. It makes therefore sense to extend the processing formalism to catch cases where the assumptions may be violated. That is what we will be doing in the next section. For the remainder

380

Frederik Fouvry

LN

LI

LN

LI

LI

Figure 1. The relation between the natural language and the language of an implemented grammar. Left: the traditional relation between the languages. Right: the relation between the natural language and the language of the relaxed implemented grammar.

of this section, we discuss the existing situation and approaches to deal with exceptions in it. In Figure 1 (left), a pictorial view of the current situation is shown (i.e. with binary distinctions): the natural language LN is a not a sharply dened concept, the implementation on the other hand makes a very sharp distinction (LI ). The intersection of LN and LI is what the system can successfully and correctly deal with. Ungrammatical is what lies outside of LN . The difference of LN and the implemented grammar LI contains what is extra-grammatical (for the implementation).2 Exceptions are dened as deviations from the rules, and therefore they belong to the set of extra-grammaticalities (insofar as they have not been added as rules to the grammar, see section 2.2). The situation that we want to attain is shown in Figure 1 (right). The system grammar for LI is more permissive: the grammar is dened for LI , but the formalism can relax it so that it can describe all of LI (the coverage of the relaxed grammar). Descriptions for structures outside the language LI are also possible (in LI ), only will they not be as informative as the structures for grammatical sentences (see section 3.2.1). They are however available, which is an improvement over the current situation. It is possible to cover a much larger part of the natural language. Some constructions may of course still be missing or not applicable, but in principle the grammar can be relaxed to the point that everything is covered. The undesired coverage may (will) also increase. It will be the task of the grammar writer on the one hand, and of renement techniques on the other to try to keep this as small as possible.
2. The term extra-grammatical is sometimes also used for phenomena that lie beyond what one wants to describe in a grammar, such as layout. This is not so here.

Describing exceptions in a formal grammar framework

381

In computational linguistics, the implemented grammar always makes binary distinctions. There are however ways to make this distinction less sharp or to deal with them in some other way. Exceptions may be removed by modifying the grammar, by modifying the grammar formalism, or by adding statistical information. We discuss these in the next paragraphs. 2.2. Treating exceptions in the grammar For grammarians, treating exceptions in the grammar is the ideal solution, especially when incorporation of the exception into the grammar leads to the formulation of a better generalisation. In practice, exceptions are most often integrated into the grammar, sometimes without a linguistic improvement of the rules, i.e. without a better generalisation. This is very likely due to the fact that there are very few formal devices available to linguists. With the integration into the grammar, an exception is treated as a separate phenomenon which should be modelled in the grammar. It is not guaranteed that the relationship to the rule of which it is an exception can be retained in the description, although it is certainly desirable to make it so. A disadvantage of this way of working is that the exceptions take a large part of the grammar, relative to the other phenomena. (The severity of this is inversely proportionate to the frequency of the exception.) Together with the size of the grammar, the odds on complicated interactions as well as the maintenance effort increase. Here the notion of grammar is the traditional one: the grammar describes the language competence of the user. Exceptions that are worthwhile treating in the grammar are the very frequent ones, because the returns for the descriptive work are high. 2.3. Treating exceptions in the formalism Instead of modifying the grammar every time an exception is discovered, it is preferable to have a system that can deal with all cases as well as possible. It is however only feasible to develop such devices when the mechanisms with which the grammar is processed, are very strictly dened. The solutions are extensions to the standard, non-tolerant formalism, and are often formulated in a way such that the non-tolerant functioning of the grammar is a special case of the extension. That is only possible when it is well-dened. This solution requires fewer direct efforts from the linguist to deal with extra-grammatical phenomena. The soft-failure does not hide other (perhaps more important) problems from the view, which is a great advantage.

382

Frederik Fouvry

Some concrete possible solutions are: default rules, mal-rules, relaxation rules, and deeper modication of the formalism. We discuss these now. 2.3.1. Mal-rules A solution that conceptually occupies the middle ground between grammar and formalism changes is the introduction of mal-rules. These are rules that describe a specic error, for instance lack of subject-verb agreement, verb-second instead of verb nal positioning, and so on. Mal-rules are used often in computer-aided language learning (call), where it is important to detect the errors correctly. In such a setting, each error has its own rule. This approach is not free of problems however. Sometimes a mistake can be explained in several ways. To appreciate this point, consider the following example (James 1998: 200201): (1) a. b. having *explain my motives (i) having explained my motives (ii) having to explain my motives

There are two explanations for the error in example (1a): the language learner who produced the sentence wanted to use either the past participle (1b-i) or an innitive construction (1b-ii). This is a serious problem that we face when dealing with extra-grammaticalities. In the best case, the error can be described with one rule for all alternative descriptions, but that is unlikely to be useful in many cases. An advantage of this solution is that the explanation of the mistake is easy: mal-rules are associated with a description of the extra-grammaticalities they are intended for. That every extra-grammaticality has to be described rst is however a disadvantage. Exceptions can also be treated in this way. In that case there is not so much difference with the solution in section 2.2. The main difference is that mal-rules have a special status among the formal devices. Therefore, they can be applied in a very controlled way, typically when no analysis was found with the normal rules. They function as fallback rules. They take as much space in the grammar as other rules, but remain unused as long as they are not needed.

Describing exceptions in a formal grammar framework

383

2.4. Relaxation rules Douglas presents an extension to a unication formalism where the rules are annotated with relaxation principles (Douglas and Dale 1992; Douglas 1995). They are similar to mal-rules, except that there is a step-wise and monotonic retraction of the constraints in a rule, until a solution can be found. Suppose there is the rule in (2a) and the input in (2b). (2) a.

b.

Rule s --> np vp: <np per> = <vp per> <np num> = <vp num> *They walks

The rule rewrites S (sentence) to a sequence of a noun phrase NP and a verb phrase VP. The constraints (the part after the colon in the rule) restrict a certain feature path (e.g. <np per>) to have a certain value (e.g. 3), or to be re-entrant with another feature path, e.g. <vp per>. The input in Example (2b) cannot be analysed with the rule in (2a) because <np num> = plural and <vp num> = singular, and these values are not the same. Only when the requirement is relaxed that the number values should be uniable can the rule application succeed. That is what Douglas does: when no analysis can be found, the constraints are relaxed, but only stepwise. For each rule, the order in which they should be relaxed is indicated by a cost. First only the constraints with a lower cost are relaxed, and this is recorded as the penalty. This still requires a good deal of work from the grammar writer. (For instance, in the described approach, the linguist has to understand the interactions between the rules due to relaxations, in order to assign the right costs.) 2.5. Only changing the underlying formalism The last technique in this section is one where there is no need to change the grammar to be able to deal with extra-grammaticalities, because the formalism takes care of it. Krieger (1995) describes an approach in a typed formalism where all type combinations can be allowed for user-dened sets of types. His direct aim was to aid the grammar writer in debugging. The relaxation simply lies in the free combination of types from certain sets. (In principle, it can be done for all types, but then all rules would apply in all cases.)

384

Frederik Fouvry

Others (notable is the collection of papers in Schter and Vogel 1995) have made similar attempts, but did not exploit the use of types. They therefore only can deal with a very limited set of cases. In the following sections, we present a further solution in this category, but one with more general applicability. First however, we briey discuss statistical processing. 2.6. Statistical processing The approach that has to be mentioned, as it is very successful in delivering analyses for unrestricted text, is statistical processing. They take as their basis an existing formalism, and extend it with automatically acquired probabilities. The strict comparison of the equality of categories is replaced by a softer one, one of probabilistic preference. Statistical parsers can easily deal with extragrammaticalities and undescribed events like exceptions, and a grammar does not need to be adapted for parsing other texts and domains. There is however no notion of an exception in the statistical component of a parser. A rule application or a constituent is more or less probable or preferable, but they cannot make a distinction between grammaticality and extragrammaticality. Statistical processing also relaxes the boundaries of Figure 1 by turning them into probabilities. The picture in Figure 1 (right) applies in the same way, except that there is no distinction between LI and LI .3 3. Generalised unication

Having reviewed the existing solutions, we now present generalised unication. It modies the denition of unication such that the rules of the grammar can also describe input which does not quite t with the grammar. This is achieved by relaxing the rules, and assigning a penalty to relaxations. Both actions are based on the type hierarchy. 3.1. Setting As formal setting, we are using Carpenters Typed Feature Logic (1992). A linguistic theory that is based on Typed Feature Logic or a similar formalism, such as Head-driven Phrase Structure Grammar (hpsg) (Pollard and Sag 1994), consists of two parts: a type hierarchy (also called signature) and a grammar. The grammar contains the various rules and the lexical entries. The rules and
3. A distinction could be introduced by dening a probability threshold.

Describing exceptions in a formal grammar framework

385

the lexical entries are all feature structures. The shape of the feature structures is determined by the type hierarchy. 3.1.1. Types A type hierarchy is a nite partial order (more specically a meet semi-lattice), i.e. a set that consists of a nite number of elements (types), which are ordered with respect to each other. The ordering relation is subsumption (|=). If a type a subsumes another type b, then a is equal to or more general than b. An example of a hierarchy is shown in Figure 2.
category

verbal

nominal

gerund

determiner

adjective

verb

gerund-as-verb

noun

gerund-as-noun

Figure 2. A sample hierarchy.

The most general type is category: everything in this hierarchy is a category. The other types are more specic. If there is no relation between two types, e.g. between determiner and adjective, then they have nothing in common (except for the supertype). All types (except for the most general one) have to have at least one supertype. Gerund-as-noun for instance has more than one supertype. That means that if we nd something of which we know that it is nominal and gerund, we know it is gerund-as-noun. Each type inherits the properties of its supertypes and adds it own. Without features (see section 3.1.2), properties are only conceptual and not visible. Two types are compatible if one subsumes the other, or if they have a common subtype, for instance nominal and gerund-as-noun, or verbal and gerund. The unication ( ) of two types is the most general subtype that they have in common. For the pairs of types that we just mentioned, the unications are gerund-as-noun and gerund-as-verb respectively. If there is no such common subtype, then the unication fails.4 From a practical viewpoint, typedness helps to ensure the correctness of a grammar.
4. The type hierarchy is constructed such that, if there are common subtypes, there is always a most general one. This can be done automatically, and need not be the concern of the grammar writer.

386

Frederik Fouvry

3.1.2. Features It is also possible to dene features on types. A type noun could for instance have a feature case. All subtypes of noun will then inherit this feature. When a feature is dened, the possible values of the feature also have to be determined. It makes sense to limit it to cases that exist in the language (so that, say, a noun with case singular is impossible). To that effect, a case hierarchy is dened, and case is required to have a value that is subsumed by case. With this information, we can build a feature structure, for instance for a nominative noun (see Figure 3): noun case nominative
Figure 3.

The feature structure can alternatively be represented as a graph as in Figure 4:


CASE

noun
Figure 4.

nominative

There are no limitations on the number of features. Only every type has to carry all features that it was dened with or that it inherited. Feature structures can be nested as well. In Typed Feature Logic, all grammatical categories are feature structures, such as in Figure 5: sign orthography him noun head case [ accusative ]
Figure 5.

3.1.3. Unication In a unication-based grammar, the rules are similar to the rules in a cfg.5 With cfgs the applicability of rules is determined by an equality test between the category of the constituent and the rule daughter. With feature structures,
5. This is only true of most implementations. In theoretical hpsg for instance, the word order is assumed to be specied independently from the rules (Pollard and Sag 1987).

Describing exceptions in a formal grammar framework

387

the comparison is done by compatibility. Two feature structures are compatible when they do not contain any conicting properties. The absence of a certain property is not considered to be signicant. With typed feature structures, the type values have to be compatible as well. f f f F [a] F 1 = F 1 [b] G [b] G 1 G 1 H [c] H [c]
Figure 6.

In Figure 6, h is not present in the second unicand, but it is in the result. Unication is monotonic: it keeps all information from its input. In the second feature structure, we see a reentrancy. This means that the value of f and g is one and the same object (token-identity). The two features are only different ways to reach that feature structure (here of unspecied value). The graph representation for this feature structure is shown in Figure 7:
F

f
G
Figure 7.

A consequence is that any property that is imposed on the value of f, also holds for the value of g and vice versa. The result of the unication as it is shown in Figure 6 is correct if b is a subtype of a. If that were not the case, and a and b are not compatible, the unication would fail. Rules in unication-based grammars are normally specied such that information is shared (through a reentrancy) between the mother node (the left-hand side of the rule) and the daughters. This way, properties of the daughters can be passed up in the tree, and that can be exploited for linguistic description or for bookkeeping information. The fact that both the mother and one daughter in the rule in Figure 8 (A) are verbal (V stands for a verb), can easily be expressed by a reentrancy in between the category features, as in (6). (A) (B)
Figure 8.

VP V NP
CATEGORY LEVEL 1 phrase CATEGORY LEVEL 1verb non-phrase CATEGORY LEVEL noun phrase

388

Frederik Fouvry

Before continuing, we summarise what has been presented. The rules in our formalism are like context-free rules, except that the categories are not atomic values, but typed feature structures. To test whether a rule can be applied using a certain constituent, it is unied with one of the rule daughters. If the unication succeeds, the application can proceed with other constituents for the remaining rule daughters, until a new constituent is found. Otherwise, other constituents and other rules need to be tried out. 3.2. The approach Our goal is to be able to describe extra-grammaticalities without the necessity to extend our grammar. Therefore, we need the following: Every sentence or utterance can be described. An analysis has to contain a measure of how much the grammar had to be relaxed. Extra-grammaticalities can be distinguished from each other. The intuition behind our solution is that we keep track of all information that the grammar provides in the form of lexical entries and rules for a given input. The information in the grammar consists of the feature structures and the types that were used to create them. In grammatical sentences, the information remains intact, while in extra-grammatical sentences, some of it has to be discarded in order to comply with the grammar. One might say that the grammar rules act as a lter. 3.2.1. An analysis for every sentence In order to be able to describe every sentence, we need to make any unication possible. Then there will always be at least one tree that spans the input. The unication for every set of types need to be dened (and is thereby allowed). That is achieved by an automatic (order-preserving) extension of the type hierarchy to a lattice. Unications like determiner adjective are now possible. The notation for the result is: determiner adjective. There is a second kind of unication necessary. With the unication of the previous paragraph, rule applications will only collect all values that are encountered during the analysis. This cannot tell us how much this value deviates from the grammar, since we do not have any way of distinguishing between the rule values and the input values. Therefore, the values in rules are set up as lters for the values coming from constituents (see Figure 9). If the value in the rule is more general than the value in the rule daughter, then the latter stays as it was. For instance, a rule that species that a category

Describing exceptions in a formal grammar framework

389

Figure 9. A grammar rule acts as a lter. The grey square is the rule. The unicands represent fragments of a type hierarchy. The unication of the two only leaves the compatible information visible.

should be of type category, does not impose any restrictions. Therefore all values are possible, as in the following coordination rule (Figure 10): [ category 1 ] [ category 1 ] and [ category 1 ]
Figure 10.

Determiners and adjectives can be coordinated: the category will be determiner adjective. When however the rule value is not compatible with the constituent that is used, then the incompatible information is removed through a generalisation: [ category 1 ] [ category 1 verb ] [ category noun ]
Figure 11.

In Figure 11, determiner ts neither daughters, and therefore the type of the result would be the generalisation ( ) of determiner and for instance verb, which is category. Because the grammar rules need to be maintained, the ultimate category value of the new constituent will be verb. For every sentence there is a description, but either there is type information lost through a generalisation, or because type values are put together. 3.2.2. A measure for extra-grammaticality The disappearance of type information can be quantied by counting how much of the information that was supplied by the input, has to be discarded. The measure we shall use here is the number of steps that need to be taken in the type hierarchy to keep the feature structures compatible with the grammar. (Other measures are equally possible. Only should the mapping between the type hierarchy and the ordering on the information quantity be order-preserving. That means that a supertype should always have a smaller weight than a subtype.) |=

390

Frederik Fouvry

With a generalisation, the amount of extra-grammaticality is the number of types between the result of the generalisation and the original type. For a set of incompatible types, such as determiner adjective, several ways of counting are possible. The most conservative one is the sum of the number of steps that need to be taken to make the set compatible with the desired result. Since we do not know that value, we can take the smallest of all possible values. For determiner adjective, that is 1: regardless of whether determiner or adjective is taken out, the removed information is 1, which is the distance to the most specic common supertype, category. Additionally, we count the number of times we have seen every type value, and multiply this with the loss for every type. This expresses that values that have been provided more frequently in the input sentence should count as more, also after unication. In the sentence in (3), there is twice evidence for feminine (la and voiture), and only once for masculine (abandonn). Discarding more frequent information should be penalised more heavily. (3) La voiture tait *abandonn. The car was abandoned The car was abandoned.

For each type that is encountered, not only the type itself has to be counted but also all its supertypes, as shown in the following Figure 12:

plural1 number1
Figure 12.

singular1 number1

plural1 singular 1 number2

The value of number in the result is 2 because that is the sum for the values on number in the unicands. Plural and singular only occurred once, and therefore their occurrences do not change in the result. This is done to make sure that no information is lost unnecessarily: supertypes that are compatible with a rule value are always kept. The occurrence values are in principle independent from each other, with this restriction that the occurrence of a supertype always has to be greater than that of one of its subtypes. This reects the fact that when a type is used, also the properties that were inherited from the supertypes are used. A generalisation should not throw this away.

Describing exceptions in a formal grammar framework

391

When there is no information loss in an analysis, it means that the input could be described by the grammar, and the result looks precisely as when it would have been constructed by a non-tolerant grammar. 3.2.3. Distinction between extra-grammaticalities The requirement that extra-grammaticalities are distinguishable is needed to avoid the trivial solution that a single or a few failure types are dened to which all failures are reduced. In some cases, this may be a valid approach, but for our needs it is necessary to make the distinction between masculine feminine neuter and feminine neuter: the rst unication is clearly worse. The distinction is also useful when it comes to diagnosis or corrections. Values that were not used at all are unlikely candidates for a correction. |= 3.2.4. Reentrancies Reentrancies have a special status. They are information carriers, but at the same time, they reduce the information weight in a feature structure. This is demonstrated by the following. According to the denition of subsumption in Carpenter (1992), the feature structure in Figure 13 (A) is subsumed by the feature structure in Figure 13 (B). When we add up the total weight for the two feature structures, thereby assuming that that a has an occurrence of 1, and a weight of 3, and the top node simply 1 and 1, we get for Figure 13 (A): 1 3 + 1 1 = 4; for Figure 13 (B): 1 3 + 1 3 + 1 1 = 7. (A) (B)
Figure 13.

F G

1a 1

F a G a

This is justied because the feature structure nodes are the basis for counting the weight. When there are fewer nodes, the weight should be smaller. It does not make a difference over how many paths the nodes are accessible. As separate information carriers, reentrancies could be weighted separately, but not in all cases: only information that can be lost should be taken into account. Not all reentrancies may be lost, since that would deeply affect processing with the grammar rules. Some reentrancies, viz. the ones between mother and daughters, are fundamentally responsible for keeping the rule application procedure in good order. The other reentrancies might be counted explicitly.

|=

|=

392

Frederik Fouvry

It is unfortunately not clear how the weights should be assigned, counted and processed, especially in combination with the type weights. 3.3. A few examples Some examples will show the technique can be used in grammar writing. In Ancient Greek, the form of the verb for third person plural shows some idiosyncrasy: for the genders masculine and feminine, it has the normal plural form (e.g. for the copular eisi(n)), but for neuter, the verb form is the same as the one for third person singular (esti(n)).6 Specifying a re-entrancy between the agreement features of the verb and the subject does not work correctly (in a non-tolerant formalism): it fails for third person plural verbs with neuter subjects. There are a few alternative analyses. One works with a special form of the third person plural for neuter. Another states that neuter plural is in some sense (e.g. semantically) really felt as a singular, such as a collective noun (this is the traditional explanation). Implementations would probably tend to favour the rst solution. There is a third analysis, which is only possible in a framework such as the one we are presenting: there, the phenomenon is left out of the grammar stricto sensu. The phenomenon is only treated by a tolerant formalism (it belongs to the set union of LN and the set difference of LI with LI ). A value clash will be detected when the agreement features are unied: singular plural, and in the interpretation of the results, this specic incompatibility may be allowed, by explaining it as an exception, not an error. The extra-grammaticality can be described, whereas it was not with previous formalisms. The explanation of the analysis however is not part of the presented formalism extension, since it requires more knowledge than is present in a grammar. The situation is somewhat more complicated when it is necessary to rule out the use of neuter plural and the third person plural of the copular. In that case, all agreement checks must be treated in the same way, i.e. they are treated inside or outside of the grammar. That restriction is due to the way in which this phenomenon is analysed here: reentrancies cannot be considered as exceptions, and therefore the fact that there is an exceptional non-reentrancy cannot be expressed. In that case, other means have to be chosen to describe the phenomenon, e.g. where the reentrancies are not the direct reection of the linguistic description. We do not pursue this specic problem further here, as it would take us too far.
6. An agreement example may be misleadingly simple, many exceptions are more complex. However, in a unication-based grammar framework, everything is described as an agreement phenomenon: the values have to unify.

Describing exceptions in a formal grammar framework

393

Another example is the use of transitive verbs as intransitives in English, such as to eat. Normally, the grammar requires that transitive verbs should have a direct object. When there is none, as in Is he eating again?, the verb frame is incomplete. When that has not been treated in the grammar, this sentence would be impossible to describe. The reason for this defect cannot be given by this grammar, since there is no means of doing so, except by having an (external) expert manually perusing the analysis. With tolerant parsing, an analysis can be given, and it states that a direct object was expected by the verb, but not given in the sentence. Concretely, it looks like the following: in the rule that combines subject and verb, the verb is required to have an empty complements list (all objects have been found). The verb from the input on the other hand has still an object slot open. These two values, an empty list and a singleton list, clash. This example also shows that this technique does not make a distinction between errors and exceptions. There also exist verbs that cannot be used intransitively (e.g. to take, to lift). When such a verb is used incorrectly, the same description will be found: a direct object was expected by the verb, but not given in the sentence. It is important to use a description of the problem, because in the system there is not enough information to distinguish between errors or mistakes on the one hand and grammatical but unusual behaviour on the other. That the formalism can return a description at all that relates to the grammar, is novel. Further steps, such a correcting the sentence or the grammar, and making a diagnosis are not within the scope of this paper. We can also take an example from morphology, e.g. from Corbett (this volume) the inection of the Slovene clovek. The exception in this paradigm con sists of two factors: suppletion and syncretism. In this case, the (somewhat unusual) goal is to generate the table describing the inection, and not to analyse input containing the word. The exception lies in the relation between the different word forms in the paradigm (knowledge of which is usually not available in a grammar). The linguistic analysis where the phenomenon is integrated into the grammar describes the combination possibilities of the stem clovek with the dif ferent case endings: it combines with singular and dual nominative, accusative, dative or instrumental endings. The other stem, ljud, only combines with plural and dual genitive and locative endings. When the stem is combined with an ending, then their compatibility is checked and if their requirements unify, the word can be formed. This is the situation where the exception has been incorporated into the grammar. When on the other hand one of the forms has not been entered in the grammar, the default case will occur, i.e. the same stem is used for all forms. (We do not take into account any phonological or orthographic changes.) If for instance only clovek is known for the entire paradigm, then there

394

Frederik Fouvry

are two different scenarios. In the rst, clovek creates too many forms (e.g. also * loveki for genitive dual and plural). That is overgeneration, and it cannot be c automatically detected. Undergeneration however can. In the second case, ljudi is found in the table and cannot be assigned a proper analysis, because it is not known. This can be detected, and a possible analysis proposed based on the endings. It is not possible to automatically relate the two stems as belonging to one paradigm since that is an inductive step. The rule that suppletion violates was expressed above by requiring that all forms should use the same stem. As for the syncretism, we can describe it as a rule itself (genitive dual and plural have to be the same form), or as an exception, in which case the rule states that the forms have to be different. We have described the exceptions by creating rules for the unexceptional case, here relations of stems within the paradigm, and where exceptions occur, we nd them as value clashes. Higher-order exceptions are treated by providing one or more rules for each dimension. The interaction of the dimensions is visible in the different explanations that can be given for the exception (the genitive dual and plural peculiarity is either a violation of the rule that the stems have to be identical, or to the rule that the forms have to be different). 4. Discussion

In the present section, we discuss some interesting properties of the proposed formalism. 4.1. Relation to the original formalism The proposed changes only extend Typed Feature Logic. The original results of the grammar remain precisely the same, with the difference that many failures are now turned into a linguistic analysis. The most important property of unication is monotonicity. This means that information that has been added to the system does not disappear. Do the proposed changes violate monotonicity, since information is removed and sometimes even changed (see Figure 11)? In the extended formalism, the original coverage of the grammar is not touched, hence monotonicity is maintained. Input that cannot be analysed by the grammar was rejected anyway. (Strictly speaking, that also breaks monotonicity: the sentence is thrown away.) In these cases, and only in these cases, the value accumulation is not monotonic (in the discussion of the case depicted in Figure 11), determiner becomes verb). The information weight however always monotonically decreases in the case of both grammatical and extra-grammatical input.

Describing exceptions in a formal grammar framework

395

We have not investigated the relation to formalisms that use defaults (Bouma 1992; Briscoe, Copestake and de Paiva 1993; Lascarides et al. 1996). Untyped formalisms such as the one used by Lexical-Functional Grammar (lfg) (Bresnan 1982; Dalrymple et al. 1995) can be treated as well, as an untyped system is essentially a typed one with a very at hierarchy. A nice consequence of the fact that the formalism is only an extension, is that there is no need to change existing grammars that have been written for the original formalism. 4.2. Philosophical status of the grammar With the proposed formalism, extra-grammatical input is within the reach of linguistic description, without being in the grammar. We believe this is novel. Without a (correct) target structure, description of ill-formed input was previously impossible. Now alternatives can be computed on the basis of the input (and the grammar). This has interesting consequences for the status of a grammar. Traditionally a grammar should describe everything that is inside of the language. Now, the denition of grammar can become more exible. Linguists may decide for themselves what status they assign to the tolerant module in the grammar formalism. The two extremes of the spectrum are: the grammar is a minimal well-dened core, and all deviations should be taken care of by tolerant processing the tolerance module is just a fall-back for cases that were missed in the grammar Not a great deal needs to be done to realise this choice: the decision on what should be put into the grammar is the only difference. The rst extreme position is realised when no more grammar work takes place, the second one when every newly discovered extra-grammaticality is integrated into it (the traditional grammar implementation work). The best practical choice probably lies somewhere in the middle. This approach explicitly takes into account the expectation of an incomplete grammar. It does not deal with extra-grammaticalities to the end (see section 4.4), but it makes clear what the grammar would do with it in a consistent and systematic way. The proposed model makes no principled distinction between exceptions and errors. It is up to the grammar writer to decide on their status. An error lies outside the grammar, an exception lies inside it. After deciding where a phenomenon lies, its place needs to be explicitly dened in the grammar. When an error is more regular than the exception, then it will not be found if it is

396

Frederik Fouvry

in the grammar (detection of overgeneration within the system is impossible), but the exception will be detected as a problematic case, which may point the grammar writer to the overgeneration. If the exception is in the grammar, the overgeneration may still exist, in which case it is harder to detect. 4.3. Implications of the use of the module Because the extension has taken place in the formalism, not the entire grammar search space needs to be checked. There is no hard need for a preliminary description of extra-grammatical phenomena, since descriptions are generated anyway. The results of the analyses remain to be interpreted, and if necessary, converted back to a normal description. How this is precisely done depends on the context in which the system is used (see section 4.4.) The interpretation of descriptions containing a problem is very exible, but complex at the same time. The strength of the approach is at the same time also its weakness: it exploits the properties of the formalism to provide a treatment for exceptions. Phenomena that cannot be described with this formalism fall outside of the scope of this solution. We believe however that the technique is generally applicable to other formalisms that are based on a partial order. 4.4. Diagnosis and correction Obtaining an analysis that has dealt with extra-grammatical phenomena, is not the nal goal. At that point, the system has only made the distinction between grammatical and extra-grammatical input. The analysis still needs to be interpreted, to see what the extra-grammaticality precisely consists of, in order to decide whether the input contained an error or an exception. If it is the former, there is no guarantee that it will be the correct or wanted solution. If it is a real exception, nothing more needs to be done. The construction of the right description is the task of an interpretation module. On the basis of the obtained solutions, the interpretation module has to nd what the problem really is. It is possible to tune the grammar and the type weights in order to obtain a differently ranked set of results. What the best strategy is to do this, remains to be investigated. Weights could for instance be automatically learnt on the basis of a manually re-ranked test corpus (similar to the work in the Redwoods project (Oepen et al. 2002)). 4.5. Moravcsik Where does this paper t in the classication of Moravcsik (this volume)? She discusses how exceptions are dealt with in linguistic descriptions. We have de-

Describing exceptions in a formal grammar framework

397

veloped a technique to detect rule deviations, and to describe precisely where the problem lies (from the viewpoint of the implemented grammar). Whether it is an exception and how it should be treated, is left to the grammar writer. 5. Conclusion

We have presented an extension of Typed Feature Logic to deal with extragrammatical phenomena, such as exceptions. Through an enlarged type hierarchy, it is made sure that there is at least one analysis. The set of analyses is ranked according to ungrammaticality, which is dened as deviance from the grammar (in terms of type distance and lost information). Some examples have shown that it can deal with more than only agreement extra-grammaticalities. Contrary to standard formalisms, our proposal makes a qualied distinction between alternatives. This gives the linguist, both computational and theoretical (or also the system that processes the output from the grammar) the option of having a closer look at extra-grammaticalities. The relaxation has been obtained not by changing the grammar, but by modifying the formalism, and exploiting that fact that types provide a natural carrier for graceful degradation. Acknowledgements This research was funded by a University of Essex Studentship and by the German Research Fund DFG (through the projects SFB 340 B 4/B 8 and SFB 378 B 4/MI 1 perform). Thanks for helpful comments go to the two anonymous reviewers, the editors, Doug Arnold, Pter Dienes, and the audiences of the talks I gave on this topic. References
Baldwin, Timothy, Emily M. Bender, Dan Flickinger, Ara Kim, and Stephan Oepen 2004 Road-testing the English Resource Grammar over the British National Corpus. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004), 204750. Bouma, Gosse 1992 Feature structures and nonmonotonicity. Computational Linguistics 18 (2): 183204.

Bresnan, Joan (ed.) 1982 The Mental Representation of Grammatical Relations. (Series on Cognitive Theory and Mental Representation) Cambridge, Massachusetts/London, England: MIT Press

398

Frederik Fouvry

Briscoe, Ted, Ann Copestake, and Valeria de Paiva (eds.) 1993 Inheritance, Defaults and the Lexicon. (Studies in Natural Language Processing) Cambridge: Cambridge University Press. Carpenter, Bob 1992 The Logic of Typed Feature Structures: With Applications to Unication Grammars, Logic Programs and Constraint Resolution. (Cambridge Tracts in Theoretical Computer Science 32) Cambridge: Cambridge University Press.

Dalrymple, Mary, Ronald M. Kaplan, John T. Maxwell III, and Annie Zaenen (eds.) 1995 Formal Issues in Lexical-Functional Grammar. (CSLI Lecture Notes 47) Stanford: CSLI Publications. Douglas, Shona 1995 Robust PATR for error detection and correction. In Nonclassical feature systems, Andreas Schter, and Carl Vogel (eds.), 139156. (Edinburgh Working Papers in Cognitive Science 10) Edinburgh: Centre for Cognitive Science, University of Edinburgh.

Douglas, Shona, and Robert Dale 1992 Towards robust PATR. In Proceedings of the Fifteenth International Conference on Computational Linguistics. Nantes, 2328 August 1992. Vol. 2, Hans Karlgren (ed.), 468474. International Committee on Computational Linguistics. James, Carl 1998 Errors in Language Learning and Use: Exploring Error Analysis. Applied Linguistics and Language Study. London/New York: Longman.

Krieger, Hans-Ulrich 1995 TDL: A type description language for constraint-based grammars. Foundations, implementation, and applications. Ph. D. diss., Department of Computational Linguistics and Phonetics, Saarland University, Saarbrcken, Germany. [Published in 1998 as Vol. 2 of the Saarbrcken Dissertations in Computational Linguistics and Language Technology.] Lascarides, Alex, Ted Briscoe, Nicholas Asher and Ann Copestake 1996 Order independent and persistent typed default unication. Linguistics and Philosophy 19 (1): 190. [Revised version of acquilex II WP 34 (August 1994/March 1995). Also chapter 3 in Schter and Vogel (1995: 61136).] Oepen, Stephan, Kristina Toutanova, Stuart Shieber, Christopher Manning, Dan Flickinger and Thorsten Brants 2002 The LinGO Redwoods Treebank: Motivation and preliminary applications. In Proceedings of the 19th International Conference on Com-

Describing exceptions in a formal grammar framework

399

putational Linguistics. Taipei, Taiwan: International Committee on Computational Linguistics. Pollard, Carl, and Ivan A. Sag 1987 Information-Based Syntax and Semantics. Vol. 1: Fundamentals. CSLI Lecture Notes. Stanford: CSLI Publications. Pollard, Carl, and Ivan A. Sag 1994 Head-Driven Phrase Structure Grammar. (Studies in Contemporary Linguistics) Chicago/London/Stanford: University of Chicago Press. Schter, Andreas, and Carl Vogel (eds.) 1995 Nonclassical Feature Systems. (Edinburgh Working Papers in Cognitive Science 10) Centre for Cognitive Science, University of Edinburgh.

Explanation and constraint relaxation Pius ten Hacken

1.

The special position of computational linguistics

Fouvry discusses the treatment of exceptions in the context of language processing. This raises interesting issues because this perspective combines two rather different conceptions of science. On the one hand, language processing can be studied as an empirical phenomenon taking place in humans. In this approach, linguistics is an empirical science aiming to describe part of the observable world and explain it in terms of its underlying system. On the other hand, language processing can be seen as a task to be formalized in the sense that a computer can perform it. As I argued elsewhere, ten Hacken (2001, 2007a), computational linguistics is an applied science. This means that it is not concerned with describing and explaining the world, but with solving practical problems and explaining why the solution works. In the case of language processing, the empirical science approach asks for the formulation of a hypothesis about the human parser, whereas the applied science approach requires the specication of a working computer program. Both approaches are valid from a scientic point of view, but it is important to distinguish them, because an optimal solution for the one is typically not adequate for the other. An explanation of human language processing does not have to be the basis for a computer program. Conversely, a running computer program does not have to explain anything about how humans process language. In fact, a computer program mimicking human language processing is likely to combine the worst of both worlds. There is no reason to expect that computers as we know them today have the same limitations as human brains. It is important to keep this in mind when we consider the recognition problem. In the shape it is studied by Fouvry, this problem concerns the matching of written input with the structures generated by a formal grammar as programmed on a computer. The computational perspective also opens up an ambiguity in the use of the term exception. Apart from the familiar linguistic sense of irregularity,

402

Pius ten Hacken

this term is also used in computer science in the sense of undescribed event. The latter means that the machine enters a state that is not foreseen in the programme. If nothing special has been done about it, an exception in this sense leads to a crash. 2. Computational linguistics, competence, and exceptions

Fouvrys Figure 1 gives a good intuitive representation of the problem of grammar writing. In the light of the preceding discussion of the status of computational linguistics, however, it is useful to distinguish two interpretations of this gure. In the context of linguistics as an empirical science, the grammar is a theory of the speakers competence. The competence is not directly observable and the techniques for collecting relevant data contribute to the vagueness at the borders of the object. The grammar need not be a fully formalized system. It can leave open more than one option where not enough data are available to choose one.1 In the context of computational linguistics (CL) as an applied science, the nature of the objects represented in Figure 1 is entirely different. The solid object with vague boundaries represents the potential input to the processing system. The clear object with precise boundaries is the coverage of the formal grammar as implemented in the processing system. Therefore, in the CL perspective we have a partial match between two sets of sentences. The vagueness of the boundaries of the input set should be interpreted as a result of the uncertainty on which sentences may actually appear in the input. It might be tempting to collapse the two interpretations into one, taking competence as the underlying source of the input. In trying to do so, however, a number of problems emerge. One category of problems concerns the relation between the sentences found in the input and the competence. Another category concerns the type of entity the competence belongs to. The rst category relates to the distinction between competence and performance. As explained by Chomsky (1965), the input sentences of a processing system belong to performance, which does not reect competence directly.
1. Of course, the grammar should not depend on undened intuitive notions as in some traditional approaches to grammar. The approach to grammar intended here, and described in more detail in ten Hacken (2006, 2007b), corresponds to the one adopted in a wide variety of approaches of generative grammar, although it excludes strictly formalized approaches such as Generalized Phrase Structure Grammar, cf. Gazdar et al. (1985).

Explanation and constraint relaxation

403

There are at least three factors that lead to discrepancies. First, other types of knowledge are involved in constraining and selecting what sentences are produced, e.g. pragmatic knowledge. Secondly, competence is used creatively and according to free will. This can also lead to less than fully grammatical sentences used for a particular effect. Finally, on the track from the mental production of a sentence to its realization, distracting factors may lead to errors. The second category of problems is more fundamental, because it relates to the nature of competence itself. In CL, competence is mapped to a set of grammatical sentences. This step is necessary for a match with the language produced by the formal grammar, which also constitutes a set of sentences. However, as elaborated by Chomsky (1980), it raises a number of problems. First, grammaticality is not a binary property. Although there are clear cases of grammatical and ungrammatical sentences, there are also many shades in between. Arguably, this can be represented by the shades in Figure 1. However, there is another problem. Competence is strictly individual. As Chomsky argues repeatedly, competence does not correspond to the notion of a named language, e.g. English, but to the state of an individuals mind/brain.2 No two individuals can share a competence, because no two individuals share a mind/brain. They can only be similar. The problems this causes are compounded by the interaction of competence with other components of knowledge and distracting factors. This analysis of competence leads to three types of exception, which seem to be collapsed in Fouvrys discussion: idiosyncrasies in the central concept of competence, e.g. the fact that the plural of ox is oxen rather than *oxes; cases of more or less marginal grammaticality resulting from the inherent grammaticality cline in competence, the use of pragmatic and creative resources in interaction with competence, and the degree of individual variation involved; errors occurring in the realization. The rst category, idiosyncrasies, is the more traditional linguistic notion of exception. The other two categories correspond to the undescribed events in computer science. The difference between them is that in the former case, marginal
2. In HPSG, Pollard and Sag (1994) try to avoid committing themselves to either a mentalist or a non-mentalist position, without however formulating a convincing alternative view of the nature of language. They use the notion of shared knowledge, but it remains unclear how this notion should be interpreted in the context of divergences between individual and/or groups of them.

404

Pius ten Hacken

cases, a human reader (for text) will be able to explain the event in terms of various factors intended to interact with competence, whereas in the latter case, errors, the human reader is unable to do so.

3.

Strategies for exceptions in computational linguistics

Fouvry proposes to treat exceptions by the relaxation of constraints imposed by the grammar. He also considers encoding exceptions in the grammar directly. In his section 4.2 he presents these two strategies as two extremes and claims that The best practical choice probably lies somewhere in the middle. This constitutes a rather one-dimensional approach to the treatment of the types of exceptions analysed above as idiosyncrasies, marginal cases, and errors. In fact, a more balanced view takes into account the specic properties of these types. In the case of idiosyncrasies such as oxen, the exceptions are actually part of the well-dened core to be described by the grammar. There is no good reason to treat such exceptions in any other way. If nothing is stated about the plural of ox, the correct form will be rejected and the form oxes will wrongly be accepted. Constraint relaxation is a very inefcient way to counter this. First, oxes will still be accepted and indeed preferred as the plural form. Second, oxen will achieve the same status as a number of other forms that are neither regular nor correct. Which other forms have this status depends on the details of the statements of the pluralization rules. In the simplest case, the nal -n will be treated as a typo, so that oxen has the same status as oxec, oxej and oxel. Surely, this is far from ideal. The most widespread approach to exceptions in the grammar is the adoption of a default mechanism. An example is DATR, as applied, for instance, by Cahill & Gazdar (1999). Although listed in the introduction to section 2.3, defaults are not discussed as an alternative to constraint relaxation. The example of the Ancient Greek number agreement would be much better served in an approach in which the clearly delimited exception is stated as a subregularity than by treating it on a par with other cases in which subject and verb do not agree in number. Marginal cases may be a better area for the application of the constraint relaxation technique. A more principled approach would of course try to disentangle the different inuences that determine the judgement that an expression is less than fully grammatical. However, this would involve an analysis of these factors which may be beyond what we can expect in fully formalized computational system. The question remains how efcient constraint relaxation is for mimicking the types of divergence from full grammaticality that occur in actual

Explanation and constraint relaxation

405

text. While constraint relaxation guarantees robustness, much of the intended content is likely to be missed in the analysis. Fouvrys primary examples used to illustrate constraint relaxation clearly fall into the category of errors. While examples such as his (2b) and (8) illustrate the technique quite clearly, no evidence is given that they represent a signicant proportion of actually found errors. The alternative treatment discussed by Fouvry is so-called malrules. In a sense, malrules and constraint relaxation approach the problem from opposite sides. Malrules look for particular, wellspecied errors. They are based on systematic error analysis and geared towards feedback about the errors. Constraint relaxation is very unspecic in what it looks for. Its main purpose is not to let errors in the input produce a crash. It might have been fairer to compare constraint relaxation with chunk parsing as proposed by Abney (1996). In chunk parsing, analyses for parts of a sentence are returned when no full analysis can be produced.

4.

The purpose of constraint relaxation

If Fouvry had presented constraint relaxation as nothing more than a general technique for increasing robustness, I would probably not have objected to it. What I nd problematic in his article, however, is his suggestions in several places that constraint relaxation is more than an ad hoc mechanism to avoid crashes. In the rst few sentences of his article, Fouvry presents the problem to be dealt with as to offer clues as to what may be going on if the input is not recognized by the grammar. As far as I can see, if there is anything constraint relaxation does not do, it is offering any such clues. What it can do is to state which constraints were relaxed and in what ways. This is not sufcient to know whether the input contains an idiosyncrasy not covered by the grammar, a marginal case of creative use of language, or an error. A more specic description of Fouvrys goal is given at the start of section 3.2: to describe extra-grammaticalities without the necessity to extend the grammar. This can be a legitimate goal in a context in which a grammar is given and robustness has to be increased for practical reasons. It is not part of linguistics as an empirical science, because it imposes an arbitrary restriction and is not geared towards general explanations. Moreover, as Fouvry notes correctly in section 4.2, his model does not make a principled distinction between exceptions and errors. In any competence-based approach to linguistics, this distinction is essential, because exceptions belong to competence and errors do not.

406

Pius ten Hacken

It may be possible to increase the interest of Fouvrys proposal by limiting its scope to computational linguistics as an applied science. However, as I argue in ten Hacken (2001, 2007a), in order to be an applied science as opposed to mere technology, computational linguistics has to provide explanations of why and to what extent the solution to a particular practical problem works. The rst step is then to specify the problem. A practical problem is more like how to give useful feedback to language learners in CALL, as discussed in Fouvrys section 3.2 than like how to increase robustness without changing the grammar. The latter is far too dependent on theory-internal constraints for an explanatory account to be of general interest. References
Abney, Steven 1996 Partial parsing via nite-state cascades, Natural Language Engineering 2: 337344.

Cahill, Lynne, and Gerald Gazdar 1999 German noun inection, Journal of Linguistics 35: 142. Chomsky, Noam 1965 Aspects of the Theory of Syntax. Cambridge, MA: MIT Press. Chomsky, Noam 1980 Rules and Representations. New York: Columbia University Press. Gazdar, Gerald, Ewan Klein, Geoffrey Pullum, and Ivan Sag 1985 Generalized Phrase Structure Grammar. Oxford: Blackwell. ten Hacken, Pius 2001 Revolution in computational linguistics: Towards a genuinely applied science. In Computational Linguistics in the Netherlands 2000: Selected Papers from the Eleventh CLIN Meeting, Walter Daelemans, Khalil Simaan, Jorn Veenstra, and Jakub Zavrel (eds.), 6072. Amsterdam: Rodopi. ten Hacken, Pius 2006 Formalism/formalist linguistics. In Encyclopedia of Language and Linguistics,2nd ed., Keith Brown (ed.), Vol. 4, 558564. Oxford: Elsevier. ten Hacken, Pius 2007a Computational linguistics as an applied science. In Computation, Information, Cognition: The nexus and the liminal, Gordana Dodig Crnkovic and Susan Stuart (eds.), 260-269. Cambridge: Cambridge Scholars Press.

Explanation and constraint relaxation ten Hacken, Pius 2007b Chomskyan Linguistics and its Competitors. London: Equinox.

407

Pollard, Carl, and Ivan A. Sag 1994 Head-Driven Phrase Structure Grammar. Chicago and Stanford, CA: University of Chicago Press and Center for the Study of Language and Information.

Unexpected loci for exceptions: languages and language families

Quantitative explorations of the worldwide distribution of rare characteristics, or: the exceptionality of northwestern European languages Michael Cysouw

Abstract. In this article, the distribution of rare features among the worlds languages is investigated based on the data from the World Atlas of Language Structures (Haspelmath et al. 2005). A Rarity Index for a language is dened, resulting in a listing of the worlds languages by mean rarity. Further, a Group Rarity Index is dened to be able to measure average rarity of genealogical or areal groups. One of the most exceptional geographical areas turns out to be northwestern Europe. A closer investigation of the characteristics that make this area exceptional concludes this article.

1.

Introduction

From a cross-linguistic perspective, the notion of exceptionality is intricately intertwined with assumptions about (ab)normality. A language showing an exceptional characteristic is much too often just a language that differs from the few normal European national standard languages widely investigated in current linguistics. Unfortunately, from a worldwide perspective it is these European national standard languages that often turn out to be atypical as will be shown later on in this article. Instead of assuming knowledge about what is normal or exceptional for a human language, I will investigate exceptionality empirically by taking account of the worldwide linguistic diversity. One way to empirically approach the notion of exceptionality is to replace it with the notion of rarity. Strictly speaking, exceptionality is a more encompassing concept than rarity. However, rarity is much easier to operationalise when dealing with large amounts of data. In this article, a trait will be considered ex* I thank Bernard Comrie, the editors of the present volume, and one anonymous reviewer for their comments and input on the basis of an earlier version of this paper.

412

Michael Cysouw

ceptional when it is rare with regard to the known worldwide diversity. Such an approach can only be taken given a large amount of data about the worlds linguistic diversity. Such a database has recently become available in the form of the World Atlas of Language Structures (WALS, Haspelmath et al. 2005), and I will gratefully draw on this enormous dataset for the present investigation of rarity among the worlds languages. This paper is organised as follows. First, in Section 2, I will introduce the World Atlas of Language Structures from which the typological data are drawn that form the basis for my calculations of rarity. In the following Section 3, the quantitative approach to compute rarity from typological data is explained. Section 4 then looks at the overall rarity for individual languages, claiming the South American language Wari to be one of the languages with the highest index level of rare characteristics. In Section 5, the calculation of rarity is extended to encompass groups of languages, and this calculation is applied to genealogical families. The Kartvelian and Northwest Caucasian language families turn out to be the families with the highest index level of rare characteristics. In Section 6, the calculation of group rarity is used to investigate areal centres of high rarity. Various geographical areas with a high level of rarity are identied. Most fascinatingly, northwestern Europe ends up on top as the linguistically rarest geographical area in the world. Section 7 investigates the exceptionality of northwestern Europe more closely, identifying twelve features that make this area so unusual from a worldwide perspective. These characteristics are all linguistically independent from each other, indicating that the exceptionally high level of rarity is probably a historical coincidence, possible enlarged by a structural bias of European scholarly tradition in linguistics. 2. Using the World Atlas of Language Structures

The World Atlas of Language Structures (WALS, Haspelmath et al. 2005) is a large database of structural (phonological, grammatical, and lexical) properties of languages gathered from descriptive materials (such as reference grammars) by a team of more than 40 authors, many of them the leading authorities on the subject.1 It is published as a printed book in traditional atlas format, but
1. The WALS is an exceptionally large collaborative project, involving many different authors. Because I have been using the complete data as supplied by WALS for the calculations of the rarity indices, I take this opportunity to thank the editors and all the authors for making this kind of research possible (in alphabetical order): Andreas Ammann, Matthew Baerman, Dik Bakker, Balthasar Bickel, Cecil H. Brown, Dunstan Brown, Bernard Comrie, Greville G. Corbett, Sonia Cristofaro, Michael

On the distribution of rare characteristics

413

also accompanied by a fully searchable electronic version of the database. The atlas consists of 142 maps with accompanying texts on diverse features of human language (such as vowel inventory size, noun-genitive order, passive constructions, and hand/arm polysemy), each of which is the responsibility of a single author (or team of authors). Each map shows between 120 and 1,370 languages. Altogether more than 2,600 languages are shown on the maps, and more than 55,000 dots give information on structural characteristics of these languages.2 In informal discussion, some doubts have been uttered as to the reliability of the data in the WALS. The reason for these doubts is that most data points have been coded by typologists on the basis of extant descriptive material, and not by specialist of the languages in question. As a test case, Wlchli (2005) checked the 119 coding points for Latvian and found these WALS-data to be reasonably well representative of the language. Latvian is a hard case for reliability, because the editors urged all authors to include this language in their map (Latvian is one of the so-called basic 100-language sample). Further, Latvian is a well-known and well-described language, but the problem for typologists is that there is no central reference work to check for any information on this language. This led to a few errors in WALS, because authors sometimes based their judgements on sources that were not the best for their particular question. Wlchli (2005) notes ve errors (= 4.2%) in which it is understandable from

Cysouw, sten Dahl, Michael Daniel, Ferdinand de Haan, Holger Diessel, Nina Dobrushina, Matthew S. Dryer, Orin D. Gensler, David Gil, Rob Goedemans, Valentin Goussev, Martin Haspelmath, Johannes Helmbrecht, Oliver A. Iggesen, Paul Kay, Ekkehard Knig, Maria Koptjevskaja-Tamm, Tania Kuteva, Ludo Lejeune, Ian Maddieson, Luisa Maf, Elena Maslova, Matti Miestamo, Edith Moravcsik, Vladimir P. Nedjalkov, Johanna Nichols, Umarani Pappuswamy, David Peterson, Maria Polinsky, Carl Rubino, Peter Siemund, Anna Siewierska, Jae Jung Song, Leon Stassen, Thomas Stolz, Cornelia Stroh, Stephan Tpper, Aina Urdze, Johan van der Auwera, Harry van der Hulst, Viveka Velupillai, Ljuba N. Veselinova and Ulrike Zeshan. Further, I would like to thank Hans-Jrg Bibiko for supplying the WALS Interactive Reference Tool, with which the maps in this paper are made. 2. Note that with about 142 features and 2,600 languages, there should be as many as 369,000 datapoints. With the actually available 55,000 datapoints only about 15 % of the data matrix is lled. For many statistical approaches this low coverage is a problem, and only carefully selected parts of the data can normally be used. In the approach presented in this paper, I will attempt to use the complete data, notwithstanding the many missing values. However, special statistical corrections, as described in Section 3, are needed to work around the problem of missing values.

414

Michael Cysouw

the sources used that a linguist might be led to the wrong conclusions. Further, Wlchli found two errors in the WALS that appear to be practical mistakes (= 1.7%). From all information supplied by the authors (e.g. from the examples included), it is clear that the author knew the right coding. However, by some unidentiable problem in the long chain of preparations, starting with the collection of the data up to the nal publication of the atlas, somewhere an error arose. In a large-scale enterprise like WALS, it is impossible to avoid such practical errors completely. The low number of practical errors for Latvian even argues for the high reliability standard of WALS.3 3. Computing a rarity index

The principal idea of the present investigation is to use this enormous WALSdatabase for holistic typology. In WALS, there are features coded from all areas of linguistic structure, so it is possible to look for correlations between widely different aspects of linguistic structure. For the present analysis, I will not look at the content of the features, but only consider their relative ubiquity. Are there languages, families or areas that have more rare characteristics than others? To investigate this question, I devised a rarity index a calculation to estimate the relative ubiquity of characteristics of a language, as measured by the data in WALS. The basic idea behind the rarity index is to compute the chance of occurrence for all characteristics of a particular language, and then take the mean over all these chances of occurrence. In essence, this results in an average rarity for a language. However, there are various confounding factors mediating between chance and rarity, which make it necessary to introduce a few extra steps in the evaluation of the chances of occurrence. Before I explain these confounding factors and the resolution used, let me rst introduce some WALS-terminology. The data in WALS is organised into features and values. A feature is a parameter of linguistic variation, shown as a double-paged map in the printed atlas (e.g. the rst map depicts the size of the consonant inventory, Maddieson 2005a). Within each feature, each language has a value. A value is the characterisation of the language for the
3. The data as brought together in WALS is beyond doubt the largest and best organised survey of structural linguistic characteristics of the worlds languages. However, there are various problems with the coding structure of the data that make it difcult to use the data for large-scale quantitative investigations without recoding them (cf. Cysouw et al. 2005). In this paper, I disregarded these problems and took the data as supplied in the atlas without doing any recoding.

On the distribution of rare characteristics

415

feature in question (e.g. in the rst map on consonant inventories, English with 24 consonants has the value average, dened as the range between 19 and 25 consonants). As a rst approach to a rarity index, the rarity of a value might be formalised by simply taking the chance occurrence of that value. For example, the value average of the feature consonant inventories occurs in 181 languages out of a total of 561 languages coded for this feature. There is thus a chance occurrence of 181/561 = 0.322 for this value. However, this chance cannot simply be interpreted as an indication of the rarity of the value. The rst problem is that different maps distinguish different numbers of values, and the chance occurrences thereby have different impact on the evaluation of rarity. For example, in the map on consonant inventories there are ve different values distinguished (small, moderately small, average, moderately large, large), but in the next map on vowel quality inventories (Maddieson 2005b) there are only three different values distinguished (small, average, large). Now, consider the value large of the feature vowel quality inventory. This value has a chance occurrence of 183/563 = 0.325, almost exactly the same as for average consonant inventory discussed previously. However, with only three values distinguished for vowel quality inventories, such a chance of around one-third should count as just average rarity. In contrast, with the ve values as distinguished for consonant inventories, a chance of one-third is actually higher than expected from an equal distribution (in which the chance would be onefth), and should thus be counted as relatively low rarity (or common). Conversely, in a hypothetical feature with only two values distinguished, a chance expectation of around one-third would count as relatively high rarity (or unusual). The simplest solution to this problem is to multiply the chance occurrence of each value with the number of values distinguished, as shown in the denition of the Rarity Index in (1). The feature consonant inventories distinguished ve different values, so the rarity index for the value average is 5 0.322 = 1.61, which is higher (and thus less rare) than the index for the value large of the feature vowel quality inventory 3 0.325 = 0.975. Note that a rarity index of around 1.0 means that the chance occurrence of a particular value approaches the chances for equally distributed features. For a feature with x values, an equal distribution would mean a chance of occurrence for each value of 1/x. If the empirically established chance occurrence of a particular value approaches 1/x, the rarity index for this value approaches x (1/x) = 1. For practical reasons, I used the inverse of this index, as shown in (2). The higher this index, the higher the rarity of the value in the WALS data. Using this inverse has the nice effect that the mean of all indices over all languages coded for a particular feature is

416

Michael Cysouw

also exactly one, as shown in (3). The equation in (3) can easily be veried by writing out the terms in the summation. (1) Rfi = n f1 ftot

n = number of values of a particular feature fi = frequency of value i ftot = total number of languages coded for this feature (2)
n

Rfi =

ftot n fi

(3)

i=1

(Rfi fi )
ftot =1

The formula in (2) thus denes the rarity-index of a value. The next step is now to compute a rarity index for a language on this basis. The basic idea for computing a rarity index of a language is to take the mean of all rarity indices for all the characteristics of this language, throughout all the maps in WALS. However, a second confounding factor is the number of maps in which a particular language occurs. The data of WALS is not complete, meaning that not every language is coded in every map. Many languages are only coded in very few maps. For this reason, simply taking the mean rarity over all values is not a good measure to evaluate which language has the most unusual characteristics. If a particular language is only coded for few features in WALS, there will be strong random effects. Languages with few code-points in WALS will show more extreme values of mean rarity, both to the high and the low side. This effect can be observed in Figure 1, in which the mean rarity for all 2,600 languages in WALS is plotted against the number of features coded (each point in the gure represents one language).4 The fewer features are coded for a language, the more extreme mean rarities occur. To normalize this effect, I evaluated the distribution of mean rarity by a randomization technique. The randomization proceeded as follows. For each

4. For clarity of depiction, the logarithm of mean rarity is shown in this gure. Using the logarithm has the visual effect of separating the out the values some more, thereby showing more clearly the distribution of the points in the gure. Another effect is that the mean rarity now centers around zero, because log (1) = 0.

On the distribution of rare characteristics

417

Figure 1. Plot of mean rarity indices against the number of features coded, with lines indicating 1 % (outer lines) and 5 % (inner lines) extremes as measured by a randomization procedure.

number of features coded (ranging between 1 and 139),5 a thousand ctitious languages were created. For each invented language, a set of features was selected completely at random. Within each feature, a value was selected semirandomly. The value selection was guided by the actual chance occurrences of each value in WALS. In this way, each set of a thousand ctitious languages has
5. WALS has 142 maps, but for the present investigation the two maps on sign languages and the map on writing systems have been disregarded, leading to a maximum of 139 features available.

418

Michael Cysouw

the same distribution of values as the real WALS. For example, the number of languages with average consonant inventory will be around 32.2% in each set of thousand languages. One such set of a thousand languages was made with each language being coded for one feature only. Then one set was made with each language being coded for two features, etcetera, nishing with a set of thousand languages in which each language was coded for 139 features. The mean rarity for all these invented languages was computed, thus giving a thousand mean rarity values for each number of features. Using all these ctitious languages, the mean rarity of a real language can be evaluated. For example, Dutch is coded for 67 features and has a mean rarity of 1.66. The question now is how extreme this value is. The mean rarity is higher than 1.00, so there appears to be a relatively high level of rarity in this language. But is this really much higher than 1.00, or is a value of 1.66 still within the expected variation? To evaluate this, the set of thousand ctitious languages coded for 67 features were used. Among this set of thousand madeup languages, there turned out to be 96 (= 9.6%) with a mean rarity higher than 1.66. Thus 904 (= 90.4 %) ctitious languages had a smaller mean rarity. From this it can be concluded that the mean rarity of Dutch is really rather high (even among the highest 10 %). Note that this value is not a real signicance value as given by statistical analyses, although it is a somewhat similar concept. This value indicates the relative unusualness of a particular language within the WALS dataset. Using such evaluations, lines representing the 1 % and 5 % extremes can be drawn in Figure 1. These lines show the boundary between the extremes in the ctitious languages, indicating which of the real languages (represented by the dots) belong to these extremes.

4.

Rarity indices for individual languages

Using this evaluation of mean rarity by randomization, the languages with the most extreme mean rarity are shown in Table 1. In this table, a mean rarity index level is indicated by a percentage in the last column. For example, 100 % means that this particular mean rarity is higher than all thousand ctitious languages for the number of features coded. The rst six languages all fall in the level of this most extreme mean rarity. As can be seen in the penultimate column, the actual values of mean rarity differ widely. Winnebago has a very high mean rarity (11.37), which is even high considering that this language is only coded for 7 features (judging from the index level of 100 %). In contrast, Wari is also included among the most extreme index levels with a mean rarity of only 2.36 (remember that the mean over all the data in WALS is 1.00). How-

On the distribution of rare characteristics

419

Table 1. Top 15 of languages according to mean rarity index level. Within each level, the languages are ordered to the number of features coded, though this is for presentational purposes only. Language Wari Dinka Jamul Tiipay Nuer Kar (Arra) Winnebago Chalcatongo Mixtec Kutenai Kombai Dahalo Maxakali Warrwa Bunuba Eyak Yawuru Genus Chapacura-Wanhan Nilotic Yuman Nilotic Tupi-Guarani Siouan Mixtecan Kutenai Awju-Dumut Southern Cushitic Maxakali Nyulnyulan Bunuban Eyak Nyulnyulan Features Coded 115 45 44 28 24 7 113 113 38 17 15 20 16 16 15 Mean Rarity 2.36 3.45 3.76 3.42 6.16 11.37 2.05 2.02 3.27 5.86 6.95 3.74 4.21 4.05 4.51 Index Level 100 100 100 100 100 100 99.9 99.9 99.9 99.9 99.9 99.8 99.8 99.8 99.8

ever, this value is achieved with as much as 115 features being coded, and for such many features, a mean rarity of 2.36 is apparently still highly signicant. Although such a listing of the worlds languages as to the level of rarity satises a currently widespread felt need for rankings, its merits are doubtful. It would be interesting if particular genealogical or areal groups showed up high in this listing. However on rst inspection this is not the case. There are two Nilotic and two Nyulnyulan languages among the top 15, which is indicative, though not convincing. Areally, among the top 15 as presented in Table 1, only the languages from Eurasia are absent. The majority of the top 15 is from the Americas (eight languages), three are from Africa and four from Australia/New Guinea. However, this is partly an effect of the random cut-off point of the top 15, chosen here for reasons of space. In Figure 2, a world map is presented, showing the geographical distribution of the top 5 % languages (i.e. all languages with an index level of 95 % and higher). There appears to be a relatively high density of languages in Africa (around the equator) and northern Australia/New Guinea, but these are also regions with a high number of languages represented in the WALS data (and with many languages in general). I would argue that from this

420

Michael Cysouw

Figure 2. World map showing the top 5 % on the rarity index level of the languages in the WALS.

distribution alone, there does not appear to be a reason to declare any group of languages to stand out as showing a particular high level of unusualness. 5. Rarity indices for groups of languages

To further investigate the distribution of rarity among the worlds languages, I computed rarity for groups of languages, based the index levels for each language (as discussed in the previous section). Such values for Group Rarity (GR) are useful to evaluate the relative rarity of a genealogical or an areal group of languages. As a measure of Group Rarity, I have used a weighted mean of the rarity index levels of the individual languages. Basically, to compute this weighted mean, I took the mean of all index levels of the individual languages (not the mean rarity itself), and weighted the languages according to the logarithm of the number of features coded, as shown in the formula in (4). Because of this logarithm, the languages with more features coded have slightly less inuence on the resulting value. Also, languages that are only coded for one feature do not have any inuence, because log (1) = 0.

(4)

GR =

i=1

log(Li ) (%R)i
i=1

log(Li )

On the distribution of rare characteristics

421

n = number of languages in a group Li = number of features coded for language i %Ri = rarity index level for language i Using the measure of group rarity on genealogical groups results in an interesting set of linguistic families showing a high level of rarity. The top 10 linguistic families as to group rarity are shown in Table 2. Only families with more than three languages included in WALS are shown, because I want to show effect on the level of the family. In families with only few members coded in WALS (or few members existing in the world), high rarity of individual languages will raise the level of the whole family unproportionally.
Table 2. Top 10 of weighted rarity for linguistic families (only families shown with more than 3 languages included in the WALS data). Family Northwest Caucasian Kartvelian Caddoan Wakashan Iroquoian Khoisan Arauan Salishan Na Dene Algic No. of Languages 7 4 5 7 8 11 6 24 23 31 Group Rarity 87.8 83.7 82.2 80.2 76.3 74.5 71.8 71.2 70.2 69.9

Two families from the Caucasus (Northwest Caucasian and Kartvelian) take the rst two positions on the ranking of families (the third indigenous family from the Caucasus, Nakh-Dagestanian, has only slightly higher than average rarity). Further, families from Northern America are strongly represented: Caddoan, Wakashan, Iroquoian, Salishan, Na Dene and Algic all made it into the top 10. Hokan, Eskimo-Aleut, Kiowa-Tanoan and Penutian just did not make it all the way up, though they still show an extremely high level of group rarity. From a genealogical perspective, the Caucasus and Northern America clearly stand out as having families showing a high level of group rarity. 6. Areal distribution of rarity

To evaluate whether there are geographical areas with a high preponderance of rare features, I investigated groups of languages that are geographically con-

422

Michael Cysouw

tiguous. For each language in the database, I took the thirty nearest languages (using a simple Euclidean distance, not taking account of natural barriers) and computed the rarity for all such areal groups. The rarity index for each group is plotted on a map on the location of the centre of the group. Such an approach necessarily will show some areal consistency, because two neighbouring languages will share many of their neighbours. However, it is interesting to see where the centres of areally consistent groups are. These centres are indicative of the location of geographical areas with a high level of rarity. The higher the rarity index for a group around a particular language, the darker the dot on the map as shown in Figure 3. In this map, there are fteen centres of high rarity, as summarised in Table 3. For all these areas, a centre is indicated. These centre languages are the rst languages that show up in the ranking of group rarity for the areal groups. This central language is not necessarily of any importance itself. For example, Frisian only turns out to be the centre of the Northwest European cluster because it is roughly in the middle of the area including English, French, German and Norwegian. The fact that there are fteen centres (and not more or less) depends on the decision to compute group rarity for areal groups of thirty languages around each centre. More centres of rarity appear when, for example, groups of only ten languages are taken. However, these centres mostly split up

Table 3. Areas of high rarity, grouped by Macroareas. Macroarea Eurasia Oceania Location of area with high rarity North-western Europe Caucasus Philippines Sumatra Pacic Northern Australia Southeast Australia Northwest America Northeast America Western North America Central America Amazonia West Africa Central Africa Southern Africa Centre language Frisian Adyghe Bikol Minangkabau East Futuna Walmatjarri Ngiyambaa Lummi West Greenlandic Havasupai Zapotec Pirah Guro Mende Zulu

America

Africa

On the distribution of rare characteristics

Figure 3. World map showing areal centres of rarity.

423

424

Michael Cysouw

groups found in the map shown here. When groups larger than thirty languages are used in the computations, then the clear distinctions between the various centres start to diminish. For the current purpose of investigating worldwide areal patterns in the WALS data, a group size of about thirty appears to be most suitable. It is interesting to speculate why these centres appear in this worldwide survey of rarity. Several of these areal groups are considered to be typological areas (or Sprachbnde). However, some areas with high rarity have no accompanying claim for areality, and many traditionally claimed linguistic areas do not show up as areas with high rarity. Although it is tempting to hypothesize that strong inuence between languages might lead to the spreading of otherwise rare phenomena, the overlap between rare areas and known areal groupings is at present only approximate. However, the quantitative notion of rarity as used in this paper might be particularly useful to investigate linguistic areas as the strongest evidence for areality stems from traits that are common in a particular area, but rare elsewhere.

7.

Rare characteristics of northwestern Europe

Probably the most surprising area to appear in the list of geographical areas with a high level of rarity is northwestern Europe. This area is centred on Frisian. Many of the thirty languages around Frisian are variants that are often considered West Germanic dialects. These are only coded for a few features in WALS, and do not have much impact on the rarity measure. When these are removed, the remaining languages in this area, all with a relatively high coverage in the WALS data, are English, German, Dutch, Frisian, and French. The pressing question now of course is what makes these languages so exceptional? To investigate which features caused the high rarity index for this group, I considered each feature individually. Depending on the values for each feature, I took the original rarity index, as shown in (2), for each value of each language in the area. Then the mean of these rarity indices was computed, and the features were ordered according to this mean. This resulted in a list of most exceptional characteristics of this area. The top ten of this list is shown in Table 4 (the mean rarity of each feature for this area is shown in the rst column). This list of exceptional characteristics of northwestern European languages will be quickly reviewed here. For more details on the coding and the decisions to distinguish between various values, please refer to the relevant texts accompanying the maps in WALS. A summary of the presence of these exceptional

On the distribution of rare characteristics Table 4. Top 10 of the rarest characteristics as found in northwestern Europe. Rarity 8.39 7.96 7.93 7.56 4.58 4.32 4.15 3.46 3.14 2.86 Feature Polar Questions Uvular Consonants The Perfect Coding of Evidentiality Demonstratives Negative Indenite Pronouns Front Rounded Vowels Relativization on Subjects Weight-Sensitive Stress Order of Object and Verb

425

Exceptional value present in Europe Interrogative word order Uvular continuants only Perfect of the have-type Modal morpheme No distance contrast No predicate negation present High and mid Relative pronoun Right-oriented, antepenultimate involved Both orders, neither order dominant

traits in northwestern European languages is given in Table 5, alongside the basic percentages of these exceptional features among all the worlds languages. The exceptional features of northwestern Europe are the following. First, the marking of polar questions is unusual. In most of the worlds languages, polar questions are constructed by using a question particle. Two other major marking patterns are polar questions marked solely by use of intonation or by special verb morphology. The typical northwest European change in word order to mark polar questions is extremely uncommon worldwide, with only few attestations outside of Europe (Dryer 2005e). Uvular consonants are not very widespread among the worlds languages. Maddieson (2005d) nds them only in 17 % of the worlds languages. Most of these languages have at least some kind of uvular stop possibly alongside other kinds of uvular consonants. The situation found in northwestern Europe, namely the existence of uvular continuants (in the form of a voiceless fricative), without the existence of uvular stops as well, is highly uncommon. Outside Europe this is mainly attested in a few incidental languages scattered throughout central Asia. A perfect (like in English I have read the book), dened as a construction combining resultative and experiential meanings, is reasonably widespread throughout the worlds languages. Dahl and Velupillai (2005) nd a construction with similar semantics in almost half of the worlds languages. However, the typical European perfect construction of the have-type (derived from a possessive construction) is a European quirk, unparalleled elsewhere in the world. Evidentiality is the marking of the evidence a speaker has for his/her statement. Grammatical devices to code this are reasonably widespread among the worlds languages. De Haan (2005) nds some kind of evidentiality in slightly

426

Michael Cysouw

Table 5. Occurrence of rare characteristics in northwestern Europe compared to their worldwide frequency. Unusual characteristic Word order in polar questions Uvular continuants only Perfect of the havetype Modal morpheme for evidentiality No distance contrast in demonstratives No negation with negative indenites High and mid front rounded vowels Relative pronoun Right-oriented stress, antepenultimate Both orders of object and verb No productive reduplication Comparative particle French + + + + + + + + English + + + + + + German + + + + + + + + + + + + + + + + + Dutch + Frisian + World 1.4 % 2.1 % 3.2 % 1.7 % 3.0 % 5.3 % 4.1 % 7.2 % 5.4 % 6.6 % 15.3 % 13.2 %

[Note: Blank cells in this table are not coded in the data from WALS. Informal inspection and personal knowledge of the present author indicates that they are almost all to be marked as present (plus).]

more than half of the worlds languages. However, the usage of a modal verb for this means, as found in northwestern Europe (e.g. Dutch het moet een goede lm zijn, French il aurait choisi la mort), is extremely uncommon worldwide. Demonstratives are normally expected to have some distinctions as to distance, like English this vs. that. In a survey of such distance contrasts in adnominal usage, e.g. this book vs. that book, Diessel (2005) nds distance contrasts

On the distribution of rare characteristics

427

in almost all of the worlds languages. However, there are a few languages that do not have such distance contrasts in adnominal usage. Some examples are found in western Africa and, somewhat surprisingly, in French (ce) and German (dies- or das; note that jen- does not mark a distance contrast in modern German, although it did in older stages of the language). Negative indenite pronouns, like nobody, nothing or nowhere, are in most of the worlds languages accompanied by a regular predicate negation. Haspelmath (2005) nds predicate negations to be obligatorily present in 83 % of the worlds languages. There are only very few languages in which a negative indenite pronoun can occur (or even has to occur) without the predicate negation. This unusual phenomenon is mainly found in a few languages in Mesoamerica and in northwestern Europe. Front rounded vowels, like high [y] or mid [], are highly unusual as phonemes in a language. Maddieson (2005e) nds them only in 7 % of the worlds languages. Both the high and the mid front rounded vowels are mostly found in some languages of northern Eurasia, among them French and German. Related to this unusual characteristic are the exceptionally high number of vowel quality distinctions (Maddieson 2005b) and the low consonants to vowel ratio (Maddieson 2005c) of northwestern European languages. These two related characteristics just did not make it into the top ten of rare features of northwestern European languages. Relative clauses are a much debated and widely investigated aspect of human language. It might come as a surprise to many linguists that the typical European usage of a relative pronoun is only highly sporadically found outside of Europe (Comrie and Kuteva 2005). There is a large variety of stress-systems attested among the worlds languages. The typical northwestern European system is a weight-sensitive stress system in which also the antepenultimate syllable is involved (Goedemans and Van der Hulst 2005). Such a system is unusual, though it is also found in the near east and sporadically throughout the worlds languages. The last rare characteristic in the top ten of rarest traits in northwestern Europe is the variable order of verb and object (Dryer 2005c). This variability is paralleled in the likewise rare trait of having variable order of genitive and noun (Dryer 2005d), which, however, did not make it into the top ten of rare characteristics of northwestern Europe. Finally, two interesting characteristics of northwestern European languages that also did not make it into the top ten of rarity deserve quick mention here. First, the languages of northwestern Europe are exceptional because they do not allow for productive reduplication (Rubino 2005) and, second, because they use a special particle for comparative constructions (Stassen 2005).

428

Michael Cysouw

Going through this list of rare characteristics of northwestern European languages, it is important to realize that there are no worldwide correlations between any pair of these features. From a typological perspective, all these features appear to be independent parameters of linguistic variation. At least, I have not been able to nd any clearly signicant correlations between any two features in this list in the WALS data. Not even the presence of a have-perfect and a have-possessive correlate. This would mean that there are no internal linguistic reasons for these features to co-occur in northwestern Europe. It is probably an accidental effect of historical contingency that exactly these rare features are found in this area, and not others. As can be seen from the summary in Table 5, the exceptional characteristics are basically found in Continental West Germanic, with English and French sharing these unusual traits in about half of the cases. This areal centre roughly coincides with the Charlemagne Sprachbund, or Standard Average European (SAE) as summarised in Haspelmath (2001). Some of the typical characteristics of SAE languages, as described by Haspelmath (2001), are also found in the present investigation. In particular, the word order in polar question, the perfect of the have-type, no negation with negative indenites, the special structure of the relative clause, and the usage of comparative particles are noted both in Haspelmaths and in the current investigation. However, there are also clear differences between my claim for northwestern Europe to have many unusual characteristics and Haspelmaths claim that the European languages share many traits. For example, the existence of denite and indenite articles is a clear case of a pan-European characteristic (Haspelmath 2001: 1494). This areality is also found in the WALS maps on articles (Dryer 2005a, 2005b). However, articles are not nearly as rare on a worldwide basis to show up in the present investigation. In contrast, the presence of the rare uvular continuants cannot be claimed to be a typical European characteristic. In fact, almost no European languages have such consonants (except for Continental West Germanic and French), but their presence is exceptional enough from a worldwide perspective to end up as a rare trait of northwestern Europe. Summarising, the claims for SAE as a linguistic area and the presence of many exceptional characteristics in this area are supplementary claims, probably both to be explained by long-term mutual inuence between the languages in question. There are a few words of caution to be added to these results. Matthew Dryer, one of the WALS editors, warns (in personal communication) that in some cases the exceptionality of northwestern Europe in the WALS data might have been enlarged by more or less deliberate decisions. He suggests that the WALS editors and authors might have included typical European oddities as separate values, thereby enhancing the exceptional prole of this area. This might indeed

On the distribution of rare characteristics

429

be the case for polar questions, modal evidentials, the have-perfect, relative pronouns and particle comparatives. These characteristics are really European quirks. They are common in Europe, and any linguist with a training based on European languages (which means almost all linguists) will at rst consider them to be the norm. While investigating the worldwide typological diversity, it will probably come as a surprise that European languages are exceptional in these respects. This might have raised the interest to investigate these characteristics of human language, eventually leading to their inclusion in WALS. Though this process might have had some effect, there are still numerous rare features in Europe that do not seem to have been inuenced by this bias.6 8. Conclusion

The usage and interpretation of large linguistic typological databases is still in its infancy. In this paper, I have laid out a preliminary attempt to approach a new large-scale typological database, the World Atlas of Language Structures, using quantitative methods. As a showcase, I have taken the notion of rarity and investigated the distribution of rare characteristics among the worlds languages. Individual languages and linguistic families were ranked according to their level of rarity. Rarity appears to be found rather evenly distributed throughout the worlds languages, though there are, of course, some languages and groups of languages that have more of it than others. The remaining question, that has to be answered by future research, is whether these languages or language groups with relatively many rare features are really rare languages. This would only be the case when in a completely different dataset the same languages would have a high level of rarity as well. Personally, I do not believe that this will be the case. Circumstantial evidence for this can be discerned in Figure 1, as with a rising number of characteristics considered, the mean rarity seems to approach normality. This might indicate that throughout all structures of a whole languages, rare and common characteristics are kept in balance.

6. In this same vein, it might also be speculated that the strong inuence from Russian and North American linguists on the research in typology in recent decades has lead to the introduction of such features as to enlarge the exceptionality of the languages in the Caucasus and North America. However, even if true, the presence of these exceptional features is still highly interesting. And there are still other areas with high rarity that show up in the present investigation. Any scientic-historical inuence is probably only a minor factor inuencing the results as presented in this paper.

430

Michael Cysouw

Still, it is interesting to interpret the distribution of rare traits in the current data. The most fascinating result being that the northwestern European area, centred on Continental West Germanic, turned out to be one of the most linguistically unusual geographical areas world-wide. Many of the rare characteristics as attested in this area might have been considered the norm from a European perspective, but the typological data shows that these characteristics are to be considered special structures of European languages, and not of human language in general. References
Comrie, Bernard, and Tania Kuteva 2005 Relativization strategies. In The World Atlas of Language Structures, Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie (eds.), 494501. Oxford: Oxford University Press. Cysouw, Michael, Jeff Good, Mihai Albu, and Hans-Jrg Bibiko 2005 Can GOLD cope with WALS? Retrotting an ontology onto the World Atlas of Language Structures. Proceedings of E-MELD workshop Linguistic Ontologies and Data Categories for Language Resources. Dahl, sten, and Viveka Velupillai 2005 Tense and Aspect. In The World Atlas of Language Structures, Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie (eds.), 266281. Oxford: Oxford University Press. de Haan, Ferdinand 2005 Coding of Evidentiality. In The World Atlas of Language Structures, Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie (eds.), 318321. Oxford: Oxford University Press. Diessel, Holger 2005 Distance constrasts in demonstratives. In The World Atlas of Language Structures, Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie (eds.), 170173. Oxford: Oxford University Press.

Dryer, Matthew S. 2005a Denite articles. In The World Atlas of Language Structures, Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie (eds.), 154157. Oxford: Oxford University Press.

On the distribution of rare characteristics

431

Dryer, Matthew S. 2005b Indenite articles. In The World Atlas of Language Structures, Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie (eds.), 158161. Oxford: Oxford University Press. Dryer, Matthew S. 2005c Order of object and verb. In The World Atlas of Language Structures, Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie (eds.), 338341. Oxford: Oxford University Press. Dryer, Matthew S. 2005d Order of genitive and noun. In The World Atlas of Language Structures, Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie (eds.), 350353. Oxford: Oxford University Press. Dryer, Matthew S. 2005e Polar questions. In The World Atlas of Language Structures, Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie (eds.), 470473. Oxford: Oxford University Press. Goedemans, Rob, and Harry van der Hulst 2005 Weight-sensitive stress. In The World Atlas of Language Structures, Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie (eds.), 6669. Oxford: Oxford University Press. Haspelmath, Martin 2001 The European linguistic area: Standard Average European. In The World Atlas of Language Structures, Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie (eds.), 14921510. Oxford: Oxford University Press. Haspelmath, Martin 2005 Negative indenite pronouns and predicate negation. In The World Atlas of Language Structures, Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie (eds.), 466469. Oxford: Oxford University Press. Haspelmath, Martin, Ekkehard Knig, Wulf Oesterreicher, and Wolfgang Raible 2001 Language Typology and Language Universals. Vol. 2. (Handbooks of Linguistics and Communication Science 20.2) Berlin: Walter de Gruyter. Haspelmath, Martin, Matthew S. Dryer, David Gil, and Bernard Comrie 2005 The World Atlas of Language Structures. Oxford: Oxford University Press. Maddieson, Ian 2005a Consonant inventories. In The World Atlas of Language Structures, Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie (eds.), 1013. Oxford: Oxford University Press.

432

Michael Cysouw

Maddieson, Ian 2005b

Vowel quality inventories. In The World Atlas of Language Structures, Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie (eds.), 1417. Oxford: Oxford University Press. Consonant-vowel ratio. In The World Atlas of Language Structures, Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie (eds.), 1821. Oxford: Oxford University Press. Uvular consonants. In The World Atlas of Language Structures, Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie (eds.), 3033. Oxford: Oxford University Press. Front rounded vowels. In The World Atlas of Language Structures, Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie (eds.), 5053. Oxford: Oxford University Press. Reduplication. In The World Atlas of Language Structures, Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie (eds.), 114117. Oxford: Oxford University Press. Comparative constructions. In The World Atlas of Language Structures, Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie (eds.), 490493. Oxford: Oxford University Press.

Maddieson, Ian 2005c

Maddieson, Ian 2005d

Maddieson, Ian 2005e

Rubino, Carl 2005

Stassen, Leon 2005

Wlchli, Bernhard 2005 Par tipolo ijas atlantu un latvieu valodas materi lu taj . [About the g a a typological atlas and the Latvian material in it]. Paper presented at Letonistu semin rs [Letonists seminary], August 613, 2005, Mazsaa laca, Latvia.

Remarks on rarity sten Dahl

In his paper, Cysouw proposes to approach the notion of exceptionality by operationalising it with the help of the notion of rarity. He considers a trait to be exceptional when it is rare with regard to the known worldwide diversity. Although he does not dene the term rare, one can deduce from his way of using it that it means less common than expected, given the number of alternatives. Rare is a word belonging to everyday language and its meaning is accordingly somewhat uid. For instance, it is not always clear if what is crucial is frequency or absolute numbers. In some domains, however, we nd explicit denitions of rare. Thus, according to standards applied in information about drugs in the EU, a rare side-effect of a drug is one which appears in less than one case of 1,000. If we applied these standards to languages, no feature could be regarded as rare that is manifested in more than six or seven languages in the world. In a language sample of the size that is average in the WALS maps about 400 languages such a feature would normally show up in one language or none at all, and it would be unlikely to be represented with by a separate value in WALS. In other words, if we are to study rarity in the WALS data, we must have more modest demands on what counts as rare. Indeed, the rare or exceptional features discussed by Cysouw in the paper tend to have a rather much higher incidence. In the most extreme case, Cysouw says that northwestern European languages are exceptional because they lack productive reduplication and employ comparative particles properties which are found in 15 and 13 per cent of the respective world-wide samples in WALS. (Note that any side-effect of a drug which has an incidence of more than one per cent is labeled common.) Admittedly, these properties are not among the Top 10 of the rarest characteristic as found in northwestern Europe among those, the highest global incidence is 7.2 % (relative pronouns). The question where to draw the borderline for rarity and exceptionality is perhaps of no greater theoretical signicance, but there are some other problems here, which I shall now turn to. For something to be called exceptional, it would seem necessary for it to be an exception to some generalization. In the case of reduplication, the lack

434

sten Dahl

of which was said to be exceptional, the generalization would be Languages have productive reduplication. Even if this is true only of 85 per cent of all languages, as the WALS data suggest, it still looks like a legitimate generalization. The case of the use of comparative particles in European languages is more problematic. The generalization would have to be negatively formulated: Languages do not use comparative particles. In my mind, it is a bit counterintuitive to speak of exceptions to such negative generalizations, in particular, when we are dealing with one among several ways of realizing a certain kind of construction. In this case, the WALS map shows four types of comparative constructions, and admittedly, Particle Comparative is the least frequent among those it is found in 22 languages, but it is not drastically less frequent than the two middle ones (Exceed Comparative and Conjoined Comparative), which occur in 33 and 34 languages. Thus, it may well happen that several of the alternatives have relatively low individual frequencies and will all be seen as exceptional. An example of this would be the feature Weight-sensitive stress. In the sample of 500 languages, only 219, that is roughly 44 per cent, have weightsensitive stress, and these are distributed over seven different types, none of which has more than 13 per cent of the total. Another problem is exemplied by the second on the top-ten list of rare characteristics in NW Europe, Uvular continuants only. Cysouw says that the existence of uvular continuants without the existence of uvular stops as well, is highly uncommon. Well, it depends on how you count. It is true that out of 566 languages, only twelve show this value of the feature Uvular consonants, but on the other hand, that makes up 20 per cent of the 60 languages with uvular continuants. And actually, only two of the twelve languages with Uvular continuants only are found in NW Europe. (We do know that there are more of them outside the sample but it would be cheating to include them in the count.) The problem is really a general one: if a set S1 of languages is the intersection of two other sets S2 and S3 , the assessment of the rarity of S1 should not be based on its absolute frequency but rather on the relative frequency of S2 given S3 , or vice versa. Two notions that have interested me are that of linguistic complexity (Dahl 2004) and that of typological diversity (Dahl 2008). Both of these do in fact have non-trivial relations to rarity. Consider, to begin with, complexity, and as a concrete example, vowel systems. It is reasonable to assume that a vowel system with a larger number of distinctions is more complex than one with a smaller number. In the paper, Cysouw says that the presence of rounded front vowels in the languages of northwestern Europe is related to the exceptionally high number of vowel quality distinctions in those languages. It seems that there are certain generalizations we can make about systems that are made

Remarks on rarity

435

up of elements of varying frequencies. Thus, given two distinctions a and b in a vowel system, if a is rarer (less frequent) than b, then systems that contain a will on average contain a larger number of distinctions and thus be more complex than systems that contain b. This is a claim that depends solely on probability-theoretical considerations. However, the connection between rarity and complexity is enhanced by the universality or near-universality of some vowel quality distinctions (All languages have some variations in vowel quality that indicate contrasts in the vowel height dimension (Ladefoged & Maddieson 1996: 286)) and by the existence of implicational universals to the effect that the presence of a less frequent element entails the present of a more frequent ones. But in Cysouws paper, rarity is a property that pertains to values of features in WALS rather than to distinctions or elements in a system, and some of the values concern the absence rather than the presence of elements, such as the feature value No adpositions, which is found in 28 out of 1074 languages and would thus be fairly rare. If we assume that languages without adpositions are ceteris paribus less complex than the ones that have them, this will be an example of a rare trait that is not connected with higher complexity. Turning now to typological diversity, I demonstrate in Dahl (2008) that at least to a certain extent, linguistically diverse parts of the world are also the places where rare features show up most. Thus, I argue in the paper that the indigenous languages of the Americas contribute about 40 per cent of the total typological or structural diversity of the languages of the world, although they make up only about 15 per cent of those languages. Comparing this to Cysouws result, eight languages on his list of the 15 languages with the highest mean rarity level are from the Americas. Looking at families with at least three languages included in the WALS data, out of the top ten having the highest weighted rarity, six are from North America and one from South America. Finally, of the 15 areas of high rarity, four are in North America and one in South America. References
Dahl, sten 2004 The Growth and Maintenance of Linguistic Complexity. (Studies in Language Companion Series 71). Amsterdam/Philadelphia: Benjamins. An exercise in a posteriori language sampling. Sprachtypologie und Universalienforschung 31: 208-220.

Dahl, sten 2008

436

sten Dahl

Ladefoged, Peter, and Ian Maddieson 1996 The Sounds of the Worlds Languages. Oxford: Blackwell.

Some more details about the denition of rarity Michael Cysouw

Replying to the many stimulating comments raised by Dahl, I am rst rather astound by his assertion that I did not dene the term rare. In fact, the whole of Section 3 denes the precise mathematical operalization of my notion of rarity. And indeed, my notion of rarity is a relative one (and I would even go as far as to argue that a notion of absolute rarity is meaningless, cf. Cysouw 2003). Still stronger, also the evaluation of the (relatively dened) Rarity Indices is relative. I explicitly do not presuppose any absolute norm separating low from high Rarity Indices, because I would not know of any data that could help us set such a norm. Thus, the only observations I make in the paper are about the most extreme (relative) rarities as compared to all other (relative) rarities. The list of rare traits of Northwestern European languages in Section 7 is thus a list of relative relative rarity. Whether these traits are really all noteworthy is of course open to interpretation. Looking at the values of the Mean Group Rarity Index for the traits themselves (as reported on in the rst column of Table 4), I would suggest that the rst four are really much more signicant rarities in northwestern Europe than the others in the list. Still, I nd it hight stimulating to know what other European characteristics should be considered rare when the notion of rarity is interpreted a bit more lenient. Just to take up the least extreme case of relative pronouns (as referred to by Dahl), this is indeed found in 7.2 % of the worlds languages, which one might (or might not) nd rare. However, looking at the worldwide distribution of relative pronouns, shown here in Figure 1 (Comrie & Kuteva 2005 = WALS 122), it is clear that it actually is a clear example of a regionally bound rarity. Next, Dahl discusses two possible problems with my notion of rarity. First, from the context of the theme of the present collection of papers he warns that the intuitive notions of rarity and exceptionality do not necessarily coincide. In principle, I completely agree with this comment, as I write in the introduction to the paper exceptionality is a more encompassing term than rarity. However, I think that the difference proposed by Dahl does not differentiate the two. For something to be called an exception, Dahl argues, there has to be some presupposed generalization relative to which it can be an exception. Now, when a trait

438

Michael Cysouw

Figure 1. Usage of relative pronouns (dots) compared with other relativization strategies (squares) for the relativization of subject (adapted from Comrie & Kuteva 2005).

X is rare, but the opposite trait not-X is not be denable (or only negatively denable by saying it is not X), then it is difcult to argue relative to what X is an exception. Here I disagree. The only generalization that is necessary is the presence of one trait (or a group of traits) that is common, and then everything else can be declared both exceptional and rare relative to the common case(s). One example discussed by Dahl concerns the typology of comparative constructions (Stassen 2005 = WALS 121). There are four types distinguished, one of which is more common than the others: Locational (47%), Exceed (20%), Conjoined (20%), and Particle (13%). Now, relative to the Locational strategy, all other are (more or less) rare and (more or less) exceptional. The radical situation would be an extremely ne-grained typology of the worlds languages in which all types are rare (implying of course that there are very many different types). In this situation (which would probably be an anomaly in itself, cf. Cysouw 2010), I do not think anybody would want to claim that all types are exceptions, because indeed there is nothing to be an exception against. However, in my operalization of rarity this situations would also not result in the presence of any rare types. In the Rarity Index, as proposed in (3) in the paper, the proportion of occurrence is taken relative to the number of types that are distinguished. The result is that in the hypothetical situation with very many roughly equally frequent small types, the Rarity Index will consider all types to be not rare. So, as far as there are problems with the denability of the non-rare counterpart, I think the interpretations of rarity and exceptionality coincide.

Some more details about the denition of rarity

439

Secondly, Dahl argues that a trait might be a composition of various independent characteristics, only the combination of which is rare. In such situations rarity should be assessed relative to the expected intersection of the traits in isolation. I completely agree with this, but the problem is caused by the unstructured coding of the values of WALS. Unfortunately, WALS does not include explicit information on the ner-grained structure of the traits distinguished. For the present paper, I decided not to perform any recoding of the WALS data, as this would be a project in its own right (see Footnote 4 of the paper and the reference therein). But suppose one would perform such recoding, as suggested by Dahl, then the computation of the rarity index for composed traits would indeed change. As an example, lets consider the WALS map on uvular consonants (Maddiesson 2005 = WALS 6) that was brought up by Dahl. There are four different types distinguished in this map that can easily be decomposed as an intersection of two binary parameters, as shown in Table 1. There is a strong correlation between these parameters (Fishers Exact p < 107 ). This implies that the twelve cases of uvular continuants without uvular stops are actually much less than would be expected by chance alone (expected frequency is 480 60/566 = 50.9).
Table 1. Typological distribution of uvular consonants
Uvular Stops No Uvular Continuants No Yes Total 468 12 480 Yes 38 48 86 Total 506 60 566

The Rarity Index, as shown in (3) in the original paper, is actually of the form expected proportion divided by observed proportion (E/O). The observed proportion (O) is the frequency of a trait fi divided through the total number of languages ftot (i.e. 12/566 in the current example). The expected proportion (E) that I used in the paper was simply the expectation under assumption of independence, viz. 1/n, where n is the number of values distinguished (i.e. 1/4 in the current example). The Rarity Index for this trait is thus E/O = ftot /(n fi ) = 566/(4 12) = 11.8. However, when the feature is decomposed as shown in Table 1, then the expected proportion changes: the expected proportion is the

440

Michael Cysouw

product of the independent proportions of the decomposed traits. In the example the expected proportion is the proportion of no uvular stops times the proportion of yes uvular consonants (i.e. 480/566 60/566 = 0.09, which is noteably smaller than 1/4 as assumed in the paper). In this way, composed traits that have a lower expectation than 1/n get a lower Composed Rarity Index. For the present example this index would be 480/566 60/566 566/12 = 4.24, which is clearly smaller than the 11.8 from the index as used in the paper. In general, when a feature f is decomposed into a set of co-occurring features f1 , f2 , f3 , ft then the expected proportion for fi is the product of all independent proportions, see (1), and the Rarity Index (RI) changes accordingly, as shown in (2). However, this all of course highly depends on any proposed decomposition of WALS features. In the current example the decomposition is rather unproblematic, but for many other features in WALS this is not as easy. (1) E(fi ) =
s=1 t

fsi ftot

(2)

RI (fi )I

1 E =E = O O

s=1

fsi ftot ftot fi

Finally, building on the discussion in Dahls reply, I would like to suggest that the relation between complexity and rarity is of implicational nature, in the sense that complexity probably implies rarity, but clearly not vice versa. As for the relation between areal diversity and rarity, I am not convinced that there should be any relation. Of course, in highly diverse areas more rarities will be found, but so would common traits. The real question should be whether the proportion of rare traits to common traits correlates with diversity. As far as I am concerned, the verdict on this matter is still open. References
Comrie, Bernard, and Tania Kuteva 2005 Relativization on subjects. In The World Atlas of Language Structures, Martin Haspelmath, Matthew Dryer, David Gil and Bernard Comrie (eds.), 494497. Oxford: Oxford University Press. Cysouw, Michael 2003 Against implicational universals. Linguistic Typology 7: 8910.

Some more details about the denition of rarity

441

Cysouw, Michael 2010 On the probability distribution of typological frequencies. In The Mathematics of Language, Christian Ebert, Gerhard Jger and Jens Michaelis (eds.), 2935. Berlin: Springer. Maddiesson, Ian 2005 Uvular consonants. In The World Atlas of Language Structures, Martin Haspelmath, Matthew Dryer, David Gil and Bernard Comrie (eds.), 3033. Oxford: Oxford University Press. Stassen, Leon 2005 Comparative constructions. In The World Atlas of Language Structures, Martin Haspelmath, Matthew Dryer, David Gil and Bernard Comrie (eds.), 490493. Oxford: Oxford University Press.

Subject index

ablaut 15, 146, 150, 154156 ablaut classes 15 ablaut formation 17 acceptability acceptability hierarchy 295, 349 relative acceptability 344 see also speakers judgements; grammaticality judgements accusative languages 32 acquisition 262264, 291, 374 agreement 13, 3738, 40, 4445, 119 121, 392, 404 long-distance agreement 36, 37, 40, 41, 44, 45 analogy 7, 141, 143, 147, 154, 170 anaphor 296, 298 areal relationships 35, 47, 421 Sprachbnde 424, 428 argument realisation 213 aspect 245247, 251 Associative Network Model 148 Autolexical Grammar 41 autosegmental phonology 84 binding 294295, 299, 318, 327 borrowings 60, 72 brevity 157 canonicity 107126, 136 canonical inection 108, 139, 141 non-canonicity (external, internal) 111112 see also case: non-canonical case marking case 13, 40, 41, 215, 243, 341343 accusative case for subjects 32, 48, 214, 223, 230, 233, 244 accusative case for experiencers 244, 251

case alternation on objects 245 case conict 344, 351 case hierarchy 386 case matching 372 Dative Sickness 217, 244 Dative Substitution 234 Exceptional Case Marking 40 experiencer verbs 223 genitive case 32, 214, 217, 230, 244245, 251 Nominative Substitution 217 non-canonical case marking 48 second genitive 118 see also ergativity categorial split 152 clusivity 10 clusterings 48 comparative constructions 427, 434, 438 competition 7, 12, 13, 26 see also rules: competing rules complexity 434 see also morphology: morphological complexity; production: production complexity comprehension see processing consonants consonant phoneme inventories 34 35 nasal consonants 3435, 50 uvular consonants 425, 434, 439 constraints 300, 304, 315 Alignment Constraints 63 conicting constraints 341 Constraint Application 309310, 312, 317 constraint relaxation 405

444

Subject index structured exceptions 214 see also case: Exceptional Case Marking see also rules: dialectical nature of the relationship between rules and exceptions exhaustivity 109 explanations 35, 48, 50 extra-grammaticality 15, 23, 26, 380, 382, 389392, 394396, 405 Faithfulness Constraint see constraints family resemblance 214 folk linguistics 339 frequency 17, 143, 148, 152153, 155 157, 164, 169171 176, 179180, 191, 219, 229, 233, 235, 292, 311, 340, 354356, 434 see also processing: statistical processing; universals: statistical universals functional categories 267268 gender 117 genealogical groups 35, 47, 421 generalisation 132133, 257 sociolinguistic overgeneralisation 16 usage-based generalisations 272 see also typology: typological generalisations Generative Grammar 36, 39, 255256, 291323, 340, 371, 402 Government and Binding Theory 40, 275, 283 gradience 5, 12, 23, 25, 51, 176, 293, 294, 306308, 328, 333, 337, 339 340, 371 grafts 363, 374 grammatical indeterminism 257 see also underdetermination grammatical levels 8, 15 grammaticality 306 grammaticality judgement 291323, 374

Faithfulness Constraint 63, 351352, 373 soft constraints 340 context-free grammar 379 core grammar 11, 259, 404 core phonology 89 counterevidence 369 dative see case Decathlon Model 311, 313, 315 default 8, 378, 395 deponency 113114, 120 derivation 3839, 41, 47, 49 diachronic change 5, 15, 17, 127, 133, 213-241 see also sound change; sporadic change see also grammaticalisation diacritic feature 63, 86, 97, 99, 101, 105 dialect 177, 349 dialectal or idiolectal variation 75 dialect geography 11 differentiation see overdifferentiation diminutives 2122 distinctiveness 155, 157, 164 diversity 434435, 440 E-language 127 Economy Theory 141, 155 Elsewhere rule see rules Empty Category Principle (ECP) 300, 304, 318 ergativity 32 evidentiality 425 exceptions accommodating exceptions 35 arbitrary exceptions 214 higher-order exceptionality 32, 108, 394 hyper-exception 25 lexical exceptions 31 regularising exceptions 35, 43, 50 soft exceptions 176, 181

Subject index see also acceptability; ideal speakerlistener; extra-grammaticality grammaticalisation 169 see also diachronic change Grimms Law 67 homophony 131132, 136 HPSG, 403 I-language 127, 133 Iceberg Effect 314 ideal speaker-listener 9, 255, 307 inection inectional split 149 verbal morphology 24 Wechselexion 151, 163 see also aspect; case; gender; morphology; paradigms; personnumber systems; tense; verb classes: strong verbs IPP-effect 18 irregularisation 140, 156, 169 see also regularity irregularity 149, 155, 157, 163 language production see production lexical fusion 149 Lexical Integrity Hypothesis 22 Lexical Parametrisation Hypothesis see parameters: Lexical Parametrisation Hypothesis Lexical Phonology 59, 6263 lexicon 1112, 24, 122, 256, 268, 335 Lexicon Optimisation 64, 7778 lexical (pre-)specication 71, 95, 103 redundancy-free lexicon 80 see also exceptions: lexical exceptions loan elements 61, 85 locality 3738, 40 markedness 143, 164, 170, 340, 361 362, 371

445

markedness constraints 351352, 362, 373 markedness hierarchy 350 markedness proles 354 maximal entropy classication 184, 199 Maximum Underspecication (MU) model 96, 103 Minimalist Program 265267, 269, 270, 275, 283, 289, 313 Minimise Domains 270, 272273 morphology 20 morphological complexity 74 morphophonological alternations 60 morphological naturalness 141 morphosyntactic specication 110 Natural Morphology 165, 171 preterite present 19 static morphology 165 word formation 20, 22 see also inectionnaturalness 140 141, 297 Neogrammarian Controversy 5 Network Model 109, 156 non-canonicity see canonicity non-coherent class 230 norm 344 number, grammatical 78, 12, 1617, 31, 3334, 49, 113 objects see syntactic relations Occams Razor 4 Optimality Theory (OT) 14, 63, 67, 69, 71, 77, 96, 131, 293, 313, 340, 342, 351, 357, 361362, 369, 371 374 Stochastic OT 209, 315, 316 ordering paradoxes 6263 outliers 9 Output Selection 309310, 312, 317 overdifferentiation 113, 116117, 120, 128, 155, 157, 163 differentiation 164

446

Subject index see also parsing production 186, 206 production complexity 207 production errors 25 productivity 89, 156, 169, 214, 221, 229, 243, 251 partial productivity 226 semi-productivity 214, 235, 243 pronouns 10, 17 negative indenite pronouns 427 see also relative clause: relative pronouns proper names 74, 85 proto-patterns 20 prototypes 12, 214215 raising 3940, 44, 49 Rarittenkabinett, Grammatisches 11, 32 rarity 411412, 433, 435, 437 centres of rarity 422423 Group Rarity 420, 437 mean rarity 418 rare languages 429 Rarity Index 414416, 437438, 440 reanalysis 9, 221 reduplication 427, 433 reexivity 294, 296 long-distance reexives 46 regularity subregularities 59, 66, 163, 165, 170; see also subsystem typological regularities 133 see also exceptions: regularising exceptions see also irregularity Relational Grammar 38, 41, 49 relative clause 33, 41, 49, 177, 200 free relatives 339359, 363, 370, 374 non-restrictive relative clauses 178

overgeneralisation see generalisation overgeneration 396 paradigms 109, 115, 139, 150, 393 parameters 258, 263, 265, 268269, 272, 283284, 286, 289, 306, 362, 372, 374 Lexical Parametrisation Hypothesis (LPH) 262, 269 macroparameters 258, 269 microparameters 265, 285, 332 Null-Subject Parameter 259, 284, 289 see also Principles and Parameters model parsing 270, 274275, 328, 356, 379, 283, 289290, 356 particles 51 comparative particles 434 particle constructions 41 particle order 41 particle structures 43 response particle 12 see also verb-particle constructions passive 1, 4, 3233, 38, 39 performance 51, 270271, 275, 283, 325, 333, 402 periphery 11, 259 person-number systems 10 politeness 11, 16 polysemy 34 pragmatics 11, 23 predictability 3, 179, 181, 183184, 186, 189, 192, 197, 201, 207208 Principles and Parameters model 300 probability 197, 202, 205, 207209, 313, 384, 435 processing 2425, 49, 51, 186, 207, 401 Dual-Processing Model 148, 156 statistical processing 384; see also probability tolerant processing 378

Subject index non-subject relative clauses 177, 205 relative pronouns 51, 342, 427, 437 relative-clause types 197 relativiser 177178, 197198, 205, 207209 restrictive relative clauses 178 relevance 153, 157, 165, 171 repair 17 routinisation 199 rules 3, 175, 266 competing rules 7; see also competition dialectical nature of the relationship between rules and exceptions 4 Elsewhere Rule 8, 12, 59 mal-rules 382, 405 movement rule 39 P-Rules 62 relaxation rules 383; see also constraints: constraint relaxation rules of referral 132 Sezer Stress Rule 73, 85 transformational rules 31, 39, 47 salience 188 schemas 156 sound change 142, 144, 156 Sound Laws 67 sporadic change 7 speakers judgements 293, 311, 328, 333, 366, 370, 372, 404 see also acceptability specication 67 Radical Underspecication 82 see also lexicon: lexical (pre-) specication see also Maximum Underspecication (MU) model see also morphology: morphosyntactic specication Sprachbnde see areal relationships Standard Average European 428

447

storage 165 Stress Assignment 32, 5987 Sturtevants Paradox 7 subclass 25, 3134, 37, 4346, 50, 115, 221, 326 see also superclass subjects see syntactic relations subsystem 45, 1214, 17, 20, 22, 24, 89 see also regularity: subregularities see also subclass superclass 3134, 4346, 5051 see also subclass superiority 300, 306, 318, 329330 suppletion 115, 120, 128129, 136, 139, 141142, 150, 153, 171 syncretism 112, 115, 128, 130132, 136, 154 syntactic relations object coreference 296, 313, 318 oblique subjects 216, 252 theme/patient subjects 223 see also case tense nominal tense 47 past tense 33 perfect 425 preterite loss 155 tense forms 21 typology syntactic typology 257 typological consistency 264 typological generalisations 132133, 257 see also regularity: typological regularities that-trace effect 304306, 318 Typed Feature Logic 384, 386, 394 underdetermination 3, 20 see also grammatical indeterminism underlying representation 62, 84 underspecication see specication

448

Subject index verb classes athematic verbs 139140, 146, 170 modal verbs 18 psych-verbs 13 rckumlaut verbs 143 strong verbs 15, 17 verb-particle constructions 39, 4448 Verners Law 6, 142, 150 vowel harmony 32, 5987 weight-sensitive stress system 427 well-formedness 311, 313314, 328, 339

unication 386387, 392, 394 Universal Grammar (UG) 10, 132133, 137, 258, 265, 269270, 272, 274, 286, 328 universals Greenberg-type universals 9 statistical universals 9 see also probability see also Universal Grammar (UG) V2-position 2223 variation 331, 343344, 357, 361364, 367, 369

Language index

Abkhaz 10 Afrikaans 17 Algonquian languages 4445 Ambrym 10 Amharic 34, 264265, 271 Arabic 5, 264 Bemba 200201, 209 Berber 268 Blackfoot 45 Burmese 260 Catalan 99 Caucasian, Northwest 412 Chamorro 268 Chinese, Mandarin 46, 258261, 284 Danish 144, 222 Dravidian 62 Dutch 5, 142144, 150151, 154155, 164, 246, 326, 418, 424, 426 Middle 142143 Duwamish, 35 English 5, 12, 15, 1718, 2021, 31, 3334, 39, 41, 4445, 4749, 6063, 97, 130131, 142, 144, 149, 150 152, 154155, 175195, 201, 213 214, 217, 244, 246, 256, 264, 266, 268, 271, 285, 300, 302304, 326, 329330, 336, 356, 364, 366, 393, 403, 422426, 428 Early Modern 245 Middle (ME) 16, 149 Old (OE) 7, 149, 245 Esperanto 14 Estonian 245 Faroese 25, 144, 213241, 243244, 251252

Finnish 79, 128, 245, 247, 271 French 5, 16, 163, 165, 171172, 261, 264, 266, 285, 390, 422, 424, 426, 427428 Prince Edward Island 261 Frisian 147148, 150, 155, 422, 424 West 145, 148 North 144145, 170 Old 147148 German 58, 1215, 1718, 2021, 25, 33, 51, 100101, 103104, 106, 142, 149150, 153, 156, 163, 165, 171, 217, 243244, 246, 251, 252, 264, 294, 325, 327, 329, 330331, 336, 364, 366, 369370, 372373, 422, 424, 427 Bavarian 11 Early New High (ENHG) 149, 156, 170 Low 145, 170 Middle 245 Middle High (MHG) 16, 146, 149, 170 Middle Low 150 New High (NHG) 142, 146, 150 152, 154, 155 Old High (OHG) 16, 20, 142, 146, 149, 152, 164, 170171, 245246 Southern 261 Spoken 19, 144145, 151 Swiss German 144145, 151, 170, 365 Germanic (GMC) 15, 17, 24, 142, 169, 326 Continental West 428 Germanic languages 139162, 217 Gothic 6

450

Language index Potawatami 47 Puget Sound 35 Quechua 4041, 44 Quileute 35 Romance 163, 169, 171, 267, 286 dialects 268 Rotokas 35 Rumantsch 129 Russian 16, 118, 245246 Sango 274 Sanskrit 62 Saramaccan 261 Serbian 120121 Serbo-Croatian 245 Slovene 112, 115, 132, 393 Snoqualmie 35 Somali 47 Spanish 171, 284 Sranan 261 Swedish, 144, 146, 149150, 152, 217, 245 Old 245 Tagalog 89 Thai 260 Tsez 4445 Turkish 25, 32, 59-94, 245 Anatolian dialects 65 Istanbul dialect 65, 74 Tuvan 79 Wari 412 Welsh 264, 268 Yiddish 6061, 63

Greek 60, 284, 342, 373, 392, 404 Ancient 6, 392, 404 Modern 342, 373 Hixkaryana 47 Hungarian 36, 4041, 9799 Icelandic 25, 32, 4849, 89, 130, 144 145, 170, 213241, 243244, 246 247, 251252, 261 Indo-European (IE) 267 Proto- 7, 15, 142, 150, 163 Indonesian 260261 Irish 34, 264 Italian 60, 163165, 267, 284 Palermo dialect 98 Japanese 331 Karen 274 Kartvelian 412 Kirghiz 32 Latin 73, 113114, 163, 165, 171 Latvian 413414 Luxembourgish 144145, 150, 154, 170171 Malay, Singapore 46 Malayalam 62 Maltese 113, 128 Mura 35 Niger-Congo languages 35 Nordic, Proto- 247 Norwegian 144, 150, 422 East Norwegian dialect 116 Ojibwe 10 Polish 245 Portuguese, Brazilian 259, 284

Zulu 32

You might also like