On Parametric ( and Non-Parametric ) Variation

This article raises the issue of the correct characterization of ‘Parametric Variation’ in syntax and phonology. After specifying their theoretical commitments, the authors outline the relevant parts of the Principles–and– Parameters framework, and draw a three-way distinction among Universal Principles, Parameters, and Accidents. The core of the contribution then consists of an attempt to provide identity criteria for parametric, as opposed to non-parametric, variation. Parametric choices must be antecedently known, and it is suggested that they must also satisfy seven individually necessary and jointly sufficient criteria. These are that they be cognitively represented, systematic, dependent on the input, deterministic, discrete, mutually exclusive, and irreversible.

A persistent preoccupation of generative linguistics has been the tension, bordering on paradox, between two questions: "Why are there so many languages?"and "Why are they all so similar?".The tension is sufficiently great that many writers, dazzled by the obviousness of the first, are tempted to deny the truth of the second: Evans & Levinson's (2009) 'The myth of language universals' is a recent example.A resolution of the tension can be found in the framework of 'Principles-and-Parameters' (Chomsky 1981a(Chomsky , 1981b; for overviews and history, see Roberts 1997, Baker 2001, and especially Biberauer 2008a), but making this claim plausible to the skeptics necessitates elaboration and refinement of the theory, in particular of the nature and scope of 'parametric' variation.It is this issue we try to address in the current contribution, suggesting identity criteria for parametric as opposed to non-parametric differences among languages.The situation is reminiscent of the debate about human types: The apparent obvious diversity of different 'races' disguises profound underlying unity, and specifying the nature of the variation is fraught with difficulty.In what follows we spell out our theoretical presuppositions, we present the elements of the Principles-and-Parameters framework and their motivation, and we suggest and defend our identity criteria.
We take seriously the central claim of the Minimalist Program (Chomsky 1995) that in elucidating the nature of the human faculty of language, linguistic theory should restrict itself to what is conceptually necessary or descriptively inevitable.Accordingly, we adopt Hauser et al.'s (2002) contrast between the Faculty of Language in the broad sense (FLB) and the Faculty of Language in the narrow sense (FLN), seeking to identify defining properties of FLN, even if this latter may perhaps consist simply of possible re-combinations of elements of FLB.We have argued elsewhere against Hauser et al.'s claim that recursion is the unique property of FLN (see Smith & Law 2007: 2) on the grounds that recursion must be characteristic of the Language of Thought in Fodor's (1975Fodor's ( , 2008) ) sense.However, we are happy to go along with Chomsky's (2009a: 29) suggestion that Natural Language is the same as the Language of Thought except for 'externalization' (cf.Smith 1983).That is, language links the Conceptual-Intentional and Sensori-Motor interfaces, where the former equates to the language of thought and the latter, as used for communication (perception and production), characterizes natural languages, which emerged evolutionarily as the result of being externalized.This external form is anyway the domain of parametric variation, the existence of which may moreover be a defining property of the human language faculty.
The Principles-and-Parameters framework provides simultaneously a solution to Plato's problem and the problem of characterizing typological variety.UG (short for 'Universal Grammar', the innate endowment that the child brings to the task of language acquisition) specifies that human languages consist of a Lexicon and a 'computational system' (referred to as C HL , the computation for human language).The lexicon consists of a set of lexical entries, each of which is a triple of phonological, morpho-syntactic, and semantic features, and with a link to associated encyclopedic information.UG also provides a set of exceptionless principles, such as structure dependence (Chomsky 1971), (strict) cyclicity (Freidin 1999, Chomsky 2002), the Extended Projection Principle (Chomsky 1995), etc., which constrain the operations of the computations and act as a constraint on language acquisition: Children learning their first language have their 'hypothesis space' tightly constrained with the result that they never make mistakes of a particular kind.However, "[…] principles do not determine the answers to all questions about language, but leave some questions as open parameters" (Berwick & Chomsky, forthcoming: 8 [in the 2008 manuscript]).
That is, in addition to a set of universal principles, UG provides a set of parameters which jointly define the limits of language variation.This is typically conceptualized as the setting of a number of 'switches' -on or off -for particular linguistic properties.Examples of such parameters in syntax are the head-direction parameter (whether heads, such as Verb, Noun, and Preposition, precede or follow their complement), the null-subject (or 'pro-drop') parameter (whether finite clauses can have empty pronominal subjects), and the null-determiner parameter (whether noun phrases can have empty determiners).Typical examples in phonology are provided by the stress differences characteristic of English and French, and the possibility of complex consonant clusters found in English but not in Japanese.English stress is 'quantity-sensitive', whereas French stress is 'quantity-insensitive', with the result that words with the same number of syllables may have different stress in English but must have uniform stress in French; in English, words may begin with clusters of consonants in a way which is impossible in Japanese, with the result that English loans into Japanese appear with the clusters separated by epenthetic vowels.
The theory thus unifies two different domains: typology and acquisition.Variation among the world's languages (more accurately the set of internalized Ilanguages; Chomsky 1986) is defined in terms of parametric differences and, in first language acquisition, the child's task is reduced to setting the values of such parameters on the basis of the stimuli it is exposed to -utterances in the ambient language.Given the strikingly uniform success of first language acquisition, it follows that "the set of possibilities [must] be narrow in range and easily attained by the first language learner" (Smith 2004: 83).By hypothesis, the principles do not vary from child to child or from language to language so, as Chomsky (2006: 183) puts it, "acquisition is a matter of parameter setting, and is therefore divorced entirely from […] the principles of UG".
The theory is at once 'internalist' (i.e. it is a theory of states of the mind/ brain), pertaining to knowledge which is largely unconscious, and universalist.An immediate implication of this position is that the range of parametric choices is known in advance and, as a corollary, it claims that acquisition is largely a process of 'selection' rather than instruction (see Piattelli-Palmarini 1989) and that such acquisition is likely to take place in a critical period or periods.
This brief characterization raises a number of problems.The first of these is the issue of deciding which phenomena are to be accounted for by reference to principles and which by reference to parameters, as exemplified in the history of subjacency which began as a universal principle but was later parameterized.More importantly, does this binary choice exhaust the ontology?We argue that parameters account for some of the surface variability -but only some: Much variation is accidental.Accordingly, we need a three-way distinction: Universal Principles, Parameters, and Accidents.Note that even universal principles may have their status obscured by recalcitrant data.For instance, the universality of Merge is not in question even though some items -interjections -do not participate.Similarly, a clear and classic instance of a parameter is 'head direction', even though some examples are problematic like English notwithstanding, which can occur before or after its complement, or the occurrence in German of synonymous (and etymologically related) pairs of preposition and postposition (e.g., längs des Flusses/ den Fluss entlang 'along the river').Finally, there are 'accidents', exemplified by gaps in morphological paradigms, such as the lack of a past tense form for beware; by the (claimed) absence of recursion in Pirahã (Everett 2005), or by the absence of initial consonants in Arrernte syllable structure (Breen & Pensalfini 1999).
Assignment to each of these categories may of course be problematic, with the uncertainty having potentially significant implications for broader considerations such as innateness.Thus Chomsky (2009b: 385), in discussing the optimization of the language faculty in terms of third-factor considerations, writes: "If you take a parameter and you genetically fix the value, it becomes a principle […].So adding parameters is reducing genetic information".This stance is similar to Janet Fodor's (2009) characterization of principles and parameters as a Minimax solution: 'minimize genetic information' and 'maximize/optimize the amount of learning'.
Reverting to the remarks above about Natural Language and the Language of Thought and the assumption that the syntax of both is the same (but see Smith 2004: 43f.for problems with this position), it is clear that "parameterization and diversity too would be mostly -maybe entirely -restricted to externalization" (Chomsky, in press: 14 [2008 manuscript]; to "language shorn of the properties of the sound system" as Smith 2004: 43 puts it), hence mainly morphology and phonology.One reason for the multiplicity of languages is then that "the problem of externalization can be solved in many different and independent ways" (Chomsky, in press: 15 [2008 manuscript]), where, moreover, these may all be 'optimal' in different ways.The interesting implication is that there is no parametric variation at the Conceptual-Intentional interface (but see below) and perhaps not even any parametric variation in the syntax narrowly construed (C HL ).
Despite these observations, we propose for illustrative purposes to pursue with the majority of linguists the possibility that parametric variation (hereafter 'PV') characterizes both syntax and phonology.Further, if there is to be any content to the 'parametric' part of PV, there is need to work out necessary and sufficient conditions for something to count as parametric.That is, we are in explicit opposition to those such as Kayne (2005: 6 and elsewhere), Manzini & Savoia (2007), and Rita Manzini (p.c.), for whom all (syntactic) variation is parametric.We reject this stance because of the need to constrain possible parameters.In the absence of such constraints "the term 'parameter' would end up being nothing but jargon for 'language-particular rule'" (Newmeyer 2005: 53) or, as Moro (2008: 107) puts it: "If there were no restrictive generalization on the format of parameters, the theory would be too weak".
Before suggesting such restrictions, it is important to note that the nature of the identity criteria, even the possibility of coming up with any, is dependent on the version of Principles-and-Parameters theory that one adopts.There are several possibilities available in the literature.First, as seen in Rizzi's (2009: 95-96) discussion, there is a conceptual contrast between theories which indulge in overspecification (where UG contains specific statements for certain choices, which must be fixed by experience) and those which indulge in underspecification (where UG has nothing to say -there are gaps, to be filled by experience; cf.: "UG limits the space of possible hypothes[e]s, but does nothing more" (Nevins 2004: 121)).
Second, this distinction cross-cuts that between macro-parametric and micro-parametric variation (for discussion see Baker 2008).'Macro'-PV is typically exemplified by the head-direction (head-first/head-last) parameter (Chomsky 1981a) or Baker's (1996) polysynthesis parameter which determines the overall morphological structure of the language.Each of these parameters has a wide variety of effects, whereas 'micro'-PV of the sort exemplified by the choice of auxiliary to accompany unaccusative verbs (Perlmutter 1978, Burzio 1986) or case realignment in Albanian causatives (Manzini & Savoia 2007) is characteristically more restricted and has correspondingly fewer repercussions.An emerging consensus seems to be that the 'macro/micro' contrast is not important: "The extent-of-variation question is not well defined or theoretically very interesting" (Baker 2008: 371).We agree, though we wish to argue that the parametric/nonparametric distinction is important both in syntax and in the phonological domain where there is no comparable macro-micro contrast.There is, third, the related issue of whether parameters pertain to principles, as in Chomsky's original proposal (Chomsky 1981b) or the later, widely accepted, 'Borer-Chomsky Conjecture' (cf.Biberauer 2008a) that all (syntactic) parameters refer to features of functional heads in the lexicon, so that the number of parameters corresponds to the size of the functional lexicon.While we are sympathetic to the restriction implied by the conjecture its apparent irrelevance to phonology makes it less central to our concerns.
At a lower level of abstraction we come, fourth, to the domain or locus of parametric variation.Biberauer (2008a: 32) suggests that the locus of parameters is "the Lexicon and one or more of the Interfaces".We are anxious that our identity criteria should pertain to phonology as well as syntax and if, despite the remarks about externalization above, it proves that there are relevant examples, to semantic choices at the C-I interface (cf.Chierchia 1998), so we are happy to follow this suggestion.At a finer level of detail, Rizzi (2009: 213ff.)observes that syntactic parameters, located within the lexicon, may pertain to any of the three basic computational processes of the syntax: Merge (e.g., head direction), Move (e.g., V to T), and Spell-Out (e.g., Null-subject).Again, there is no obvious phonological counterpart to this taxonomy.
There are many other considerations which are not directly relevant to our concerns or about which we have nothing to contribute.For instance, Nevins (2004: 123) argues on the basis of 'parametric ambiguity' 1 that "variation is the result of maintaining multiple parameter settings simultaneously" (cf.Yang 2002).We are suspicious of this position as it looks like a conceptually undesirable version of 'multiple grammars' (for discussion, see Smith, in press).
We turn now to the main concern of the article: Suggesting, illustrating, and defending a number of criteria which variation has to meet to count as parametric rather than accidental.
The theory of PV hypothesizes that the range of choices is 'antecedently known', and this basic property correlates with a number of others which distinguish PV from non-parametric variation, and allow us to provide identity criteria for it.Being antecedently known may not be as straightforward as we have previously (Smith & Law 2007, in press) assumed.There is both a terminological and a substantive issue.Chomsky (2009b: 395) observes that in many languages the expression used for 'knowing a language' does not involve the word 'know', but rather the equivalent of 'come', 'hear', or 'have'.This has probably underlain some of the philosophical dispute about whether knowledge of language, in the sense of competence, constitutes real knowledge or not, but this terminological concern is of minor importance in the present context.The substantive issue is whether 'antecedently known' entails cognitively 'represented' or could refer simply to 'architectural' (third factor) constraints on the hypothesis space.The strongest position is that all options are laid out -so 'represented'prior to experience and whatever abilities the child brings to the task of first language acquisition are deployed to select among them.The weaker, architectural, position may be preferable if it allows properties of the language faculty to be derived from more general considerations.1 This refers to the situation where several analyses or structures could underlie the data of interest.
Whichever position is correct, we take our first criterion to be that variants licensed by parametric choice must be cognitively represented.To make clear what motivates this condition, consider by contrast acclimatization, specifically sweating.We have a critical period for setting our sweating switch: Experiencing hot and humid weather in the first three years of life leads to a different setting from exposure to different conditions, and these settings cannot be significantly altered thereafter (Gluckman & Hanson 2005: 7).Despite a certain superficial similarity, this is not PV because the different states are not (mentally) represented and have no cognitive effects.Further, it is relevant to note that where there is evidence that some linguistic fact is not represented there is also evidence that this is not a domain of PV.For instance, Smith (2003, in press) claims that the learning child does not represent its own mispronounced output (e.g., saying [bɔkəl] for bottle), but equally such mispronunciation does not constitute the locus of PV.
This leads to our second criterion: systematicity.This is implicit in Moro's (2008: 106) remark that the relevant domain is one where variation is "minimal and systematic"; or equivalently, to what Biberauer (2008a: 2) describes as 'nonrandom' variation.A simple example is provided by irregular morphology of the type exemplified by the impossibility of *amn't in (most varieties of) English, or the kind of defective paradigm seen in Latin vis-vim-vi.We do not consider this to be PV because it is by definition not systematic and hence we could not plausibly acquire knowledge of it by any process of triggering in the way which is plausible for systematic contrasts such as the possibility of null determiners or the absence of codas.Although systematicity and 'potentially triggered' may be extensionally the same the two notions are conceptually distinct so need to be kept separate, but we link them under a single criterion.
Our third criterion is dependence on the input; that is, the variant chosen must correspond to a possible state of the adult language, and hence can be illustrated most clearly from first language acquisition.The head-direction parameter clearly reflects properties of the ambient language in a way that is not characteristic of all variation.An example of systematic but input-independent and non-parametric variation is provided by the individual differences in consonant harmony in phonological development (cf. Smith 1973: 163), or the variation in the choice of initial or final negation in syntactic development (cf. Smith 2005: 29).For instance, two children in essentially the same environment may produce the adult duck as [gʌk] and [dʌt] respectively.These may both be manifestations of consonant harmony, but they do not count as PV because the particular variants chosen appear to be independent of the input (and consonant harmony is anyway essentially alien to adult phonology).A comparable syntactic example is provided by the development of negation.All children typically go through a stage in which the negator is peripheral, either initial or final.Individual children then differ such that one child learning English may say 'no like cabbage' and another 'like cabbage no'.We take such variation to be non-parametric as no language allows only such peripheral negation.This universal exclusion enables us to differentiate this non-parametric variation from UG-licensed errors of the sort described by Crain and his colleagues (cf.Crain & Pietroski 2002).A child may produce a form which never occurs in the input (e.g., 'What do you think what pigs eat?') because the structure is licensed by UG and so occurs as a parametric choice in other languages.2Despite this potential complication, the case of consonant harmony in phonology and negation in syntax should make the conceptual contrast between parametric and non-parametric variation clear.
Our fourth criterion is that PV must be deterministic:3 That is, the input to the child must be rich enough and explicit enough to guarantee that a parameter such as pro-drop or the presence of complex onsets in phonology can be set.If the input does not meet this requirement we are dealing with non-parametric variation.A syntactic example is provided by sequence of tense phenomena where individual variation verges on the random (see Smith & Cormack 2002).A phonological example is provided by Yip (2003: 804) who argues that some speakers treat a post-consonantal glide as a secondary articulation of the consonant, others as a segment in its own right: "[T]he rightful home of /y/ [is] underdetermined by the usual data, leaving room for variation".Her conclusion is that "speakers opt for different structures in the absence of conclusive evidence for either".Again that indicates for us that the variation is non-parametric.Deterministicness suggests that the process of parameter-setting must be 'reflexive' (cf.Chomsky 2009b: 384) but, as with systematicity and triggering, the notions are conceptually distinct so we keep them apart, though again not as separate criteria.
Our fifth criterion is suggested by an observation of Dupoux & Jacob (2007) to the effect that PV in language is 'discrete' (usually binary), whereas in other domains -moral judgment, for instance -one typically finds continuous scales.A linguistic example of the contrast is provided by vowel height.Whether a language displays 2, 3, or 4 degrees of vowel height in its phonological system is a matter of parametric choice(s).The degree to which the particular articulation of some vowel is high -either randomly or as a matter of individual difference (maybe my articulations of [i] are systematically higher than yours) is continuous and could not be parametric.
Our sixth criterion is 'exclusivity'.PV gives rise to mutually exclusive possibilities: Languages are either [+pro-drop] or [-pro-drop] -the choice leaves no room for compromise, no language is both.By contrast, the choice in a [+prodrop] language of using or not using a subject pronoun is non-parametric.The contrast is again most obvious with morality where moral diversity involves "different preference orderings among competing members of a finite set of universal moral values" (Dupoux & Jacob 2007: 377).An extension of mutual exclusivity would be that the choices are exhaustive in that they exhaust the relevant hypothesis space interdependently.That is, the parameters are not independent (as claimed explicitly in e.g., Manzini & Wexler 1987) but are hierarchically nested: The choice of a parameter [±X], gives rise to a range of further choices within each of [+X] and [-X], and apparent exceptions to exclusivity are due to choices being either subordinate or parallel to a given parameter.We do not make this (non-)independence criterial as we know of no cogent evidence either for or against.Similarly, although the phrasing used here in terms of [±X] suggests binarity, which is often presupposed in terms of [strong/weak] in the literature (see e.g., Radford et al. 2009: 314), we see no reason to make this essential.
A possible further seventh criterion is 'irreversibility': That is, the putative impossibility of the re-setting of parameters in second language acquisition (see e.g., Tsimpli & Smith 1991).The implicit contrast is with the reversible variations found in lexical learning.For instance, despite half a century's exposure to examples like "I didn't see him yet", one of us (NS) still judges them ungrammatical (the only licit possibility is "I haven't seen him yet").This is in contrast to examples like the second sentence of this article, written without malice aforethought, which begins: "The tension is sufficiently great that many writers […]".This construction was originally ungrammatical for NS (the only licit possibility being "The tension is sufficiently great for many writers to have […]") but has now changed its status.The former contrast is arguably a matter of PV, the latter not.
We summarize and illustrate the foregoing criteria in the following table: (1) We have in general not committed ourselves to where the parametric choices reside.It is not clear whether there is a single answer, but we assume in the absence of definitive evidence that all such choices are lexical.
(2) It would be helpful to determine which of these criteria might derive from other properties, bearing in mind that nothing so derivable would be part of FLN.In particular, it is desirable to establish which criteria (e.g., deterministicness and mutual exclusivity) might fall out from general properties of complex cognitive systems ('third-factor' considerations, where these include general learning strategies and principles of computational efficiency', as in Chomsky, in press: 15 [2008 manuscript]). 4An example in principle is provided by the head-direction parameter.It seems clear that the choice between head-first and head-last is a function of the need for linearization imposed by the temporal structure of speech.Given that 'merge' combines A and B it is physically necessary either that A precede B or that B precede A. In such a situation, as Boeckx (2009: 198) observes, appeal to a parameter may be supererogatory.Two points are, however, relevant.First, the physical for linearization may be the ultimate cause of the parameter but the skew distribution of the world's languages and the consistency of head direction within a language suggest that the parameter does exist: The physical constraint has led to grammaticalization of the parameter.Second, although this parameter has a 'third factor' motivation it is only one example and not a criterion for parameterhood whose status is affected.For plausible instances of a criterion being rendered unnecessary we probably need to look elsewhere.We leave the issue for future research.
(3) In earlier work (Smith 2007;Smith & Law 2007, in press) we have investigated whether the criteria for parametric status allow a generalization to other domains, either human or animal, suggesting that our knowledge of music and our moral judgment might be such examples in the former domain, and birdsong in the latter.We are currently less sanguine about the possibility.
The preceding discussion implies that many of the parameters postulated in the literature are, by our criteria, accidents rather than reflecting genuine, but not exceptionless, generalizations.We have already alluded to some of the work of Kayne and Manzini, and Evans & Levinson (2009: 432) explicitly assume that parameters account for all differences: The 'full set of possible combinations'.Our attempt to delineate criteria for PV should not in any way be taken to impugn the value of the work of these authors, but we think it is time for the theory to be put on a more explicit footing.We await corroboration or refutation of our putative criteria with anticipation and apprehension in equal measure.
choice of a pronoun (or not) in a prodrop language 7. Choices must be irreversibleParametric temporal adverbial modificationNon-parametric sub-categorization possibilitiesA number -a huge number -of issues remain open.We list a few below:

4
Though we are skeptical of the claim that "[t]o externalize the internally generated expression 'what John is eating what', it would be necessary to pronounce 'what' twice, and that turns out to place a very considerable burden on computation"(Berwick & Chomsky,  forthcoming: 11 [in the 2008 manuscript]).The burden seems slight, especially given that in first language acquisition children regularly repeat material 'unnecessarily' (see the examples fromCrain & Pietroski 2002 above).