Evolution , Perfection , and Theories of Language

In this article it is argued that evolutionary plausibility must be made an important constraining factor when building theories of language. Recent suggestions that presume that language is necessarily a perfect or optimal system are at odds with this position, evolutionary theory showing us that evolution is a meliorizing agent often producing imperfect solutions. Perfection of the linguistic system is something that must be demonstrated, rather than presumed. Empirically, examples of imperfection are found not only in nature and in human cognition, but also in language — in the form of ambiguity, redundancy, irregularity, movement, locality conditions, and extra-grammatical idioms. Here it is argued that language is neither perfect nor optimal, and shown how theories of language which place these properties at their core run into both conceptual and empirical problems.


Introduction
Linguistic theory is inevitably underdetermined by data.Whether one is trying to characterize the distribution of wh-questions across languages or account for the relation between active sentences and passive sentences, there are often many distinct accounts, and linguistic data alone is rarely absolutely decisive.For this reason, theorists often appeal to external considerations, such as learnability criteria (Gold 1967, Wexler & Culicover 1980), psycholinguistic data (Schönefeld 2001), and facts about the nature and time course of language acquisition (e.g., the accounts presented in Ritchie & Bhatia 1998).There is also a move afoot to constrain linguistic theory by appeal to considerations of neurological plausibility (Hickok & Poeppel 2004, Marcus, in press).And there is a long-standing history of constraining linguistic theory by appealing to considerations of cross-linguistic variation (Greenberg 1963, Chomsky 1981a, Baker 2002).Here, we consider a different sort of potential biological constraint on the nature of linguistic theory: Evolvability.
Constructing a theory which says that language is evolvable involves looking at what we know from evolutionary biology about what typically evolving systems look like, what kinds of properties they have, and then applying this to questions about the plausible nature of language.Here, our focus will be on the plausibility of recent suggestions (e.g., Chomsky 1998, 2002a, 2002b, Roberts 2000, Lasnik 2002, Piattelli-Palmarini & Uriagereka 2004, Boeckx 2006) that language may be an 'optimal' or near-optimal solution to mapping between sound and meaning -a premise that has significant impact on recent developments in linguistic theory.
In what follows, we will argue that the presumption that language 1 is optimal or near-optimal is biologically implausible, and at odds with several streams of empirical data.We begin with some background in evolutionary theory.

Evolution, Optimality, and Imperfection
Our analysis begins with a simple observation: Although evolution sometimes yields spectacular results, it also sometimes produces remarkably inefficient or inelegant systems.Whereas the Darwinian phrase (actually due to Huxley rather than Darwin) of "survival of the fittest" sometimes is misunderstood as implying that perfection or optimality is the inevitable product of evolution; in reality, evolution is a blind process, with absolutely no guarantee of perfection.
To appreciate why this is the case, it helps to think of natural selection in terms of a common metaphor: as a process of hill-climbing.A fitness landscape symbolizes the space of possible phenotypes that could emerge in the organism.Peaks in the landscape stand for phenotypes with higher fitness, troughs represent phenotypes with lower fitness.Evolution is then understood as the process of traversing the landscape.Our focus in the current article is on a limitation in that hill-climbing process, and on how that limitation reflects back upon a prominent strand of linguistic theorizing.The limitation is this: Because evolution is a blind process (Dawkins 1986), it is vulnerable to what engineers call the problem of local maxima.A local maximum is a peak that is higher than any of its immediate neighbors, but still lower (possibly considerably lower) than the highest point in the landscape.
In the popular "fitness landscape" terminology of Sewall Wright (1932), the perfect solution and the optimal solution to a given problem posed by the 1 The term 'language' itself is of course intrinsically ambiguous; the term can, among other things, refer to the expressions in a particular language, to the underlying cognitive system itself, to its biological and neurological manifestation, or to a formal model of the system.Here, our discussion pertains primarily to the latter (although the former two will be mentioned from time to time); that is, what is often referred to as the human language faculty, which is formally modeled, as a grammar, in different ways by different linguistic theories.
organism's environment can (and often do) differ in their location.While perfection holds only of the highest peak, lower peaks in the landscape may in some circumstances be optimal.But, in the words of Simon (1984), natural selection does not even necessarily seek optimality.Rather, evolution essentially serves as it were a satisficing agent; rather than inevitably converging on the best solution in some particular circumstance, it may converge on some other reasonable if less than optimal solution to the problem at hand.
Perhaps the most accurate phraseology is that of Dawkins (1982) who uses the term 'meliorizing', which captures the fact that evolution is constantly testing for improvements in the system, but not explicitly guided to any particular target and by no means guaranteed to converge on perfection or even optimality.Perfection is possible, but not something that can be presumed.

Imperfections in Nature
In the real world, evolution sometimes achieves perfection or near-optimality, as in the efficiency of locomotion (Bejan & Marden 2006), but has in many instances fallen short of any reasonable ideal.The mammalian recurrent laryngeal nerve, for example, is remarkably inelegant and inefficient, following a needlessly circuitous route from brain to larynx posterior to the aorta.While in humans, this may not add up to a significant amount of extra nerve material, in giraffes it is estimated to be almost twenty feet (Smith 2001).The problem here is one of what Marcus (2008) calls evolutionary inertia -the tendency of evolution to build new systems through small modifications of older systems, even when a fresh redesign might have worked better.
The human spine is similarly badly designed (Krogman 1951, Marcus 2008).Its job is to support the load of an upright bipedal animal, yet a much better solution to this problem would be to distribute our weight across a number of columns, rather than let a single column carry it all.As a result of the spine's less than perfect design, back pain is common in our species.Here again, evolutionary inertia is the culprit -the human spine inherits its architecture, with minor modification, from our quadrupedal ancestors, even though a single column works better in bearing horizontal loads than it does in bearing vertical loads.Although a sensible engineer could have anticipated the ensuing problems, the blind process of evolution could not.
Another illustration of the friction that derives from evolutionary inertia is the human appendix, an example of what is known as a vestige.This is a different type of imperfection, an example of a structure that has no current place in the organism at all.Its existence does not seem to increase our fitness in any way, and its poor structure can lead to blockages which cause sometimes fatal infection (Theobald 2003).The appendix was an earlier adaptation for digestion of plants in our ancestors, now not required by non-herbivorous humans.Although we might have been better off without an appendix and the ensuing risk of infection, evolution lacks the capacity to anticipate; because of the architecture of evolutionary inertia we are stuck with the risks despite a lack of corresponding benefits.(Yet another example comes from human wisdom teeth, which are imperfect due to the problem of fit that our larger third molars pose for our modern jaws.Our ancestors had larger jaws that comfortably accommodated the larger wisdom teeth, but cumulative gradual adaptive evolution has decreased our jaw size over time, resulting in pain on eruption, and impacting of the wisdom teeth.)

Imperfections in Human Cognition
In human cognition too, imperfection arising from gradual adaptive evolutionary processes seems common.Human memory, for instance, is far from perfect (Marcus 2008).It can be easily distorted by environmental factors, and we often blur together memories of similar events, remembering the general but not the specific.For example, we may remember some fact we read, but not where we read it.Furthermore, our memories can be tested, and often distorted in stressful circumstances, such as under the questioning in a courtroom.Marcus argues that location-addressable memory, such as computers have, would be much more useful to modern humans, but we are the result of gradual cumulative evolution from ancestors who dealt in the here-and-now, where context-dependent memory was a good enough tool.Once more, evolution did not have the foresight to bestow on us the kind of memory that would be a better solution to problems faced by modern humans.
Human belief too, shows evidence of imperfect design (Marcus 2008).Our beliefs are also subject to biasing or warping.Although we may believe that we reason objectively, this is often not the case.Context, emotion, and unconscious biases, such as what we are familiar with, or the confirmation bias, can all warp our beliefs.Again, this imperfection is the result of cumulative evolution from an ancestor that needed to act, but not often to think or reason, evolution once again lacking the foresight required to know that reasoning objectively and logically would be more useful to us.

Is Language Different?
If all this is taken for granted in biology, it is not taken for granted in linguistics.
To the contrary, in recent years it has become popular to assume that language may well be perfect, or nearly so.Chomsky (2002a: 93) has argued that "language design may really be optimal in some respects, approaching a 'perfect solution' to minimal design specifications"; similarly, Roberts (2000), for example, has argued that language may be a computationally perfect system for creating mappings from signal to meaning.Could language be different, more perfect than other aspects of biology?Since the balance of perfection and imperfection could vary between domains, we see this as a fundamentally empirical question.Since imperfection exists, it seems unreasonable to simply presume linguistic perfection, but near-perfection exists, too, as in the primate retina's exquisite sensitivity to light (Baylor et al. 1979).
That said, a priori it would be surprising if language were better designed than other systems, for the simple reason that language is, in evolutionary terms, an extremely recent innovation.By most recent estimates, language emerged only within the last 100,000 years (Klein & Edgar 2002), and as such there has been relatively little time for debugging.

Imperfections and Inefficiencies in Language: Some Empirical Evidence
At least superficially, instances of imperfection seem plentiful in language, most notably in all manner of speech errors, such as the phonological slip in written a splendid support (instead of written a splendid report), the lexical slip in a fifty pound dog of bag food (instead of a fifty pound bag of dog food) (from Fromkin's Speech Error Database), or the Spoonerism (attributed to Reverend Spooner himself) in You have hissed all the mystery lectures (instead of You have missed all the history lectures).According to the taxonomy of Dell (1995), there are at least 5 distinct types of speech error (exchanges, shifts, anticipations, perseverations and substitutions), which can apply at some 10 different linguistic levels (from sentence through word, morpheme, syllable and phoneme, to feature).Frequencies of occurrence are as high as 1-2 per thousand words. 2  Similarly, people frequently misparse passives with non-canonical relations (e.g., reading man bites dog as if it were dog bites man, Ferreira 2003) and interpreting sentences in ways that are internally consistent.For example, subjects often infer from the garden-path sentence While Anna dressed the baby slept both that the baby slept (consistent with a proper parse) and that Anna dressed the baby (inconsistent with what one would expect to be the final parse, Christianson et al. 2001).Likewise, they are vulnerable to "linguistic illusions", such as the belief that More people have been to Russia than I have is a well-formed sentence, when it is in fact not.
Still, such errors do not necessarily bear on more architectural questions about the nature of grammar, per se; they might be seen as purely a matter of performance.What of competence grammar?Here, too, we will suggest, rumors of linguistic perfection are exaggerated.

Redundancy
Turning to competence, and the core syntactic system, a first type of imperfection comes under the heading of redundancy.We will define redundancy as the ability of more than one structure or (sub-)system to carry out the same function.Redundancy therefore entails duplication or inefficiency in a system.A perfectly designed system would surely eschew what is not just clumsy, but may also be more costly, requiring instead a system that is streamlined and efficient.
Yet language is replete with redundancy, not just in the occasional genuine synonym (couch and sofa) but also in more subtle areas such as case marking.The language faculty makes available two possible manners of marking case on a noun -by imposing strict word order constraints, or with the use of inflectional 2 This measure holds for English, based on an analysis of the London-Lund corpus (Garnham et al. 1981), but there is no reason to think that it differs greatly cross-linguistically (Dell 1995).morphology.Languages like English mostly make use of the former strategy, and languages like Russian typically use the latter.Either would suffice, but from a sheer elegance perspective, it is somewhat surprising that human languages fail to adopt a consistent solution.Meanwhile, languages like German show that both strategies can be used concurrently -in a highly redundant fashion.In (1a), the inflectional morphology on subject and object differs.This contrasts with (1b), where the definite article for feminine nouns does not differ in form from nominative to accusative case: ( While in (1b), only word order can signal case, in (1a) both inflectional morphology and word order signal case.We know here that word order is playing a part in (1a), and it is not simply the case that the morphology does all the signaling, because SVO is the default order in German main clauses, if the opposite order is used, as in (2), intonational differences show this as somehow marked.
(2) Den Mann beisst der Hund.German the.ACC man bites the.NOM dog 'The dog bites the man.' A second instance of redundancy is seen in person and number morphology.It is very often the case that a language will redundantly mark person and/ or number on more than one element in a phrase or sentence.In English, for example, we get cases like (3), where every single word in the sentence is marked in some way for plurality.
(3) Those four people are teachers.
What is remarkable about this is how easily in principle it could avoided: Mathematical and computer languages lack these sorts of redundancies altogether.
Redundancy can of course be adaptive.It benefits humans to have two kidneys, and it benefits birds to have excess flight feathers (King & McLelland 1984).In a similar way, synonyms might be argued to be adaptive due to the advantage they confer when retrieval of a particular lexical item fails.Or, it might be argued that in a noisy channel, redundantly specifying some parts of the code would lead to increased communicative success.Perhaps, then, examples like this should not be thought of as imperfections.However, the redundancies we in fact observe appear too arbitrary and unsystematic to be explained strictly in terms of their benefits towards communicating relative to noise in the communi-cation channel, especially in comparison to the more systematic techniques one finds in digital communication.The parity system, for example, that modems use -making the 8 th bit a 1 ('odd parity') if the number of '1's in the first seven bits is itself odd, otherwise zero -is systematically applied to every byte in a stream; redundancies in language are frequently far less systematic.Plurality is marked in some instances but not others, for example.Patterns of syncretism often keep redundancies themselves from being systematic.Furthermore, the existence in natural languages of redundancies that have no apparent advantage -where artificial languages lack them -undermines the case that language is maximally elegant or economical, and emphasizes the extent to which the details of grammar are often imperfect hotchpotches.
In fact, a case of the very opposite of what is here defined as redundancy gives us a further imperfection in language.If redundancy involves multiple structures carrying out the same function, the doubling or tripling of function that is seen in syncretic forms such as the past and passive participles in English, or nominative and vocative case morphology on certain classes of nouns in Latin (Baerman et al. 2005), leads to imperfection in the form of a lack of clarity.Differing functions being fulfilled by identical structures might be considered optimal or perfect under an interpretation appealing to efficiency or simplicity, yet taken to extremes the system that emerges is far from usable.

Ambiguity
Ambiguity, both lexical and syntactic, provides another type of imperfection present in natural language, but not in formal languages. 3Lexical ambiguity comes in the form of homonymy, for example, bear as an animal versus bear as a verb of carrying, and polysemy (which differs from homonymy in that the meanings of the multiple lexical items that sound alike are connected in some way), for example, mouth of a river, or of a person, wood as a part of a tree, or as an area where many trees are growing.In both cases, the signal on its own is not enough to pick out a meaning.The use of a lexically ambiguous word requires the listener to take the immediate context and his world knowledge into account in order to correctly assign a meaning to the speaker's utterance, thus making the process inherently less efficient than it would be given a non-ambiguous system.
If the syntactic component of the grammar is understood as responsible for creating a mapping between signal and meaning, the most natural manner in which it would do this is to map a single unique signal to a single unique meaning.Syntactic ambiguities can be looked at as violations of this intuitively elegant system of one-to-one mapping. 4In syntactic ambiguities, single signals are mapped to multiple meanings.In (4a), for example, the signal maps equally to two meanings, (i) where I use green binoculars to see the girl, and (ii) where I see the girl who has a pair of green binoculars.The signal in (4b) maps to four meanings, (i) where I stand on the mountain and use green binoculars to see the girl, (ii) where I use green binoculars to see the girl who comes from the mountain, (iii) where I stand on the mountain and see the girl who has a pair of green binoculars, and (iv) where I see the girl who is from the mountain who has a pair of green binoculars.In (5), syntactic ambiguity results from elision, mapping the signal to two possible meanings, (i) where John saw a friend of John's and Bill also saw a friend of John 's, and (ii) where John saw a friend of John's and Bill saw a friend of Bill's.
I saw the girl with green binoculars.b.
I saw the girl with green binoculars from the mountain.
(5) John saw a friend of his and Bill did too.
To be sure, ambiguity can be used by the speaker intentionally to create vagueness.For example, when, in the context of a job reference, I say I can't recommend this person enough, I am being deliberately evasive.In addition, there are cases of syntactic ambiguities too that can be resolved by context.But even when both deliberate and immediately resolvable ambiguities are factored out, a considerable amount of unintended -yet in principle unnecessary -ambiguity remains (e.g., Keysar & Henley 2002).

Irregularity
Languages also deviate from elegance and simplicity in the widespread existence of linguistic irregularity, both lexical (morphological) and syntactic.If language were perfect, then we would expect that it should be fully regular and systematic, as all formal languages are.In natural language, mappings between sound and meaning are created in inconsistent, almost messy ways.
Morphological paradigms are the most obvious case of irregularity in language -the verbal paradigm for the verb to be in many languages, or the formation of plural nouns in English -but this imperfection can also be seen in other areas of the grammar.Syntactic irregularity is found in extra-grammatical idioms (Fillmore et al. 1988) like by and large, all of a sudden, and so far so good, where lexical items are combined in a way completely unpredictable by the grammar of the language in question.For example, there is no rule in the grammar of English that permits the conjunction of a preposition like by with an adjective like large.Nor is there any rule in the grammar of English that says two adjective phrases (so far, so good) can be concatenated.Such irregularity has no counterpart in synthetic languages, and forces the parser to do more work than is strictly necessary (e.g., in determining whether input strings are to be interpreted compositionally or idiomatically).
merely re-locates it, and still requires the listener to make mappings from surface strings to underlying meanings that are not one-to-one and not specified by the grammar.

Needless Complexity
A fourth class of imperfection in language concerns intricacies that the linguistic system could function without.The first example of this type of needless complexity concerns the form and interpretation of sentences like ( 6): (6) Who did John meet?
Here, the object of the meeting event is questioned by placing the lexical item who at the start of the sentence.However, we interpret who at the end of the sentence, as belonging after the verb meet.Linguistic theories which assume a derivational approach to language posit an operation in the grammar which permits elements to be displaced from one position to another.Chomsky (2002b) argues that movement is motivated by the need to distinguish between the deep semantics of argument structure and the surface semantics of discourse structure.So, who is an argument of meet, but the fact that ( 6) is a question is signaled by moving the whword to the beginning.However, movement is not necessary here as this kind of distinction can be made in other ways.Intonation can mark surface semanticsin fact, English topic/comment and focus semantics are much more frequently marked intonationally than by syntactic movement.Another option is to use morphological markers, like Japanese wa.The cases here are specific, but the point can be generalized -if there exist languages that do not require movement to make the distinction between deep and surface semantics, then why does the language faculty need to make this operation available at all?In some eyes, movement may be a more elegant way of signaling this semantic distinction than, say stacks or special features, but a system lacking any of these is more elegant still.
Operations such as movement that are part of language competence are constrained by locality conditions.This means that it is not permissible to apply linguistic operations just anywhere, but that they are constrained to apply within limited structural domains.For example, (7a) is more acceptable than (7b) because the wh-phrase in the initial position of the sentence has moved a relatively short step in (7a) (from after persuade), but in (7b) has moved a step longer than is permitted (from after visit).
Who did John persuade to visit who? b. *Who did John persuade who to visit?These too are absent in formal languages and seem to add needless complexity.Locality conditions force the learner to execute extra computation in that he must figure out for his language where the boundaries that divide what is local from what is not lie.A linguistic system designed with efficiency and economy as its central concern would minimize the work the learner must undertake.The question then is why movement and constraints on locality exist.One possibility is that if our linguistic representations are subject to the limitations of the type of memory we have inherited from our ancestors (Marcus, in press) locality conditions allow us to process complex linguistic expressions in the fragmented pieces we are capable of dealing with.What is an imperfection by the measure of efficiency and economy can be explained by our evolutionary history.Language is imperfect and messy because evolution is imperfect and messy.

4.
If Language Is Not Perfect, Might It Be Optimal?
The examples presented in the previous section strongly suggest that, empirically, the human language faculty fails to meet the strict criterion of perfection, but they still leave open a weaker possibility.Could language be seen as some sort of optimal tradeoff?Although perfection and optimality are often conflated in discussions of this issue in the literature, the two notions are certainly conceptually distinct.Perfection entails an absolute, the best in all possible circumstances, while optimality entails points on a gradient scale, each of which can only be reached by overcoming some limitations, and thus is the best in some specific circumstances only.As Pinker & Jackendoff (2005: 27) note, "nothing is 'perfect' or 'optimal' across the board but only with respect to some desideratum".The immediate question, then, is: "Is there any criterion by which language could be considered to be optimal?"A number of criteria spring immediately to mind: ease of production, ease of comprehension, ease of acquisition, efficient brain storage, efficient communication, efficient information encoding, and minimization of energetic costs.Let us consider each in turn.
First, one could imagine that language might be optimal from the perspective of speakers, minimizing costs for producing expressions.In reality, however, this criterion is not always met.In cases of morphological redundancy, such as that seen in person and number morphology mentioned above, where the speaker has to produce this type of inflection on multiple (in some cases every) lexical items in one sentence, the computational costs for the speaker rise considerably.In question formation, the speaker is forced to calculate locality conditions to ensure a wh-phrase is not uttered in an illegitimate position in the sentence, again a case of increased computational load.
What of optimality from the opposite perspective?If production costs are higher than strictly necessary, is this because comprehension costs are kept low?Could language be optimal from the hearer's perspective, allowing speakers' utterances to be interpreted easily?Here again, the answer seems to be no.Both lexical and syntactic ambiguity lead to increased complexity for the hearer.Additional computation must be undertaken in order to select the correct interpretation of a number of possibilities.Movement also causes difficulties for comprehension, because resolving filler-gap dependencies can be costly, especially when they are not signaled in advance (Gibson 1998, Wagers 2008).
Is it then language acquisition that drives the system to be optimal?Are comprehension and production complicated because the crucial consideration is that the system must be easily learnable?Here again, the answer appears to be no.Ambiguity (both lexical and syntactic), extra-grammatical idioms, and movement, for example, all complicate acquisition, because one-to-one mapping between signal and meaning is upset, because rules of the grammar are not consistently followed, and because filler-gap relations must be mastered.
Could language be optimal because it is stored in the brain in the most efficient manner possible?Again, probably not: Morphological irregularity and idioms belie this criterion too.Storage is inefficient in cases where each entry in a verbal paradigm constitutes a separate entry.With idiomatic expressions, the number of entries in the lexicon grows even further.
A fifth criterion suggests that language might be considered optimal if communication between speaker and hearer were as efficient as possible.Yet again, this criterion can be discounted when we consider ambiguity.Both lexical and syntactic ambiguity can lead to communication breakdown, and the subsequent need for speakers to make corrections or amendments.
Another possible measure of optimality might be in terms of the amount of code that needs to be transmitted between speaker and hearer for a given message that is to be transmitted.It is not obvious how to explicitly measure this, given the complexities of human communication (what counts as the message that it is to be transmitted), but this proposal too seems to run headlong into the sort of imperfections seen above (ambiguity, movement, redundancy, etc.).
It turns out, then, there is -despite numerous proposals -no obvious desideratum by which language can plausibly be said to be optimal.
A true devotee of the notion of language as optimal solution could of course turn to combinations of criteria, for example, could language be a system that yields an optimal balance between ease of comprehension and ease of acquisition?It is possible, but here too we are skeptical.With no a priori commitment to which combinations might be optimized, and no specific account for why some of these criteria but not others might be optimized, the advocate of linguistic optimality risks getting mired in a considerable thicket of post hoc justification.It is easy to see in broad outline how natural selection might have favored a system that rewards each of these properties, but there is little predictive power; there is no reason from these as first principles, for example, to predict that natural languages would (or would not) have locality conditions.Formal languages lack them, they complicate acquisition, and inasmuch as extra entities such as bounded nodes need to be computed, they presumably also complicate comprehension.Imperfections such as morphological redundancy could be seen as optimizing ease of comprehension, but imperfections like syntactic ambiguity and movement operations do the opposite; imperfections like syncretism and lexical ambiguity arguably reduce demands on long-term memory (inasmuch as they demand a smaller number of lexical entries) but considerably complicate comprehension, and deviate from a kind of elegant oneto-one mapping principle that is found in formal languages.Taken together, the five criteria yield a very weak stew; there is no clear prediction from first principles of what a language should be like, only (see Table 1) a set of inconsistent and largely post hoc attributions, with no genuine explanatory force.In reality some quirks of language may have more to do with history than optimal function (Marcus 2008).Our susceptibility to tongue-twisters, for example, may come from the evolutionary inertia (Goldstein et al. 2007, Marcus 2008) inherent in repurposing an ill-suited timing system to the purposes of speech production, rather than any intrinsic virtues.Similarly, locality conditions may exist as an accommodation to an underlying memory substrate that is poorly suited to language (Marcus, in press) rather than as a solution that could be considered optimal from any design-theoretic criteria.

The Minimalist Program and Perfectionism
Talk of language and its apparent imperfections takes on special significance in light of its role in the formulation of one linguistic theory that has been prominent in recent years -the Minimalist Program, as introduced by Chomsky (1995).Here, a presumption of linguistic perfection (or near-perfection) is central, with Chomsky (2004: 385) suggesting that language may come close "to what some super-engineer would construct, given the conditions that the language faculty must satisfy".Roberts (2000: 851) has gone so far as to suggest the Minimalist Program's assumption that language is a computationally perfect system for creating mappings between signal and meaning "arguably represent[s] a potential paradigm shift" in Generative Grammar.

Optimality versus Perfection
The first issue is that the difference between optimality and perfection is never clarified in the minimalist literature.At the end of the 1990s, Chomsky (1998: 119) claims that "language is surprisingly 'perfect'".Yet only a few years later, he states that "[t]he substantive thesis is that language design may really be optimal in some respects, approaching a 'perfect solution' to minimal design specifications" (Chomsky 2002a(Chomsky : 1993)), and then, just a page later in the same publication, he says that "[t]he strongest minimalist thesis would be this: […] Language is an optimal solution to legibility conditions".Nowhere are perfection and optimality teased apart in this literature, yet as was hinted at in section 2, these terms should be applied in significantly different cases.

Optimal for What?
Inasmuch as the Minimalist Program is tied to the notion of optimality, it is immediately vulnerable to all the concerns outlined in section 3 above, to wit, unless there is some clear, a priori criterion for optimality, claims of optimality have little force.As Lappin et al. (2000) and Wasow (2002) have noted, Chomsky himself is not particularly clear about his criteria.One could imagine that minimalism might seek optimality in terms of a linguistic architecture that minimized energetic costs, and reduced computational load, but advocates of minimalism have never been particularly clear about the criteria.
As Lappin et al. (2000) note, if language were optimal in terms of computational simplicity, it would require the minimum amount of computational operations and apparatus; it would not exceed the computational requirements of any artificial system that could be created to undertake the same job.Given the presence of redundancy, movement, locality conditions, and other imperfections discussed above, this possibility seems like a non-starter.Computational simplicity is further compromised by the kinds of "economy conditions" (see below) assumed in minimalist analyses, which require that all possible outputs given the lexical items inputted be computed and compared in order to determine the most economic option (Johnson & Lappin 1997).
The minimalist position similarly cannot be rescued by appealing to the more modest criterion of optimal compromise examined in section 3.No compelling reasoning has been presented in the literature to illustrate the pertinent criteria for which language is considered optimal, and how the conflict between these is reconciled by the properties the linguistic system shows.

Optimality and Economy
In the minimalist literature, optimality (or perfection) seems most often to be equated with "economy", and with the related suggestion that all properties of language might derive from virtual conceptual necessity, 5 a term glossed by Boeckx (2006: 4) as "the most basic assumptions/axioms everyone has to make when they begin to investigate language". 6 In one respect, this notion is admirable (if unsurprising): Linguistic theorizing, like all scientific theorizing, should be guided by considerations of parsimony.If two theories cover some set of data equally well, but one does it with fewer stipulations or fewer parameters, we should, other things being equal, choose the "simpler theory".
But researchers under the minimalist umbrella often seem to take parsimony a step further, and suggest that independently of the character of the linguistic data, a theory with few principles or representational formats is to be favored over a theory with more principles or representational formats.For example, the Minimalist Program reduces the levels of representation to just two -Phonological Form (PF) and Logical Form (LF), arguing that "virtual conceptual necessity demands that only those levels that are necessary for relating sound/sign and meaning be assumed" (Boeckx 2006: 75) -where previous theories also posit Deep Structure (DS) and Surface Structure (SS).In our view, such assumptions are risky.To paraphrase Einstein, a theory ought to have as few representational formats as possible, but not fewer; the correct number of levels of representations could well be one or two, but it could be three or four or even ten or twenty; this is simply a matter for empirical investigation.For example, research in autosegmental phonology suggests that multiple levels (or tiers) of representation are required to account for processes such as tone (Goldsmith 1976); one would not want to revert to a single level account simply because fewer levels are superficially simpler or more economical.
A second type of economy lurks behind the first: An assumption that linguistic competence is in some significant fashion mediated by something akin to energetic costs.Economy of this sort is reflected in the types of economy considerations that have been employed since the earliest times of Generative 5 For a critique of the coherence of the very notion of virtual conceptual necessity, see Postal (2003). 6 Unfortunately, there is no clear consensus about what such assumptions might be.On the restrictive side, virtual conceptual necessity might consist of little more than a requirement that sound be connected to meaning (Chomsky 1995, Boeckx 2006), with other properties, for example, binary branching, derived rather than stipulated as necessities.On the less restrictive side, however, even puzzling properties such as "displacement" (movement), which hardly seem logically necessary, are also included, as in Boeckx's (2006: 73) suggestion: " Chomsky (1993) remarked that one way of making the minimalist program concrete is to start off with the big facts we know about language […].These are: (i) sentences are the basic linguistic units; (ii) sentences are pairings of sounds and meanings; (iii) sentences are potentially infinite; (iv) sentences are made up of phrases; (v) the diversity of languages are the result of interactions among principles and parameters; (vi) sentences exhibit displacement properties […].Such big facts are, to the best of our understanding, essential, unavoidable features of human languages […].They thus define a domain of virtual conceptual necessity".In our view, this broader formulation considerably weakens the explanatory force of virtual conceptual necessity.Although (i)-(iv) seem like plausible minimal requirements, (v) and (vi) seem to be empirical observations about human language, not logical requirements: hence properties that demand explanation, rather than mere stipulation.
Grammar (see review in Reuland 2000), as in Chomsky & Halle's (1968) evaluation procedures for grammars.More recent minimalist versions include localitydriven constraints such as Shortest Move, where a lexical item can be moved from one position in a sentence to another only if there is no other position closer to the lexical item that it could move into, and necessity-driven constraints such as Last Resort, where a lexical item will be moved from one position to another only if no other operation will result in grammaticality (Chomsky 1995).Unfortunately, minimalism, as currently practiced, wavers considerably as to what is allegedly being economized.
Consider, for example, the nature of the Spell-Out operation in later versions of minimalism.Spell-Out is the operation that applies once all lexical items in a lexical array have been combined through Merge and Move, sending the semantic features of these lexical items to LF and the phonological features to PF.In those formulations that follow Chomsky's (2001) Derivation by phase architecture, Spell-Out operates not once at the end of a derivation, but multiple times throughout it.Under this view, the derivation advances in stages or phases, at each phase only a sub-set of the lexical array being visible.Once the items in this sub-set have been combined, Spell-Out of this phase takes place.The advantage that is put forward for such a system is the decrease in memory requirements -the material that must be 'remembered' until the point of Spell-Out is considerably less.Yet, a system that applies Spell-Out only once could be argued to be advantageous in that the machinery for applying the operation is invoked only once in the derivation.The question then becomes: Is it computationally simpler (and hence more optimal) for the Spell-Out operation to apply multiple times to small amounts of material, or only once but dealing with a larger amount of material?Without a clear answer to this question, references to economy become too evanescent to have real force.
A second case pertains to the operation of Agree.Agree allows for uninterpretable features on lexical items to be checked and removed before Spell-Out.In earlier versions of the theory (Chomsky 1995), Agree was permitted to apply only to elements in a particular local relation to each other -a Specifier-Head relation.Later, this stipulation was relaxed, allowing Agree to apply more freely.An additional rule was then required in order that illicit Agree relations could be ruled out (Chomsky 2001).While it might appear intuitively as if permitting Agree to apply freely is a simpler, more optimal approach, the question is whether the additional c-command rule that must be imposed negates this.Is it computationally simpler (and hence more optimal) to apply Agree freely and eliminate problem cases with an additional rule, or to restrict Agree from the start to applying only in local domains?Once more, the Minimalist Program offers nothing in the way of a discriminating measure.
Whether the type of economy measures that the Minimalist Program has in mind are better defined as perfection or as optimality, we have shown that neither is plausible for language.Taking this path leads the Minimalist Program into two different kinds of problematic positions, which we will examine in the following sections.

Capturing the Facts of Language Leads to Abandoning Perfection
Even if the notion of optimality could be tightened in order to give it more force, a more serious problem would remain: So far as we can tell, Minimalist theory cannot actually work unless it abandons the core presumption of perfection or optimality.Minimalism equates perfection with a type of bareness that derives from admitting only what is strictly necessary.But, as Newmeyer (2003: 588) puts it, practice rarely if ever meets that target; in his words, "no paper has ever been published within the general rubric of the minimalist program that does not propose some new UG principle or make some new stipulation about grammatical operations that does not follow from the bare structure of the MP".In actual practice, many of the mechanisms and operations that have been introduced into the system appear to be motivated not from virtual conceptual necessity, but rather from empirical realities that could not have been anticipated from conceptual necessity alone.For example, phases, movement, and constructions all seem to require additional machinery, and none have counterparts in formal languages.Capturing them seems inevitably to take the theory away from the perfection that is its ostensible target.
Consider (8a), and its Japanese counterpart in (8b): What would be the simplest and most elegant way to capture the cross-linguistic facts illustrated in (8a) and (8b) within a minimalist framework?One option might be to say that English question words appear sentence-initially, whereas Japanese question words appear in situ in a position further to the right.This is a simple, economical, minimalist account.However, it misses the fact that although 'what' appears in initial position syntactically, semantically, it belongs in final position, and therefore there is more in common between English and Japanese than initially appears the case.However, to account for this fact, the theory has to add machinery, and so the account we get is no longer simple, economical or, minimalist.
Indeed, Kinsella (2009) has gone so far as to argue that EPP features have been added to the minimalist architecture specifically to drive movement, and for no other reason; there is (once again) no analog in formal languages, and no obvious reason that they should exist, for example, following from virtual conceptual necessity.As Chomsky (2000: 12) notes, "[i]n a perfectly designed language, each feature would be semantic or phonetic, not merely a device to create a position or to facilitate computation".EPP features, however, represent exactly that -features which create a position (the specifier position of the head holding the [EPP] feature), and which facilitate computation (by forcing a movement operation to apply).It is this essential tension which pushes the minimalist architecture away from the evolutionarily implausible ideal of economy and elegance.
One seems to be left, in short, with a choice between (i) a theory which delineates an optimal system of language, but that fails to account for the data, and (ii) a theory which accounts for the data of human language, but delineates a system which is not optimal.Operations such as Move, features such as [EPP], and computations such as the generation of multiple derivations from one lexical array, to then be chosen between (such as is required in Chomsky 2001), do not belong in a bare minimal system, yet seem like concessions the Minimalist Program must introduce in order to account for the facts.

The Redistribution of Labor
More broadly speaking, many minimalist analyses seem to achieve elegance only in Pyrrhic fashion, through a redistribution of labor that keeps syntax lean but at the expense of other systems, the burden of explanation shifted to phonology, semantics, and the lexicon, but the overall level of complexity much the same as before.
The phonological component of the grammar, for example, now looks after optional movements, such as Heavy NP Shift, topicalization, extraposition, and the movements required to deal with free word order languages.Also removed to this component of the grammar are the more obligatory movements of object shift and head movement, as in, for example, verb second languages.As a strongly lexicalist theory of language, the minimalist lexicon takes over the work required to deal with wh-movement, and case assignment, in the form of uninterpretable features.The binding of pronouns and anaphora is in at least some minimalist approaches (partly) the responsibility of the semantic component (Chomsky 1993, Lebeaux 1998).These redistributions may well be wellmotivated, but simply shifting computations that were once assumed to be syntactic to these other components does not make the grammar as a whole any more optimal, simple, or perfect.In the limit, if one simply deems syntax to be the elegant, non-redundant part of language, the notion of elegance becomes tautological, and the notion of syntax itself loses any connection to the very linguistic phenomena that a theory of syntax was once intended to explain.
As Table 2 makes clear, this general trend is common.Many of the canonical issues that were given a strictly syntactic analysis in Government and Binding theory are removed to other components of the grammar -semantics, discourse, and in particular, phonology, and the lexicon, leaving a more minimal syntax, but considerably greater complications elsewhere, and suggesting that some degree of complexity that departs from virtual conceptual necessity may be inevitable, even if it is redistributed.

GB solution MP solution
Head movement (e.g., Verb Second) Syntax: movement of a category head to another category head position, e.g., V to I or C (Haider & Prinzhorn 1986, den Besten 1989) Phonology: covert movement after Spell-Out (Chomsky 2001, Boeckx & Stjepanović 2001) Object Shift Syntax: DP movement to specifier position above VP in an extended IP (e.g., AgrOP), licensed by verb movement (Holmberg 1986). Phonology

The Reality of Imperfection and its Implications for Linguistic Theory
If the analyses given above are correct, it is unrealistic to expect language to be a perfect or near-perfect solution to the problem of mapping sound and meaning, and equally unrealistic to expect that all of language's properties can be derived straightforwardly from virtual conceptual necessity.The sorts of optimality-, economy-, and parsimony-driven constraints that advocates of minimalism have emphasised may well play an important role in constraining the nature of language, but if our position is correct, there is likely to be a residue that cannot be derived purely from such a priori constraints.

Beyond Virtual Conceptual Necessity
Two of the most salient forms of this residue -characteristic properties of human languages that do not seem to follow from virtual conceptual necessityare idioms and the existence of parametric variation between languages that cannot be boiled down to simple differences in word order (Broekhuis & Dekkers 2000).Consider first idiomatic expressions, such as kick the bucket, keep tabs on, extra-grammatical examples of the sort discussed in section 3.1.3,and the many constructional idioms and partially-filled constructions discussed by Culicover & Jackendoff (2005) (e.g., to VERB one's BODY PART off/out, giving us He sang his heart out, He yelled his head off, He worked his butt off, etc.).In the first instance, the very existence of such phenomena does not accord well with minimalist principles: Formal languages, which generally lack idioms, are more economical, more parsimonious, and more elegant.One might ultimately craft a minimalist account of idioms, but it is hard to see how to do so without stretching one's notion of conceptual necessity.
Many seemingly straightforward patches to the Minimalist Program either fail or undermine the overall goals of minimalism.For example, one might suggest that the compositional operation of Merge could apply to units larger than individual words, but as Jackendoff (to appear) notes, on this proposal, partially-filled cases such as 'take X to task' are problematic.If Merge were to target the whole unit directly from the lexicon, it would need to be categorized as a verb rather than a verb phrase (phrases must be created by merging smaller units together), but it is not clear how or why a verb would be allowed to have an open argument position within it, and how this argument position would be filled given that Merge cannot target parts of an undecomposable unit.Alternatively, along the lines of Rögnvaldsson (1993), one might allow syntactic composition rules to operate in the lexicon, but although this might account for cases with an idiosyncratic semantics only, it leaves those cases which also have an idiosyncratic syntax, such as be that as it may, unexplained.Yet another possibility, along the lines of Svenonius (2005), might be to account for idioms in terms of more complex tree structures (Banyan trees) and movement to a position that is part of some unconnected structure (sideward movement, Nunes 1995), but this seems to be a clear case of adding machinery beyond what is conceptually necessary in order to account for the data. 7 8 Certain cross-linguistic variation, too, poses difficulties for theories that vest heavily in economy.Consider, for instance, the question of whether a language requires a phonologically overt subject (e.g., English) or not (e.g., Spanish), or of whether in a given language the verb comes before its object (e.g., English) or after (e.g., Japanese).In earlier theories, these questions were answered by appealing to the notion of parameters set during acquisition.7 Banishing idioms to the 'periphery' rather than the 'core' does not really help.It may well be that idioms somehow sit outside the regular form-meaning mapping rules of the language, but the fact remains that idioms are pervasive in human languages (Jackendoff, to appear), and that they are absent in formal languages; as such, their existence in human language must be explained.

8
Even in approaches that treat idioms in much the same way as non-idiomatic constructions (e.g., Distributed Morphology, Halle & Marantz 1993), complexity lingers, for example, in the form of a post-syntactic idiosyncratic meaning component.Although that explanation still seems reasonable to the present authors, parameters of this sort actually pose difficulties for any orthodox version of minimalism.Take for example the original definition of the pro-drop parameter (Rizzi 1986), according to which the person and number features of the phonologically null subject are determined by the verb it occurs with.While this conjecture is quite reasonable, it poses difficulty for minimalist approaches, in which the person and number features of a verb are determined by the subject of that verb, in an Agree relation.In particular, on minimalist accounts, the null subject is licensed by the agreement features of the verb, inherently it cannot be specified with agreement features, but the verb's agreement features must be given their value by the null subject.To fix this, additional machinery of some form must be added to the minimalist architecture.One possibility (Alexiadou & Anagnostopoulou 1998) is to stipulate that agreement features are already valued on the verb in languages which allow phonologically empty subjects.This, however, requires stipulating that the distribution of such features differs crosslinguistically, and undermines the idea that a verb is not intrinsically singular or plural, 1 st , 2 nd , or 3 rd person (Kinsella 2009).A second option is to say that null subjects possess the agreement features required to give value to the verb's features (Holmberg 2005).This, on the other hand, requires stipulating that the null subject has its identity already, suggesting that the lexicon must contain multiple null subject entries, and taking the null pronoun very far from its original characterization (Kinsella 2009).
The word order effects that the head directionality parameter gives rise to can be accounted for in the Minimalist Program in one of three ways, but each adds complexity to the system.The first says that the Merge operation which combines lexical items into larger structures is subject to a condition deciding which element of the pair being combined will determine the category of the combined unit (as a simplified example, if a verb and a noun combine, will the unit they form be a verb phrase or a noun phrase?); cf.Saito & Fukui (1998).The second posits a rule in the phonological component of the grammar which looks after the linear order of words, rearranging any orderings which are not permitted in the language in question.This, of course, is simply the type of redistribution of labor (from syntax to phonology) discussed in section 5.3.The third possibility (Kayne 1994) assumes a universal underlying order and invokes movement in the syntactic component, thus requiring additional features to be added in order to drive movement in languages whose surface order differs from the underlying order.
If the restrictions that the Minimalist Program places on language were to be relaxed, better analyses for idioms, or for parametric variation, might be possible.Instead of beginning with the assumption that the system should be optimal, economic and simple, and having to then add to the syntactic machinery in unconvincing and arbitrary ways in order to account for particular facts, it would surely be preferable to admit complexity from the outset and account for the data using rules, operations, and generalizations that apply across the system as a whole.Indeed, alternative frameworks for theorizing about language, which do not place perfection and economy at their core, offer more convincing accounts for these cases.
For example, idioms might be more naturally captured by constructionbased approaches to language (e.g., Goldberg 1995, Kay & Fillmore 1999, Culicover & Jackendoff 2005) that posit a continuum of form-meaning mappings (constructions), where individual lexical items sit at the idiosyncratic end of the continuum, and general phrase structure rules, such as VP → V NP, sit at the general end, idioms sitting somewhere in the middle.Hardly elegant (and such theories have their own problems, Crain et al. 2009), but perhaps demanded by the empirical data.The redundancy of lexical storage that emerges from such a position would only be possible in a framework that accepts the existence of imperfection.
Optimality Theory, meanwhile, might lend insights into parametric variation.An optimality-theoretic take on the pro-drop parameter invokes the constraint of SUBJECT (which stipulates that a sentence must have an overt subject), which will be ranked high in languages like English, but will be outranked by many other conflicting constraints in languages like Spanish.This competition between constraints is seen clearly in the explanation for the existence of semantically empty subjects in languages which require an overt subject.The constraint of FULL-INT (which stipulates that all elements in a sentence must have meaning, i.e. expletive elements like 'it' and 'there' are ruled out) is in direct competition with the constraint of SUBJECT (Grimshaw & Samek-Lodovici 1998).In null-subject languages, FULL-INT is ranked higher than SUBJECT, that is, SUBJECT can be violated in order to satisfy FULL-INT.These languages, unlike English, disallow overt expletive elements; the reverse ranking of these two constraints would result in an overt expletive as we get in English.This alternative approach neatly captures the facts as a result of relaxing the demands of perfection and economy.It posits multiple constraints where a more parsimonious system might prefer to posit just one, and it allows (even demands) that these constraints compete, without demanding that a single onesize solution should optimally fit all.
More broadly, the fact that languages vary is not per se predicted by virtual conceptual necessity -one could easily imagine some species having soundmeaning mappings but having only a single grammar.Likewise, it seems unlikely that one would a priori expect that there would be significant arbitrary variation within a given language; constructed languages do not typically contain irregularities, idioms, and the like.Such variation -within languages and between languages -is characteristic of human language, and indeed among the properties that most markedly differentiate human languages from other formal languages.To put this somewhat differently, if linguistics is to capture what is characteristic of human language, it cannot simply provide a kind of Platonistic conception of what ideal languages would be, it has to describe -and ultimately explain -the character that human languages actually have.

A Recipe for (Bio)linguistics
The recognition that there are possible sources of imperfection in language must be reflected in how the language theorist goes about his day-to-day work.Moving forward, we suggest that the following principles should be followed: (A) Economy cannot be presumed.Although economy may contribute to the nature of language, one should not add features or operations to the system merely in order to achieve economy at a higher level of explanation.
(B) One should not assume a priori that every property of language is rule-based.Individually stored examples may oppose the clean simplicity of a system that is entirely rulebased, but experimental evidence shows that the most parsimonious account may sometimes be a more complicated one (Pinker 1991, Prasada & Pinker 1993, Marcus et al. 1995).
(C) One should not presume a priori that there is an absence of redundancy.A framework which is compatible with the existence of this imperfection may actually be more correct than one that is not compatible with it.
Biolinguistics is characterized by Boeckx & Grohmann (2007) in the editorial of the inaugural issue of this journal as an interdisciplinary enterprise concerned with the biological foundations of language.In order to fulfill this mission, biolinguists must take seriously insights from other disciplines.If our argument here is correct, at least one strand of recent linguistics -its tendency towards a presumption of perfection -is at odds with two core facts: The fact that language evolved quite recently (relative to most other aspects of biology) and the fact that even with long periods of time, biological solutions are not always maximally elegant or efficient.To our minds, anyway, the presumption of perfection in language seems unwarranted and implausible; a more realistic theory of language may reverse this trend, and look towards possible imperfections as a source of insight into the evolution and structure of natural language.

Table 1 :
Quirks of language and the lack of optimization in language

Table 2 :
Shifting burdens of explanation and the Minimalist Program