Edinburgh Explorer Languages and genes: Reflections on biolinguistics and the nature-nurture question

the apparent facial differences between speakers of Yoruba and Italian.


Introduction
With the launch of this journal, the term 'biolinguistics' gains new visibility and credibility, but a clear definition has yet to emerge. In their Editorial in the journal's inaugural issue, Boeckx & Grohmann (2007: 2) draw a distinction between "weak" and "strong" senses of the term. The weak sense is understood to mean "business as usual" for linguists, so to speak, to the extent they are seriously engaged in discovering the properties of grammar, in effect carrying out the research program Chomsky initiated in Syntactic Structures, while the strong sense refers to highly interdisciplinary and broad attempts to provide explicit answers to questions that necessarily require the combination of linguistic insights and insights from related disciplines (evolutionary biology, genetics, neurology, psychology, etc.).
We are concerned with the impact of rapid progress in genetics and cognitive neuroscience on linguists' conceptions of the biological bases of language and on the overarching issue of nature and nurture in linguistics. The particular focus of our discussion is the recent claim (Dediu & Ladd 2007) that there is a causal relationship between genetic and linguistic diversities at the population level, involving two brain growth-related genes and linguistic tone. Our broader aim, however, is to consider the implications of such relationships -assuming that they actually exist -for those who are "seriously engaged in discovering the properties of grammar" and for those who are attempting to "provide explicit answers to [necessarily interdisciplinary] questions" about language as a biological phenomenon. We argue that broad biological findings and insights must eventually inform the work of those whose interests and activities in biolinguistics are covered by Boeckx & Grohmann's weak sense of the term.
We thank the editors for their invitation to comment on Dediu & Ladd's work. DRL's work on this paper was supported by an Individual Research Fellowship from the Leverhulme Trust, DD's by an ESRC (UK) postdoctoral fellowship, and ARK's by a British Academy postdoctoral fellowship.

Languages and Genes
It is now well established that genes affect speech and language in individuals. By this we mean that there are demonstrable associations between inter-individual differences in genetic makeup and inter-individual differences in speech and language abilities. The best known case to date is undoubtedly that of the FOXP2 gene (Hurst et al. 1990, Gopnik & Crago 1991, Lai et al. 2001, Fisher et al. 2003), but it is well established that there are many other links between genetic variation and variation in abilities relevant to speech and language. The study of this type of correlation uses the tools of Behavior Genetics (Plomin et al. 2001, Stromswold 2001, which allows researchers to tackle three kinds of questions: First, to provide estimates of the heritability 1 of various speech and language abilities and disabilities (Stromswold 2001, Felsenfeld 2002, Bishop 2003, Plomin & Kovas 2005; second, to identify specific genetic loci and alleles involved (Fisher et al. 2003, Halliburton 2004, Plomin et al. 2001; and third, to dissect the complex relationships between and within aspects of speech and language (Plomin et al. 2001, Stromswold 2001, Plomin & Kovas 2005. The main conclusions from this fast-developing field seem to be (Dediu 2007: 125) that: (i) speech and language are quite strongly influenced by our genes at the individual level, but the nature and strength of this influence varies greatly across the particular aspects considered; (ii) the best model, both for disorders and the normal range of variation, is one involving many genes with small effects; (iii) some of these genes are generalists while others are specialists; (iv) most speech and language disorders simply represent the low end of the normal distribution of linguistic variability, rather than qualitatively distinct pathologies.
It also seems clear from this work that, in general, the causal links between genes and variability in speech and language are very complex and crucially involve the environment. We shall return to this point shortly.
In addition to connections between individual genetic and linguistic variability, it is also well established that genetic and linguistic diversity are correlated at the level of populations (Cavalli-Sforza et al. 1994, Dediu 2007. That is, geographical inter-population differences in allele frequencies tend to match the distribution of language varieties (e.g., dialects, languages or linguistic families) 2 . This match, unlike the ones discussed in the preceding paragraph, is spurious, in the sense that it does not suggest any causal link between genetic differences and linguistic differences. Rather, it is due to past 1 Defined as the proportion of phenotypic variation accounted for by genetic variation (Plomin et al. 2001: 85, Stromswold 2001: 652, Halliburton 2004.

2
It is true that some methods and datasets in the field have been heavily criticized, such as the tendency in earlier studies to make uncritical use of unjustified and/or controversial "historical linguistic" classifications and concepts, i.e. linguistic macrofamilies (Sims-Williams 1998, Bolnick et al. 2004), but the general approach is valid and fruitful. demographic processes which shaped both types of diversities in parallel ways (Cavalli-Sforza et al. 1994, Poloni et al. 1997, Jobling et al. 2004, McMahon 2004: An ancient population split is reflected both in the present-day similarity between the genetic structure of the descending populations and in the close relationship between the language varieties they speak. One example of this approach is represented by the language/farming co-dispersal class of theories (e.g., Diamond 1997, Bellwood & Renfrew 2002, Diamond & Bellwood 2003, which try to explain the present-day world-wide distribution of genetic and linguistic diversities through the expansion of agriculturalists, carrying both their genes and languages in the process. There is a third possible type of relationship between genetic and linguistic diversity that is not well established, namely between population genetics and language typology. This possibility was explored in a recent paper by two of the authors (Dediu & Ladd 2007), which proposed a connection between the interpopulation differences in two human genes and the inter-language distribution of lexical and/or grammatical tone. The two genes are ASPM and Microcephalin, which are known to be involved in brain growth and development. In September 2005, two papers published by the same research group appeared simultaneously in Science (Mekel-Bobrov et al. 2005, Evans et al. 2005, announcing the discovery of two new alleles (haplogroups) of ASPM and Microcephalin, named "the derived haplogroup of ASPM" and "the derived haplogroup of Microcephalin", and denoted here as ASPM-D and MCPH-D respectively. Both these haplogroups are fairly recent (approximately 5.8 thousand years ago for ASPM-D, and 37 thousand years ago for MCPH-D) and, strikingly, show a skewed geographic distribution and signs of recent or even ongoing positive natural selection (Mekel-Bobrov et al. 2005, Evans et al. 2005. Given that these haplogroups are potentially involved in brain size and development, the source of this geographical distribution and natural selection quickly became the focus of intense research. However, to date this research has failed to find the phenotype under selection, meaning that ASPM-D and MCPH-D probably do not determine obvious phenotypic effects; it has now been established that they do not appear to influence normal variation in intelligence (Mekel-Bobrov et al. 2007), brain size (Woods et al. 2006), head circumference, general mental ability, social intelligence (Rushton, Vernon & Bons 2007), or schizophrenia (Rivero et al. 2006).
The proposal of Dediu & Ladd (2007) is that the populations which have a low frequency of these derived haplogroups tend to speak tone languages. Impressionistically, this idea is supported by the apparent visual match between the map of tone languages (as given, for example, by Haspelmath et al. 2005) and the distribution of ASPM-D and MCPH-D (as given by the maps in Mekel-Bobrov et al. 2005 andEvans et al. 2005, respectively). Dediu & Ladd tested this hypothesis statistically using a database of 983 genetic variants (alleles) that sampled the human nuclear genome and 26 linguistic typological features that covered various aspects of phonetics, phonology and morphosyntax in 49 oldworld populations. (Complete details on the populations, genetic variants, linguistic features and methodology are given in Dediu & Ladd 2007or Dediu 2007 3 ) The statistical analysis showed that the distribution of the correlations between genetic and linguistic features strongly supports the hypothesized connection between ASPM-D/MCPH-D and tone. To rule out the likelihood that this correlation is of the spurious type discussed above, i.e. due entirely to underlying demographic and linguistic processes, Dediu & Ladd computed the correlation between tone and the two derived haplogroups while simultaneously controlling for geographic distances between populations (a proxy for population contact and dispersal) and historical linguistic affiliation between languages (a proxy for similarity through common descent); the proportion explained by these factors turned out to be minimal (again, details are to be found in Ladd 2007 andDediu 2007). It seems, therefore, that the relationship between tone and the derived haplogroups is not due to these standard factors; instead, it could reflect a causal relationship between the inter-population genetic and linguistic diversities.

From Individual Genetic Diversity to Population-Level Linguistic Diversity
How could such a relationship work? How could having or not having a certain allele in one's genome cause one's language to be tonal or not? We believe that any plausible mechanism relating individual genomes and typological variation in languages must consist of at least two distinct aspects: individual bias and intergenerational cultural transmission of language. We consider the second of these first. The proposed influence of inter-generational transmission is based on the well-accepted notion (e.g., Lightfoot 1979, Lass 1997, Anderson & Lightfoot 2002, Hale 2003, Campbell 2004) that much language change is brought about when children acquire a subtly different grammar from that of their parents. In invoking cultural transmission as a mechanism for genetically influenced typological change, that is, we are simply proposing that a population whose speakers are linguistically biased -for whatever reason -may, over many generations, transform its language in ways that reflect the preponderance of individual biases among language acquirers. This general idea is supported by a number of computer and mathematical models, which show that even slight biases will affect the direction of language change. For example, Daniel Nettle (1999) studied language change and the threshold problem by including the impact of functional biases, and found that they are effective in influencing the trajectory of language change. Kenny Smith (2004) considered "innate" biases of agents (in favor of, neutral to, or against homonymy) and showed that these influence the evolution of vocabulary. A recent mathematical approach using Bayesian learners (Kirby, Dowman & Griffiths 2007) concludes that small learning biases can be amplified by the process of cultural transmission and expressed as universals. There are of course additional complications to be addressed: Human populations are rarely uniform in their genetic composition, and they are normally in contact with other populations who may be both genetically and linguistically quite distinct. Dediu (in preparation) computationally analyzes a complex population of heterogeneous agents and finds that an allele biasing the rate of learning of a binary linguistic feature can be amplified by the cultural transmission of language even for weak biases and low population frequencies. Given a relatively weak bias of the sort we discuss below, many factors might override its influence and impact on the trajectory of language change. Among other things, this makes clear that we are not proposing any sort of deterministic relation between genes and language, only a very indirect and probabilistic one; we certainly are not suggesting that there are "genes for Chinese". But we believe that the broad outlines of an explanation based on the interaction of bias and cultural transmission are very plausible indeed. 4 Now let us consider what we mean by individual bias. We intend the term very broadly to mean anything in a given individual's genetic makeup that somehow inclines the individual to acquire, perceive and/or produce a given linguistic phenomenon in preference to some alternative. Such biases could include a range of cognitive/perceptual and anatomical/physiological factors. A relatively clear example is provided by the case of Italian and Yoruba vowels, discussed nearly thirty years ago by Peter Ladefoged (Ladefoged 1984; see also Disner 1983). Ladefoged noted the existence of small differences in formant values between Yoruba and Italian, which have otherwise very similar 7-vowel systems (namely, /i e ´ a ø o u/), and noted that these differences are consistent with anatomical differences between Africans and Europeans: Some of the differences between the two languages are due to the shapes of the lips of Italian as opposed to Yoruba speakers. […] [W]ith the exception of /i/ and to a lesser extent /e/, the second formant is lower for the Italian vowels than for the Yoruba vowel. These differences are precisely those that one would expect if Yoruba speakers, on the whole, used a larger mouth opening than that used by the Italian. […] The possibility of overall differences in mouth opening is certainly compatible with the apparent facial differences between speakers of Yoruba and Italian. (Ladefoged 1984: 85-86) It is uncontroversial that facial anatomy is influenced by genetic makeup and that vowel quality might be affected by facial anatomy. In our terms, the genetically inherited trait (the shape of various components of the face and vocal 4 Since the publication of Dediu & Ladd (2007), we have learned that a similar idea was proposed half a century ago by Darlington (Darlington 1947, Darlington 1955) and extensively developed by Brosnahan (Brosnahan 1961), based on the apparent correlation between the distribution of blood groups in Europe and the distribution in the European languages of interdental fricatives, front rounded vowels, and various other phonetic types. The idea was largely dismissed at the time -though Brosnahan's book was reviewed in Science (Swadesh 1961) -partly because of the taint of racism in the general intellectual atmosphere of the time, partly because the proposal's empirical underpinnings in genetics were necessarily primitive and its statistical approach elementary, and partly because there was no obvious way of ruling out a co-dispersal account even if the apparent correlation was valid. However, Brosnahan does give a very clear account of how variable individual biases or predispositions might affect the development of languages over many generations, which is identical in its essentials to the proposals discussed here.

tract) induces a linguistic bias (a tendency to produce slightly more open or less open vowels). However, this is only half the story. Indeed, Ladefoged goes on to say:
This does not, of course, imply that a Yoruba could not learn perfect Italian. Any individual speaker could compensate for the overall, statistical, difference in headshape […]. (Ladefoged 1984: 86) This is a critically important qualification. First, it makes clear that individual bias need not be manifested in the behavior of the linguistically mature speaker: It is perfectly obvious that all normal children acquire the language(s) they are exposed to during their first years. Second, and more important, it means that individual bias by itself will not necessarily have longterm effects on the language system. If any Yoruba child can learn perfect Italian or any Italian child perfect Yoruba, the putative effects of facial anatomy on phonetic realization can become manifest, if at all, only through the operation of some further factor.
That factor, we claim, is inter-generational cultural transmission. Ladefoged did not spell this out, but a hypothetical scenario will make clear the kind of thing he might have said if he had done so. Imagine that a group of Yoruba infants, as a result of some inconceivable but irrelevant cataclysm, are brought up in Italy away from any speakers of Yoruba. We can assume that their Italian will be phonetically indistinguishable from that of the Italian speakers with whom they live. Now let us further imagine that these unfortunate children go on to found an Italian-speaking community isolated from contact with other Italian speakers and remaining largely endogamous, i.e. genetically Yoruba rather than Italian. We suggest that, a number of generations downstream, the language spoken by their descendants will exhibit vowels having slightly lower second formants. Any individual Yoruba child of the founder generation, brought up in Italian surroundings, will have learned to produce vowels that acoustically match those it hears; over several generations, however, under the influence of the anatomically-determined bias, the community's phonetic norms will drift. This scenario also serves to make a further important point about genelanguage links of the sort we are discussing: The linguistic bias in this case is unrelated to any biologically selective pressures that may have given rise to the differences in facial anatomy. That is, genetic differences can affect language without creating selective pressures, and without being due to selective pressures related to language. There is no reason to think that slight differences in vowel quality confer any selective advantage on speakers, even though they are causally related to anatomical traits that are themselves clearly heritable and that may be due to natural selection for some other reason. The linguistic differences can merely be indirect by-products of characteristics that have independently evolved.
The case made by Dediu & Ladd (2007) for a link between ASPM, Microcephalin, and linguistic tone is more complex and more speculative than the example based on Yoruba and Italian vowels, because the nature of the bias is considerably less obvious, but their basic proposal for the interaction of individual bias and cultural transmission is identical. Dediu & Ladd assume that the bias is some sort of cognitive or perceptual preference arising from structural differences in the areas of the brain involved in language and speech. Detailed mechanisms remain hypothetical, but Dediu & Ladd sketch one proposal for the kind of structural differences that might be involved, and point to a range of studies showing that genes have an important impact on the normal interindividual variation in brain anatomy and physiology, including the areas involved in language and speech (e.g., Bartley, Jones & Weinberger 1997, Pennington et al. 2000, Thompson et al. 2001, Wright et al. 2002, Scamvougeras et al. 2003, Giedd, Schmitt & Neale 2007. They concede that it is by no means clear what sort of cognitive or perceptual bias might induce a preference for or against linguistic tone, though they suggest that it may relate to a preference for having phoneme-sized units that are strictly linearized (as in a non-tonal language) or for allowing phonemes to occur simultaneously (as in a tone language) (Ladd, in preparation). Importantly, they also note that -as with the case of facial anatomy and vowel quality -the putative linguistic bias could be completely unrelated to the selective pressures that may be driving the spread of the derived haplogroups of ASPM and Microcephalin. There is no reason to think that there is any selective advantage to speaking a tonal or a non-tonal language, since both types of languages serve as supports for a wide range of complex human cultures. If we wanted to use the proposed bias to explain the strong natural selection of the derived haplogroups argued for by Mekel-Bobrov et al. (2005: 1722, the difference in biological fitness (however defined) between tonal and non-tonal languages would have had to be so obvious that Dediu & Ladd's finding would be old news. Instead, it is most likely that the proposed linguistic effects of ASPM-D and MCPH-D are selectively neutral by-products, and that the naturally selected phenotypes of these genes must be sought elsewhere. The latter is a topic well beyond the scope of our brief remarks here.

Nature, Nurture, and the Language Faculty
If genes can affect language through the mechanisms discussed here, what does this mean for the biological basis of language? We think that, most importantly, it provides a further illustration of the fact that there is a fundamentally complex and irreducible interaction between one's genes and one's language and culture -between nature and nurture. A clear example of this interaction, from a very different domain, is provided in a recent paper (Caspi et al. 2007): Caspi and colleagues found that breastfed children tend to have higher IQs than nonbreastfed children, but only if they possess a specific variant of the FADS2 gene, allowing them to actually process the human-specific long-chain polyunsaturated fatty acids present in mothers' milk, which "are thought to be important for cognitive development because they are required for efficient neurotransmission […] and are involved in neurite outgrowth, dendritic arborization, and neuron regeneration after cell injury […]" (Caspi et al. 2007: 18860). Thus, if a baby is breastfed (nurture) but does not have the appropriate genome (nature), or does have the genome but is not breastfed, there is no positive effect on its IQ. For such an effect to appear, it is required that both nature and nurture are present and "cooperate". Genes are expressed through the environment, and not, as suggested by the unfortunate catchphrase "nature versus nurture", in spite of the environment or independently of it. The case of breastfeeding and the FADS2 gene is just one example of the interaction between nature and nurture; many others can be found in the biological literature under the headings of "extended phenotype" (Dawkins 1982), "niche construction" (Odling-Smee, Laland & Feldman 2003), and "phenotypic plasticity" (West-Eberhard 2003). All this literature suggests that we have to move beyond simplistic slogans and embrace the complexity of genotype-environment interactions. For the specific case of genes biasing language, the causal chain leading from genes to their phenotype flows not only through the individual's immediate environment and the individual's effects on it, but through a temporally and culturally-mediated environment, including the individual, as well as the individual's linguistic peers and their descendants over many generations. In the case of language, that is, the nature-nurture interaction fundamentally includes time, in the form of repeated transmission of cultural information across generations. This is the most obvious lesson to be drawn from cases like those discussed by Ladefoged (1984) and by Dediu & Ladd (2007).
A more subtle, and probably more important, consequence is that the capacity for language (in its broad sense) is not fixed and uniform across the species, but diverse and dynamic. It can vary from individual to individual, and it can change gradually over time. This would be a commonplace for anyone taking an evolutionary stance and regarding language as a biological phenomenon that has resulted from biological evolutionary processes, but sits uneasily with the idea of language as a perfect and economical system (Kinsella, forthcoming). There is a wealth of data showing that human evolution did not stop at a conveniently chosen moment in the past (be it around 200,000 years ago, when, presumably, the Homo sapiens species arose, or 10,000 years ago, when agriculture and civilization as we define it began). Rather, it continues to act on various aspects of the human body, brain and mind (see, for example, Mekel-Bobrov et al. 2005, Evans et al. 2005, Voight et al. 2006. The two linguistic examples we have considered both deal with phonetic and phonological aspects of language, but there are no principled reasons for excluding morphosyntax or semantics from the discussion. Linguistic theorizing in general, and biolinguistics in particular, must take into account and integrate the idea that human linguistic capacities are variable and probably still evolving. This does not rule out the existence of genetically determined universals of language. Indeed, the existence of the type of genetic influence on typological linguistic features discussed by Dediu & Ladd would seem to increase the plausibility of claiming that some properties of language have deep cognitive, and ultimately genetic, causes -though, of course, the very lack of variation implicit in the definition of absolute universals makes it difficult to evaluate such claims empirically. That is, some linguistic universals may result from biases that are due to genes fixed or near fixation in the human species, a possibility that fits very well with the Chomskyan research program that forms the basis of Boeckx & Grohmann's "weak sense" of biolinguistics (see also especially Anderson & Lightfoot 2002). At the same time, however, if we accept the possibility of genetic explanations both for some universal properties of language and for some cases of typological variation, it is difficult to avoid the implication that the capacity for language has evolved through the standard mechanisms of evolutionary biology, in a gradual manner, and that it continues to do so. We therefore think that the most important task for biolinguistics is to inform linguistic theorizing by putting a strong emphasis on the evolutionary adequacy of linguistic ideas (Kinsella, forthcoming). This can only be achieved if we adopt Boeckx & Grohmann's "strong" sense of biolinguistics.
We are not suggesting that "business as usual" for linguists should be abandoned; this endeavor has yielded enormous results over the past decades. Indeed, we believe that a new and better account of the mystery of human language will only come from a truly interdisciplinary approach; one that brings together linguists and others in equal measure, making use of their respective methodologies with a full understanding of their assumptions, and trying to resolve any incompatibilities using shared standards of falsifiability and argumentation. Yet we also believe that we must keep in mind Theodosius Dobzhansky's (1973) famous dictum that "nothing in biology makes sense except in the light of evolution". Everything in biolinguistics must ultimately be confronted and eventually reconciled with known evolutionary theory. Unless evolutionary concerns are taken seriously, the point of proclaiming a new field of biolinguistics remains obscure.