A Parallel Derivation Theory of Adjuncts

Daniel Milway*1

Biolinguistics, 2022, Vol. 16, Article e9313,

Received: 2022-04-14. Accepted: 2022-06-11. Published (VoR): 2022-10-19.

Handling Editor: Kleanthes K. Grohmann, University of Cyprus, Nicosia, Cyprus

*Corresponding author at: 5-6 Boulton Dr., Toronto, ON, Canada, M4V 2V4. E-mail:

This is an open access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


I present and argue for a theory of adjuncts according to which, adjuncts and their respective hosts are derived as separate, parallel objects that are not combined until forced to by the process of linearization. I formalize the notion of the workspace, and the workspace-based operation MERGE. Finally, I show that this approach to adjuncts naturally accounts for Adjunct Islands and Parasitic Gaps and is consistent with adjective ordering constraints.

Keywords: syntax, theory, adjuncts, derivations, minimalism

1 Introduction

Adjuncts and the process of adjunction which produces them occupy a somewhat paradoxical place in biolinguistic grammatical theory, being both ubiquitous and peripheral. They are empirically ubiquitous—a language without adjuncts would be remarkable, and it is quite difficult to even use language without adjuncts—but they are theoretically peripheral—formal theories of grammar, as I argue in Section 3, generally do not predict adjuncts and some seem to predict that adjuncts ought not exist. This has made adjuncts into something of a thorn in the side of grammatical theorists, stopping them from developing a complete and uniform theory of grammar. In this paper, I propose that, while one recent theoretical development in biolinguistics/minimalism—the decoupling of phrase-building and labeling—has closed off one possible route to explaining adjuncts, another development—derivation by workspace—has opened up another such route.

The question of adjuncts can be put as follows. How is (1) structured/derived such that (i) it means what it means, and (ii) (2)–(4) are grammatical and mean what they mean?


Rosie sang the song with gusto.


Rosie sang the song.


Rosie sang the song with gusto before dinner.


Rosie sang the song before dinner with gusto.

The answer that I propose in this paper is, in its most basic expression, that adjuncts (i.e., with gusto and before dinner in (1)–(4)) and their hosts (i.e., Rosie sang the song in (1)–(4)) are derived separately from each other and only “joined” post-syntactically. This conjecture, it should be noted, is not completely novel. Indeed, Chomsky (1965) conjectured that . . .

[many Adverbials] are Sentence transforms with deleted Subjects. Thus underlying the sentence “John gave the lecture with great enthusiasm,” with the Adverbial “with great enthusiasm” is the base string “John has great enthusiasm” . . . with the repeated NP “John” deleted as is usual[.] (pp. 218–219)

Similarly, Lebeaux (1988, p. 151) proposes the operation Adjoin-α, which “is simply an operation joining phrase-markers.” It would, of course, be easy to answer theoretical questions if all one had to do was conjecture as I have just done. The task of the theorist is to show that such a conjecture can be made to follow from an independently plausible theory, and that is the task taken up in this paper.

The goal of this paper is to propose a theoretical explanation of grammatical adjuncts and adjunction. I will begin in Section 2 with some remarks on the empirical scope of my proposal. I continue in Section 3 by laying out my relevant theoretical assumptions with special reference to Simplest Merge (Collins, 2017) and workspaces (Chomsky, 2020). Next, I make my proposal explicit in Section 4, starting at a very coarse-grain and getting progressively finer. After that, I discuss some facts that are naturally accounted for by the proposal in Section 5 and some facts that seem to contradict my theory in Section 6, and compare my approach with some contemporary attempts to explain adjuncts in Section 7. Finally, I conclude, discussing the implications of my proposal on the broader theory of grammar in Section 8.

2 What Is This Paper About?

As I mention above, this paper proposes a theory of adjuncts and each reader likely has their own particular rough and ready pretheoretic or quasitheoretic notion of what an adjunct is. More than likely this notion is based on a prototype of adverbs, adjectives, prepositional phrases, or the union of all of these categories—indeed perhaps all of my examples of adjuncts will take the form of adjectives, adverbs or prepositional phrases. This notion, no doubt, has furnished each reader with a battery of tests for any would-be theory of adjuncts—a bunch of facts that a theory of adjuncts must account for. Expecting or requiring that theoretical definition of some aspect of nature perfectly matches a pretheoretic notion of that aspect of nature is a fool’s errand—the explanatory domain of a theory rarely, if ever, matches any pretheoretic notion, nor should we expect it to.

There are at least two reasons that we ought not to expect the domain of any theory to match our pretheoretic notion. The first is that the very rationale for theoretical investigation of some aspect of nature is our lack of pretheoretic understanding of that aspect of nature. The first step towards theoretical explanation of something, then, is the realization that our intuitive understanding of it is flawed. It is therefore inconsistent to require that implications of a pretheoretic notion be carried over to an explanatory theory.

The second reason is that, historically, the domains of explanatory theories are rarely if ever coextensive with the pretheoretic domain. In one sense, the process of theorizing narrows the domain but, in another sense, explanatory theories tend to have an unexpectedly broad domain. For instance, Generative Grammar doesn’t address all of the phenomena covered by the commonsense term “language,” but the theory has also been used to provide explanations for aspects of the human faculty of music (Mukherji, 2012) and arithmetic (Chomsky, 2020). Similarly, pre-Galilean (i.e., Aristotelian) mechanics covered all variety of earthly motion and change, including plant growth, but excluded the motion of the stars and planets, which belonged to the separate field of cosmology (Feyerabend, 1993). So, requiring a theory to meet our pretheoretic expectations may preclude theories with surprising explanatory depth.

The case of adjuncts and adjunction, though, is complicated by the fact that, broadly speaking, the current understanding of them is not exactly pretheoretic. As I discuss in the following section, the term “adjunct” had a precise theoretical meaning in various versions of X-bar theory, but more broadly, the term refers to a possible class sub-expressions which do not fit neatly into grammatical theory. In this sense, we could describe the ideas of adjuncts/adjunction as held by syntacticians to be extratheoretic. Yet, so long as it is made somewhat explicit, a pretheoretic notion of any phenomenon is a crucial starting point for any theoretical work, the present one being no exception. So, as a pretheoretic notion, I take adjuncts to be parts of linguistic expressions which are optional, stackable, and freely-orderable, and adjunction to be the process by which adjuncts are introduced to an expression. Two important notes to make regarding this “definition” are that (i) it is a conjunction, not a disjunction, of three properties—every adjunct seems to have all three—and (ii) it is at best a heuristic device—my theory will take it to be the “base case” for adjuncts. So for instance, the data on adjective and adverbial ordering restrictions that motivates cartography/nanosyntax does not seem to meet my definition of adjunction and therefore that data does not seem to involve adjunction, though I complicate this view somewhat in Section 5.3. Indeed, as we shall see in Section 6 though, much of the data that seems to contradict the theory I propose involves expressions that do not meet the heuristic definition of adjuncts/adjunction and therefore can be set aside.

3 Theoretical Context

The current proposal is situated in the biolinguistic/minimalist theory of grammar. The core conjecture of this theory is that the human language faculty is a mentally-instantiated computational procedure which generates an infinite array of structured expressions by the recursive application of the simplest combinatory operation Merge. The task of theorizing under this approach can be divided into two related subtasks— the formalization of the operation Merge, and the formalization of the derivational architecture. While the former has largely been the centerpiece of minimalist program, the latter has been brought into sharp relief quite recently. In this section I will discuss current approaches to the two subtasks with reference to adjuncts where relevant, followed by some comments on the other cognitive systems with which the language faculty interacts.

3.1 Merge and Adjuncts

From the earliest work in transformational grammar (Chomsky, 1957, 1965) up until early theories in the minimalist program (Chomsky, 1995, 2000) the generative component of the language faculty was divided into a base subcomponent, and a transformational subcomponent. In all of these theories the base included both the mechanism for generating complex structures from simple items, and the mechanism for labeling those structures. The latter was written directly into the particular phrase-structure rules of the early theories, then derived from general X-bar principles in later theories and finally assigned by early definitions of Merge, given below in (5) where the choice of the label γ was generally assumed to follow X-bar principles. This was assumed despite the fact—recognized later by Chomsky (2013) and Collins (2002)—that the theoretical innovations of minimalism had eliminated any principled basis for X-bar theory—more on this later.


Mergev1(α, β) → {γ, {α, β}}

Theorists working within the minimalist program, however, have put forth various proposals for decoupling labeling from Merge—Collins (2002, 2017) and Seely (2006) simply eliminate labels; Chomsky (2013) eliminates labels from the narrow syntax, relegating them to an interface process; Hornstein (2009) proposes that Merge and Label are separate operations in the syntax. Most of those theorists1 have settled on the definition of Merge in (6), sometimes called “Simplest Merge”.


Mergesimplest(α, β) → {α, β}

This move, though seemingly a minor one, has major implications for the theory of grammar generally and the possibilities for a theory of adjuncts more particular.

A move to a label-free definition of Merge has implications for the theory of adjuncts because the theories of adjuncts within X-bar theories and early minimalist theories depended on the nature of labels and their importance for the c-command relation. For instance, Lebeaux (1988) proposed a transformation Adjoin-α which attaches an adjunct phrase to the maximal projection of a host phrase and then labels the resulting structure with the label of the host phrase as shown in Figure 1. In contrast, Chametzky (1996), critiquing Lebeaux’s proposal, argues that the node created by adding an adjunct is unlabeled.

Click to enlarge
Figure 1

Adjoin-α (Lebeaux, 1988)

Stepanov (2001) adapts Lebeaux’s theory of adjuncts to an early minimalist theory and argues that adjuncts can be added counter-cyclically without violating what he terms the least tampering principle—defined in (7).


Least Tampering (Stepanov, 2001, p. 102)

Given a choice of operations applying to a syntactic object labeled α, select one that does not change @(α).

@(X): a set of c-command relations in a syntactic object labeled X.

Stepanov (2001, p. 101) further defines c-command and domination as in (8).

  1. α c-commands β if neither α nor β dominates the other and the first branching node that dominates α dominates β.

  2. α is dominated by β only if it is dominated by every segment of β.

His argument runs as follows. Supposing adjunction proceeds more or less as schematized in Figure 1, the adjoined PP is dominated only by a segment of NP. Therefore, neither the NP nor the PP c-commands the other or the others contents. Thus, no new c-command relations are formed by merging the PP counter-cyclically and Least Tampering is not violated. So, Stepanov’s argument for late merge depends on the adoption of a particular theory of labels resembling that of G&B theories.

Regardless of the soundness of these proposals within their respective theories, they all crucially assumed a generative procedure in which labeling and structure building were intrinsically linked. Therefore, none of these theories of adjuncts can be neatly translated into a theory in which labeling and structure building are separate from each other.

The move to a “Simplest Merge” theory of syntax, then, demands a novel theory of adjuncts. Chomsky (2004, 2013) has suggested that adjuncts are the result of an operation Pair-Merge which creates ordered pairs rather than sets, as demonstrated in (9), with angle brackets—⟨·⟩—indicating an ordered set.


Pair-Merge(α, β) → ⟨α, β⟩

This conjecture, though, does not constitute a novel theory of adjuncts, as there has been little to no effort to demonstrate that the empirical properties of adjuncts follow from Pair-Merge—it captures the general observation that the Host-Adjunct relationship is asymmetric, but does not predict stackability, conjunctive interpretation, island-hood, etc.2 So, Simplest Merge theories of syntax lack a theory of adjuncts.

3.2 The Derivational Architecture

Early minimalist theorizing focused on simplifying the architecture of the grammar by eliminating levels of representations like D-Structure, S-Structure in favor of a single derivational cycle with interfaces to independent cognitive systems. Discussion of the architecture of that derivational cycle, though has been quite limited until recently. Generally, it has been assumed that a given sentence is generated from a finite lexical array in a single linear derivation, perhaps punctuated by phases.

Recently, though, there has been increasing interest in the idea that a sentence is derived in possibly multiple subderivations, each corresponding to either the clausal spine of the sentence or its complex constituents (Chomsky, 2020; Collins & Stabler, 2016; Nunes, 2004). So, for instance, a transitive sentence like (10) would be derived in three subderivations—one corresponding to the clausal spine, and one each for the nominal arguments.


The customers purchased their groceries.

Chomsky (2020) gives an explicit argument for the idea of subderivations based on extensions of Merge—Parallel Merge (Citko, 2005), in particular—which exploit the fact that the domain of Merge is rather undefined. Take, for example, the hypothetical stage of a derivation in (11) consisting of an already constructed phrase {α, β} and an atomic object γ. Note that the square brackets used in (11) indicate a set, the type of which, is irrelevant and as yet undefined.


[{α, β}, γ]

At this stage, according to Chomsky, there should be two basic options—Internal Merge and External Merge. Internal Merge would involve Merging α or β with the set {α, β} resulting in a stage resembling (12), while External Merge would involve Merging γ with the set {α, β} resulting in the stage (13).


[{β, {α, β}}, γ]


[{γ, {α, β}}]

Parallel Merge, though, involves Merging α or β with γ to give a stage resembling (14).


[{α, β}, {β, γ}]

This, Chomsky argues, is an inevitable but unacceptable result of defining Merge as in (6), as it could be used to violate any conceivable locality constraint.

The solution that Chomsky proposes is to redefine Merge as an operation not on syntactic objects per se but on workspaces which contain syntactic objects. Following Chomsky, I will refer new version of Merge as MERGE (pronounced “capital merge”). I will propose formal definitions of workspaces and MERGE in Section 4.1.3, but some properties of these constructs are worth mentioning here. The objects that we called stages of derivations—e.g., (11)—are in fact workspaces. The distinction between the two terms—“stage” and “workspace”—is analogous to the distinction between the distinction between “theorem”/“lemma” and “formula” in proof theory—in both cases, the former are a subspecies of the latter that are demonstrably derivable by a system of axioms and rules. So, while we can arbitrarily construct any number of well-formed workspaces for our purposes, there is no guarantee that all of them will be derivable from the lexicon by the grammatical operations. The new operation, MERGE, operates on workspaces as sketched in (15) where (i) X and Y are syntactic objects, (ii) WS and WS′ are workspaces, (iii) either X and Y are in WS or X is in WS and contains Y, and (iv) WS’ contains {X, Y} but does not contain X or Y.



Where WS′ = (WS − {X, Y}) ∪ {{X, Y}} in the case of External Merge

or WS′ = (WS − {X}) ∪ {{X, Y}} in the case of Internal Merge.

Setting aside issues of formalization for the time being, the workspace-based theory proposed by Chomsky (2020) suggests a picture of syntax wherein the derivation of, say, (10) is as given in (16), abstracting away from certain representational details and ignoring Transfer for the time being.3


The derivation of (10)

  1. WS1 = [the, their, customers, purchase, groceries, Voice, Tpst, C]

    MERGE(WS1, their, groceries) → WS2

  2. WS2 = [{their, groceries}, the, customers, purchase, Voice, Tpst, C]

    MERGE(WS2, the, customers)→ WS3

  3. WS3 = [{their, groceries}, {the, customers}, purchase, Voice, Tpst, C]

    MERGE(WS3, purchase, {their, groceries})→ WS4

  4. WS4 = [{purchase, {their, groceries}}, {the, customers}, Voice, Tpst, C]

    MERGE(WS4, Voice, {purchase, . . . })→ WS5

  5. WS5 = [{Voice, {purchase, {their, groceries}}}, {the, customers}, Tpst, C]

    MERGE(WS5, {the, customers}, {Voice, . . . })→ WS6

  6. WS6 = [{{the, customers}, {Voice, . . . }}, Tpst, C]

    MERGE(WS6, Tpst, {{the, customers}, {Voice, . . . }})→ WS7

  7. WS7 = [{Tpst{{the, customers}, {Voice, . . . }}}, C]

    MERGE(WS7, {the, customers}{Tpst, . . . })→ WS8

  8. WS8 = [{{the, customers}, {Tpst{{the, customers}, {Voice, . . . }}}}, C]

    MERGE(WS8, C, {{the, customers}{Tpst, . . . }})→ WS9

  9. WS9 = [{C, {{the, customers}, {Tpst . . . }}}]

3.3 The Language Faculty and Other Cognitive Systems

Thus far, I have only been discussing the human capacity for combining meaningful expressions to create larger meaningful expressions, often called the narrow faculty of language (FLN). Many of the empirical properties of language, though, spring from how the FLN interacts with other cognitive systems, namely the sensorimotor (SM) system which produces and processes external expression of language and the conceptual-intentional (CI) system which uses linguistic objects for mind-internal processes such as planning and inference. These are called systems rather than modules to indicate that they seem to be multifaceted, likely consisting of numerous interacting modules. The complexity of these systems is reflected in the difficulty of developing unified theories of morpho-phonology and semantics-pragmatics. While I will not be wading too deep into these waters, any theorizing regarding FLN requires getting one’s feet wet. In this section I will discuss the aspects of the SM and CI systems and their respective interactions with FLN insofar as they will be relevant to my theory of adjuncts. Specifically, I will discuss the SM problem of mapping hierarchical structures to linear ones, the CI problem of compositionality, and the problem of distinguishing copies from repetitions which affects both systems.

In Section 3.1, I discussed the fact that Simplest Merge decoupled phrase structure from labeling. What I neglected to mention was that it also decoupled phrase structure from linear order—the set {α, β} could just as easily be linearized as αβ or βα. In order to express a linguistic object, either in speech, sign, or writing, that object must be at least partially4 put in a linear order. The linear order, then, must be derivable from the structures created by FLN by various principles and parameters in a way which is definite within a language but particular to that language. One of those principles is Richard Kayne’s (1994) Linear Correspondence Axiom (LCA), a version of which is given in (17).


The Linear Correspondence Axiom

For syntactic object x and y, if x asymmetrically c-commands y, then x ≺ y.

The key insight of the LCA is that asymmetric c-command is equivalent to linear precedence in that it both are antisymmetric—if x ≤ y and y ≤ x then x = y—and transitive—if x ≤ y and y ≤ z then x ≤ z. One need not look very far to find the shortcomings of the LCA qua theory of linearization, and likely it is only one of the many axioms at play in the linearization process. But regardless of its shortcomings, the LCA is an important proof of concept, showing that linear ordering can be derived from structure without being encoded directly in it.

Turning to the CI system, I will now address what I, perhaps misleadingly, called the problem of compositionality, which tends to be taken as the semanticist’s counterpart to the linearization problem. The problem is usually stated as follows: The FLN generates hierarchically structured expressions but the CI system operates on formulas of a likely higher-order predicate calculus. To solve this problem, semanticists propose various compositional principles such as function application, predicate modification (Heim & Kratzer, 1998), event identification (Kratzer, 1996), and existential closure (Heim, 1982), among others. The degree to which the problem as stated exists, though, has been called into question within biolinguistic/minimalist theorizing. Chomsky (2013, and elsewhere) argues that language is primarily an instrument of thought, which contradicts the premise that linguistic objects must be transformed into or mapped onto thought objects. If linguistic objects are thought objects, than such a premise would be akin to requiring that one convert US Federal Reserve notes to US dollar bills before engaging in commerce. I will be adopting this position with two caveats. First, to say that the problem of compositionality as stated is non-existent is not to say that there are no problems of linguistic interpretation. We will encounter several as I propose and refine my theory of adjuncts. Second, I will on occasion choose to represent the interpretation of some expression in formal logic when such a representation is the most perspicuous way to demonstrate some relevant property of the expression. This is not to say that formal logic has any sort of privileged status, only that it may sometimes be a useful way to highlight certain properties of expressions.

Finally, I must discuss the copy-repetition distinction. Simplest Merge, which de-coupled phrase-structure from labeling, also combined phrase structure and transformations as its external and internal modes of operation respectively. While External Merge adds a new item to a syntactic object, Internal Merge merges one object with an object that that object contains as demonstrated in (18).


Mergesimplest(β, {α, β}) → {β, {α, β}}

The two βs on the righthand side of the arrow in (18) are copies of each other which means that the object represented on the righthand side of the arrow here doesn’t contain two βs but rather, that β is in two positions in the newly created object. To make this more concrete, consider the passive in (19) and its approximate syntactic representation in (20).


A man was seen.


{{a, man}, {T, {. . . {vpass {see, {a, man}}} . . . }}}

By hypothesis, (20) is formed by Internal Merge, combining the theme a man with the TP that contains it, making the two instances of {a, man} copies of each other. Because the two instances are copies of each other, they are really only one object and therefore, they refer to the same individual and are pronounced only once. Compare this to the active in (21) and its approximate syntactic representation in (22).5


A man saw a man.


{{a, man}, {T, {. . . {vact {see, {a, man}}} . . . }}}

In this case, the two instances of {a, man} are not copies of each other, but merely repetitions. So, the lower instance was Externally Merged with the verb and then later the second instance was Externally Merged higher. Because the two instances are not copies, of each other, they are distinct objects and therefore, they do not necessarily co-refer and they are both pronounced.

I mentioned above that copies undergo deletion by the SM system while repetitions do not. This much follows from both Simplest Merge and the facts of language, but question of which copies delete and when turns out to be quite complicated. If we started with the basic facts of English passives and wh-questions, we might propose a principle that states that only the highest copy—the copy that c-commands all other copies—is pronounced. Like the LCA, one need not look far to find exceptions,6 but also like the LCA, the principle of “pronounce the highest copy” can serve as a demonstration that the choice of which copy to pronounce can be derived from a structure without being encoded in it.

3.4 Summary

The forthcoming proposal is made in the theoretical context of biolinguistics/minimalism, a label that, admittedly, covers a wide range of theoretical positions. In this section, I have done my best to make explicit the relevant positions under that label which I will be taking in my theoretical proposal. First, I am assuming that the basic, likely only, innate language-specific combinatory operation is Simplest MERGE, which creates unlabeled binary sets and encompasses both the base component and the transformational component of the narrow syntax. Second, I am assuming that MERGE operates on a workspace by manipulating that workspace’s contents. Third, I assume that, while the narrow faculty of language (FLN) is simple, perhaps consisting only of merge and the derivational architecture, the systems that interpret the objects generated by FLN, either for externalization (SM) or mind-internal computation (CI), are complex, encompassing a number of principles, parameters and operations of which we understand very little.

4 The Proposal

The theory of adjuncts that I propose is best viewed in contrast to the theory of arguments. According to this theory, outlined in Section 3.2, an argument is derived separately from its clausal spine, and the result of that subderivation is merged into the clausal spine. An adjunct is also derived separately, except that the adjunct is never merged into the clausal spine. So the syntactic representation of (1) is given in (23) with the adjunct-free sentence (2) derived as the first element (SO1) of the workspace, and the adjunct PP with gusto derived as the second element (SO2) of that same workspace—again with angle brackets indicating that the workspace is an ordered set.


⟨{Rosie, {T, . . . {sing, {the, song}}}}SO1, {with, gusto}SO2

The expression represented in (23) is grammatical insofar as SO1 is a grammatical clause and SO2 is a grammatical PP. Furthermore, the grammaticality of each of the two objects—the clause and the PP—is independent of the grammaticality of the other. Therefore, the clause would be grammatical without the PP, or if there were additional adjuncts, regardless of the ordering. Note that these are the three characteristic properties of adjuncts: optionality, stackability, and freedom of order.

This independence, of course, carries over to the interpretation of (23). That is, Rosie sang the song and with gusto in (23) should be interpreted the same way as a sequence of independent expressions like (24) is—conjunctively.


Susan entered the room. The lights were off.

If (24) can be given a truth-value it would be the same as the truth-value of the conjunction of the two sentences. In the same way, (23) is interpreted more or less as in (25).


Rosie sang the song. It was with gusto.

There is one major difference, though, between the actual interpretation of (1) and that of (25)—the former entails that the anthem-singing event and the gusto-having event are the same, while in the latter, that identity is only an implicature. This might suggest that the adjunct with gusto is, in fact, semantically dependent on its host clause, but such a conclusion is unwarranted. It is not so much that the adjunct is about what its host is about but rather that the host and adjunct are about the same thing. This is the case, I propose, because the host and the adjunct are constructed in the same workspace.

Turning to pronunciation, it might be suggested that my proposal introduces new complexity to the already complicated nature of pronunciation. That is, our best theories suggest that c-command is vital for linearization, but there can be no c-command relation across SOs that are not connected to each other by Merge. Such an objection, however, would mistake the nature of the linearization problem, namely that Merge creates unordered objects that must be converted to ordered object for pronunciation. If, however, we make the reasonable assumption that a derivation stage such as (23) is intrinsically ordered (SO1 ≺ SO2), as indicated by the angle brackets in (23), then no linearization problem should occur beyond the one already solved by c-command.

So, if we take linearization—part of Transfer—to be the process of converting a WS into a totally ordered set of words—abstracting away, for the moment, from the complex process of word-formation—we can come up with some basic principles that likely govern such a process. The LCA, for instance, would be a prime candidate for such a basic principle. A more general one, though, would be a principle of Conservation of Information—akin to what are called “Faithfulness Constraints” in Optimality Theory. One aspect of Conservation of Information would no doubt be that linearization should not change or delete an ordering statement without good cause. So, if two elements are already ordered with respect to each other, say by a previous process of linearization or, more importantly for us here, by the intrinsic order of the WS being processed, we should not change or delete that ordering unless it is required by some other constraint. We will see a case in which established order must be altered a bit later.

An anonymous reviewer points put that taking WSs to be ordered sets is contrary to Chomsky (2020) who explicitly posits that workspaces are unordered sets. Chomsky, however, gives no explicit rationale for his assumption and nothing in the broader system appears to hinge on it, so it seems to be an idle assumption. Furthermore, the minimalist reasoning that leads us to hypothesize that SOs are unordered sets does not necessarily apply to workspaces, especially if workspaces are part of the broad faculty of language, like operations such as minimal search. To show that this is at least a reasonable assumption, consider Chomsky’s argument for MERGE as set-formation, summarized below.

Language is both specific and universal to humans, and therefore the genetic basis for language must be identical across humans and absent in non-humans. The simplest explanation for this is that language and its genetic basis are simple, and evolved in a single step. Given the basic property of language—discrete infinity—the genetic basis for language is one that creates a simple computational/generative operation in the mind, call it MERGE. The simplest such operation is one that creates the simplest possible complex object—an unordered binary set. Therefore, absent any compelling evidence to the contrary, we should assume that MERGE creates unordered binary sets.

This sets up the advent of MERGE as something of a dividing line. So, any cognitive operation or structure—like workspaces—must either have evolved before, after, or as part of MERGE, with each possibility having different implications. Since, MERGE is defined in terms of workspaces, we can set aside the possibility that workspaces evolved after MERGE and focus on the other two possibilities. Both are possibilities—MERGE and workspaces may be inseparable, or MERGE being a novel operation that used the already existing workspaces. If the former is the case, then the narrow minimalist reasoning suggests that workspaces are unordered sets. If the latter is true, though, then workspaces did not necessarily evolve in a single step, but possibly over several millennia of steps, and simplicity is not a given. The latter, that workspaces predated MERGE and are ordered, seems more plausible, so I assume it pace Chomsky.

In what follows, I will refine this proposal somewhat, but the core claim—that adjuncts are derived separately and remain separate from their hosts—will remain the same. I pause here to note that this solution broadly accounts for adjunct without recourse to novel operations or major modifications to the architecture of the grammar, and is therefore preferable, on minimalist grounds, to theories which do introduce novel theoretical machinery such as Pair-Merge.

4.1 The Problem of Adjunct Scope

The sentence in (26) is ambiguous.


Sharon made the error deliberately.

It can be interpreted as saying either that Sharon intended to make the error in question, or that she made the error in a deliberate manner. The conclusion drawn from this sort of ambiguity is that the adverb deliberately has two possible scopes—A high scope resulting in the first interpretation, and a low scope resulting in the second interpretation. Under an X-bar theory of adjuncts, this can be easily accounted for by aligning scope with attachment site as in Figure 2 and Figure 3.

Click to enlarge
Figure 2

The High-Scope Interpretation of (26) in X-Bar Theory

Click to enlarge
Figure 3

The Low-Scope Interpretation of (26) in X-Bar Theory

As it stands, however, the parallel derivation theory of adjuncts cannot account for adjunct scope. Or, to be more precise, it cannot account for the fact that adjuncts can have multiple scope possibilities. This can be seen when we consider how we would represent (26) in a workspace-based analysis—as the juxtaposition of Sharon made the error and deliberately as shown in (27), which would be the result of deriving the clause and adverb independently of each other.


〈{Sharon, {T, . . . {Voice, {make, {the, error}}}}}, deliberately〉

If we take a full declarative clause to describe a situation or state of affairs, then, according to (27), (26) would describe a situation s, such that in s Sharon made the relevant error, and that s was brought about by a deliberate choice of the agent of s. In other words, the proposed workspace-based theory of adjuncts seems to predict only the high-scope interpretation of (26).

In order to modify our proposal to allow for adjunct scope, we must first realize that adjunct scope-taking is different from other kinds of scope-taking, such as quantifier scope. Usually, when we talk about scope, we have in mind an asymmetric relation. So the two readings of (28) can be described by saying which of the two quantifier phrases scopes over the other.


Every student read a book.

  1. s(∃b(read(b, s)))

  2. b(∀s(read(b, s)))

The relationship between a modifier and a modified expression, however, is generally considered to be symmetric, at least in terms of their interpretation.7 So, in the low-scope interpretation of (26), the logical predicate expressed by deliberately is conjoined with the one expressed by make an error, as shown in open formula (29).


(make(the-error, e) & deliberate(e))

It does not, then, make sense to say that deliberately “scopes over” the VP. We can still ask, though, why does deliberately conjoin with the VP and not, say, with AspP, or TP. The answer, at least in X-bar terms is obvious—the adverb and the VP conjoin because they are in the same position, that is [Comp, Voice]. In other words, deliberately conjoins with the VP, because both scope directly under Voice, and therefore, indirectly under everything that scopes over Voice.

This rethinking of adjunct scope, then suggests a workspace-based analysis of the low scope interpretation of (26), shown in (30).

{ S h a r o n , { T , { S h a r o n , { V o i c e , { m a k e , { t h e , e r r o r } } } } } } , { S h a r o n , { T , { S h a r o n , { V o i c e , { d e l i b e r a t e l y } } } } }

Here we can say that deliberately and the VP are in the same position, as they are both the complement of Voice in their respective SOs. Such a representation, however, raises three obvious questions, especially noting that Sharon appears in both host and adjunct:

  1. How is (30) interpreted?

  2. How is (30) pronounced?

  3. How is (30) derived?

I address these three questions in turn directly.

4.1.1 How Is (30) Interpreted?

The workspace in (30) contains two syntactic objects, each of which is a finite clause. I will assume that the interpretation of each clause contains an event description and a specification of how the event described relates to the context of utterance. For the sake of clarity, I will consider only the event-description portion of the meaning.

So the event description contained in the first object—the one associated with the clausal host— is given in (31), and the event description contained in the second object—the one associated with the adverbial adjunct—is given in (32).


(make(e) & Agent(e)(Sharon) & Theme(e)(the-error))


(Agent(e)(Sharon) & deliberately(e))

If, as I conjectured in the first part of this section, (31) and (32) yields the conjunction of the two, and if we take the further simplifying step of eliminating redundant conjuncts, we get the correct interpretation in (33).


(make(e) & Agent(e)(Sharon) & Theme(e)(the-error) & deliberately(e))

Whether there is some process for eliminating redundant conjuncts instantiated in our cognitive faculties is not clear. What’s more, it is not obvious how we could test for such a process. Assuming that redundant conjuncts are eliminated in the final interpretations of expressions like (26), however, will save space in this paper and reduce the amount of typing on my part, so I will do so going forward.

More could be said, of course, about the interpretation of (30), but I will leave this as a task for further research and move on to the question of pronunciation

4.1.2 How Is (30) Pronounced?

The problem posed for pronunciation by (30) is that the adjunct contains most of a clause which is not pronounced. That is, Sharon, T, Voice, etc. must be deleted somehow. Recall from Section 3.3 that the basic rule of deletion is that if a syntactic object contains two constituents, α and β, such that α = β and α asymmetrically c-commands β, then β is deleted.

The notion of identity here, must capture copies, but not repetitions, so in order for the various phrases and heads to be deleted from the adjunct we must show that they can be treated as copies of the corresponding phrases and heads in the host. Since the distinction between copies and repetitions is to follow from the derivational history of an expression, I will postpone the question of identity until the following section and stipulate, for the moment, that Sharon, T, Voice, etc. in the adjunct are considered copies of their counterparts in the host.

As for the c-command requirement for deletion, it is quite plain that it cannot apply to the deletion of copies in different SOs as in (30). However, if we broaden the c-command requirement on deletion to one of a more general ordering (α > β), then it can apply to elements of separate syntactic objects, since SOs in a workspace are ordered with respect to each other.

This broadening of the c-command requirement may seem ad hoc on its face, but there is a good reason to think that an operation like deletion is not sensitive specifically to c-command. That reason is that, as decades of research suggest, the syntactic component is the only component of the language faculty that is particular to the language faculty. It follows from this that deletion, an operation of the externalization system, is not particular to language. Since it is not particular to language, it should not be defined in language-particular terms. Therefore, defining deletion in terms of ordering as opposed to c-command is theoretically preferred.

So, turning back to the task at hand, (30) is pronounced by deleting all the redundant structure in the adjunct. This occurs because every element of the deleted structure is identical to an element in the host and ordered with respect to that matching element.

Again, this is a rather simplified picture of how externalization works. To get a sense of the additional complexities, consider the case of sentence-medial adverbs, as in (34), which we can assume has the underlying structure in (35) with the adjunct SO preceding the host SO.

Mary quickly finished her homework.
{ M a r y , { T , { M a r y , { V o i c e , q u i c k l y } } } } SO1 , { M a r y , { T , { M a r y , { V o i c e , { f i n i s h , { h e r , h o m e w o r k } } } } } } SO2

Applying our simple externalization reasoning outlined above, we will linearize according to the intrinsic ordering of the WS and the c-command-determined ordering of the individual SOs, and delete redundant structure based on that ordering, yielding the deviant string in (36).


Mary T Voice quickly finish her homework.

This string is deviant for two related complementary reasons. First, T and Voice in English cannot be pronounced independently—they are bound morphemes.8 Second, affix-hopping in English requires adjacency of the T-Voice-V sequence.

If we assume that these specific constraints, however they may be formulated, are able to override the general deletion rules and Conservation of Information—a reasonable assumption since overriding general considerations is precisely what specific constraints do—then T and Voice in SO1 would be deleted and their instances in SO2 would remain, as in the string in (37), which eventually surfaces as (34) after affix-hopping.


Mary quickly T Voice finish her homework.

Thus, the results of parallel derivation are pronounced as part of the complex and language-specific process of externalization.

4.1.3 How Is (30) Derived?

The derivation of host-adjunct structures such as (30) can be divided into to parts. In the first part, the two objects—host and adjunct—are derived independently of each other, and in the second part, the objects are derived in lockstep. So, for instance, merging Aspperf to the host object is accompanied by merging Aspperf to the adjunct object, and so on. The first part represents the standardly assumed operation of workspaces, and is, therefore, already understood, at least insofar as workspaces are understood. The second part—the part involving lockstep derivation—is novel and its explanation will occupy this section.

The result of the first part of the derivation is given in (38) below.


〈{make, {the, error}}SO1,{deliberately}SO2, SharonSO3

Let’s suppose that nothing forces the objects to derive in lockstep, but rather they derive freely and only result in a host-adjunct structure if their respective derivations mirror each other. This, however, would lead to two problems.

The first problem this poses has to do with the copy/repetition distinction. The externalization system, by hypothesis, deletes copies, not repetitions. Recall that T, Voice, the subject, etc. of the adjunct delete in this case. This deletion would only occur if those objects and their counterparts in the host object were copies of each other and, while the necessary and sufficient conditions on copy-hood are not well understood, there is good reason to believe that content-identity is not sufficient. That is, two instances of, say, VoiceAct are not copies just because they have identical content—it seems they must have an identical derivational history. This could not possibly hold of Voice, T, etc. if the second stage of the derivation under discussion proceeds freely.

The second problem has to do with the subject Sharon. In (30), Sharon is in both derived objects, yet this does not seem possible if each object’s is derivation is fully independent of the other’s. Suppose we reach a stage of the derivation as shown in (39) where the next step must be to merge Sharon into SO1 and SO2 as the Agent.


〈{Voice, {make, {the, error}}}SO1, {Voice, {deliberately}}SO2, SharonSO3

If we were to MERGE Sharon with SO1, as shown in (40), it would be rendered inaccessible to SO2, and vice-versa.


〈{Sharon, {Voice, {make, {the, error}}}}SO1, {Voice, {deliberately}}SO2

Thus, there would no longer be any way to derive the two objects in lockstep. While this problem seems to be distinct from that of the copy/repetition problem above, it has the same solution—defining MERGE such that it lockstep derivation can be forced. I turn to such a definition presently.

Formal Definitions of MERGE

As discussed in Section 3.2, Chomsky (2020) argues that the standard conception of Merge—Merge(α, β) → {α, β}—needs to be replaced with a new one, called MERGE, which meets a number of desiderata. One such desideratum is that MERGE should be defined in terms of workspaces, rather than syntactic objects. In order to do this we must first provide some definitions for workspaces and other derivational notions. These definitions are given in (41)–(42).


A derivation D is a finite sequence of workspaces (WS1, WS2,. . ., WSn),

where D(i) = WSi.


A workspace WS is a finite sequence of syntactic objects (SO1, SO2,. . . SOn),

where WS(i) = SOi.

In addition to the workspace desideratum, MERGE should also “restrict computational resources” (Chomsky, 2020), by ensuring that when a new object is created by MERGE, its constituent parts do not remain accessible in the workspace. That is, MERGE substitutes the new object for the old objects. The definition of MERGE in (43), where “+” represents an “append” operation and “−” represents a “delete” operation, meets the two desiderata that I have mentioned thus far.9

Where ω is a workspace, and α and β are syntactic objects, MERGE 3 ω , α , β { α , β } + ω - α - β if α and β are in ω { α , β } + ω - α if α is in ω and β is in α undefined otherwise

So, MERGE takes three arguments a workspace ω, and two SOs α and β and, provided α is a member of ω and β is either a member of ω or contained in α, it results in a new workspace. This new workspaces, call it ω', differs from ω only in that the new set {α, β} is a member of ω', and neither α nor β are members of ω'.

Note that the definition of merge in (43) stipulates the distinction between internal and external merge. By hypothesis, though, the two cases of merge should fall out from a single definition of merge. Without the stipulation, it’s likely that unrestricted parallel merge (Citko, 2005) or sideward merge (Nunes, 2004) would be derivable in this system. As discussed in Section 3.2, though, once such varieties of merge are allowed, there is virtually no restriction on what can be derived.

Being a computational procedure, MERGE ought to proceed in steps. Therefore, it should be a curried (or schönfinkeled) function.10 So, MERGE would be defined as in (44), with M standing in for the intension of MERGE (i.e., the right side of the arrow in (43)).


MERGE = (λω.(λα.(λβM)))

Curried functions are a variety of higher-order functions because they have functions as outputs in contrast first-order functions whose inputs and outputs are strictly non-functional. Under this version of MERGE a step of external merge is divided into three steps as follows. First, as in (45), MERGE is applied to the workspace argument, resulting in another curried function, this one with two lambda terms.


(λω.(λα.(λβ.({α, β}+((ω−α)−β)))))(W) → (λα.(λβ.({α, β}+((W−α)−β))))

Next, the new function is applied to an SO argument, resulting in another function with one lambda term.


(λα.(λβ.({α, β} + ((W − α) − β))))(X) → (λβ.({X, β} + ((W − X) − β)))

Finally, this function is applied to another SO argument, resulting in a new workspace.


(λβ.({X, β} + ((W − X) − β)))(Y) → {X, Y} + ((W − X) − Y)

For relative ease of reading, I will use a shorthand for these lambda expressions, borrowed from Formal Semantics which represents applied arguments as superscripts. The notation fx, for instance, indicates the result of f being applied to x. In Formal Semantics, this is used to represent relative interpretation functions, such as ⟦⋅⟧g,w, which indicates the interpretation function relative to assignment g and world w. The three step external MERGE, then can be represented as in (48).

  1. MERGE(W) → MERGEW (= (45))

  2. MERGEW(X) → MERGEW,X (= (46))

  3. MERGEW,X(Y) → {X, Y} + ((W − X) − Y)(= (47))

This definition of MERGE as a curried function also allows us to somewhat explain the accessibility restrictions on MERGE in a less stipulative way. We can do so by first hypothesizing that each input to MERGE partially defines the domain of the resulting function. So, the domain of MERGE is the set of workspaces and, when MERGE is applied to a workspace W, it yields MERGEW as in (48a). The domain of MERGEW, then, is the workspace W and, assuming X is a member of W, applying MERGEW to X yields MERGEW,X as in (48b). The domain of MERGEW,X, then, is something like the union of W and X. That is MERGEW,X can apply to any member of W—yielding External MERGE–or any object contained in X—yielding Internal MERGE. Note that this requires a second hypothesis, namely that workspaces and syntactic objects have distinct notions of “membership”, with workspace membership being something like set membership and syntactic object membership being a recursive membership—Y is contained in X if Y ∈ X, or for some Z ∈ X, Y is contained in Z. Making these two hypotheses, though, gives us some understanding of why MERGE would have Internal and External cases, but not Parallel or Sideward cases.

The Map Function

In the previous section I noted that curried functions are a class of higher-order functions because they have functions as outputs. In this section I will introduce a higher-order function that takes functions as inputs—the map function— which will be key to achieving lockstep parallel derivations. Informally speaking, map takes a function and applies it to a list of arguments. Formally, map is defined in (49).


map(f, (x0, x1, … xn)) → (f(x0), f(x1), . . . f (xn))

Now, lets consider how lockstep parallel derivations would proceed. The stage at which the lockstep derivation begins was given in (38) and repeated here as (50).


〈{make, {the, error}}SO1,{deliberately}SO2, SharonSO3〉 (=WS1)

The next step is to select VoiceAct from the lexicon and merge it with SO1 and SO2. Selection is achieved by a simple operation Select, which is defined in (51) as extending a workspace to include a token of a lexical item.11


For workspace W and lexical item LI, Select(W)(LI) → W + LI

Selecting VoiceAct for the workspace in (50) proceeds as follows.

Select(WS1)(Voice Act )→ { m a k e , { t h e , e r r o r } } SO1 , { d e l i b e r a t e l y } SO2 , S h a r o n SO3 , Voice Act (= WS2)

Next, we can merge VoiceAct with SO1 and SO2 in three steps. First we apply MERGE to WS2, as shown in (53), and then apply the resulting function to VoiceAct as in (54).




MERGEWS2(VoiceAct) → MERGEWS2,VoiceAct

The result of these two steps is a function which we can map to our host and adjunct objects as in (55).


map(MERGEWS2,VoiceAct)(〈SO1, SO2〉)

The final result requires some discussion. By our definition of map in (49), the result should be a list of the individual applications of the function, as represented in (56).


〈MERGEWS2,VoiceAct(SO1), MERGEWS2,VoiceAct(SO2)〉

This, though will be a list of workspaces, given in (57), which is not a legitimate object according to our formalism. What’s more, this list of workspaces will be riddled with redundancy—SO1, SO2, and SO3 are contained in both workspaces, albeit within larger SOs in some cases, and in the same order.


〈〈{VoiceAct, SO1}, SO2, SO3〉, 〈SO1, {VoiceAct, SO2}, SO3〉〉

This situation is a violation of the broad principle of Resource Restriction proposed by Chomsky (2020), which we can ameliorate with the help of two more specific constraints defined in (58) and (59).



Delete all X in U such that there is some Y in U and Y contains X.



For all operations f applied to U, yielding U′, if relation R holds in U and is not explicitly altered by f, then R holds in U′

These constraints, though expressed in a way useful for a theory of language, seem to be ideal candidates for general cognitive principles. The former, for instance, seems to be active in our perceptual systems which only transmit a fraction of their input to the mind, discarding redundant data. The latter, on the other hand, ensures that cognitive processes can be Markovian (i.e., memoryless) without loss of information.

By applying these constraints to (57)—deleting the copies of SO1 and SO2 that are directly contained in their respective workspaces because copies of them are also contained in the newly created SOs, while maintaining the relative ordering of the remaining objects—we get the single workspace (60).

〈{VoiceAct, {make, . . . }SO1}SO1′, {VoiceAct, deliberately}SO2}SO2′, SharonSO3〉 (=WS3)

Note that, although the above discussion assumed that map created the object (57) which was then flattened to (60) by applying the Resource Restriction constrains, a more reasonable assumption is that the constraints apply not to representations, but to operations. So, just we define MERGE in (43) such that it removes redundant material, to too should we assume that map respects NoRedundancy and ConserveInformation. Under this assumption, then, the operation in (55) directly generates (60) without the generating intermediate object (57).

The next step in our lockstep derivation is to merge the external argument Sharon with both host and adjunct. This step, shown in (61), will proceed much in the same way as the above-described step, but without the need to select anything from the lexicon.

map(MERGE(WS3)( S h a r o n ))→ { S h a r o n , { Voice Act , { m a k e , } } } , { S h a r o n , { Voice Act , { d e l i b e r a t e l y } } } (=WS4)

The derivation will continue in this manner, selecting lexical items as needed and merging them with the two syntactic objects until we reach the point, represented in (62), at which the external argument Sharon must internally merge as the subject.


〈{T,…{Sharon,… {make,…}}}SO1,{T,…{Sharon, . . . {deliberately}}}SO2〉 (= WSN)

Here we face a complication. Our first step is to apply MERGE to the workspace WSN yielding MERGEWSN. Based on the pattern set up above, we might try to apply MERGEWSN to the external argument Sharon, however this is not a legitimate move, as the Sharon is not in WSN, which is the domain of MERGEWSN. Instead, we map MERGEWSN to the two objects giving us a list of functions as shown in (63).



To complete this merge step, we need to apply each of these new functions to Sharon— we need an inverted map function, which applies a list of functions to a single input. We can construct such a function with another higher-order function—apply, defined in (64)—and lambda abstraction.


apply(f)(x) → f(x)

Our final step, then, is shown in (65).


map(λf.apply(f)(Sharon))(〈MERGEWSN,SO1, MERGEWSN,SO2〉)

→ 〈MERGEWSN,SO1(Sharon), MERGEWSN,SO2(Sharon)〉

→ 〈{Sharon, SO1}, {Sharon, SO2}〉 = (30)

4.2 Curried MERGE and Non-Adjunction Structures

Before continuing, it is worth noting that, despite what might seem like drastic changes to the Grammar, the process of deriving a simple sentence—one without adjuncts— remains largely the same. Consider the derivation of (66) in (67).


Leviathan smiles.

  1. WS1 = 〈smile, Leviathan, T, Voice, C〉

    MERGE(WS1)(smile)(Voice) → WS2

    An abbreviation of:

    1. MERGE(WS1) → MERGEWS1

    2. MERGEWS1(smile) → MERGEWS1,smile

    3. MERGEWS1,smile(Voice) → WS2

  2. WS2 = 〈{Voice, smile}, Leviathan, T, C)

    MERGE(WS2)({Voice, smile})(Leviathan) → WS3

  3. WS3 = 〈{Leviathan, {Voice, smile}}, T, C〉

    MERGE(WS3)({Leviathan, . . . })(T) → WS4

  4. WS4 = 〈{T, {Leviathan, {Voice, smile}}}, C〉

    MERGE(WS4)({T, . . . })(Leviathan) → WS5

  5. WS5 = 〈{Leviathan, {T, {Leviathan, {Voice, smile}}}}, C〉

    MERGE(WS5)({Leviathan, {T, . . . }})(C) → WS6

  6. WS6 = 〈{C, {Leviathan, {T, {Leviathan, {Voice, smile}}}}}〉

As can be seen here, a derivation of a sentence without adjuncts using Curried MERGE, appears to be a notational variant of the same derivation using the three-place MERGE.

5 Corroborating Evidence

In this section, I will outline a few problems related to adjunction that the proposed theory provides natural solutions to. First, I will address the island-hood of adjuncts. Then, I will discuss parasitic gaps, whereby adjunct island-effects are ameliorated. Finally, I will discuss a class of facts commonly associated with Cartographic/Nanosyntactic approaches to syntax—adjunct ordering constraints.

5.1 The Islandhood of Adjuncts

A well-known property of adjuncts is that they are islands to movement. Indeed, Bošković (2020) points out that, while the island-hood of many other constructions varies across languages, adjunct island-hood, along with its apparent exceptions, some of which I address in Section 6.3, seems to be constant.12 So, for instance (68) is an ungrammatical question, and (69) contains an ungrammatical relative clause because they both require an instance of wh-movement out of an adjunct.


*Whati did she eat an apple [after washing _i]?


*The student whoi he invited Barbara [without meeting _i]

To see how the theory of adjuncts I propose here predicts adjunct island-hood consider the stage of the derivation of (68) immediately before wh-movement occurs. As shown in (70)—which abstracts away from certain irrelevant structural details—the wh-expression what is in the adjunct object (SO2), which “scopes over” the TP. Note that both syntactic objects contain a Cwh head as a result of lockstep derivation.


〈{Cwh, {she, {T, . . . }}}SO1, {Cwh, {after, {washing, what}}}SO2〉 (= WS1)

In order to derive (68), we would need a wh-movement operation such as (71).



The result of this operation, however, is undefined because what is neither a member of WS1, nor contained in SO1.

The operation in (72), on the other hand, is defined and would yield the stage in (73).




〈{Cwh, {she, {T, . . . }}}SO1, {what, {Cwh,{after, {washing, what}}}}SO2〉 (= WS1)

This stage is problematic for two reasons. First, the Cwh head in SO1 would bear an unsatisfied wh-feature which would lead to a crash at the CI interface. Second, (73) would not yield (68) when linearized because what, being in SO2, would be ordered after all of the words in SO1. That is, we would expect (73) to be linearized as (74).


*She ate an apple what after washing.

Thus, the island-hood of adjuncts follows naturally from my proposed theory of adjuncts.

5.2 Parasitic Gaps

The island-hood of adjuncts, though constant across languages, is circumvented in so-called parasitic gap constructions (Engdahl, 1983) as in (75) and (76).13


Whati did she eat _i [after washing eci]?


The student whoi he invited _i [without meeting eci]

Here the parasitic gaps in the adjuncts, represented here as ecs, are licensed if there is a parallel trace in the host. This required parallelism is both syntactic—the trace and the parasitic gap have the same grammatical role (i.e. direct object in (75) and (76))—and semantic—the trace and parasitic gap co-refer.

Here, the mechanism for ensuring lockstep derivation—higher-order functions— allows us to derive parasitic gaps. To demonstrate this, consider the penultimate stage in the derivation of (75) shown in (77).


〈{Cwh, {she, {T, {. . ., whati}}}}SO1, {Cwh, {after, {washing, whati}}}SO2〉 (= WS1)

Note that the two instances of what here are copies of each other, meaning they share a derivational origin. The final stage of (75), given in (79) is derived, as shown in (78), using the same pattern we used for parallel internal MERGE in (62)–(65).


  2. map(MERGEWS1)(〈SO1,SO2〉) → 〈MERGEWS1,SO1, MERGEWS1,SO1

  3. map(λf.apply(f)(what))(〈MERGEWS1,SO1, MERGEWS1,SO1〉)→(79)


〈{whati{Cwh, {she, {T, {. . ., whati}}}}}SO1′, {whati{Cwh, {after, {washing, whati}}}}SO2′

As discussed in Section 4.1.2, all instances of what i except for the highest instance in the first SO is deleted, yielding the string (75).

Thus, parasitic gaps are naturally accounted for in the theory I propose here.

5.3 Cartography’s Facts

There are well-known restrictions on the ordering of adjectives and adverbials—for instance an ordering of size adjectives before shape adjectives, as in (80), is preferred to the reverse order, as in (81).14


a small square table


?*a square small table

Facts such as these are putatively explained within the cartographic/nanosyntactic framework (see Cinque & Rizzi, 2015) with two related hypotheses. The first hypothesis is that there is a universal fixed hierarchy of functional heads such as Size and Shape. The second hypothesis is that adjectives, adverbials, etc. are not adjuncts at all—that they are merged as specifiers of their appropriate functional heads.15 So, If Size and Shape select small and square as their respective specifiers, and Size selects ShapeP as its complement, then (80) can be derived, but (81) cannot.

Before outlining how the proposed theory of adjuncts might account for these facts, it is worth noting that there is no inherent contradiction between the cartographic/nanosyntactic theory of adjectives, adverbs, etc. and the theory of adjuncts being proposed here. As I stated above, the former theory explains (80) and (81) in part by saying that attributive adjectives are not adjuncts but specifiers, and this explanation can be extended to other similar ordering restrictions. The theory proposed here, though, is not a theory of adjective, adverbs, or prepositional phrases—it is a theory of adjuncts. Therefore, the proposition that attributive adjectives are specifiers, rather than adjuncts, merely implies that attributive adjectives are beyond the scope of my theory.

One might object to this by asserting that cartography/nanosyntax in fact makes a stronger claim—that all adjuncts are specifiers. Such a claim, though, is self-contradicting in the same way as a claim that all odd numbers are even would be. A coherent version of this claim is that there are no adjuncts, really—everything we thought was an adjunct is actually a specifier. Such a claim does not so much contradict the theory proposed here as render it empirically inert. The examples in (1)–(3), with which I began this paper, along with the classic example in (82), however, suggest that this strong version of cartography/nanosyntax cannot be maintained.


the tall, tall, tall, . . . tall building

A cartographic/nanosyntactic analysis of (82) would crucially need to uniquely associate each adjective—each instance of tall —with a functional projection. So which projection contains each tall? Perhaps one of the repetitions for tall is the specifier of SizeP, but that would leave all of the other tall s without a functional projection to merge with. Thus, the strong version of cartography/nanosyntax fails to provide an analysis of (82).

The theory of parallel derivation being proposed here, then, is compatible with a weak version of cartography/nanosyntax or, at least, a version of moderate strength. My goal in this section, however, is to extend the parallel derivation theory to explain some of the central facts cartography/nanosyntax, thus putting the two theories in conflict with each other.

The extension of the theory involves two auxiliary hypotheses—the Universal Functional Sequence hypothesis, and the hypothesis that operations on non-contiguous segments of the workspace are more costly than those on contiguous segments.

The first hypothesis, which is lifted from cartography/nanosyntax, is that there is some universal ordered set of functional heads, and that the ordering of that set is reflected in the c-selection relation. So, the data in (80) and (81) is follows, at least partially, from the conjecture that Size can c-select Shape, but not vice versa. I diverge from the cartography/nanosyntax explanation, though, in that I don’t argue that (81) involves Shape incorrectly c-selecting Size. Rather, the deviance of (81) comes from the fact that it requires an operation on a non-contiguous segment of the workspace, as I demonstrate below.

To begin, I give the derivation of (80)—a nominal phrase with an acceptable adjective sequence—in Table 1, followed by the derivation of (81)—a nominal phrase with a deviant adjective sequence— in Table 2.16 Recall that the linear ordering of SOs in a workspace is hypothesized to be reflected in the linear ordering of their respective externalizations. So, in Table 1 the fact that the SO based on the adjective small precedes the one based on the adjective square is reflected in the fact that, when this workspace is pronounced, “small” preceded “square,” and so on. The key point of comparison here is between respective second steps, in which Shape is merged. In Table 1, this step maps MERGEWS2,Shape to a contiguous segment of the workspace. In Table 2, on the other hand, this step maps the same curried function to a non-contiguous segment. If we make the auxiliary hypothesis that mapping over a contiguous sequence is more computationally efficient than mapping over a non-contiguous sequence, then we have a possible explanation of the deviance of (81) and, by extension, a possible explanation of adjunct ordering restrictions. That is, violations of adjunct ordering restrictions, rather than being violations of c-selection restrictions, are the result of suboptimal derivations.

Table 1

The Partial Derivation of (80)

(Start) { s m a l l } SO1 , { s q u a r e } SO2 , TABLE SO3 , n , S I Z E ,   S H A P E (WS1)
MERGE(WS1)(SO3)(n) { s m a l l } SO1 , { s q u a r e } SO2 , { n , TABLE } SO3 , S I Z E ,   S H A P E (WS2)
map(MERGE(WS2)(shape))(〈SO2,SO3〉) { s m a l l } SO1 , { S H A P E , { s q u a r e } } SO2 , { S H A P E , { n , TABLE } } SO3 , S I Z E (WS3)
map(MERGE(WS3)(size))(〈SO1,SO2,SO3〉) { S I Z E , { s m a l l } } SO1 , { S I Z E , { S H A P E , { s q u a r e } } SO2 , { S I Z E , { S H A P E , { n , TABLE } } } SO3 (WS4)
Table 2

The Partial Derivation of (81)

(Start) { s q u a r e } SO1 , { s m a l l } SO2 , TABLE SO3 , n , S I Z E ,   S H A P E (WS1)
MERGE(WS1)(SO3)(n) { s q u a r e } SO1 , { s m a l l } SO2 , { n , TABLE } SO3 , S I Z E ,   S H A P E (WS2)
map(MERGE(WS2)(Shape))(〈SO1,SO3〉) { S H A P E , { s q u a r e } } SO1 , { s m a l l } SO2 , { S H A P E , { n , TABLE } } SO3 , S I Z E (WS3)
map(MERGE(WS3)(Size))(〈SO1,SO2,SO3〉) { S I Z E , { S H A P E , { s q u a r e } } SO1 , { S I Z E , { s m a l l } } SO2 , { S I Z E , { S H A P E , { n , TABLE } } } SO3 (WS4)

Under the present approach, adjectives still merge with their respective functional heads, but as complements. That is, the structural relation between functional heads, like Size, and modifiers, like small, is the same as the relation between roots and their categorizing heads. It follows from this that modifiers merged with the interpretive relation between functional head and modifier should be the same as the one between categorizing heads and roots. This prediction is borne out in the intuitive understanding of polysemy.

Consider, for instance, how one would define the word work. Since it is polysemous we would have to give a list of definitions—we would say “work as a noun means …” followed by “work as a verb means …”, or vice versa. We could formalize these as in (83).

  1. SEM({n, WORK }) = …

  2. SEM({v, WORK }) = …

Now compare this to the adjective light which is many ways polysemous. Our list of definitions would be as follows—“light as a color adjective (as in light skin) means …”, “light as a weight adjective (as in light jacket) means …”, “light as an evaluative adjective (as in light opera) means …”, and so on. Again, we can formalize these as in (84).

  1. SEM({color, light}) = …

  2. SEM({weight, light}) = …

  3. SEM({value, light}) = …

In both cases, we replace the as-a relation with the head-complement relation. If such a move were made in isolation, it would be quite innocuous, even trivial. In the current context, though, the move was a logical result of a substantive hypothesis and should, therefore, be seen as corroborating evidence in favor of that hypothesis.

5.4 Concord vs Agreement

Among languages whose adjectives show ϕ-feature morphology, there is further division based on the contexts in which that morphology shows up. In French, for instance, ϕ-morphology shows up on attributive adjectives—matching the ϕ-features of their host noun—and predicative adjectives—matching the ϕ-features of the subject—as shown in (85) and (86), respectively.

  1. la  femme  grand–e
    the.FSg  woman  tall–FSg
    "the tall woman"

  2. le  garçon  grand–∅
    the.MSg  boy  tall–MSg
    "the tall boy"

  3. les  filles  grand–es
    the.Pl  girls  tall–FPl
    "the tall girls"

  1. La  femme  est  grand–e
    the.FSg  woman  is.  tall–FSg
    "The woman is tall."

  2. Le  garçon  est  grand–∅
    the.MSg  boy  is  tall–MSg
    "The boy is tall."

  3. Les  filles  sont  grand–es
    the.Pl  girls  are  tall–FPl
    "The girls are tall."

In contrast, German adjectives show ϕ-features in attributive positions but not in predicative positions as shown in (87) and (88), respectively.

  1. keine  groß–e  Frau
    no.FSgNom  tall–FSgNom  woman
    "no tall woman"

  2. kein  groß–er  Junge
    no.MSgNom  tall–MSgNom  boy
    "no tall boy"

  3. keine  groß–en  Mӓdchen
    no.NPlNom  tall–NPlNom  girls
    "no tall girls"

  1. Keine  Frau  ist  groß.
    no.FSgNom  woman  is  tall
    "No woman is tall."

  2. Kein  Jung  ist  groß.
    no.MSgNom  boy  is  tall
    "No boy is tall."

  3. Keine  Mӓdchen  sind  groß.
    no.NPlNom  girls  are  tall
    "No girls are tall."

Put in commonly-used descriptive language, both French and German adjectives undergo (nominal) concord—shown in (85) and (87)—while only French adjectives undergo (subject) agreement—shown in (86) and not in (88).

If we assume, following Milway (2019), that (i) adjective agreement comes from the same process as finite verb agreement, and (ii) French and German, for example, differ from each other in that the French adj0 head bears unvalued ϕ-features, while the German one does not, then we can explain the facts demonstrated in (86) and (88). This, however, leaves the question of how concord happens, for which my proposed theory of adjuncts offers an answer.17

First, consider the simple German nominal phrase in (89) which is specified for Case, gender, and number.

eine  Brücke
a.FNom  bridge
"a bridge"

Setting aside gender for now, we can assume that Case an number features are housed not on the noun itself, but on functional heads in the noun’s extended projection—Case is on D, number on Num. Therefore, we can analyze (89) roughly as in (90).


〈{einF.Nom, {NumSg, {Brücke}}}〉

Now, consider the nominal phrase (91) which has an adjunct that shows concord.

eine  klein-e  Brücke
a.FNom  small-FSgNom  bridge
"a small bridge"

On the same assumptions as above, we can analyze (91) as in (92).


〈{einF.Nom, {NumSg, {klein}}}, {einF.Nom, {NumSg, {Brücke}}}〉

There is no need to get features from the noun to the adjective here, since the relevant features—F, Sg, Nom—are in both syntactic objects by virtue of lockstep parallel derivation.

This is, of course, far from a full analysis of all concord phenomena,18 but rather, a proof-of-concept—a demonstration that concord may be explained in this theory of adjuncts without recourse to complicated operations like Agree.

6 Apparent Counterexamples

Any worthwhile scientific theory should make empirical predictions. The preceding section discusses some of the correct empirical predictions of the theory that I have proposed. An honest assessment of the history of science, however, would show that most new theories make several wrong empirical predictions.19 In this section I will discuss four apparently faulty predictions of my theoretical proposal.

The first such prediction is that host elements cannot c-command any adjunct elements unless they are also adjunct elements. There are many instances, though, in which a pronoun in the host clause seems to be able to bind, and therefore c-command, an R-expression in an adjunct. The second is that, according to my proposal, a host and adjunct do not form a constituent. Many standard constituency tests, though, suggest otherwise. Third, my proposal predicts that all adjuncts are islands, though there are certain classes of apparent adjuncts which allow wh-extraction from them. Finally, my proposal that adjuncts are separate objects from their hosts seems to clash with cases where adjuncts seem to undergo movement, such as wh-questions and topicalization. In the remainder of this section I will discuss each of these in turn.

6.1 Adjuncts and Principle C

An anonymous reviewer notes that despite my proposal’s predictions to the contrary, there is evidence that elements in the host of a sentence can c-command into an adjunct. The evidence for this claim was in the form of the principle C violation in (93).


Hei/*j asked which picture that Johnj liked Mary bought.

Other than the island constraints, there is perhaps no more common source of data that informs theorizing about adjuncts than binding principle C. Unlike the data from island constraints—which is rather uniform—the data from principle C is varied and rather muddy.

Lebeaux (1988), for instance showed that fronted phrases that contained adjuncts showed antireconstruction effects with respect to principle C. Compare the sentences in (94) and (95).

  1. *Hei destroyed those pictures of Johni.

  2. *Hei destroyed those pictures near Johni.

  1. *Which pictures of Johni did hei destroy?

  2. Which pictures near Johni did hei destroy?

The ungrammatical sentences in (94) show that he is able to bind into both an argument (as in (94a)) and an adjunct (as in (94b)). Their counterparts in (95), however, show that binding survives wh-movement for the argument case (95a), but not the adjunct case (95b). Lebeaux uses this as evidence for his claim that adjuncts are added late. In modern terms, Lebeaux would propose that in (95a), there is a copy of John in the c-command domain of he, whereas in (95b) John only exists in the fronted wh-phrase.

Based on this data, we could propose the generalization in (96).


Lebeaux’s Generalization

If A is adjoined to X, and Y c-commands X, then Y c-commands A and its contents, unless A has been fronted.

Speas (1990, pp. 51–52), however, presents data that confounds such a generalization, showing that some types of adjuncts trigger principle C violations even when fronted.


Temporal location vs. locative

  1. In Beni’s office, hei is an absolute dictator.

  2. *In Beni’s office, hei lay on his desk.


Rationale vs. benefactive

  1. For Maryi’s valor, shei was awarded a Purple Heart.

  2. *For Maryi’s brother, shei was given some old clothes.


Temporal vs. locative

  1. On Rosai’s birthday, shei took it easy.

  2. *On Rosai’s lawn, shei took it easy.


Temporal vs. instrumental

  1. With Johni’s novel finished, hei began to write a book of poetry.

  2. *With Johni’s computer, hei began to write a book of poetry.

So, there are cases in which host-elements seem to c-command into adjuncts and there are cases where they do not.

Faced with such a situation and assuming the analysis of the data is correct,20 a theorist of adjuncts has two options, neither of which is good. Either they construct a theory in which the c-command into adjuncts is predicted to be the norm, or they construct a theory in which c-command into adjuncts is barred as the norm. In either case the theorist will have exceptions when it comes to the principle C data presented here.

Beyond the muddiness of the principle C data, I would be remiss if I didn’t note two of its shortcomings as a source of theoretically useful data. First is the fact that we currently lack a proper theory of binding within the biolinguistic/minimalist theory. Hornstein (2009, pp. 20–25) proposes a theory of principles A and B, but stops short of discussing principle C in detail. Second, there is some evidence that principle C binding is not entirely based on c-command. Compare the sentences in (101).

  1. *Hisi mother loves himselfi.

  2. Hisi/j mother loves himi.

  3. Hisi/%j mother loves Johnj.

The principle A violation in (101a) and the lack of principle B violations in (101b), taken together, suggest that the possessive pronoun his does not c-command the direct object (himself/him). The principle C violation in (101c), however, suggest that his does indeed c-command the direct object John. Thus, Principle C data contradicts Principle A/B data.

It is possible, then, that further development of the proposed theory of adjuncts in tandem with a theory of binding could eventually yield a theory in which all the data adduced in this section is accounted for. It is also possible that these facts are naturally accounted for by another theory of adjuncts. Since there is no current candidate for this other theory of adjuncts, I will leave the data points in this section as fodder for future research.

6.2 Adjuncts and Constituency Tests

If adjuncts are completely separate objects from their hosts, as this paper proposes, then host and adjunct together should not form a constituent. An anonymous reviewer, however, points out that if a sentence like (1) undergoes VP-fronting, the adverbial adjunct is fronted along with the VP host as in (102).


Sing the song with gusto, Rosie did.

This seems to indicate, contra my proposal, that sing the song with gusto is a constituent. There is however, an alternative explanation once one considers the fuller theory of grammar which my proposal is embedded in.

The first hint at this explanation is that the thing that moves in VP-fronting is likely a phase which, according to Chomsky (2013), means it has undergone labeling. Consider, then, the structure of the fronted “VP” which undergoes labeling in (103).


〈{Voice, {sing, {the, song}}}SO1, {Voice, {with, gusto}}SO2

The labeling algorithm of Chomsky (2013) does a minimal search and returns the most prominent element of an object as its label. In the case of both the host SO1 and the adjunct SO2, the label will be Voice. What’s more, by hypothesis, the Voice head in the host and the one in the adjunct are copies of each other, which means the respective labels of the object will be copies of each other.21

Now, turning to the actual process of VP-fronting, let’s hypothesize that, when possible, syntactic operations refer to labels, rather than whole objects. This, I believe, is a reasonable hypothesis, because searching for a single atomic element is likely more efficient than searching for a complex object. This gain in efficiency, though, comes at a cost of precision. Consider, the stage of the penultimate stage of the derivation of (102), shown in (104).


〈{C, {T, {. . . }}}SO1〉(= WS1)

The VP-fronting step will be one of internally MERGE-ing Voice, as in (105)



Since the host and the adjunct are both labeled by the same Voice head22, they will both be targeted by this MERGE operation, and therefore they will be fronted together. Note that this explanation predicts that VP-fronting always fronts any VP adjuncts along with their hosts. This prediction does seem to be borne out as shown by the fact that the VP host cannot be fronted on its own as in (106)


*Sing the song Rosie did with gusto.

Note that other constituency tests, which likely do not involve an actual movement operation, are able to target the host, the adjunct, and both together.

  1. It was sing the song with gusto that Rosie did.

  2. It was sing the song that Rosie did with gusto.

  3. It was with gusto that Rosie sang the song.


We expected Rosie to sing the song with gusto, and . . .

  1. she did so.

  2. she did so with gusto.

  3. she sang the song so.

There is, no doubt much more to be said about this data, and its implications for the interpretation of constituency tests. I will leave that discussion for future research, noting only that the data in question does not seem to rule out a parallel derivation theory of adjuncts.

6.3 Non-Island Adjuncts

I argued in Section 5.1 that my theory of adjuncts predicts their islandhood. Several commentators, though, note that this prediction is contradicted by cases in which adjuncts seem not to be islands to movement. In particular, they point to the cases investigated by Truswell (2011), such as those in (109).

  1. What did you come round [to work on _]?

  2. Who did John get upset [after talking to _]?

  3. What did John come back [thinking about _]? (Truswell, 2011, p. 129)

Truswell (2011) argues that extraction out of adjuncts is governed by what he dubs the Single Event Grouping Condition, given in (110), with auxiliary definitions in (111) and (112).


The Single Event Grouping Condition (Truswell, 2011, p. 157)

An instance of wh-movement is legitimate only if the minimal constituent containing the head and the foot of the chain can be construed as describing a single event grouping.


An event grouping ℰ is a set of core events and/or extended events {e1, . . . en} such that:

  1. Every two events e1, e2 ∈ ℰ overlap spatiotemporally;

  2. A maximum of one (maximal) event e ∈ ℰ is agentive. (Truswell, 2011, p. 157)


An event e is agentive iff:

  1. e is an atomic event, and one of the participants in e is an agent;

  2. e consists of subevents e1, . . . en, and one of the participants in the initial subevent e1 is an agent. (Truswell, 2011, p. 158)

If the possibility of wh-extraction is governed by purely semantic considerations, as Truswell suggests, then theories such as the one proposed in this paper, which derive island-hood on purely syntactic grounds are wrong-headed. Truswell’s proposal, however, is flawed both theoretically and empirically as I discuss below.

The major theoretical flaw is that the very notion of an event is not well enough defined to form the basis of a theory of wh-extraction.23 In broad terms, any proposal that the structure of some semantic object constrains the syntax requires at least a theory of those semantic objects and their structure in which they are independent of syntax, because if the structure of the constraining semantic object depends on syntax, then the constraint is ultimately syntactic. So, the condition in (110) requires that event groupings be discrete—i.e., countable—independent of their description—i.e., that discrete events have objective existence regardless of how we choose to describe them. That discreteness cannot come from the extra-mental world, where phenomena are continuous, a conclusion with which Truswell seems to concur. Therefore, the discreteness of events must have some cognitive source.

While Truswell presents data and arguments for such a non-linguistic cognitive source of event individuation, he does not present a theory of it.24 Therefore, his larger analysis is ultimately promissory.

Empirically speaking, we can construct examples to show that (110) is both too restrictive—predicting islands were none exist—and not restrictive enough—failing to predict islands that do exist. Consider the case of (113).


Whati did John lie around [reading _i] all day. (Truswell, 2011, p. 156)

This case, Truswell argues, is predicted by (110) because, while lying around and reading might be construed as distinct events, lying around is arguably not agentive, as shown by the fact that it is incompatible with agent-oriented adverbials as in (114).


*John (deliberately/intentionally) lay around (on purpose). (adapted from Truswell, 2011, p. 151)

If we embed the event description in (113) under an agentive verb, though, wh-movement is predicted by (110) to be blocked. This prediction, however, is not borne out when the description is embedded under try as shown in (115).


(Context: John was hoping to lie around reading a book all day Saturday, but he got called into work where he did his reading while pretending to work.)

Whati did John try to lie around [reading _i] all day?

Here, the minimal constituent that contains the head and tail of the wh-movement chain describes two agentive events—a trying event, and a reading event, which is understood to have occurred even if the lying-around event did not. Despite this violation of (110) and contrary to Truswell’s prediction, though, wh-movement is allowed.

We can also construct at least one case which should be allowed by (110) but is not, in reality, allowed. This construction begins with the sentences in (116), each of which can reasonably be said to describe an atomic event.

  1. Marya (deliberately) invited Benjamin.

  2. Marya (deliberately) defied Susana.

All else being equal, both (116a) and (116b) describe agentive events—they entail an act of invitation and an act of defiance respectively. Despite this, if we combine the two event-descriptions as host and adjunct, as shown in (117), the event count becomes slightly murky.


Mary invited Benjamin defying Susana.

This sentence looks like it should describe two agentive events—an act of invitation and an act of defiance—but closer reflection yields the conclusion that it only describe one—the act of invitation is the act of defiance. This conclusion is bolstered by the fact that the inviting and the defying cannot occur at separate times as demonstrated in (118).25


*Marya invited Benjamin on Saturday, defying Susana on Sunday.

(cf. Marya invited Benjamin on Saturday informing Susana on Sunday.)

Despite the fact that sentences like (117) seem to describe a single agentive event, and contrary to (110) they do seem to show adjunct island effects as in (119).


*Whoi did Marya invite Benjamin [defying _i]?

Given these theoretical and empirical arguments, Truswell’s semantic explanation of adjunct island-hood does not seem plausible.

It is more plausible that event individuation is governed by syntactic principles. If this is the case, then even if Truswell’s analysis is correct, wh-movement is governed by syntactic principles. It follows from this that, if the non-island adjuncts represented in (109) form a class, then that class must be defined syntactically. In fact, if we compare the examples in (120)–(123) to those in (1)–(4) we see that so-called rationale adjuncts, which are not islands (see (109)), are decidedly less free than, say manner and temporal adverbials.


Zoe came around the café to work on her novel.


Zoe came around the café.


Zoe came around the café to work on her novel to impress the cute barista.


Zoe came around the café to impress the cute Barista to work on her novel.

While all of these are grammatical, the hosts and adjuncts are not independent of each other as they are in (1)–(4) and as my theory predicts they would be. In (122), for instance, impressing the barista depends of working on the novel, while in (123), the reverse is the case.

So, my proposed theory of adjuncts can be maintained against Truswell’s data, by making one of two theoretical moves. We could divide adjuncts into free adjuncts and restricted adjuncts and limit the scope of my theory to the former, or we could make the stronger claim that the so-called adjuncts that Truswell (2011) is concerned with are not truly adjuncts and therefore not within the scope of my theory. I see no reason not to make the latter move.

It is worth noting here that Truswell does not seem to provide a working definition of adjunct, relying instead on his readers’ pretheoretic intuition about what counts as an adjunct. I suspect that if he had provided such a definition, he might come to the same conclusion as I do above—that those cases of non-island “adjunct” are not truly adjuncts at all. This, of course, leaves the question of what they actually are if not adjuncts—a question which I will not take up here.

6.4 Apparent Adjunct Movement

Consider the sentences (124) and (125), in which the apparent adjuncts, indicated by emphasis, appear to have undergone A’-movement.


With a bat, she struck the ball.


When did he say she sang the anthem?

If, as I propose, adjuncts are not part of the clauses that they are “adjoined to”, how can they possibly undergo movement within them? This puzzling question, though, conceals a paradox which, in a sense, provides its answer—if topics and wh-expressions are in [SPEC, CP], how can they be adjuncts, which, according to the theory proposed here, are not part of their host clauses? The obvious way to answer these questions is to propose that the expressions in question are externally merged in [SPEC, CP], meaning that they are not adjuncts, and they do not undergo movement.

This may seem like a rather bold claim, but I can think of no good theoretical argument against it. The reason that this claim might seem bold is likely because topicalization and wh-questions are commonly subsumed under the banner of “A’-movement”, reflecting the GB/Early Minimalist separation of movement from Merge/base generation. With the discovery of Internal Merge as a sub-case of Merge, the notion of A’-movement—or any movement—as a unified phenomenon makes no sense.26 The emphasized in expressions in (124) and (125) could be considered cases of A’-External Merge.

Note that the ambiguity of (125)—in one reading when targets the time of the saying event, while in the other it targets the time of the singing event—is captured in the exact same way as it was for the ambiguity of (26) was captured. So, assuming when is a TP adverbial, the final derivational step of the two readings are given in (126) and (127) respectively.

  1. WSN-1 = { C W h , { h e , { T 1 , { h e , { V o i c e , { s a y , } } } } } } SO1 , { T 1 , w h e n } SO2

    MERGE(WSN-1)(SO1)(SO2) →WSN

  2. WSN = 〈{{T1, when}, {CWh, {he, {T1, {he, {Voice, {say, …}}}}}}}〉

  1. WSN-1 = { C W h , { h e , { T 1 , { h e , { V o i c e , { { s h e , { T 2 , } } } } } } SO1 , { T 2 , w h e n } SO2

    MERGE(WSN-1)(SO1)(SO2) →WSN

  2. WSN = 〈{{T1, when}, {CWh, {he, {T1, {he, {Voice, {say, …}}}}}}}〉

The “scope” of when is determined by which T it MERGES with. If it MERGES with T1, which defines the time of the saying event, then it will target the time of the saying event. If it MERGES with T2, which defines the time of the singing event, then it will target the time of the singing event.

This hypothesis, it should be noted, is preliminary—it raises a number of theoretical and empirical questions, which are beyond the scope of this paper. As a preliminary hypothesis, though, it demonstrates that apparent adjunct movement is not, in principle, ruled out by parallel derivation.

7 Other Contemporary Theories of Adjuncts

In Section 3 I discussed various historical theories of adjuncts—those based on frameworks and assumptions which are broadly considered to have been superseded. In this section, I will discuss a few theories of adjuncts whose assumptions are more contemporary to the theory I propose here. Specifically, I address the theories put forth by Hornstein (2009), Oseki (2015), Bode (2019), and Nakashima (2021). Hornstein (2009) embeds his theory of adjunction in a full theory of syntax which, though it is very much in the spirit of the minimalist theorizing that this paper adopts—differs from Chomsky’s minimalist theory in its technical details. For Hornstein, the central structure building operation is not set-formation, but concatenation which he augments with a labeling operation.

  1. Concatenate A, B → AB (Hornstein, 2009, p. 58)

  2. Label AB → [AAB]

Concatenate, for Hornstein, is responsible for the unbounded nature of language, while Label is responsible for its hierarchical structure. Label, then, is the operation that give us language as we know it.

In Hornstein’s theory, both operations are free as opposed to triggered, so nothing requires that a given structure be labeled at all. Indeed, Hornstein proposes that adjunction structures are unlabeled structures. So, (129a) would be a VP-Argument structure, while (129b) would be a VP-Adjunct structure.

  1. [V [V VXP]PP]

  2. [VVXP]PP

Aspects of this theory are similar to the one proposed here.27 We need not delve any further into Hornstein’s theory of adjuncts, we can even stipulate that his theory has the exact same empirical content of the one I have proposed, because Hornstein’s theory of adjuncts and mine are embedded in different theories of syntax. Therefore, any direct comparison of the two would not be feasible except as part of a broader comparison of the two theories that embed them.

Similar comments apply to Oseki’s (2015) proposed theory of adjuncts. For Oseki, adjunction structures are XP-YP structures that are unlabeled under Chomsky’s (2013) Labeling Algorithm. Oseki derives the properties of adjuncts based on two additional assumptions. The first assumption is Hornstein’s (2009) Label Accessibility Condition, which states that only labels are accessible to Merge. Therefore, Merge cannot target the unlabeled {XP, YP} structure, but instead targets one of its constituent parts as in (130). The result, according to Oseki, is a two-peaked structure as in Figure 4.

  1. Merge(XP, YP) → {XP, YP}

  2. Merge(Z, XP) → {Z, XP}

Click to enlarge
Figure 4

A Two-Peaked Structure Produced by (130)

The second assumption, drawn from Epstein et al. (2012), is that the formation of twopeaked structures is immediately followed by transfer of one object—here the adjunct YP—such that one of the peaks is eliminated.

Such a derivation, of course, is impossible under the MERGE-based grammar I assume here, as MERGE(WS,XP,Z) would be defined either if both Z and XP are members of WS or if XP is a member of WS and Z is contained in XP. Neither of these conditions hold, as XP is part of {XP, YP} which would be a member of WS. Indeed, the elimination of two-peaked structures as a possibility is one of Chomsky’s motivations for the development of MERGE. So, Oseki’s theory of adjuncts can be set aside for now.

Turning to Bode’s (2019) proposal, we will see, it is theoretically consonant with the assumptions made here but suffers from an empirical flaw. Bode’s theory rests on two assumptions, which are best explained in the context of an adjunction structure {XP, YP}. Under Chomsky’s Labeling Algorithm, such a structure can be labeled either if XP and YP share a feature or if one of the objects is a lower copy. Neither of these apply to adjunction structures, so Bode assumes a third possibility—the adjunct YP is invisible if it has already been Transferred. In this case XP provides the label, and YP is invisible to further computation.

The second assumption is about the function of labels—Bode assumes that labels provide instructions to the interpretive systems SM and C-I. So, an object with label X will be interpreted differently from an object with label Y (X ≠ Y). Specifically Bode asssumes that labels have the function of “… pointing to asymmetric relations such as predicate/argument or operator/scope.” (p. 97) It’s this assumption—which Milway (2019) shares—that yields a problematic prediction. Compare, for instance the External Merge position of an External Argument (DP-VoiceP) and a VoiceP adjunct structure (VoiceP-YP). Both structures require further action to be labelable—Internal Merge in the case of the External Argument and Transfer in the case of the adjunct—and both would eventually be labeled with Voice. Therefore, they should both be interpreted asymmetrically, a result that runs afoul of the basic observation that, while predicate/argument structures are semantically asymmetric, host-adjunct structures are semantically symmetric structures. This false prediction is, no doubt, correctable by some auxiliary hypothesis, but would still remain a point of contrast between Bode’s theory and the one proposed here. Absent any such explicit hypotheses, we can set Bode’s (2019) proposal aside for the time being.

Finally, we have Nakashima (2021), who proposes an extension of MERGE called Asymmetric MERGE. According to Nakashima, an instance of MERGE has four possible outputs, given in (131).

Where WS = [α, β], MERGE WS, α , β { α , β } (a) { α , β } , α (b) { α , β } , β (c) { α , β } , α , β (d)

Nakashima calls cases (b) and (c) “Asymmetric MERGE” (AM), and proposes that the objects that remain in the output WSs—α in (b) and β in (c)—are construed as adjuncts. They then go on to show how this proposal captures a number of facts about adjuncts. We can stipulate for our purposes that if we accept AM, then it can capture these facts, but unfortunately, the arguments for AM do not hold water.

Nakashima argues that AM is predicted by Determinacy (Chomsky, 2019 as interpreted by Goto & Ishii, 2020), and that AM is merely one case of MERGE, like Internal and External MERGE. The assertion that MERGE as defined in (131) is determinate, though, is false on its face—the version of (131) is four-ways ambiguous unless there is some explicit definition of what determines which output is given by which input. Furthermore, cases (b), (c), and (d) for (131) are precisely the results that Chomsky (2020) defines MERGE to rule out. If we don’t take (131) as given, then AM cannot be said to simply be another case of MERGE. Thus, we can set Asymmetric MERGE aside for the same reason as we set aside the theories of Hornstein (2009) and Oseki (2015)—they are based on fundamentally different theoretical assumptions that those of this paper.

8 Conclusion

I have argued in this paper that the basic facts about adjuncts only make sense if we assume that adjuncts are not truly attached to their hosts. While previous theories of grammar have not offered any way of formalizing this assertion, I proposed that the relatively new notion of workspaces offers such a possibility. That is, I proposed that adjuncts are derived separately from their “hosts”—just as arguments are derived separately from their predicates—but they are not incorporated into the “host” object—while arguments are incorporated into their predicates. I formalized this proposal and, in the process, proposed a workspace-based formalization of MERGE. I then applied this formalized proposal to some generalizations related to adjunct—Islands, Parasitic Gaps, and adjective ordering constraints—showing that those generalizations are either predicted by my proposal or consistent with it.

Before concluding, though, I would like to discuss some possible implications of some of my proposals—the possibility of extending this theory to coordinate structures, the implications of the theory of adjuncts for non-adjunct/non-coordinate structures, and the broader implications of the introduction of higher-order functions. As Bošković (2020) argues, adjuncts and coordinate structures bear many similarities—e.g., flexible ordering, unbounded stacking, conjunctive interpretation, and islandhood— suggesting that they should be unified. In the current context that would mean a sentence like (132a) would be represented as in (132b), and derived as one would expect.

  1. Jackie went home and ate a sandwich.

  2. 〈{Jackie, {T {. . . {go, home}}}}, {Jackie, {T {. . . {eat, {a, sandwich}}}}}〉

This extension raises a host of questions which I will leave for future research, except to point out that the simplest such theory of coordinate structures would predict that coordinating conjunctions are not present in the syntax—a result that it would share with Chametzky’s (1996) theory of coordination.

Assuming that the parallel derivation theory can account for adjuncts and coordinate structures, we can ask what it says about the derivation of canonical structures—i.e., those without adjuncts or coordination. On a theoretical level, the answer to that question would be that the theory has virtually no effect on such structures—they would be derived in a single non parallelized derivation. On an analytical level, though, it may be the case that there are structures which appear to be canonical structures, but are best understood as adjunction structures. Milway (2019), for instance, argues that structures such as depictives, resultatives, and ACC-ing structures—as in (133a)–(133c), respectively—involve adjunction.

  1. She eats her toast dry.

  2. They hammered the metal flat.

  3. I saw him running down the street.

Again, further research is required to investigate such claims, but if other apparently canonical but problematic structures can by explained by parallel derivations, it could represent quite an advance for linguistic theory.

Finally, my proposal makes crucial use of the higher-order function map, and this suggests an obvious minimalist criticism—namely that I have introduced unnecessary complexity to the grammar. Put concisely: If adding Pair-Merge to the grammar is illegitimate, then why isn’t the addition of map? I will propose and discuss two possible answers to this challenge. First, I will discuss the possibility that higher-order functions like map are derivable from MERGE—that they “come for free”. Second, I will discuss the possibility that it is these higher-order functions, rather than MERGE, which are the fundamental basis of language.

The idea that one could derive higher-order functions from MERGE begins with the suggestion—made frequently by Chomsky28—that internal MERGE is sufficient to explain the human faculty of arithmetic. The reasoning is as follows: The simplest case of Merge is vacuous internal Merge (Merge(x) → {x}), which is identical to the set-theoretic definition of the successor function (S(n) = n + 1). Since the arithmetic is reducible to a notion of 0 or 1, the successor function and a few other axioms, Merge suffices to generate arithmetic. The process of learning arithmetic, then, is merely the process of setting the axioms of the system.

This result should not be surprising, though, since theoretical models of computation are closely linked to arithmetic. In fact, early models of computation were largely models of arithmetic—where the set of determinable functions that could be represented in model X is the set of X-computable functions on the natural numbers. An assumption generally made, called the Church–Turing thesis, is that a general class of computable functions is identical to the class of functions computable by a Turing machine. So, if we assume that a Merge-based computation system is capable of general computation, then it should be capable of performing every computable function. Since higher-order functions are computable functions, then a Merge-based system should allow for them.

This reasoning hinges on a few hypotheses, but even if it could be done completely deductively, it would still face the serious problem that models of computation and related systems assume a strict distinction between operations and atoms. Take, for instance, the process of deductive reasoning, which derives statements from statements following rules of inference. In this case our operations are the rules of inference and the atoms are the statements. As Carroll (1895) famously illustrated, it is very easy to blur the lines between a rule of inference—such as modus ponens, given in (134)—and the logical statement in (135), but doing so renders the system useless.


((P → Q)&P) → Q

The former is a rule of inference that may or may not be active in a logical system, while the latter is a statement which may or may not be true in a logical system. If a system doesn’t explicitly include (134) but can effectively perform it, we can say that the system in question can simulate (134). If a system can prove (135) without it being an axiom, then we can say that the system generates (135).

In the grammatical system that I have been assuming, MERGE corresponds to the rules of inference, and the syntactic objects and workspaces correspond to the atoms. In my reasoning above, I concluded that a MERGE-based system could simulate higher-order functions like map, but it cannot be concluded from this that map could be an integral part of adjunction. The human mind is capable of simulating wide variety of systems. For instance, a skilled Python programmer is effectively able to simulate a Python interpreter, but such a simulation requires learning, practice and considerable mental effort. Adjunction, on the other hand, seems to be fully innate and mostly effortless.

The second possibility is to propose that higher-order functions, or some principle that allows for them, are the basis for language. That is, we accept the minimalist evolutionary proposal that a single mutation separates us from our non-linguistics ancestors, but we propose that instead of MERGE/Merge, the result of that mutation was higher-order functions. There are a number of issues of varying levels surmountability with this proposal which I discuss below.

The first issue is that, while Merge/MERGE is a single operation and, therefore, easily mappable to a single genetic change, higher-order functions are a class of functions, making the task of linking them to a single mutation non-trivial. However, if they do form a (natural) class of functions, then they must share some singular feature, which can be mapped to a single mutation. The definition of a higher-order function as one that takes or gives a function as an input or output, respectively, suggests a such a feature—abstraction.

If abstraction is to be the defining feature of the faculty of language, then it behooves us to give a concrete definition of it. In the mathematico-computational sense, abstraction can be seen as the ability of system to treat functions as data. Applied to our cognitive system, this seems to allow meta-thinking—thinking about thinking, reasoning about reasoning, reflecting upon reflections, and so on, what Hofstadter (1979) calls “jumping out of the system.” This kind of meta-thinking, though, is commonly associated with consciousness, which leads to two problems with this approach. The first problem is the hard problem of consciousness—if abstraction and consciousness are the same, then we may never fully understand either. The second problem is more mundane—we are no more conscious of adjunction than we are of MERGE, yet my reasoning here suggests that perhaps we should be conscious of the former.

There is however, a third possibility—a synthesis of the two previous possibilities. The early results of computability theory (Gödel, 1931; Turing, 1936) made crucial use of abstraction—using, say, number theory to reason about the axioms and operations of number theory. In fact, every simple model of computation allows for abstraction of the sort I am considering here. This seems to suggest that the choice between the two possibilities above is a false one—that MERGE and abstraction cannot truly be disentangled. This does not allow us to avoid the problems that I have raised, though, but it does suggest that they can be combined and perhaps be solved together.


1) Hornstein (2009) differs, defining Merge not as set-formation but as concatenation.

2) Similar remarks apply to attempts to use Pair-Merge to explain other phenomena (cf. Epstein et al., 2016; Richards, 2009)—to the extent that the phenomena in question are problematic, the assertion that Pair-Merge explains them is, at best, a conjecture.

3) Note we can still identify the three relevant subderivations—(16a)–(16b) and (16b)–(16c) for the nominals, and (16c)–(16i) for the clausal spine—though strictly speaking every contiguous sequence of workspaces in a derivation would be considered a subderivation.

4) All modes of expression allow for some sort of simultaneous pronunciation, be it facial expressions in sign language, intonation in spoken language, or typography in written language.

5) I abstract away from the predicate-internal subject hypothesis for simplicity.

6) All varieties of covert movement, such as quantifier raising (May, 1978) and wh-in-situ (Lu et al., 2020) would contradict this proposal. Trinh (2009) discusses more nuanced copy deletion data and arrives at a constraint on the delete-low-copies principle. See also Bošković (2002).

7) Setting aside cases of non-intersective modification.

8) Unless do-support is used.

9) The astute reader will likely note that my definition of MERGE sacrifices the simplicity of Merge to meet the Chomsky’s desiderata. This, I believe, reflects the fact that we lack a sufficient model of neural computation in which to ground our grammatical theory. Such a model would likely meet the “restrict resources” desideratum automatically.

10) Dmitrii Zelenskii (P.C.) points out that the function defined in (43) can be considered monadic and, therefore, computational if we take it to be a function from a triple WS, X, Y to a workspace. While this is true, it would merely kick the can down the road a bit, as we would then need to define a function that creates the appropriate triple for MERGE3.

11) Strictly speaking, the token of LI on the righthand side of the arrow should be distinguishable from LI itself on the lefthand side. Collins and Stabler (2016), for instance, distinguish lexical items, which are members of the lexicon, from lexical item tokens which are pairs of lexical items and integers. For simplicity of exposition I will not formally distinguish tokens from proper lexical items.

12) Bošković notes that, since the Coordinated Structure Constraint and its apparent exceptions is also constant across languages, it should be unified with adjunct island-hood.

13) I represent the gaps within the adjuncts here as ecs because, depending on the analysis, they are alternately identified as traces of movement or null pro-forms.

14) See Sproat and Shih (1991) for further discussion of the adjective ordering restriction.

15) See Ernst (2014) for a discussion of this hypothesis, which he refers to as the “F-Spec” hypothesis. See also Larson (2021) for a recent critique of Cartography.

16) I leave out Select operations for the sake of brevity.

17) See Norris (2017a, 2017b) for a full survey of the attempts to explain concord phenomena.

18) Even just within German, there are three sets of concord phenomena—strong, weak, and mixed—that need full analysis.

19) Feyerabend (1993) goes farther, arguing that every successful theory began its life unable to account for all of the phenomena that its predecessors accounted for. See also Piattelli-Palmarini et al. (2009, pp. 35–36) for discussion of early empirical falsification of special relativity.

20) One could, of course, reject the analysis and suggest one in which what looks like a single phenomenon is actually multiple phenomena. In the current context, this could take the form of arguing, perhaps that half of Speas’ data does not actually involve adjunction. Indeed, any analysis that proposes this divergent behavior is explained by positing two types of adjuncts is, in very real sense, rejecting the analysis.

21) An anonymous reviewer notes that this, in fact much of the proposed theory, depends on how the operation Transfer is formalized/defined—a task which I do not take up here. This is undoubtedly true—in fact, I would go further and say that virtually any theory of any aspect of syntax depends on a theory of Transfer, and that the development of such a theory is a project on its own. Collins and Stabler (2016) provide a formal definition of Transfer, for instance, but they are quick to point out the empirical flaws in their own definition.

22) These Voice heads are both visible to labeling, per Chomsky’s (2013) Labeling Algorithm, since neither is a lower copy—neither has undergone Internal MERGE.

23) There is also perhaps a minor flaw in the definition of an agentive event in (112). The first condition in that definition requires that agentive events be atomic events, while the second allows for that atomic event to consist of multiple subevents. By definition, however, atoms are not divisible, so this is a contradiction in terms.

24) Indeed, many of the arguments he does present can very easily be interpreted to show that than the structure of events qua semantic objects, is constrained by the syntax, and not the other way around. Take, for instance, Fodor’s Generalization, given in (i).


Fodor’s Generalization (Truswell, 2011, p. 49 following Fodor, 1970)

A single verb phrase describes a single event.

This can very easily and naturally be interpreted to mean that a happening is construed as a single event by virtue of being described with a single verb phrase rather than the other way around.

25) This example is, of course, acceptable if the invitation is unrelated to the defiance. That interpretation, however, is irrelevant here.

26) The fact that A’-movement seems to be an identifiable phenomenon without being theoretically identifiable is what makes it worth studying.

27) This is no accident, since the earliest version of the theory I am proposing was partially inspired by Hornstein’s theory (Milway, 2019).

28) See Chomsky (2019, p. 274) for an instance in writing.


The research presented here was largely funded by the Canada Emergency Response Benefit.


I am grateful to all of those who offered comments and critiques.

Competing Interests

The author has declared that no competing interests exist.

Previously Presented

The ideas that became this paper were presented at the Univeristy of Toronto Syntax Project, the 2020 meeting of the LSA, and as an MIT Colloquium talk on October 1, 2021.

Related Versions

Previous versions of this paper were posted to LingBuzz:


  • Bode, S. (2019). Casting a minimalist eye on adjuncts. Routledge.

  • Bošković, Ž. (2002). On multiple wh-fronting. Linguistic Inquiry, 33(3), 351-383.

  • Bošković, Ž. (2020). On unifying the coordinate structure constraint and the adjunct condition. In A. Bárány, T. Biberauer, J. Douglas, & S. Vikner (Eds.), Syntactic architecture and its consequences II: Between syntax and morphology (Open Generative Syntax 10, pp. 227–258). Language Science Press.

  • Carroll, L. (1895). What the tortoise said to Achilles. Mind, 4(14), 278-280.

  • Chametzky, R. (1996). A theory of phrase markers and the extended base. SUNY Press.

  • Chomsky, N. (1957). Syntactic structures. Mouton.

  • Chomsky, N. (1965). Aspects of the theory of syntax. MIT Press.

  • Chomsky, N. (1995). The minimalist program. MIT Press.

  • Chomsky, N. (2000). Minimalist inquiries: The framework. In R. Martin, D. Michaels, J. Uriagereka, & S. J. Keyser (Eds.), Step by step: Essays on minimalist syntax in honor of Howard Lasnik (pp. 89–155). MIT Press.

  • Chomsky, N. (2004). Beyond explanatory adequacy. In A. Belletti (Ed.), Structures and beyond: The cartography of syntactic structures (pp. 104–131). Oxford University Press.

  • Chomsky, N. (2013). Problems of projection. Lingua, 130, 33-49.

  • Chomsky, N. (2019). Some puzzling foundational issues: The Reading program. Catalan Journal of Linguistics, 19(Special Issue), 263-285.

  • Chomsky, N. (2020). The UCLA lectures.

  • Cinque, G., & Rizzi, L. (2015). The cartography of syntactic structures. In B. Heine & H. Narrog (Eds.), Oxford handbook of linguistic analysis (2nd ed., pp. 65–78). Oxford University Press.

  • Citko, B. (2005). On the nature of merge: External merge, internal merge, and parallel merge. Linguistic Inquiry, 36(4), 475-496.

  • Collins, C. (2002). Eliminating labels. In S. D. Epstein & T. D. Seely (Eds.), Derivation and explanation in the minimalist program (pp. 42–64). John Wiley & Sons.

  • Collins, C. (2017). Merge (X, Y)={X, Y}. In L. Bauke, A. Blümel, & E. Groat (Eds.), Labels and roots (pp. 47–68). De Gruyter Mouton.

  • Collins, C., & Stabler, E. (2016). A formalization of minimalist syntax. Syntax, 19(1), 43-78.

  • Engdahl, E. (1983). Parasitic gaps. Linguistics and Philosophy, 6, 5-34.

  • Epstein, S. D., Kitahara, H., & Seely, T. D. (2012). Structure building that can’t be. In M. Uribe-Etxebarria & V. Valmala (Eds.), Ways of structure building (pp. 253–270). Oxford University Press.

  • Epstein, S. D., Kitahara, H., & Seely, T. D. (2016). Phase cancellation by external pair-merge of heads. Linguistic Review, 33(1), 87-102.

  • Ernst, T. (2014). Adverbial adjuncts in Mandarin Chinese. In C.-T. J. Huang, Y.-H. A. Li, & A. Simpson (Eds.), The handbook of Chinese linguistics (pp. 49–72). Wiley Online Library.

  • Feyerabend, P. (1993). Against method. Verso.

  • Fodor, J. A. (1970). Three reasons for not deriving "kill" from "cause to die". Linguistic Inquiry, 1(4), 429-438.

  • Gödel, K. (1931). Über formal unentscheidbare Sӓtze der Principia Mathematica und verwandter Systeme I. Monatshefte für Mathematik und Physik, 38(1), 173-198.

  • Goto, N., & Ishii, T. (2020). Some consequences of merge and determinacy.

  • Heim, I. (1982). The semantics of definite and indefinite noun phrases [Doctoral dissertation]. University of Massachusetts Amherst.

  • Heim, I., & Kratzer, A. (1998). Semantics in generative grammar. Blackwell.

  • Hofstadter, D. R. (1979). Gödel, Escher, Bach: An eternal golden braid. Basic Books.

  • Hornstein, N. (2009). A theory of syntax: Minimal operations and universal grammar. Cambridge University Press.

  • Kayne, R. (1994). The antisymmetry of syntax. MIT Press.

  • Kratzer, A. (1996). Severing the external argument from its verb. In J. Rooryck & L. Zaring (Eds.), Phrase structure and the lexicon (pp. 109–137). Springer.

  • Larson, R. K. (2021). Rethinking cartography. Language, 97(2), 245-268.

  • Lebeaux, D. (1988). Language acquisition and the form of the grammar [Doctoral dissertation]. University of Massachusetts.

  • Lu, J., Thompson, C. K., & Yoshida, M. (2020). Chinese wh-in-situ and islands: A formal judgment study. Linguistic Inquiry, 51(3), 611-623.

  • May, R. C. (1978). The grammar of quantification [Doctoral dissertation]. Massachusetts Institute of Technology.

  • Milway, D. (2019). Explaining the resultative parameter [Doctoral dissertation]. University of Toronto.

  • Mukherji, N. (2012). The primacy of grammar. MIT Press.

  • Nakashima, T. (2021, October 30). How to generate adjuncts by merge [Paper presentation]. NELS 52, Online.

  • Norris, M. (2017a). Description and analyses of nominal concord (pt I). Language and Linguistics Compass, 11(11), Article e12266.

  • Norris, M. (2017b). Description and analyses of nominal concord (pt II). Language and Linguistics Compass, 11(11), Article e12267.

  • Nunes, J. (2004). Linearization of chains and sideward movement. MIT press.

  • Oseki, Y. (2015). Eliminating pair-merge. In U. Steindl (Ed.), Proceedings of the 32nd West Coast Conference on Formal Linguistics (pp. 303–312). Cascadilla Proceedings Project.

  • Piattelli-Palmarini, M., Uriagereka, J., & Salaburu, P. (Eds.). (2009). Of minds and language: A dialogue with Noam Chomsky in the Basque country. Oxford University Press.

  • Richards, M. (2009). Internal pair-merge: The missing mode of movement. Catalan Journal of Linguistics, 8, 55-73.

  • Seely, T. D. (2006). Merge, derivational c-command, and subcategorization in a label-free syntax. In C. Boeckx (Ed.), Minimalist essays (pp. 182–217). John Benjamins Publishing.

  • Speas, M. (1990). Phrase structure in natural language. Springer.

  • Sproat, R., & Shih, C. (1991). The cross-linguistic distribution of adjective ordering restrictions. In C. Georgopoulos & R. Ishihara (Eds.), Interdisciplinary approaches to language (pp. 565–593). Springer.

  • Stepanov, A. (2001). Late adjunction and minimalist phrase structure. Syntax, 4(2), 94-125.

  • Trinh, T. (2009). A constraint on copy deletion. Theoretical Linguistics, 35(2-3), 183-227.

  • Truswell, R. (2011). Events, phrases, and questions. Oxford University Press.

  • Turing, A. M. (1936). On computable numbers, with an application to the entscheidungsproblem. Proceedings of the London Mathematical Society, 2(42), 230-265.