(or “How I’ve been spending my unemployment*”)
Yesterday I finished and posted a paper to LingBuzz. It’s titled “Agree as derivational operation: Its definition and discontents” and its abstract is given below. If it sounds interesting, have a look and let me know what you think.
Using the framework laid out by Collins and Stabler (2016), I formalize Agree as a syntactic operation. I begin by constructing a formal definition a version of long-distance Agree in which a higher object values a feature on a lower object, and modify that definition to reflect various several versions of Agree that have been proposed in the “minimalist” literature. I then discuss the theoretical implications of these formal definitions, arguing that Agree (i) muddies our understanding of the evolution of language, (ii) requires a new conception of the lexicon, (iii) objectively and significantly increases the complexity of syntactic derivations, and (iv) unjustifiably violates NTC in all its non-vacuous forms. I conclude that Agree, as it is commonly understood, should not be considered a narrowly syntactic operation.
*Thanks to the Canada Recovery Benefit, I was able to feed myself and make rent while I wrote this.
Three quick points:
1. Your statement about ECM subjects not being canonical subjects is correct, inasmuch as they are not in the subject position of a finite clause. But your reasoning (that they are not candidates for movement because they are not nominative) is wrong even on the premises of the theory you’re arguing against. That theory takes nominative to be the absence of otherwise valued case features (just like 3sg is the absence of otherwise valued phi-features), and so ECM subjects *are* nominative at the stage of the derivation where only material from the embedded clause has been merged. See https://go.lingsite.org/kp2015 for a discussion of this with a focus on ECM constructions in particular.
2. I actually agree wholeheartedly with the contents of note 13 (on p. 29). But I think there is a complementary piece missing from that note, namely: any conceptual criterion can be satisfied if one is not responsible to the relevant facts. Now, we probably can (and do) disagree on what constitutes “the relevant facts” in this particular discussion; and to be clear, I think it is entirely valid scientifically to say, “I am going to sacrifice this bit of empirical coverage in the name of theoretical considerations, and hope that one day we come back to it with a better understanding than we do today.” As you say, nobody has a theory of everything, nor should any theory be held to that standard. That said, I think the empirical coverage of what you consider to be minimalist syntactic theory (e.g. without Agree) is laughable, essentially in the same class as a theory that would state “there are no natural languages.” You (I presume) disagree. That strikes me as quite a reasonable disagreement. But it should be acknowledged: conceptual elegance sans empirical coverage is cheap. Not quite as cheap as empirical coverage sans conceptual elegance; but close.
3. You analyze ex. (83d) as involving a trace. Given that this example is an instance of control (and not raising), it might be worth flagging that you are assuming the Movement Theory of Control. (Which is not to endorse or disavow that theory, just to point out an implicit assumption that should be made explicit.)
Re:formalization, there has been – for whatever reason – a little flurry of interest in the last couple of years in the formalization of at least the search procedure part of minimal search. (Cf. discussions of Breadth- and Depth-First search, etc.) This is separate from the No-Tampering stuff, which is more central to your interests in this paper, but anyway: there’s a recent manuscript by Branan & Erlewine, and a 2019 UMich thesis by Hezao Ke (which I learned about from Erlewine). And, though not in the same class with these two in terms of the depth of discussion, I myself had grown frustrated at some point that everyone just waved their hands about “minimal search” and no one ever so much as wrote down even some f***ing pseudocode, so in 2019 I finally dumped the relevant pseudocode into one of my papers (see pp. 23‑24). My version relies on a distinction between “specifier” and “complement”, which as you say in this paper is derivable but not a primitive. That said, non-descent into specifiers seems like a pretty stable property of minimal search, so whether or not that distinction ultimately dissolves into something more basic, it appears to be declaratively correct.
I’ll have to have a look at those papers.
You are right that a distinction between specifier and complement is derivable in a label-free bare phrase structure system, but not with the same definiteness as it is in an X-Bar system. Case-in-point, in the object {{X, ZP}, {Y, WP}}, we could derive the fact that {X, ZP} is in [Spec, Y] but, by that same token, we could also derive the fact that {Y, WP} is in [Spec, X], similarly, there is no principled way of identifying the head of the object in question—this is the central intuition of the “Problems of Projection” papers.
Indeed, the structure doesn’t tell you what projects in {{X, ZP}, {Y, WP}}, and therefore, which phrase is the specifier of which. What tells you that is c-selection.
As you know (since we’ve discussed it elsewhere before), I think it’s beyond clear that both labeling and c‑selection are narrow-syntactic affairs. That’s because they both deal in semantically- and phonologically-illegible primitives. Chomsky can argue till he’s blue in the face that labeling (and therefore c-selection) are Conceptual-Intentional matters – as he attempts to do in the “Problems of Projections” papers. But absent a worked out account of the kind of data discussed by Merchant (2019) for example, I’m not buying. And so far all I’ve heard in the way of a response to that is maybe if we fundamentally change what we mean by “semantic primitive” we can capture this data after all. I must say, that seems indistinguishable to me from “modularity prevents a working account of this in Problems-of-Projection terms, so we propose to do violence to the notion of modularity.”
Given that c-selection is narrow-syntactic, the answer to whether X or Y projects in the structure {{X, ZP}, {Y, WP}} can be given by which one c‑selects the other. And in cases where the answer is “neither”, both options will be available (as argued a while ago by Donati (2006), for example, in her analysis of free relatives).
My point was only that, very likely, your minimal search algorithm, since it seems to depend on notions like specifier and head is at worst not expressible and, at best, not trivially expressible, in the Collins & Stabler formalism, which I use in my paper.
As for what is and isn’t in the narrow syntax or what counts as a semantic or phonological primitive, I don’t even think those questions are sensible except within the framework of a theory—which is related to what I discuss here. The theory I tend to work in says that labelling is not part of the “narrow syntax” and seems to say the same about c-selection. And I concur with your conclusion that this is incompatible with the prevailing theory of semantics, at least—I don’t know enough about current morphological or phonological theory to say anything about them—but I also think the prevailing theory of semantics is deficient on its own terms and should be revised.
I do not know whether you received my comments on the previous article (on adjuncts) so I’ll just mention they have existed at a certain point, maybe check your spam folder.
Thomas Graf seems to see no problem in computational realization of Agree (Graf 2012 “Movement-generalized Minimalist grammars”, I think?) or, more generally, in jumping between derivation and representation from computational point of view.
“The representational expressions, on the other hand, are much more concise and accessible, so they have been overwhelmingly used as shorthands for the representational expressions” – one of these “representational” (probably the latter) should be “computational”?
I fail to see how your definition in (21-23) is different from Wurmbrand’s (2014, “The Merge Condition: A syntactic approach to selection”) (4), other than that (4) does not spell out Relativized Minimality inside it (for, presumably, independent reasons). Presumably, some other definitions are likewise close. However, Wurmbrand is also interesting in being quite close to what you call “Agree is a reflex of Merge”: Merge creates contexts for Agree but is disallowed if no Agree happens. This is either underivable by NTC-respecting Agree or refutes the claim “we can continue to postpone [NTC-respecting] Agree at least until the next instance of Transfer”: the prerequisites of Agree will not be lost but the prerequisites of Merge will.
More importantly, the work in Agree suggests that a projection of head retains its features – so, if A and B are Merged and it was A that selected B (rather than vice versa), then Merge(A,B) has the same features as A. You try to work around it, and that’s what derives the unnecessary complexity of Agree, of head-hood/specifier-hood and so on. Maybe Chomsky, Collins and Stabler were just, erm, wrong about not transferring the features to the result of Merge? Then no full search of one of the objects is needed (compare the “no-descent-into-specifiers” argument below).
I plainly don’t get the argument that, for (82b) to be true, canonical subject should be definable. If φ-agreement feeds any kind of movement (and if all φ-agreement is uniform), it feeds some kind of Merge, and if it feeds any kind of Merge, it is in narrow syntax because nothing precedes narrow syntax (and also because it is itself structure-sensitive which rules out lexicon φ-agreement). (I believe (82b) to be false, but that is because I think not all φ-agreement is uniform; this part is, however, where you and Dr. Preminger seem to agree.)
Also, a minor nitpick: you assume, following Collins, that uninterpretable features are syntactic, rather than needed for PF-branch.
I don’t recall getting them, and I can’t seem to find them in my email. Did you post them as a comment or email them, and if the latter, do you remember which address you would have sent them to?
Here I think we need to be careful. The “Minimalist Grammars” (MGs) that Graf and other computational linguists study are not the same as the grammar that Collins & Stabler (C&S) formalized and I adopted—the waters are further muddied by the fact that Stabler also works with MGs. MGs formalize an, in my opinion, outdated theory that came out of the minimalist program, whereas C&S formalized a more current theory—MGs, for instance, assume labels as part of the syntax and assume Move as an operation. Furthermore, it seems to me that MGs and C&S are interested in fundamentally different things: the development and study of MGs seem to belong to Computer Science, while C&S’s work belongs to Cognitive Psychology.
Thanks for catching that, it should be “derivational”
You’re probably right. This was just the definition of Agree that I had in my head, but I should probably do a better job of in-text citation there, and in section 4.2 where I’m arguing against What I take to be Wurmbrand’s thesis in that paper.
This is something that’s come up a few times for me both in the process of writing this paper and my adjuncts paper and in the commentary on them. The following things are becoming clear to me if we strip away all notions of data and view these things—Merge, Agree, c-selection, labels, x-bar relations, etc—purely formally, as if they are just mathematical objects of only academic curiosity:
I have yet to see a theoretical treatment of c-selection, labels, x-bar relations that has any sort of explanatory power.
I guess there’s two steps to get there. First, we need to add the premise that Movement to Canonical Subject Position is part of the Narrow Syntax, then we need to recognize that, in order for this to be true (or false) it needs to be true (or false) of Movement to Canonical Subject Position—i.e., it presupposes the existence of Movement to Canonical Subject Position.
Correct. This is something more or less required given how C&S define PHON features—for them PHON features seem to be things like “voiced” or “labial” or the like. I’m not at all committed to this decision, though—My sense is that we really have no clue what sorts of things SYN or SEM features are.
I used the form at https://milway.ca/contact-me/ (back when it seemed to work instead of showing a lot of bugged code). And, unfortunately, I do not think I have my comments saved anywhere. I could re-read the paper
Yeah, but here’s the funny thing: they claim each and every change linguistics made to the theory since MGs inception except for one (identity-of-meaning checks) is easily implementable in MGs without changing generative capacities and such. And insomuch as you call for computational perspective you seem to be walking on their territory: they are not so much Computer Science as they are math.
Of course there is a trivial sense in which it is true: Agree is defined over (already-built) syntactic structures, syntactic structures are built by Merge, thus Agree is dependent on Merge. But that does not mean that we cannot build in something like a condition on derive-by-Merge saying “only Merge X and Y if X has a valued feature of type F and Y contains a sub-node which has an unvalued feature F” (to limit the search you can add “and either X is a lexical item or Y is the head” – on the latter see under next quote), which is equivalent to “Agree is condition on Merge” because these are the exact conditions of Wurmbrand’s Reverse Agree. “Agree is condition on Merge” does not mean that Agree actually derivationally precedes Merge, it means that Merge only happens if it creates conditions for (immediate) Agree, it is technically a one-step-look-ahead if Agree and Merge are fully separate. Claiming that anyone who says “Agree is condition on Merge” argues for Agree a step before Merge, rather than something along the lines I described above, seems to be strawmanning.
If this means that we can have Merge as set-maker instead of pair-maker and still know (inside narrow syntax) which of the Merged objects is the head and which is its argument (after which the rest follows), then we may use this derivable information in all sorts of definitions just as well as if it were primitive. But you give off a distinct vibe of claiming that the premise is false, that narrow syntax cannot know that of {A,B}, A is the head and B is its argument (or vice versa). And if it cannot, then you can’t derive all those relations in narrow syntax. Whether you technically lose generative capacity because of it is a question to computational department (to Graf, Stabler, Kobele and such) – I would not be too surprised by a negative answer. But broadly, this is (a)a non-lexical information, (b)available to both phonology and semantics. And anything that conforms to (a) and (b) should be in narrow syntax, because Y-model.
Yes, but that only works because Collins adopts early insertion. It is, plainly speaking, unsustainable (again, not in the computational sense but in the sense of minimally elegant generalizations which are thus likely close to what happens in our mind).
Of course, you could actually argue against (b) above: if H&K-style formal semantics is on the right track you don’t need to know which of the merged objects is syntactic head and which is syntactic argument, which makes this a possibly PF-only information which can then be derived in syntax-phonology interface instead of narrow syntax. But this is something one should state not hand-wave.
I think this is absolutely correct, which is one of the reasons I think H&K-style semantics is on the wrong track. I think the idea that projection and, by extension, grammatical category are essentially phonological notions is wrong on its face. Of course, I could be wrong, and if someone were to construct such an explanatory theory, I’d gladly consider it.
Of course, this is pretty far beyond the scope of my paper.
As for the adjuncts’ paper: I do remember off the spot 1)uncertainty why we consider complement DPs to have their own workspace, as opposed to being built in the same workspace as the verb that governs them; 2)”Being a computational procedure, MERGE ought to proceed in steps. Therefore, it should be a curried (or schönfinkeled) function” being a non-sequitur: we can first create an <ω,α> pair (or even an {ω,α} set since they are of different nature) and feed it to a function that takes such pairs/sets as an argument; semanticists’ arguments for currying their functions crucially rely on the structure they are fed. Also, line 81 contains a broken LaTEX code (crefdef:PairMerge), and line 146 contains the set {α,β} could just as easily be linearized as α _ β or α _ β (one of those should be β _ α).
3 – yes, there is. One but not the other is re-stateable as no-look-ahead by either simple finite redefinition of stages or, as I did above, re-including the conditions.
4(b) – I think this is not a good way to frame a question. For quite some time, the answer has been “the complex object behaves like X because it literally bears X’s syntactic features”. You (and Chomsky) say “no, the object does not bear the features but it still behaves like X”. When asked “why does it behave so then”, you can only answer that “this should drive a research program”. If your only reasoning had been something along the lines “set-formation is simpler than concatenation” (is (42) in adjuncts paper really even that simpler than (head-argument) ordered tuple formation though?), I think this amounts to losing the argument, because descriptive power of unordered Merge gets way too weak if nothing else is said, and additional build-ups are bound to destroy the alleged explanatory advantages.
Because the verb-theme object is actually a phrase-phrase object ({{v, ROOT}, {D, …}}) just like the VP-Agent object.
Sure, there’s no reason a function can’t take a pair or set as input, but since the whole point of MERGE and the wider derivational machinery is to create sets out of independent objects. In other words, what good is MERGE if it takes already constructed objects and outputs those same objects?
(Thanks for finding those typos, the paper is out being reviewed right now, so I’ll fix it after I get the rejection or R&R)
As I said, If this is true, it should be provably true.
To my reading “the complex object behaves like X because it literally bears X’s syntactic features” is basically the same as saying “the complex object behaves like X, because it does.” I recognize that I’m in the minority, but I remain unconvinced by the label-ful syntax literature.
It is… not immediately obvious that the order of derivation goes like this. I think there was a paper by Heidi Harley which argued extensively for arguments of root directly.
That’s a claim you’re trying to instantiate. Technically, if you delete subelements and add results of Merge to Lexical Array, after a new Select(α) your LA is exactly the set you need: it contains the α and the last-built object.
(That’s a case built for External Merge, and it probably will crumble when incorporating Internal Merge, but, again, this is a line of thought that needs to be followed through rather than a given about any computational model)
What do you mean? Algorithmic proof of re-stateability of one-step-look-ahead as non-look-ahead is trivial for general case(this is often used in Markovian models, for instance: a sequence of states <1,2,3,…> is rewritten as <<1,2>,<2,3>,<3,4>,…>, and then you just evaluate the relevant pair-state) and literally done above for the special case of Agree-dependent Merge. Proof of non-re-statability of unbounded look-ahead as non-look-ahead is perhaps less trivial (it is overall much harder to prove you can’t than to prove you can) but irrelevant: if unbounded look-ahead can be restated as non-look-ahead, too, then the whole look-ahead is, contrary to what linguists usually think, computationally a non-problem, and if it cannot, then we get the distinction.
This is wrong, because feature percolation makes definite predictions – for instance, that the object will always inherit exactly one label, that it will (at least preferentially) inherit features from the same element it inherits label from (as opposed to, say, having something like “the actual me” and inheriting the me‘s 1st person rather than the the‘s 3rd person, getting *the actual me am instead of the actual me is).
The theory I was investigating assumed that roots don’t take arguments. The things Harley called roots would be something else in my theory.
Okay
I’d also add, that simple statability is the bare minimum. You should also keep that the point is to develop a theory of a biologically instantiated computational procedure.
Is this a prediction or an assertion? In virtually every attempt I’ve seen to save projection, these things are just asserted.
Depends on the details of specific proposal, but at a certain point the distinction becomes murky in practice, to say the least. Insofar as labels are features, label inheritance is a corollary of preferential feature inheritance rather than an independently stated fact (though see Zelenskii 2020 in Working Papers of Linguistic Society (University of Victoria) for an (admittedly modifiable) alternative). More importantly, object behavior is a prediction of feature/label percolation, not an assertion of its own.
Moreover, it is not immediately clear why projection would need “saving”. Burden of proof is on you and Chomsky here, I’m afraid: feature percolation, given independently natural restrictions (see Heck 2004, for instance), makes the right empirical predictions and you suggest to literally just throw it out and still claim deriving the same. I admit this looks like Minimalism on surface, but Minimalism suggests throwing out things we can empirically do without, not just all things.
Quoting Chomsky, “we still know very little about what can happen when you throw a billion neurons into a size of a football” (the quote may not be exact). The re-statement I set up above is computationally instantiable (with search through the previously-built object but that is not insurmountable, not to mention that we may keep track of unvalued features separately, simplifying the search) and respects the tendency for overstating found in most mental processes; I fail to see what is biologically or computationally problematic about it. Evolutionarily, once we throw out the nice but unbelievable story about mental arithmetics allegedly following Peano’s axioms, there isn’t much to prefer the “simple” Merge of workspaces from your paper to the Merge+Agree tandem operation where you add something only when it fills up a feature, just as you only insert a physical object in a hole if there is a hole (or a possibility to make it). In this case, Agree would not even be a fully separate operation so much as it would be a direct consequence of Merge.
Oh, and for 5.3: (91) (Truswell’s definition of agentive event) clearly presupposes OR not AND between a. and b., but you discuss it (lines 760-764) as if there were an AND; no theory of semantic roles would suggest that (93) has two Agents, rather than an Agent and a Beneficiary (that is, sell is linguistically like give); and Fodor’s generalization is independently false because we have functional causatives and should be restated as “no more than one Agent per Voice” (trivially true if Kratzer is correct and Agents are always introduced by Voice to begin with).
That’s not clear to me, but I’m sure my reviewers will take me to task for that too.
Depends on what you take Agents, Beneficiaries, and events to be. Are they objectively real extra-mental or extra-linguistic entities/properties/facts/etc., or are they mental/linguistic constructs. Truswell spends a good portion of his book arguing that events (and therefore their attendant relations) are at least independent of language.
Fodor’s generalization comes from a paper (“Three Reasons for Not Deriving ‘Kill’ from ‘Cause to Die'”) in which he was explicitly arguing splitting up V. It’s an interesting paper and I don’t think its arguments have been fully answered. Regardless, “verb phrase” in Truswell’s rendering of the generalization means VoiceP, or vP, or v*P, depending on your theory.
Semantic roles such as Agent might be non-linguistic but they certainly cannot be non-mental. It is not a property of real world to be a willing person because there is no such thing as free will in real world, for instance. All these are mental representations, and, just as with any mental representation, some properties are highlighted and some are neglected. Thus “Alex sold Cate a car” and “Cate bought a car from Alex” do not have the same mental representation, though a world where one is true and the other is false is unimaginable; and one of the important differences, to my mind and, I think, most linguists’ minds is that in the former, the (sole) Agent is Alex and in the latter, the (sole) Agent is Cate. Whether Agent is a linguistic construct or an extra-linguistic mental construct is irrelevant for that judgment.
I don’t think I agree with any of these judgments. Except to say that I take meanings/LFs/Semantic representations/etc. to be mental/linguistic representations.
I feel like you are a)trying to use mental and linguistic as interchangeable instead of their actual hyponymic relation; b)expecting an unbelievably good correlation of mental representations and reality.
a) The actual relation between the mental and the linguistic is an open question.
b) I’m not, but standard theories of semantics seem to.
a)How so? Languages are instantiated in our brains. And there are things instantiated in our brains that are not language (visual processing, for instance). Thus, linguistic objects form a proper subset of mental objects.
b)I think you misinterpret them. When an Agent is claimed to be “a willing, consensual participant” and so on, what’s claimed is that the person constructing a sentence’s meaning represents the participant as such. “A kills B” does not suddenly start to have two Agents if the situation described was an assisted suicide of B – the Agent of the proposition reflected by this sentence is still A, this is how this sentence chooses to represent the reality.
This has gotten rather far afield, and I think we’re starting to talk past each other. So, I’ll just say this: I think much of what you’ve said fits well within “the majority opinion” within the field of generative syntax and much of what I’ve expressed is a minority position. I’m not convinced but I am open to reading arguments for some of the positions you’ve taken, but the comment section of the blog of an unemployed crank (me) is not set up for a proper airing of such arguments. I will have a look at your working paper (I assume you’re the Zelenskii of Zelenskii (2020))
Yeah, I am the author 😉
I think having a disagreement can be productive if both sides more or less understand what the other side says (can you use “each other” in this sort of sentence? I think not…). What are the positions arguments for which you are interested in?