The DP Hypothesis—a case study of a sticky idea

Recently, in service of a course I’m teaching, I had a chance to revisit and fully engage with what might be the stickiest idea in generative syntax—The DP hypothesis. For those of you who aren’t linguists, the DP hypothesis, though highly technical, is fairly simple to get the gist of based on a couple of observations:

Observation 1: Words in sentences naturally cluster together into phrases like “the toys”, “to the store”, or “eat an apple.”

Observation 2: In every phrase, there is a single main word called the head of the phrase. So, for instance, the head of the phrase “eat an apple” is the verb “eat.”

These observations are formalized in syntactic theory, so that “eat an apple” is labeled a VP (Verb Phrase), while “to the store” is a PP (Preposition Phrase). Which leads us to the DP hypothesis: Phrases like “the toys,” “a red phone,” or “my dog” should be labelled as DPs (Determiner Phrases) because their heads are “the,” “a,” and “my,” which are called determiners in modern generative syntax.

This is fairly counterintuitive, to say the least. The intuitive hypothesis—the one that pretty much every linguist accepted until the 1980s—is that those phrases are NPs (Noun Phrases), but if we only accepted intuitive proposals, there’d be no science to speak of. Indeed, the all the good scientific theories start off counterintuitive and become intuitive only by force of argument. One of the joys of theory is experiencing that shift of mind-set—it can feel like magic when done right.

So it was quite unnerving when I started reading the actual arguments for the DP hypothesis, which I had, at one point, fully bought into, and and began to become less convinced by each one. It didn’t feel like magic, it felt like a con.

My source for this is a handbook chapter by Judy Bernstein that summarizes the basic argument for the DP Hypothesis—a twofold argument consisting of a Parallelism argument and purported direct evidence of the DP Hypothesis— as previously advanced sand developed by Szabolcsi, Abney, Longobardi, Kayne, Bernstein herself, and others.

The parallelism argument is based on another counterintuitive theory developed in in the mid-20th century which states that clauses, previously considered either headless or VPs, are actually headed by abstract (i.e., silent) words. That is, they are variously considered TPs (Tense Phrases), IP’s (Inflection Phrases), or CPs (Complementizer Phrases). The parallelism argument states that “if clauses are like that, then ‘noun phrases’ be like that too” and then finds data where “noun phrases” look like clauses in some way. This might seem reasonable on its face, but it’s a complete non sequitur. Maybe the structure of a “noun phrase” parallels that of a clause, but maybe it doesn’t. In fact, there’s probably good reason to think that the structure of “noun phrases” is the inverse of the structure of the clause—the clause “projects” from the verb, and verbs and nouns are complementary, so shouldn’t the noun have complementary properties to the verb?

Following through on parallelism, if extended VPs are actually CPs, then extended NPs are DPs. Once you have that hypothesis, you can start making “predictions” and checking if the data supports them. And of course there is data that becomes easy to explain once we have the DP Hypothesis. Again, this is good as far as it goes, but there’s a key word missing—”only.” We need data that only becomes easy to explain once we have the DP Hypothesis. And while I don’t have competing analyses for the data adduced for the DP Hypothesis at the ready—though Ben Bruening has one for at least one such phenomenon—I’m not really convinced that none exist.

And that’s the foundation of the DP Hypothesis, a weak argument resting on another weak argument. Yet, it’s a sticky one—I can count on one hand the contemporary generative syntacticians that have expressed skepticism about it. Why is it so sticky? My hypothesis is that it’s useful as a shibboleth and as a “project pump”.

Its usefulness as a shibboleth is fairly straightforward—there’s no quicker way to mark yourself as a generative syntactician than to put DPs in your tree diagrams. Even I find it jarring to see NPs in trees.

To see the utility of the DP Hypothesis as a “project pump”, one need only to look at the Cartography/Nanosyntax literature. Once you open up a space for invisible functional heads between N and D, you seem to find them everywhere. This, I think, is what Chomsky meant when he described the DP Hypothesis as “…very fruitful, leading to a lot of interesting
work” before saying “I’ve never really been convinced by it.” Who cares if it’s correct, it contains infinite dissertations!

Now maybe I’m being to hard on the DP and its fans. After all, as far as theoretical avenues go, the DP Hypothesis is something of a cul de sac, albeit a large one—the core theory doesn’t really care whether “the bee” is a DP or and NP, so what’s the harm? I could point out that by maiking such a feeble hypothesis our standard, we’ve opened ourselves to being dunked on my anti-generativists. Or I could bore you with such Romantic notions as “calling all things by their right names.” Instead, I’ll be practical and point out that, contrary to contemporary digital wisdom, the world is not infinite, and every bit of real estate given to the DP cul-de-sac in the form of journal articles, conference presentations, tenure-track hires, etc. is space that could be used otherwise. And, to torture the metaphor further, shouldn’t we try to use our real estate for work with a stronger foundation?

Why are there no Cartesian products in grammar?

This post, I think, doesn’t rise above the level of “musings.” I think there’s something here, but I’m not sure if I can articulate it properly.

An adequate scientific theory is one in which facts about nature are reflected in facts about the theory. Every entity in the theory should have an analogue in nature, relations in the theory should be found in nature, and simple things in the theory should be ubiquitous in nature. This last concern is at the core of minimalist worries about movement—early theories saw movement as complex and had to explain its ubiquity, while later theories see it as simple and have to explain the constraints on it. But my concern here is not minimalist theories of syntax, but model-theoretic semantics.

Model theories of semantics often use set-theory as their formal systems,[1]Yes, I know that there are many other types of model theories put forth so if they are adequate, then ubiquitous semantic phenomena should be simply expressible in set theory, and simple set-theoretic notions should be ubiquitous in semantics. For the most part this seems to be the case—you can do a lot of semantics with membership, subset, intersection, etc.—but obviously it’s not perfect. One point of mismatch is the notion of the Cartesian product (X × Y = {⟨x, y⟩ | xX, yY }) a very straightforward notion in set-theory, but one that does not have a neat analogue in language.

What do I mean by this? Well, consider the set-theoretic statement in (1) and its natural language translation in (2).

(1) P × P ⊆ R

(2) Photographers respect themselves and each other.

What set-theory expresses in a simple statement, language does in a compound one. Or consider (3) and (4) which invert the situation

(3) (P × P) − {⟨p, p⟩ | p ∈ P} ⊆ R

(4) Photographers respect each other.

The natural language expression has gotten simpler at the expense of its set-theoretic translation. This strikes me as a problem.

If natural language semantics is best expressed as set theory (or something similar), why isn’t there a simple bound expression like each-selves with the denotation in (5)?

(5) λX.λY (Y × Y ⊆ X)

What’s more, this doesn’t seem to be a quirk of English. When I first noticed this gap, I asked some native non-English speakers—I got data from Spanish, French (Canadian and Metropolitan), Dutch, Italian, Cantonese, Mandarin, Persian, Italian, Korean, Japanese, Hungarian, Kurdish, Tagalog, Western Armenian, and Russian[2]I’d be happy to get more data if you have it. You can email me, put it in the comments, or fill out this brief questionnaire.—and got fairly consistent results. Occasionally there was ambiguity between plural reflexives and reciprocals—French se, for instance, seemed to be ambiguous—but none of the languages had an each-selves.

My suspicion—i.e. my half-formed hypothesis—is that the “meanings” of reflexives and reciprocals are entirely syntactic. We don’t interpret themselves or each other as expressions of set-theory or whatever. Rather, sentences with reflexives and reciprocals are inherently incomplete, and the particular reflexive or reciprocals tells the hearer how to complete it—themselves says “derive a sentence for each member of the subject where that member is also the object”, while each other says “for each member of the subject, derive a set of sentences where each object is one of the other members of the subject.” Setting aside the fact that this, even to me, proposal is mostly nonsense, it still predicts that there should be an each selves. Perhaps making it sensible, would fix this issue, or vice versa. Or maybe it is just nonsense, but plenty of theories started as nonsense.

Notes

Notes
1 Yes, I know that there are many other types of model theories put forth
2 I’d be happy to get more data if you have it. You can email me, put it in the comments, or fill out this brief questionnaire.

Some idle thoughts on the arguments for semantic externalism/internalism

This semester I’m teaching an intro semantics course for the first time and I decided to use Saeed’s Semantics as a textbook. Its seems like a good textbook; it gives a good survey of all the modern approaches to semantics—internalist, externalist, even so-called cognitive semantics—though the externalist bias is clear if you know what to look for. For instance, the text is quick to bring up the famous externalist thought experiments—Putnam’s robotic cats, Quine’s gavagai, etc—to undercut the internalist approaches, but doesn’t really seem to present the internalist critiques and counterarguments. So, I’ve been striving to correct that in my lectures.

While I was preparing my most recent lecture, something struck me. More precisely, I was suddenly able to put words to something that’s bothered me for a while about the whole debate: The externalist case is strongest for natural kinds, but the internalist case is strongest for human concepts. Putnam talks about cats and water, Kripke talks about tigers and gold, while Katz talks about bachelors and sometimes artifacts. This is not to say that the arguments on either side are unanswerable—Chomsky, I think has provided pretty good arguments that even, for natural kinds, our internal concepts are quite complicated, and there are many thorny issues for internalist approaches too—but they do have slightly different empirical bases, which no doubt inform their approach—if your theory can handle artifact concepts really well, you might be tempted to treat everything that way.

I don’t quite know what to make of this observation yet, but I wanted to write it down before I forgot about it.


There’s also a potential, but maybe half-baked, political implication to this observation. Natural kinds, are more or less constant in that, while they can be tamed and used by humans, we can’t really change them that much, and thinking that you can, say, turn lead into gold would mark you as a bit of a crackpot. Artifacts and social relations, on the other hand, are literally created by free human action. If you view the world with natural kinds at the center, you may be led to the view that the world has its own immutable laws that we can maybe harness, maybe adapt to, but never change.

If, on the other hand, your theory centers artifacts and social relations, then you might be led to the conclusion, as expressed by the late David Graeber, that “the ultimate hidden truth of the world is that it is something we make and could just as easily make differently.”

But, of course, I’m just speculating here.

A Response to some comments by Omer Preminger on my comments on Chomsky’s UCLA Lectures

On his blog, Omer Preminger posted some comments on my comments on Chomsky’s UCLA Lectures, in which he argues that “committing oneself to the brand of minimalism that Chomsky has been preaching lately means committing oneself to a relatively strong version of the Sapir-Whorf Hypothesis.” His argument goes as follows.

Language variation exists. To take Preminger’s example, “in Kaqchikel, the subject of a transitive clause cannot be targeted for wh-interrogation, relativization, or focalization. In English, it can.” 21st century Chomskyan minimalism, and specifically the SMT, says that this variation comes from (a) variation between the lexicon and (b) the interaction of the lexical items with either the Sensory-Motor system or the Conceptual-Intentional system. Since speakers of a language can process and pronounce some ungrammatical expressions—some Kaqchikel speakers can pronounce an equivalent of (1) but judge it as unacceptable—some instances of variation are due to the interaction of the Conceptual-Intentional system with the lexicon.

(1) It was the dog who saw the child.

It follows from this that either (a) the Conceptual-Intentional systems of English-speakers and Kaqchikel-speakers differ from each other or (b) English-speakers can construct Conceptual-Intentional objects that Kaqchikel-speakers cannot (and vice-versa, I assume). Option a, Preminger asserts, is the Sapir-Whorf hypothesis, while option b is tantamount to (a non-trivial version of) it. So, the SMT leads unavoidably to the Sapir-Whorf hypothesis.

I don’t think Preminger’s argument is sound, and even if it were, its conclusion isn’t as dire as he makes it out to be. Let’s take these one at a time in reverse order.

The version of the Sapir-Whorf hypothesis that Preminger has deduced from the SMT is something like the following—the Conceptual-Intentional (CI) content of a language is the set of all (distinct) CI objects constructed by that language and different languages have different CI content. This hypothesis, it seems, turns on how we distinguish between CI objects—far from a trivial question. Obviously contradictory, contrary, and logically independent sentences are CI-distinct from each other, as are non-mutually entailing sentences and co-extensive but non-co-intentisive expresions, but what about true paraphrases? Assuming there is some way in Kaqchikel of expressing the proposition expressed by (1), then we can avoid Sapir-Whorf by saying that paraphrases express identical CI-objects. This avoidance, however, is only temporary. Take (2) and (3), for instance.

(2) Bill sold secrets to Karla.
(3) Karla bought secrets from Karla.

If (2) and (3) map to the same CI object, what does that object “look” like? Is (2) the “base form” and (3) is converted to it or vice versa? Do some varieties of English choose (2) and others (3), and wouldn’t that make these varieties distinct languages?

If (2) and (3) are distinct, however, it frees us—and more importantly, the language learner—from having to choose a base form, but it leads us immediately to the question of what it means to be a paraphrase, or a synonym. I find this a more interesting theoretical question, than any of those raised above, but I’m willing to listen if someone thinks otherwise.

So, we end up with some version of the Sapir-Whorf hypothesis no matter which way we go. I realize this is a troubling result for many generative linguists as linguistic relativity, along with behaviourism and connectionism, is one of the deadly sins of linguistics. For me, though, Sapir-Whorf suffers from the same flaw that virtually all broad hypotheses of the social sciences suffer from—it’s so vague that it can be twisted and contorted to meet any data. In the famous words of Wolfgang Pauli, it’s not even wrong. If we were dealing with atoms and quarks, we could just ignore such a theory, but since Sapir-Whorf deals with people, we need two be a bit more careful. One need not think very hard to see how Sapir-Whorf or any other vague social hypothesis can be used to excuse, or even encourage, all varieties of discrimination and violence.

The version of Sapir-Whorf that Preminger identifies—the one that I discuss above–seems rather trivial to me, though.

There’s also a few problems with Preminger’s argument that jumped out at me, of which I’ll highlight two. First, in his discussion of the Sensory-Motor (SM) system, he seems to assume that any expression that is pronouncable by a speaker is a-ok with that speaker’s SM system—He seems to assume this because he asserts that any argument to the contrary is specious. Since the offending Kaqchikel string is a-ok with the SM system it must run afoul of either the narrow syntax (unlikely according to SMT) or the CI system. This line of reasoning, though, is flawed, as we can see by applying it’s logic to a non-deviant sentence, like the English version of (1). Following Preminger’s reasoning, the SM system tells us how to pronounce (1) and the CI system uses the structure of (1) generated by Merge for internal thought. This, however, leaves out the step of mapping the linear pronunciation of (1) to its hierarchical structure. Either (a) then Narrow Syntax does this mapping, (b) the SM system does this mapping, or (c) some third system does this mapping. Option a, of course, violates SMT, while option b contradicts Preminger’s premise, this leaves option c. Proposing a system in between pronunciation and syntax would allow us to save both SMT and Preminger’s notion of the SM system, but it would also invalidate Preminger’s over all argument.

The second issue is the assumption that non-SM ungrammaticality means non-generation. This is a common way of thinking of formal grammars, but very early on in the generative enterprise, researchers (including Chomsky) recognized that it was far to rigid—that there was a spectrum from prefect grammaticality to word salad that couldn’t be captured by the generated/not-generated dichotomy. Even without considering degrees of grammaticality, though, we can find examples of ungrammatical sentences that can be generated. Consider (4) as compared to (5).

(4) *What did who see?
(5) Who saw what?

Now, (4) is ungrammatical because wh-movement prefers to target the highest wh-expression, which suggests that in order to judge (4) as ungrammatical, a speaker needs to generate it. So, the Kaqchikel version of (1) might be generated by the grammar, but such generation would be deviant somehow.

Throughout his argument, though, Preminger says that he is only “tak[ing] Chomsky at his word”—I’ll leave that to the reader to judge. Regardless, though, if Chomsky had made such an assumptions in an argument, it would be a flawed argument, but it wouldn’t refute the SMT.

A note on an equivocation in the UCLA Lectures

In his recent UCLA Lectures, Chomsky makes the following two suggestive remarks which seem to be contradictory:

. . . [I]magine the simplest case where you have a lexicon of one element and we have the operation internal Merge. [. . . ] You have one element: let’s just give it the name zero (0). We internally merge zero with itself. That gives us the set {0, 0}, which is just the set zero. Okay, we’ve now constructed a new element, the set zero, which we call one.

p24

We want to say that [X], the workspace which is a set containing X is distinct from X.
[X] ≠ X
We don’t want to identify a singleton set with its member. If we did, the workspace itself would be accessible to MERGE. However, in the case of the elements produced by MERGE, we want to say the opposite.
{X} = X
We want to identify singleton sets with their members.

p37

So in the case of arithmetic, a singleton set ({0}, one) is distinct from its member (0), but the two are identical in the case of language. This is either a contradiction—in which case we need to eliminate one of the statements—or its an equivocation—in which case we need to find and understand the source of the error. The former option would be expedient, but the latter is more interesting. So, I’ll go with the latter.

The source of the equivocation, in my estimation, is the notion of identity—Chomsky’s remarks become consistent when we take him to be using different measures of identity and, in order to understand these distinctions, we need to dust off a rarely used dichotomy—form vs substance.

This dichotomy is perhaps best known to syntacticians due to Chomsky’s distinction between “formal universals” and “substantive universals” in Aspects, where formal universals were constraints on the types of grammatical rules in the grammar and substantive universal were constraints on the types of grammatical objects in the grammar. Now, depending on what aspect of grammar or cognition we are concerned with, the terms “form” and “substance” will pick out different notions and relations, but since we’re dealing with syntax here we can say that “form” picks out purely structural notions and relations, such as are derived by merge, while substance picks out everything else.

By extension, then, two expressions are formally identical if they are derived by the same sequences of applications of merge. This is a rather expansive notion. Suppose we derived a structure from an arbitrary array A of symbols, any structure whose derivation can be expressed by swapping the symbols in A for distinct symbols will be formally identical to the original structure. So, “The sincerity frightened the boy.” and “*The boy frightened the sincerity” would be formally identical, but, obviously, substantively distinct.

Substantive identity, though is more complex. If substance picks out everything except form, then it would pick out everything to do with the pronunciation and meaning of an expression. So, from the pronunciation side, a structurally ambiguous expression is a set of (partially) substantively identical but formally distinct sentences, as are paraphrases on the meaning side.

Turning back to the topic at hand, the distinction between a singleton set and its member is purely formal, and therein lies the resolution of the apparent contradiction. Arithmetic is purely formal, so it traffics in formal identity/distinctness. Note that Chomsky doesn’t suggest that zero is a particular object—it could be any object. Linguistic expressions, on the other hand, have form and substance. So a singleton set {LI} and its member LI are formally distinct but, since they would mean and be pronounced the same, are substantively identical.

It follows from this, I believe, that the narrow faculty of language, if it is also responsible for our faculty of arithmetic, must be purely formal—constructing expressions with no regard for their content. So, the application of merge cannot be contingent on the contents of its input, nor could an operation like Agree, which is sensitive to substance of an expression, be part of that same faculty. These conclusions, incidentally, can also be drawn from the Strong Minimalist Thesis

Colin Phillips on the Theory/Experiment divide.

Over on his blog, Colin Phillips has taken up the age-old theory vs experiment debate. The position he seems to take is that the contrast between theory and experiment is illusory and, therefore, the debate itself is wrong-headed. Here he is making what seems to be his main point:

There’s a terminological point here that is straightforward. Nobody own [sic] the term “theory”. All flavors of linguist are using evidence and reasoning to build generalizable accounts of how the human language system works. We all use empirical evidence, and we all develop theories. The distinction between theoreticians and experimentalists is largely a myth. Sometimes our experiments are so easy that we’re embarrassed to label them as experiments (e.g., “Does that sentence sound better to me if I take out the complementizer?”). Sometimes the experiments take a long, long time, so we get to spend less time thinking about the theoretical questions. But it’s all basically the same thing.

“Theories all the way down” by Colin Phillips

This quote includes a few mistakes which tend to muddle the debate. The first is the focus on whether a person can be strictly a theoretician or an experimentalist. Phillips says “no” and I would tend to agree, because as humans we all contain multitudes, to paraphrase Walt Whitman. It doesn’t follow from this, though, that theory and experiment are the same thing. Creators can be critics, and producers can be consumers, but this does not negate the contrasts between art and criticism, between production and consumption.

The second mistake, and this is a widespread mistake in linguistics, is that he seems to miscategorize the pen-and-paper empirical method of old-school linguistics as theoretical. Norbert Hornstein has posted about this error on his blog, a number of times, adopting from Robert Chametzky a three-way distinction between analytical, theoretical, and metatheoretical work. As Hornstein argues, most of what we call theoretical syntax, is better described as analytical—it applies theoretical constructs to data with the dual effect of testing the constructs and making sense of the data. To be sure this trichotomy takes for granted the data -gathering method, and it would be interesting to think about how that could be related to analysis. Are they independent of each other, or is the gathering a proper subpart of the analysis? Either way, I would agree with Phillips that “experimental” and “pen-and-paper” work ought to be grouped together, but I disagree that either is theoretical work.

Theoretical work is a a different beast that presents its own endemic challenges—difficulties that more analytical work does not have to address. Blurring the line between the two types of work, however, introduces additional hurdles. These hurdles usually take the form of conferences, journals, and job postings, which declare themselves to be “theoretical” but are in actuality mainly analytical. This ends up crowding out truly theoretical work which any science needs at least as much as experimental work in order to progress and flourish.

To close, why bother arguing about language use? Isn’t it fluid—always changing? I suppose it is, but I don’t particularly care what we call theory or analysis or experiment, but I do care that we recognize the distinctions between them. Please forgive the piety, but I’m a sucker for an aphorism: As Plato said, the goal of inquiry is to carve Nature at its joints, and as Confucius said “The beginning of wisdom is to call things by their proper name.”