New LingBuzz Paper

(or “How I’ve been spending my unemployment*”)

Yesterday I finished and posted a paper to LingBuzz. It’s titled “Agree as derivational operation: Its definition and discontents” and its abstract is given below. If it sounds interesting, have a look and let me know what you think.

Using the framework laid out by Collins and Stabler (2016), I formalize Agree as a syntactic operation. I begin by constructing a formal definition a version of long-distance Agree in which a higher object values a feature on a lower object, and modify that definition to reflect various several versions of Agree that have been proposed in the “minimalist” literature. I then discuss the theoretical implications of these formal definitions, arguing that Agree (i) muddies our understanding of the evolution of language, (ii) requires a new conception of the lexicon, (iii) objectively and significantly increases the complexity of syntactic derivations, and (iv) unjustifiably violates NTC in all its non-vacuous forms. I conclude that Agree, as it is commonly understood, should not be considered a narrowly syntactic operation.

*Thanks to the Canada Recovery Benefit, I was able to feed myself and make rent while I wrote this.

A Response to some comments by Omer Preminger on my comments on Chomsky’s UCLA Lectures

On his blog, Omer Preminger posted some comments on my comments on Chomsky’s UCLA Lectures, in which he argues that “committing oneself to the brand of minimalism that Chomsky has been preaching lately means committing oneself to a relatively strong version of the Sapir-Whorf Hypothesis.” His argument goes as follows.

Language variation exists. To take Preminger’s example, “in Kaqchikel, the subject of a transitive clause cannot be targeted for wh-interrogation, relativization, or focalization. In English, it can.” 21st century Chomskyan minimalism, and specifically the SMT, says that this variation comes from (a) variation between the lexicon and (b) the interaction of the lexical items with either the Sensory-Motor system or the Conceptual-Intentional system. Since speakers of a language can process and pronounce some ungrammatical expressions—some Kaqchikel speakers can pronounce an equivalent of (1) but judge it as unacceptable—some instances of variation are due to the interaction of the Conceptual-Intentional system with the lexicon.

(1) It was the dog who saw the child.

It follows from this that either (a) the Conceptual-Intentional systems of English-speakers and Kaqchikel-speakers differ from each other or (b) English-speakers can construct Conceptual-Intentional objects that Kaqchikel-speakers cannot (and vice-versa, I assume). Option a, Preminger asserts, is the Sapir-Whorf hypothesis, while option b is tantamount to (a non-trivial version of) it. So, the SMT leads unavoidably to the Sapir-Whorf hypothesis.

I don’t think Preminger’s argument is sound, and even if it were, its conclusion isn’t as dire as he makes it out to be. Let’s take these one at a time in reverse order.

The version of the Sapir-Whorf hypothesis that Preminger has deduced from the SMT is something like the following—the Conceptual-Intentional (CI) content of a language is the set of all (distinct) CI objects constructed by that language and different languages have different CI content. This hypothesis, it seems, turns on how we distinguish between CI objects—far from a trivial question. Obviously contradictory, contrary, and logically independent sentences are CI-distinct from each other, as are non-mutually entailing sentences and co-extensive but non-co-intentisive expresions, but what about true paraphrases? Assuming there is some way in Kaqchikel of expressing the proposition expressed by (1), then we can avoid Sapir-Whorf by saying that paraphrases express identical CI-objects. This avoidance, however, is only temporary. Take (2) and (3), for instance.

(2) Bill sold secrets to Karla.
(3) Karla bought secrets from Karla.

If (2) and (3) map to the same CI object, what does that object “look” like? Is (2) the “base form” and (3) is converted to it or vice versa? Do some varieties of English choose (2) and others (3), and wouldn’t that make these varieties distinct languages?

If (2) and (3) are distinct, however, it frees us—and more importantly, the language learner—from having to choose a base form, but it leads us immediately to the question of what it means to be a paraphrase, or a synonym. I find this a more interesting theoretical question, than any of those raised above, but I’m willing to listen if someone thinks otherwise.

So, we end up with some version of the Sapir-Whorf hypothesis no matter which way we go. I realize this is a troubling result for many generative linguists as linguistic relativity, along with behaviourism and connectionism, is one of the deadly sins of linguistics. For me, though, Sapir-Whorf suffers from the same flaw that virtually all broad hypotheses of the social sciences suffer from—it’s so vague that it can be twisted and contorted to meet any data. In the famous words of Wolfgang Pauli, it’s not even wrong. If we were dealing with atoms and quarks, we could just ignore such a theory, but since Sapir-Whorf deals with people, we need two be a bit more careful. One need not think very hard to see how Sapir-Whorf or any other vague social hypothesis can be used to excuse, or even encourage, all varieties of discrimination and violence.

The version of Sapir-Whorf that Preminger identifies—the one that I discuss above–seems rather trivial to me, though.

There’s also a few problems with Preminger’s argument that jumped out at me, of which I’ll highlight two. First, in his discussion of the Sensory-Motor (SM) system, he seems to assume that any expression that is pronouncable by a speaker is a-ok with that speaker’s SM system—He seems to assume this because he asserts that any argument to the contrary is specious. Since the offending Kaqchikel string is a-ok with the SM system it must run afoul of either the narrow syntax (unlikely according to SMT) or the CI system. This line of reasoning, though, is flawed, as we can see by applying it’s logic to a non-deviant sentence, like the English version of (1). Following Preminger’s reasoning, the SM system tells us how to pronounce (1) and the CI system uses the structure of (1) generated by Merge for internal thought. This, however, leaves out the step of mapping the linear pronunciation of (1) to its hierarchical structure. Either (a) then Narrow Syntax does this mapping, (b) the SM system does this mapping, or (c) some third system does this mapping. Option a, of course, violates SMT, while option b contradicts Preminger’s premise, this leaves option c. Proposing a system in between pronunciation and syntax would allow us to save both SMT and Preminger’s notion of the SM system, but it would also invalidate Preminger’s over all argument.

The second issue is the assumption that non-SM ungrammaticality means non-generation. This is a common way of thinking of formal grammars, but very early on in the generative enterprise, researchers (including Chomsky) recognized that it was far to rigid—that there was a spectrum from prefect grammaticality to word salad that couldn’t be captured by the generated/not-generated dichotomy. Even without considering degrees of grammaticality, though, we can find examples of ungrammatical sentences that can be generated. Consider (4) as compared to (5).

(4) *What did who see?
(5) Who saw what?

Now, (4) is ungrammatical because wh-movement prefers to target the highest wh-expression, which suggests that in order to judge (4) as ungrammatical, a speaker needs to generate it. So, the Kaqchikel version of (1) might be generated by the grammar, but such generation would be deviant somehow.

Throughout his argument, though, Preminger says that he is only “tak[ing] Chomsky at his word”—I’ll leave that to the reader to judge. Regardless, though, if Chomsky had made such an assumptions in an argument, it would be a flawed argument, but it wouldn’t refute the SMT.

A note on an equivocation in the UCLA Lectures

In his recent UCLA Lectures, Chomsky makes the following two suggestive remarks which seem to be contradictory:

. . . [I]magine the simplest case where you have a lexicon of one element and we have the operation internal Merge. [. . . ] You have one element: let’s just give it the name zero (0). We internally merge zero with itself. That gives us the set {0, 0}, which is just the set zero. Okay, we’ve now constructed a new element, the set zero, which we call one.

p24

We want to say that [X], the workspace which is a set containing X is distinct from X.
[X] ≠ X
We don’t want to identify a singleton set with its member. If we did, the workspace itself would be accessible to MERGE. However, in the case of the elements produced by MERGE, we want to say the opposite.
{X} = X
We want to identify singleton sets with their members.

p37

So in the case of arithmetic, a singleton set ({0}, one) is distinct from its member (0), but the two are identical in the case of language. This is either a contradiction—in which case we need to eliminate one of the statements—or its an equivocation—in which case we need to find and understand the source of the error. The former option would be expedient, but the latter is more interesting. So, I’ll go with the latter.

The source of the equivocation, in my estimation, is the notion of identity—Chomsky’s remarks become consistent when we take him to be using different measures of identity and, in order to understand these distinctions, we need to dust off a rarely used dichotomy—form vs substance.

This dichotomy is perhaps best known to syntacticians due to Chomsky’s distinction between “formal universals” and “substantive universals” in Aspects, where formal universals were constraints on the types of grammatical rules in the grammar and substantive universal were constraints on the types of grammatical objects in the grammar. Now, depending on what aspect of grammar or cognition we are concerned with, the terms “form” and “substance” will pick out different notions and relations, but since we’re dealing with syntax here we can say that “form” picks out purely structural notions and relations, such as are derived by merge, while substance picks out everything else.

By extension, then, two expressions are formally identical if they are derived by the same sequences of applications of merge. This is a rather expansive notion. Suppose we derived a structure from an arbitrary array A of symbols, any structure whose derivation can be expressed by swapping the symbols in A for distinct symbols will be formally identical to the original structure. So, “The sincerity frightened the boy.” and “*The boy frightened the sincerity” would be formally identical, but, obviously, substantively distinct.

Substantive identity, though is more complex. If substance picks out everything except form, then it would pick out everything to do with the pronunciation and meaning of an expression. So, from the pronunciation side, a structurally ambiguous expression is a set of (partially) substantively identical but formally distinct sentences, as are paraphrases on the meaning side.

Turning back to the topic at hand, the distinction between a singleton set and its member is purely formal, and therein lies the resolution of the apparent contradiction. Arithmetic is purely formal, so it traffics in formal identity/distinctness. Note that Chomsky doesn’t suggest that zero is a particular object—it could be any object. Linguistic expressions, on the other hand, have form and substance. So a singleton set {LI} and its member LI are formally distinct but, since they would mean and be pronounced the same, are substantively identical.

It follows from this, I believe, that the narrow faculty of language, if it is also responsible for our faculty of arithmetic, must be purely formal—constructing expressions with no regard for their content. So, the application of merge cannot be contingent on the contents of its input, nor could an operation like Agree, which is sensitive to substance of an expression, be part of that same faculty. These conclusions, incidentally, can also be drawn from the Strong Minimalist Thesis

Internal unity in science again

Or, how to criticize a scientific theory

Recently, I discovered a book called The Primacy of Grammar by philosopher Nirmalangshu Mukherji. The book is basically an extended, and in my opinion quite good, apologia for biolinguistics as a science. The book is very readable and covers a decent amount of ground, including an entire chapter discussing the viability of incorporating a faculty of music into biolinguistic theory. I highly recommend it.

At one point, while defending biolinguistics from the charge of incompleteness levied by semanticists and philosophers, Mukherji makes the following point.

[D]uring the development of a science, a point comes when our pretheoretical expectations that led to the science in the first place have changed enough, and have been accommodated enough in the science for the science to define its objects in a theory-internal fashion. At this point, the science—viewed as a body of doctrines—becomes complete in carving out some specific aspect of nature. From that point on, only radical changes in the body of theory itself—not pressures from common sense—force further shifting of domains (Mukherji 2001). In the case of grammatical theory, either that point has not been reached or … the point has been reached but not yet recognized.

Mukherji (2010, 122-3)

There are two interesting claims that Mukherji is making about linguistic theory and scientific theory in general. One is that theoretical objects are solely governed by theory-internal considerations. The other is that the theory itself determines what in the external world it applies to.

The first claim reminded me of a meeting I had with my doctoral supervisor while I was writing my thesis. My theoretical explanation rested on the hypothesis that even the simplest of non-function words, like coffee, were decomposable into root objects (√COFFEE) and categorizing heads (n0). I had a dilemma though. It was crucial to my argument that, while categorizing heads had discrete features, roots were treated as featureless blobs by the grammar, but I couldn’t figure out how to justify such a claim. When I expressed my concern to my supervisor, she immediately put my worries to rest. I didn’t need to justify that claim, she pointed out, because roots by their definition have no features.

I had fallen into a very common trap in syntax—I had treated a theory-internal object as an empirical object. Empirical objects can be observed and sensibly argued about. Take, for instance, English specificational clauses (e.g. The winner is Mary). Linguists can and do argue about the nature of these—i.e. whether or they are truly the inverse of predicational clauses (e.g., Mary is the winner)— and cite facts the do so. This is because empirical objects and phenomena are out there in the real world, regardless of whether we study them. Theory-internal objects, on the other hand are not subject to fact-based argument, because, unless the Platonists are right, they have no objective reality. As long as my theory is internally consistent, I can define its objects however I damn please. The true test of any theory is how well it can be mapped onto some aspect of reality.

This brings me to Mukherji’s second assertion, that the empirical domain to a theory is determined by the theory itself. In the context of his book, this assertion is about linguistic meaning. The pretheoretic notion of meaning is what he calls a “thick” notion—a multifaceted concept that is very difficult to pin down. The development of a biolinguistic theory of grammar, though, has led to a thinner notion of meaning, namely, the LF of a given expression. Now obviously, this notion of meaning doesn’t include notions of reference, truth, or felicity, but why should we expect it to? Yes, those notions belong to our common-sense ideas of meaning, but surely at this stage of human history, we should expect that scientific inquiry will reveal our common-sense notions to be flawed.

As an analogy, Aristotle and his contemporaries didn’t distinguish between physics, biology, chemistry, geology, an so on—they were all part of physics. One of the innovations of the scientific revolutions, then, was to narrow the scope of investigation—to develop theories of a sliver of nature. If Aristotle saw our modern physics departments, he might look past all of their fantastic theoretical advances and wonder instead why no one in the department was studying plants and animals. Most critiques of internalist/biolinguistic notions of semantics by modern philosophers and formal semanticists echo this hypothetical time-travelling Aristotle—they brush off any advances and wonder where the theory of truth is.

Taken together, these assertions imply a general principle: Scientific theories should be assessed on their own terms. Criticizing grammatical theory for its lack of a theory of reference makes as much sense as criticizing Special Relativity for its lack of a theory of genetic inheritance. While this may seem to render any theory beyond criticism, the history of science demonstrates that this isn’t the case. Consider, for instance, quantum mechanics, which has been subject to a number of criticisms in its own terms—see: Einstein’s criticisms of QM, Schrödinger’s cat, and the measurement problem. In some cases these criticisms are insurmountable, but in others addressing them head-on and modifying or clarifying the theory is what leads to advances in the theory. Chomsky’s Label Theory, I think, is one of the latter sorts of cases—a theory-internal problem was identified and addressed and as a result two unexplained phenomena (the EPP and the ECP) were given a theoretical explanation. We can debate how well that explanation generalizes and whether it leans too heavily on some auxiliary hypotheses, but what’s important is that a theory-internal addressing of a theory-internal problem opened up the possibility of such an explanation. This may seem wildly counter-intuitive, but as I argued in a previous post, this is the only practical way to do science.

The principle that a theory should be criticized in its own terms is, I think, what irks the majority of linguists about biolinguistic grammatical theory the most. It bothers them because it means that very few of their objections to the theory ever really stick. Ergativity, for instance, is often touted as a serious problem for Abstract Case Theory, but since grammatical theory has nothing to say about particular case alignments, theorists can just say “Yeah, that’s interesting” and move on. Or to take a more extreme case, recent years have seen all out assaults on grammatical theory from people who bizarrely call themselves “cognitive linguists”, people like Vyvyan Evans and Daniel Everett, they claim to have evidence that roundly refutes the very notion of a language faculty. The response of biolinguists to this assault: mostly a resounding shrug as we turn back to our work.

So, critics of biolinguistic grammatical theory dismiss it in a number of way. They say it’s too vague or slippery to be any good as a theory, which usually means they refuse to seriously engage with it, they complain that the theory keeps changing—a peculiar complaint to lodge against a scientific theory, or they accuse theorists of arrogance—a charge that, despite being occasionally true, is not a criticism of the theory. This kind of hostility can be bewildering, especially because a corollary of the idea that a theory defines its own domain is that everything outside that domain is a free-for-all. It’s hard to imagine a geneticist being upset that their data is irrelevant to Special Relativity. I have some ideas about where the hostility comes from but they’ll take me pretty far afield, so I’ll save them for a later post and leave it here.

Self-Promotion: I posted a manuscript to Lingbuzz.

Hi all,

I’ve been working on a paper for a few months and it’s finally reached the point where I need to show it to some people who can tell me whether or not I’m crazy. To that end, I posted it on LingBuzz.

It’s called “A workspace-based theory of adjuncts,” and be forewarned it’s pretty technical. So if you’re just here for my hot takes on why for-profit rent is bad, or what kind of science generative syntax is, or the like, it might not be for you.

If it is for you, and you have any comments on it, please let me know.

Happy reading!

What kind of a science is Generative Syntax?

Recently, I found myself reading Edmund Husserl’s Logical Investigations. I didn’t make it that far into it—the language is rather abstruse—but included in the fragments of what I did read was a section in which Husserl clarified something that I’ve been thinking about recently, which is the place of theory in a science. In the section in question, Husserl defines a science as a set of truths that belong together. So, the truths of physics belong together, and the truths of economics belong together, but the former and the latter don’t belong together. But what does it mean, Husserl asks, for truths to belong together?

Husserl’s answer is that it can mean one of two things. Either truths belong together because they share an internal unity or because they share an external unity. Truths—that is, true propositions—are linked by an internal unity if they are logically related. So, a theorem and the axioms that it is derived from share an internal unity, as would two theorems derived from a set of internally consistent axioms, and so on. The type of science characterized by internal unity, Husserl calls abstract, explanatory, or theoretical science. This class would include arithmetic, geometry, most modern physics, and perhaps other fields.

A set of truths has external unity if the members of the set are all about the same sort of thing. So, geography, political science, history, pre-modern physics, and so on would be the class of sciences characterized by external unity. Husserl calls these descriptive sciences.

When I read the description of this dichotomy, I was struck both by how simple and intuitive it was, and by how meaningful it was, especially compared to the common ways we tend to attempt to divide up the sciences (hard sciences vs soft sciences, science vs social science, etc). the distinction also happens to neatly divide fields of inquiry into those that generate predictions (theoretical sciences) and those that do not (descriptive sciences). Why does a theoretical science generate predictions while a descriptive one does not? Well consider the starting point of either of the two. A theoretical science, requiring internal unity, would start with axioms, which can be any kind of propositions, including universal propositions (e.g., “Every number has a successor”, “”No mass can be created or destroyed.”). On the other hand, a descriptive science, which require external unity, would start with observable facts, which must be particular propositions (e.g., “The GDP of the Marshall Islands rose by 3% last year”, “That ball fell for 5 seconds”). This matters because deductive reasoning is only possible if a systems has at least some universal premises. So, a theoretical science generates theorems, which constitute the predictions of that science. A descriptive science, on the other hand, is limited to inductive reasoning which at best generates expectations. The difference being that if a theorem/prediction is false, then at least one of the axioms that it is derived from must be false, while if an expectation is false, it doesn’t mean that the facts that “generated” that expectation are false.

Turning to the question I asked in my title, what kind of science is Generative Syntax (GS)? My answer is that there are actually two sciences—one theoretical, one descriptive—that answer to the name Generative Syntax, and that most of the current work is of the latter type. Note, I don’t mean to distinguish between experimental/corpus/field syntax and what’s commonly called “theoretical syntax”. Rather, I mean to say that, even if we restrict ourselves to “theoretical syntax,” most of the work being done today is part of a descriptive science in Husserl’s terminology. To be more concrete, let me consider two currently open fields of inquiry within GS. One which is quite active—Ergativity, and one which is less popular—Adjuncts.

Ergativity, for the uninitiated, is a phenomenon having to do with grammatical case. In English, a non-ergative language, pronouns come in two cases: nominative (I, he, she, they, etc), which is associated with subjects, and accusative (me, him, her, them, etc) which is associated with objects. An ergative language, also has two cases: ergative, which is associated with subjects of transitive verbs, and absolutive which is associated with objects of transitives and subjects of intransitives. To be sure, this is an oversimplification, and ergativity has been found to be associated with many other phenomena that don’t occur in non-ergative languages. Details aside, suppose we wanted to define a science of ergativity or, more broadly, a science of case alignment in Husserl’s terminology. What sort of unity would it have? I contend that it has only external unity. That is, it is a descriptive science. It begins with the fact that the case systems of some languages are different from the case systems that most linguistics students are used to. Put another way, if English were an ergative language, linguists would be puzzling over all these strange languages where the subjects always had the same case.

Adjuncts, a fancy term for modifiers, are the “extra” parts of sentences: adjectives and adverbs, the things newspaper editors hate. Adjuncts contrast with arguments (subjects, objects, etc) and predicates, which each sentence needs and needs in a particular arrangement. So, the sentences “She sang the song with gusto after dinner” and “She sang the song after dinner with gusto” are essentially identical, but “She sang the song” and “The song sang her” are wildly different. On its face, this is not particularly interesting—adjuncts are commonplace—but every unified theory of GS predicts that adjuncts should not exist. Take the current one, commonly called minimalism. according to this theory sentences are constructed by iterated application of an operation called Merge, which simply takes two words or phrases and creates a new phrase (Merge(X, Y) → {X, Y}≠X≠Y). It follows from this that “She sang the song” and “The song sang her” are meaningfully distinct but it also follows (falsely) that “She sang the song with gusto after dinner” and “She sang the song after dinner with gusto” are also meaningfully different. From this perspective, the study of adjuncts doesn’t constitute a science in itself, but rather it is part of a science with internal unity, a theoretical science.

So, despite the fact that research on ergativity and research on adjuncts both tend to be described as theoretical syntax in GS, the two are completely different sorts of sciences. Inquiry into the nature of adjuncts forms part of the theoretical science of syntax, while work on ergativity and, I would conjecture, the majority of current work that is called “theoretical syntax”, its use of formalisms and hypotheses notwithstanding, forms a descriptive science, which would be a part of a larger descriptive science.

Both sorts of science are valuable and, in fact, often complement each other. Accurate descriptions of the heavens were vital for early modern physicists to develop their theoretical models of mechanics, and novel theories often furnish descriptivists with better technology to aid their work. Where we get into trouble is when we confuse the two sorts of sciences. There’s an argument to be made, and and it has been made by John Ralston Saul in his book Voltaire’s Bastards, that many of the problems in our society stem from insisting that descriptive social sciences, such as international relations, economics, and law, and even much of the humanities have been treated like theoretical sciences.

Turning back to syntax and taking a micro view, why am I grinding this axe? Well, I have two main reasons: one selfish, the other more altruistic. The selfish reason is that I am a theoretician in a descriptivist’s world. This manifests itself in a number of ways, but I’ll just highlight the immediate one for me: the job market. The academic job market is insanely competitive, and PhD students are expected at least to present at conferences in order to make a name for themselves. This is a problem because (a) there are no theoretical syntax conferences and (b) a standard 20 minute talk, while often sufficient to present a new descriptive analysis of a phenomenon, is not ideal for presenting theoretical work.

Beyond that, I think the confusion of the two sorts of sciences can exacerbate imposter syndrome, especially in graduate students. It took me a while to figure out why I had such a hard time understanding some of my colleagues’ work, and why some papers on “theoretical syntax” had such wildly different characters, arguments, and styles from others. I eventually figured it out, but every so often I see a grad student struggling to make sense of the field and I just want to tell them that they’re not wrong, the field doesn’t really make sense, because it’s actually two fields.

Colin Phillips on the Theory/Experiment divide.

Over on his blog, Colin Phillips has taken up the age-old theory vs experiment debate. The position he seems to take is that the contrast between theory and experiment is illusory and, therefore, the debate itself is wrong-headed. Here he is making what seems to be his main point:

There’s a terminological point here that is straightforward. Nobody own [sic] the term “theory”. All flavors of linguist are using evidence and reasoning to build generalizable accounts of how the human language system works. We all use empirical evidence, and we all develop theories. The distinction between theoreticians and experimentalists is largely a myth. Sometimes our experiments are so easy that we’re embarrassed to label them as experiments (e.g., “Does that sentence sound better to me if I take out the complementizer?”). Sometimes the experiments take a long, long time, so we get to spend less time thinking about the theoretical questions. But it’s all basically the same thing.

“Theories all the way down” by Colin Phillips

This quote includes a few mistakes which tend to muddle the debate. The first is the focus on whether a person can be strictly a theoretician or an experimentalist. Phillips says “no” and I would tend to agree, because as humans we all contain multitudes, to paraphrase Walt Whitman. It doesn’t follow from this, though, that theory and experiment are the same thing. Creators can be critics, and producers can be consumers, but this does not negate the contrasts between art and criticism, between production and consumption.

The second mistake, and this is a widespread mistake in linguistics, is that he seems to miscategorize the pen-and-paper empirical method of old-school linguistics as theoretical. Norbert Hornstein has posted about this error on his blog, a number of times, adopting from Robert Chametzky a three-way distinction between analytical, theoretical, and metatheoretical work. As Hornstein argues, most of what we call theoretical syntax, is better described as analytical—it applies theoretical constructs to data with the dual effect of testing the constructs and making sense of the data. To be sure this trichotomy takes for granted the data -gathering method, and it would be interesting to think about how that could be related to analysis. Are they independent of each other, or is the gathering a proper subpart of the analysis? Either way, I would agree with Phillips that “experimental” and “pen-and-paper” work ought to be grouped together, but I disagree that either is theoretical work.

Theoretical work is a a different beast that presents its own endemic challenges—difficulties that more analytical work does not have to address. Blurring the line between the two types of work, however, introduces additional hurdles. These hurdles usually take the form of conferences, journals, and job postings, which declare themselves to be “theoretical” but are in actuality mainly analytical. This ends up crowding out truly theoretical work which any science needs at least as much as experimental work in order to progress and flourish.

To close, why bother arguing about language use? Isn’t it fluid—always changing? I suppose it is, but I don’t particularly care what we call theory or analysis or experiment, but I do care that we recognize the distinctions between them. Please forgive the piety, but I’m a sucker for an aphorism: As Plato said, the goal of inquiry is to carve Nature at its joints, and as Confucius said “The beginning of wisdom is to call things by their proper name.”

On the general character of semantic theory (Part b)

(AKA Katz’s Semantic Theory (Part IIIb). This post discusses the second half of chapter 2 of Jerrold Katz’s 1972 opus. For my discussion of the first half of the chapter, go here.

(Note: This post was written in fits and starts, which is likely reflected in its style (or lack thereof). My apologies in advance)

The first half of chapter 2 was concerned with the broader theory of language, rather than a semantic theory. In the second half of the chapter, Katz begins his sketch of the theory of semantics. It’s at this point that I pick up my review.

4. The structure of the theory of language

In this section, Katz discusses universals, which he frames, following Chomsky, as constraints on grammars. Katz differs from Chomsky, though, in how he divvies up the universals—whereas Chomsky, in Aspects, distinguishes between formal and substantive universals, Katz adds a third type: organizational universals. These classifications are defined as follows:

Formal universals constrain the form of the rules in a grammar; substantive universals provide a theoretical vocabulary from which the constructs used to formulate the rules of particular grammars are drawn; organizational universals, of which there are two subtypes, componential organizational universals and systematic organizational universals, specify the interrelations among the rules and among the systems of rules within a grammar.

p30-31

Furthermore, formal, substantive, and componential universals cross-classify with phonological, syntactic, and semantic universals. This means that we can talk about substantive phonological universals, or componential semantic universals, and so on. So, for example, a phonological theory consists in a specification of the formal, substantive, and componential universals at the phonological level, and such a specification amounts to a definition of the phonological component of the language faculty. Systematic universals, then, specify how the components of the grammar are related to each other. With this discussion, Katz sets up his goals: to specify the formal, substantive, and componential universals at the semantic level. More precisely, he aims to develop the following:

(2.7) A scheme for semantic representation consisting of a theoretical vocabulary from which semantic constructs required in the formulation of particular semantic interpretations can be drawn

p33

(2.8) A specification for the form of the dictionary and a specification of the form of the rules that project semantic representations for complex syntactic constituents from the dictionary’s representations of the senses of their minimal syntactic parts.

p33

(2.9) A specification of the form of the semantic component, of the relation between the dictionary and the projection rules, and of the manner in which these rules apply in assigning semantic representations

p3

These three aspects of semantic theory, according to Katz, represent the substantive, formal, and componential universals, respectively. A theory that contains (2.7)-(2.9), and answers questions 1-15 (as listed here) would count as an adequate semantic theory.

5. Semantic theory’s model of a semantic component

So, Katz asks rhetorically, how could it be that semantic relations, such as analyticity, synonymy, or semantic similarity, be captured in the purely formal terms required by (2.7)-(2.9)? The answer is simple: semantic relations and properties are merely formal aspects of compositional meanings of expressions. This is a bold and controversial claim: Semantic properties/relations are formal properties/relations or, to put it more strongly semantic properties/relations are, in fact, syntactic properties/relations (where “syntactic” is used is a very broad sense). Of course, this claim is theoretical and rather coarse. Katz aims to make it empirical and fine.

So, what does Katz’s semantic theory consist of? At the broadest level, it consists of a dictionary and a set of projection rules. No surprise yet; it’s a computational theory, and any computational system consists of symbols and rules. The dictionary contains entries for every morpheme in a given language, where each entry is a collection of the senses of that morpheme. Finally he defines two “technical terms.” The first is a reading which refers “a semantic representation of a sense of a morpheme, word, phrase, clause, or sentence and which is further divided into lexical readings and derived readings. The second term is semantic marker which refers to “the semantic representation of one or another of the concepts that appear as parts of senses.” Katz then continues, identifying the limiting case of semantic marker: primitive semantic markers.

Here it’s worth making a careful analogy to syntactic theory. Semantic markers, as their name suggests, are analogous to phrase markers. Each are representations of constituency: a phrase marker represents the syntactic constituents of an expression while a semantic marker represents the conceptual constituents of a concept. In each theory there are base cases of the markers: morphemes in syntactic theory and aptly named primitive semantic markers. I must stress, of course that this is only an analogy, not an isomorphism. Morphemes are not mapped to primitive semantic markers, and vice versa. Just as a simple morpheme can be phonologically complex, it can also be semantically complex. Furthermore, as we’ll see shortly, while complex semantic markers are structured, there is no reason to expect them to be structured according to the principles of syntactic theory.

Before Katz gets to the actual nitty-gritty of formalizing these notions, he pauses to discuss ontology. He’s a philosopher, after all. Semantic markers are representations of concepts and propositions, but what are concepts and propositions? Well, we can be sure of some things that they are not: images, mental ideas, and particular thoughts which Katz groups together as what calls cognitions. Cognitions, for Katz, are concrete, meaning they can be individuated by who has them, when and where they occur, and so on. If you and I have the same thought (e.g., “Toronto is the capital of Ontario”) then we had different cognitions. Concepts and propositions, for Katz, are abstract objects and, therefore, independent of space and time, meaning they can’t be individuated by their nonexistent spatiotemporal properties. They can, however, be individuated by natural languages, which Katz also takes to be abstract objects, and, in fact, are individuated easily by speakers of natural languages. Since, in a formulation echoed recently by Paul Pietroski (at around 5:45), “senses are concepts and propositions connected with phonetic (or orthographic) objects in natural languages” and the goal of linguistic theory is to construct grammars that model that connection, the question of concept- and proposition-individuation is best answered by linguistic theory.1

But, Katz’s critics might argue, individuation of concepts and propositions is not definition of “concept” or “proposition”. True, Katz responds, but so what? If we needed to explicitly define the object of our study before we started studying it, we wouldn’t have any science. He uses the example of Maxwell’s theory of electromagnetism which accurately models the behaviour and structural properties of electromagnetic waves but does not furnish any definition of electromagnetism. So if we can come up with a theory that accurately models the behaviour and structural properties of concepts and propositions, why should we demand a definition?

We also can’t expect a definition of “semantic marker” or “reading” right out of the gate. In fact, Katz argues, one of the goals of semantic theory (2.7) is to come up with those definitions and we can’t expect to have a complete theory in order to develop that theory. Nevertheless, we can use some basic intuitions to come up with a preliminary sketch of what a reading and a semantic marker might look like. For instance, the everyday word/concept “chair”, has a common sense, which is composed of subconcepts and can be represented as the set of semantic markers in (2.15).

(2.15) (Object), (Physical), (Non-living), (Artifact),
       (Furniture), (Portable), (Something with legs),
       (Something with a back), (Something with a seat),
       (Seat for one)

Of course, this is just preliminary. Katz identifies a number of places for improvement. Each of the semantic markers is likely decomposable into simple markers. Even the concept represented by “(Object)” is likely decomposable.

Or, Katz continues, we can propose that semantic markers are ways of making semantic generalizations. Katz notes that when we consider how “chair” relates to words such as “hat,” “planet,” “car,” and “molecule” compared to words such as “truth,” “thought,” “togetherness,” and “feeling.” Obviously, these words all denote distinct concepts, but just as obviously, the two groupings contrast with each other. We can think of the semantic marker “(Object)” as the distinguishing factor in these groupings: the former is a group of objects, the latter a group of non-objects. So, semantic markers, like phonological features and grammatical categories, are expressions of natural classes.

Finally, Katz proposes a third way of thinking of semantic markers: “as symbols that mark the components of senses of expressions on which inferences from sentences containing the expressions depend.” (p41) For instance we can infer (2.19) from (2.18), but we can’t infer (2.27).

(2.18) There is a chair in the room.

(2.19) There is a physical object in the room.

(2.27) There is a woman in the room.

We can express this inference pattern by saying that every semantic marker that comprises the sense of “physical object” in (2.19) is contained in the sense of “chair” in (2.18), but that is not the case for “woman” in (2.27). The sense of “woman” in (2.27) contains semantic markers like “(Female)” which are not contained in the sense of chair in (2.18). Here Katz notes that his proposal that concepts like “chair” consist of markers is merely an extension of an observation by Frege that (2.28a,b,c) are together equivalent to (2.29)

(2.28)
(a) 2 is a positive number
(b) 2 is a whole number
(c) 2 is less than 10

(2.29) 2 is a positive whole number less than 10

For Frege, “positive number”, “whole number”, and “less than 10” are all properties of “2” and marks of “positive whole number less than 10”. Katz’s extension is to say that the concepts associated with simple expressions can have their own marks.

Next, Katz discusses the notions of derived and lexical readings which are, in a sense, the inputs and outputs, respectively, of the process of semantic composition. As the name suggests, lexical readings are what is stored in the dictionary. When a syntactic object hits the semantic component of the grammar, the first step is to replace the terminal nodes with their lexical readings. Derived readings are generated by applying projection rules to the first level of non-terminal nodes, and then the next level, and so on until the syntactic object is exhausted.

The process of deriving readings, Katz asserts, must be restrictive in the sense that the interpretation of a sentence is never the every permutation of the lexical readings of its component parts. For instance, suppose the adjective “light” and the noun “book” have N and M senses in their respective lexical readings. If our process for deriving readings were unrestrictive, we would expect “light book” to have N×M senses while, in fact, fewer are available. We can see this even when we restrict ourselves to 2 senses for “light”—“low in physical weight”, and “inconsequential”—and 2 senses for “book”—“a bound collection of paper” and “a work of literature”. Restricting ourselves this much we can see that the “light book” is 2-ways ambiguous, describing a bound collection of papers with a low weight, or a work of literature whose content is inconsequential, and not a work of literature with a low weight or an inconsequential bound collection of papers. Our semantic theory, then, must be such that the compositional process it proposes can appropriately restrict the class of derived readings for a given syntactic object.

To ensure this restrictiveness, Katz proposes that the senses that make up a dictionary entry are each paired with a selectional restriction. To illustrate this, he considers the adjective “handsome” which has three senses: when applied to a person or artifact it has the sense “beautiful with dignity”; when applied applied to an amount, it has the sense “moderately large”; when applied to conduct, it has the sense “gracious or generous”. So, for Katz, the dictionary entry for “handsome” is as in (2.30).

(2.30) "handsome";[+Adj,…];(Physical),(Object),(Beautiful),
                           (Dignified in appearance),
                           <(Human),(Artifact)>
                           (Gracious),(Generous),<(Conduct)>
                           (Moderately large),<(Amount)>

Here the semantic markers in angle brackets represent the markers that must be present in the senses that “handsome” is applied to.

This solution to the problem of selection may seem stipulative and ad hoc—I know it seems that way to me—but recall that this is an early chapter in a book published in 1972. If we compared it to the theories of syntax and phonology of the time, they might appear similarly unsatisfying. The difference between Katz’s theory and syntactic and phonological theories contemporary to Katz’s theory is that syntactic and phonological theories have since developed into more formalized and hopefully explanatory theories through the collaborative effort of many researchers, while Katz’s theory never gained the traction required to spur that level of collaboration.

Katz closes out this section, with a discussion of “semantic redundancy rules” and projection rules. Rather than discuss these, I move on to the final section of the chapter.

6. Preliminary definitions of some semantic properties and relations

Here Katz shows the utility of the theory that he has thus far sketched. That is, he looks at how the semantic properties and relations identified in chapter 1 can be defined in the terms introduced in this chapter. These theoretical definitions are guided by our common sense definitions, but Katz is careful to stress that they are not determined by them. So, for instance, two things are similar when they share some feature(s). Translating this into his theory, Katz gives the definition in (2.33) for semantic similarity.

(2.33) A constituent Ci is semantically similar to a constituent Cj on a sense just in case there is a reading of Ci and a reading of Cj which have a semantic marker in common. (they can be said to semantic similar with respect to the concept φ in case the shared semantic marker represents φ)

Note that we can convert this definition into a scalar notion, so we can talk about degrees of similarity in terms of the number of shared markers. Katz does this implicitly by defining semantic distinctness as sharing no markers and synonymy as sharing all features.

Similarity is a rather simple notion, and therefore has a simple definition; others requires some complexity. For instance, analytic statements like “Liars lie” are vacuous assertions due to the fact that the the meaning of the subject is contained in the meaning of the predicate. Here, Katz gives the definition one might expect, but it is clear that more needs to be said, as the notions of subject and predicate are more difficult to define. More on this in later chapters.

A more puzzling and less often remarked upon semantic relation is antonymy—the relation that holds of the word pairs in (2.46) and of the set of words in (2.47)

(2.46) bride/groom, aunt/uncle, cow/bull, girl/boy, doe/buck

(2.47) child/cub/puppy/kitten/cygnet

Katz notes that although antonymy is generally taken to be merely lexical, it actually projects to larger expressions (e.g., “our beloved old cow”/”our beloved old bull”), and is targeted by words like “either” as demonstrated by the fact that (2.49a) is meaningful while (2.49c) is anomalous.

(2.49)
a. John is well and Mary’s not sick either.
c. John is well and Mary’s not {well/foolish/poor/dead}

In order for antonymy to be given an adequate theoretical definition, then, it must be expressed formally. Katz does this by marking semantic markers that represent antonymy sets with a superscript. For instance, “brother” and “sister” would be represented as (Sibling)(M) and (Sibling)(F), respectively. Again, this is clearly stipulative and ad hoc but that is to be expected at this stage of a theory. In fact, Katz seems to have been revising his theory up to his death, with the colour incompatibility problem—the question of why the sentence “The dot is green and red” is contradictory—occupying a the focus of a 1998 paper of his and a section of his posthumous book. Even Katz’s ad hoc solution to the problem, though, is miles ahead of any solution that could possibly be given in current formal semantics—which is bases its definition of meaning on reference—because, to my knowledge, there is no way to account for antonymy in formal semantics. Indeed, the mere fact, that Katz is able to give any theoretical definition of antonymy, puts his theory well ahead of formal semantics.

Conclusion

Katz’s rough sketch of a semantic theory is already fairly successful in that its able to provide concrete definitions of many of the semantic notions that he identifies in the first chapter.2 I don’t believe this success is due to Katz’s ingenuity, but rather to the fact that he approached theory-building as the central activity in semantic inquiry, rather than an arcane peripheral curiosity. Since the theory building is central, it can occur in tandem with analysis of linguistic intuition.

In the next chapter, Katz responds to criticisms from his contemporaries. I’m not sure how enlightening this is for modern audiences, so I might skip it. We’ll see…


  1. ^ This argument, of course, leads pretty quickly to a classic problem inherent in the notion of abstract objects: the problem of how abstract objects can interact with the physical world. We could, of course, get around this by denying that concepts and propositions are abstract but then we need to explain how two different people could have the same thought at different times, in different places. I’m not sure which is the best choice and I’m not sure that linguistics (or any science) is up to the task of deciding between the two, so I’ll just proceed by going along with Katz’s realistic attitude about abstract objects, with the caveat that it might be wrong—a kind of methodological Platonism.
  2. ^ Katz does not give definitions for presupposition or question-answer pairs here, more on that in later chapters.

On the general character of semantic theory (Part a)

(AKA Katz’s Semantic Theory (Part IIIa). This post discusses chapter 2 of Jerrold Katz’s 1972 opus. For my discussion of chapter 1, go here.)

Having delineated in chapter 1 which questions a semantic theory ought to answer, Katz goes on in chapter 2 to sketch the sort of answer that a such a theory would give. He starts at a very high level, discussing the very notion of natural language and ends up with some of the formal details of the theory that he aims to develop.

Katz begins by reminding the reader that the questions of meaning—questions 1–15 below—are absolute questions. That is, they aren’t meant to be relativized to any particular language.

  1. What are synonymy and paraphrase?
  2. What are semantic similarity and semantic difference?
  3. What is antynomy?
  4. What is superordination?
  5. What are meaningfulness and semantic anomaly?
  6. What is semantic ambiguity?
  7. What is semantic redundancy?
  8. What is semantic truth (analyticity, metalinguistic truth, etc.)?
  9. What is semantic falsehood (contradiction, metalinguistic falsehood, etc.)?
  10. What is semantically undetermined truth or falsehood (e.g., syntheticity)?
  11. What is inconsistency?
  12. What is entailment?
  13. What is presupposition?
  14. What is a possible answer to a question?
  15. What is a self-answered question?

So, asking What is semantic truth in English? is kind of like asking What is a hiccup to a Canadian?. This, Katz acknowledges, makes a strong empirical claim, namely, that every natural language should exhibit the properties whose definitions are requested by questions 1–15.

As a syntactician, this claim made me think about what notions I would include the field of syntax as universal in this sense. Notions like sentence or phrase would certainly be there, and category would likely be there. Would subject, predicate, object, and the like be there? Would modification, or transformation? How about interrogative, declarative, imperative, etc? Notions like word/morpheme, or linear precedence, certainly were included in early versions of syntax, but more recently they tend to either be banished from the theory or dissolved into other notions.

I know of very few syntactitians who ask these questions. Perhaps this is because syntax has decidedly moved beyond the early stage in which Katz found semantics in 1972, but it still behooves us to keep those questions in mind, if only for the purposes of introducing syntax to students. Furthermore, perhaps if we keep these questions in mind, they can serve as a guide for research. Before embarking to answer a research question, the researcher would try to trace that question back to one of the basic questions to judge its likely fruitfulness. I would be curious to see how the papers in, say, LI would fare under such an analysis. But I digress.

Katz continues, asserting that a theory of linguistic meaning must be embedded in a larger theory of natural language, and in order to develop such a theory we must have some sense of what sort of thing a natural language might be. It is this question that occupies the first part of this chapter

1. Theories about the objective reality of language


The first thing Katz does here is distinguish between the two main competing conceptions of language (at least the main conceptions of his day): the traditional rationalist conception of language as “the internalized rules of grammar that constitute the fluency of its native speakers”, and the empiricist conception of language as “a vast stock of sound chunks classifiable into various phonological and syntactic categories” (p12). He opts for rationalism, citing the now familiar arguments against the empiricist stance. First off, we can’t identify a language L with the set S of all actual utterances of L because any competent speaker of L can easily construct an expression that lies outside of S. This is because although practical factors force every expression of a language to be of finite length, there is no theoretical limit to the length of an expression; no matter the length of an expression, there is always a grammatical way of lengthening it.

One could, Katz continues, expand S to be the set of all expressions that a speaker of L could utter without eliciting an odd response from a hearer. However, this amounts to defining L in terms of dispositions of a speech community, namely the dispositions to accept or reject strings of L. In practical reality, though, these dispositions can be wildly inconsistent depending on a variety of psychological and external factors, so if we want a consistent definition we need to clean up our notion of dispositions. Katz does so by “incorporating recursive mechanisms of sentence generation” (p15), or, as they’re more commonly referred to, generative grammars. And once we incorporate generative grammars, we have a rationalist conception of natural language.

Thus far, there’s nothing too surprising. Katz gives us a fairly standard argument in favour of the rationalist conception of language. But this is where Katz’s discussion gets a little strange; this is where he reveals his realist (in the philosophical sense) view of language. It is a mistake, he argues, to identify, say, English with the actual internalized rules in English-speakers’ brains. This would be like “identifying arithmetic with the concrete realizations of the mathematical rules in the heads of those who can compute using positive real numbers” (p16). As evidence for this claim, Katz cites “dead languages” like Sanskrit, which seems to exist (we can make true or false assertions of it) even though its rules are not actualized in any human’s brain the way that Hindi-Urdu’s rules are. Although he doesn’t say it explicitly here, Katz is arguing that languages are abstract entities, like platonic forms. In his own words: “A language is not itself subject to the fate of the mortals who speak it. It is some sort of abstract entity, whatever it is that this means.” (p16)

Katz further defends this view by identifying it with the standard scientific practice of idealization. So a natural languages like, say, Punjabi and a biological species like homo sapiens is an idealization in that they can’t be defined in terms of concrete examples. Similarly the notions of ideal gases, perfect vacuums, and massless strings are the idealizations of physics. He also cites Chomsky’s discussion in Aspects of the “ideal speaker-listener” and Rudolph Carnap who makes a similar observation, that one cannot directly investigate language but must do so by comparison to a constructed language.

Katz’s proposal and argument that languages are abstract entities strikes me as interesting but a bit confused. Katz’s argument from dead languages is compelling, and could perhaps be made even stronger. Consider for instance, reconstructed languages such as Proto Indo-European or Proto Algonquian. At best we know a scant few details about these languages, but we can say with some certainty that they were each spoken by some speech community. Do they exist in the same sense as Sanskrit does? I think the answer has to be yes, as the only difference between a reconstructed language and a dead language seems to be a written record of that language, and that is clearly not the difference between a language and a non-language.

The argument based on idealization, though. seems to be slightly confused. The comparison of a language with a species does seem to be apt, and might point towards his conclusion, but the comparison to ideal gases etc. I think suggests a different notion of idealization, the one that I’ve always taken Chomsky to be using. Under this sense, the idealized objects that scientists employ are not hypothesized to be real, but rather to be useful. I don’t believe even the most realist of scientists believes in the existence of frictionless planes. Scientists use these idealizations to reveal real, but non-apparent aspects of the world. In discussing the ideal speaker-listener, Chomsky was not suggesting that such a person exists, just that we ought to use this idealized person to help reveal a real aspect of the world, namely, the human language faculty.

2. Effability

In the next section Katz espouses what he calls the principle of effability, which he attributes to a number of earlier philosophers (Frege, Searle, and Tarski). The essence of the principle is roughly that if a proposition or thought is expressible in any language, it is expressible in every language. He spends a good chunk of text defending and sharpening his principle, but I’ll set that discussion aside here, and focus on why he proposes this principle. According to Katz, “effability alone offers a satisfactory basis for drawing the distinction between natural languages, on the one hand, and systems of animal communication and artificial languages, on the other” (p22). Despite this bold seeming claim, Katz is rather hesitant regarding his principle. He admits that it is rather inchoate and probably not yet up to any empirical task. But only part of his claim is about the viability of effability, the other claim is that no other property of natural language can distinguish it from other similar systems.

In particular, Katz takes aim at the properties that Chomsky tends to highlight as distinguishing factors for natural language: creativity, stimulus freedom, and appropriateness. Taking these one-by-one, he argues that none of them is unique to natural language. First, he considers creativity which he takes to be the ability of a speaker-listener to produce and understand indefinitely many sentences. This, Katz argues is a property of (a) any artificial language with recursive rules, and (b) certain animal communication systems, specifically bee communication. Next, Katz takes on stimulus freedom, which he argues means freedom from external stimuli, asserting that “[i]t cannot mean freedom from the control of internal stimuli as well.”1 This being the case, says Katz, stimulus freedom doesn’t make sense as a distinction. Also, he asserts that some animal behaviour displays such stimulus freedom. Finally, Katz argues that appropriateness is not part of linguistic competence—that it is extragrammatical, and also that some animal behaviour displays this property.

I take some issue with Katz’s critiques of each of the distinguishing properties individually, but I’ll set that aside for now to highlight a broader issue. Even if we take Katz’s critiques at face value, they still don’t refute Chomsky’s claim, because Chomsky’s Cain isn’t that each of the three properties distinguishes natural language, but that the conjunction of the three is what distinguishes natural language. That is, natural language is distinct from animal communication and artificial language in that it is creative, stimulus-free, and appropriate. So, for instance, even if a bee can produce novel dances, it does so in response to a stimulus. Artificial language might be creative, but it makes little sense to talk about stimulus freedom or appropriateness with respect to them. So Katz’s critiques don’t really have that much force.

At any rate, the principle of effability, while an interesting notion, doesn’t seem to be too crucial for Katz’s theory. The index of the book lists only one reference to effability outside this section. So, on to the next.

3. Competence and Performance

In the final table-setting section of this chapter, Katz takes up and defends Chomsky’s competence/performance distinction. His discussion, though, differs from most that I’ve encountered in that he uses a debate between Chomsky and Gilbert Harman, one of Chomsky and Katz’s empiricist contemporaries. Katz first clears a significant portion of underbrush in this debate in order to get to what he takes to be the crux of the issue: the proposal that linguistic competence consists in the unconscious knowledge of general principles. He summarizes Harman’s issue, which seems to revolve around the notion of grammatical transformations, as follows.

[G]iven that we can say that speakers of a language know that certain sentences are ungrammatical, certain ones ambiguous, certain ones related in certain ways to others, and so on, what licenses us to go further and say that speakers know (tacitly) the linguistic principles whose formalization in the grammar explain the noted ungrammaticality, ambiguity, sentential relations and the like?

(p28)

This challenge, Katz seems to argue, is not based on the empiricist/rationalist debate in epistemology, but rather on the realist/fictionalist argument in the philosophy of science.2 Harman is saying that a transformational grammar is maybe a good model of a speaker-listener of a given language, but it’s just that, a model. Katz responds, with the help of a quote from his erstwhile co-author, Jerry Fodor, that the only sensible conclusion to be drawn from the empirical accuracy of a scientific theory is that the theory is a true description of reality, at least insofar as it is empirically accurate. There is, of course much more to say about this, but I’ll leave it there.

Thus, Katz sets up his conception of language in order to be able to sketch a theory of semantics within a theory of language. In my next post I will take up the details of that sketch.


  1. ^ Katz cites Cartesian Linguistics for Chomsky’s distinguishing factors, and it’s likely that CL doesn’t discuss stimulus-freedom too extensively. In more recent discussion, though, Chomsky does include internal stimuli in the property of stimulus freedom, so, it’s not clear that Katz’s critique here still holds.
  2. ^ I suspect that there is no strong demarcation between epistemology and philosophy of science, but I can’t say with any confidence one way or the other.

Instrumentalism in Linguistics

(Note: Unlike my previous posts, this one is not aimed at a general audience. this one’s for linguists)

As a generative linguist, I like to think of myself as a scientist. Certainly, my field is not as mature and developed as physics, chemistry, and biology, but my fellow linguists and I approach language and its relation to human psychology scientifically. This is crucial to our identity. Sure our universities consider linguistics a member of the humanities, and we often share departments with literary theorists, but we’re scientists!

Because it’s so central to our identity, we’re horribly insecure about our status as scientists. As a result of our desire to be seen as a scientific field, we’ve adopted a particular philosophy of science without even realizing it: Instrumentalism.

But, what is instrumentalism? It’s the belief that the sole, or at least primary, purpose of a scientific theory is its ability to generate and predict the outcome of empirical tests. So, one theory is preferable to another if and only if the former better predicts the data than the latter. A theory’s simplicity, intelligibility, or consistency is at best a secondary consideration. Two theories that have the same empirical value can then be compared according to these standards. Generative linguistics seems to have adopted this philosophy, to its detriment.

What’s wrong with instrumentalism? Nothing per se. It definitely has its place in science. It’s perfectly reasonable for a chemist in a lab to view quantum mechanics as an experiment-generating machine. In fact, it might be an impediment to their work to worry about how intelligible QM is. They would be happy to leave that kind of thinking to the theorists and philosophers while they, the experimenter, used the sanitized mathematical expressions of QM to design and carry out their work.

“Linguistics is a science,” the linguist thinks to themself. “ So, linguists ought to behave like scientists.” Then with a glance at the experimental chemist, the linguist adopts instrumentalism. But, there’s a fallacy in that line of thinking: Instrumentalism being an appropriate attitude for some people in a mature science, like chemistry, does not mean it should be the default attitude for people in a nascent science, like linguistics. In fact, there are good reasons for instrumentalism to be only a marginally acceptable attitude in linguistics. Rather, we should judge our theories on the more humanistic measures of intelligibility, simplicity, and self-consistency in addition to consistency with experience.

What’s wrong with instrumentalism in linguistics?

So why can’t linguists be like the chemist in the lab? Why can’t we read the theory, develop the tests of the theory, and run them? There are a number of reasons. First, as some philosophers of science have argued, It is never the case that a theoretical statement is put to the test by an empirical statement, but rather the former is tested by the latter in light of a suite of background assumptions. So, chemists can count the number of molecules in a sample of gas if they know its pressure, volume, and temperature. How do they know, say, the temperature of the gas sample? They use a thermometer, of course, an instrument they trust by virtue of their background assumptions regarding the how matter, in general, and mercury, in particular, are affected by temperature changes. Lucky for chemists, those assumptions have centuries worth of testing and thinking behind them. No such luck for generative linguists, we’ve only got a few decades of testing and thinking behind our assumptions, which is reflected by how few empirical tools we have and how unreliable they are. Our tests for syntactic constituency are pretty good in a few cases — good enough to provide evidence that syntax traffics in constituency — but they give way too many false positives and negatives. Their unreliability means real syntactic work must develop diagnostics which are more intricate and which carry much more theoretical baggage. If a theory is merely a hypothesis-machine, and the tools for testing those hypotheses depend on the theory, how can we avoid rigging the game in our favour?

Suppose we have two theories, T1 and T2, which are sets of statements regarding an empirical domain D. T1 has been rigorously vetted and found to be internally consistent, simple, and intelligible, and predicts 80% of the facts in D. T2 is rife with inconsistencies, hidden complexities, and opaque concepts, but covers 90% of the facts in D. Which is the better theory? Instrumentalism would suggest T2 is the superior theory due to its empirical coverage. Non-dogmatic people might disagree, but I suspect would all be uncomfortable with instrumentalism as the sole arbiter in this case.

The second problem, which exacerbates the first, is that there’s too much data, and it’s too easy to get even more. This has resulted in subdisciplines being further divided into several niches each devoted to a particular phenomenon or group of languages. Such a narrowing of the empirical domain, coupled with an instrumentalist view of theorizing, has frequently led to the development of competing theories of that domain, theories which are largely impenetrable to those conversant with the general theory but uninitiated with the niche in question. This is a different situation from the one described above. In this situation T1 and T2 might each cover 60% of a subdomain D’, but those 60% are overlapping. Each has a core set of facts that the other cannot, as yet, touch, so the two sides take turns claiming parts of the overlap as their sole territory, and no progress is made.

Often it’s the case that one of the competing specific theories is inconsistent with the general theory, but proponents of the other theory don’t use that fact in their arguments. In their estimation the data always trumps theory, regardless of how inherently theory-laden the description of the data is. It’s as if two factions were fighting each other with swords despite the fact that one side had a cache of rifles and ammunition that they decided not to use.

The third problem, one that has been noted by other theory-minded linguists here and here, is that the line between theoretical and empirical linguistics is blurry. To put it a bit more strongly, what is called “theoretical linguistics” is often empirical linguistics masquerading as theoretical. This assertion becomes clear when we look at the usual structure of a “theoretical syntax” paper in the abstract. First, a grammatical phenomenon is identified and demonstrate. After some discussion of previous work, the author demonstrates the results of some diagnostics and from those results gives a formal analysis of the phenomenon. If we translated this into the language of a mature science it would be indistinguishable from an experimental report. A phenomenon is identified and discussed, the results of some empirical techniques are reported, and an analysis is given.

You might ask “So what? Who cares what empirical syntacticians call themselves?” Well, if you’re a “theoretical syntactician,” then you might propose a modification of syntactic theory to make your empirical analysis work, and other “theoretical syntacticians” will accept those modifications and propose some modifications of their own. It doesn’t take too long in this cycle before the standard theory is rife with inconsistencies, hidden complexities, and opaque concepts. None of that matters, however, if your goal is just to cover the data.

Or, to take another common “theoretical” move, suppose we find an empirical generalization, G (e.g., All languages that allow X also allow Y), the difficult task of the theoretician is to show that G follows from independently motivated theoretical principles. The “theoretician,” on the other hand, has another path available, which is to restate G in “theoretical” terms (e.g., Functional head, H, is responsible for both X and Y), and then (maybe) go looking for some corroboration. Never mind that restating G in different terms does nothing to expand our understanding of why G holds, but understanding is always secondary for instrumentalism.

So, what’s to be done?

Reading this, you might think I don’t value empirical work in linguistics, which is simply not the case. Quite frankly, I am constantly in awe of linguists who can take a horrible mess of data and make even a modicum sense out of it. Empirical work has value, but linguistics has somehow managed to both over- and under-value it. We over-value it by tacitly embracing instrumentalism as our guiding philosophy. We under-value it by giving the title “theoretical linguist” a certain level of prestige. We think empirical work is easier and less-than. This has led us to under-value theoretical work, and view theoretical arguments as just gravy when they’re in our favour, and irrelevancies when they’re against us.

What we should strive for, is an appropriate balance between empirical and theoretical work. To get to that balance we must do the unthinkable and look to the humanities. To develop as a science, we ought to look at mature sciences, not as they are now, but as they developed. Put another way, we need to think historically. If we truly want our theory to explain the human language faculty, we need to accept that we will be explaining it to humans and designing a theory that another human can understand requires us to embrace our non-rational qualities like intuition and imagination.

In sum, we could all use a little humility. Maybe we’ll reach a point when instrumentalism will work for empirical linguistics, but we’re not there yet, and pretending we are won’t make it so.