What does falsification look like anyway?

Vulcan vs Neptune

There’s an argument that plays out every so often in linguistics the goes as follows:

Critic: This data falsifies theory T.
Proponent: Not necessarily, if you consider arguments X,Y, and Z.
Critic: Well, then theory T seems to be unfalsifiable!

This is obviously a specious argument on the part of the critic, since unfalsified does not entail unfalsifiable, but I think it stems from a very understandable frustration—theorists often have an uncanny ability to wriggle free of data that appears to falsify their theories, even though falsificationism is assumed by a large majority of linguists. The problem is that the logic falsificationism, while being quite sound, maybe unimpeachable, turns out to be fiendishly difficult to apply.

At its simplest, the logic of falsificationism says that a theory is scientific insofar as one can construct a basic statementi.e., a statement of fact—that would contradict the theory. This, of course, is an oversimplification of Karl Popper’s idea of Critical Rationalism in a number of ways. For one, falsifiability is not an absolute notion. Rather, we can compare the relative falsifiability of two theories by looking at what Popper calls their empirical content—the number of basic statements that would contradict them. So if a simple theoretical statement P has a particular empirical content, then the conjunction P & Q will have a greater empirical content, and the disjunction P v Q will have a lesser empirical content. This is a useful heuristic when constructing or criticizing a theory internally, and seems like a straightforward guide to testing theories empirically. Historically, though, this is not the case, largely because it is often difficult to recognize when we’ve arrived at and accurately formulated a falsifying fact. In fact, it is often, maybe always, the case that we don’t recognize a falsifying fact as such until after one theory has been superseded by another.

Take for instance the case of the respective orbits of Mercury and Uranus. By the 19th century, Newtonian mechanics had allowed astronomers to make very precise predictions about the rotations of the planets, and based on those predictions, there was a problem: two of the planets were misbehaving. First, it was discovered that Uranus—then the last known planet from the sun—wasn’t showing up where it should have been. Basically, Newton’s mechanics predicted that on such and so day and time Uranus would be in a particular spot in the sky, but the facts were otherwise. Rather than cry “falsification!”, though, the astronomers of the day hypothesized an object on the other side of Uranus that was affecting its orbit. One such astronomer, Urbain Le Verrier was even able to work backwards and predict where that object could be found. So in September of 1846, armed with Le Verrier’s calculations, Johann Gottfried Galle, was able to observe an eighth planet—Neptune. Thus, an apparent falsification became corroboration.

Urbain Le Verrier (1811-1877)
Johann Galle (1812-1910)

I’ve previously written about this story as a vindication of the theory first approach to science. What I didn’t write about, and what is almost never discussed in this context is Le Verrier’s work on the misbehaving orbit of Mercury. Again, armed with Newton’s precise mechanics, Le Verrier calculated the Newtonian prediction for Mercury’s orbit, and again[1]Technically though, Le Verrier’s work on Mercury predated his work on Uranus Mercury didn’t behave as expected. Again, rather than throw out Newtonian mechanics, Le Verrier hypothesized the planet Vulcan between Mercury and the sun, and set about trying to observe it. While many people claimed to observe Vulcan, none of these observations were reliably replicated. Le Verrier was undeterred, though, perhaps because observing a planet that close to the sun was quite tricky. Of course, it would be easy to paint Le Verrier as an eccentric—indeed, his Vulcan hypothesis is somewhat downplayed in his legacy—but he doesn’t seem to have been treated so by his contemporaries. The Vulcan hypothesis wasn’t universally believed, but neither does it seem to be the Flat-Earth theory of its day.

It was only when Einstein used his General Theory of Relativity to accurately calculate Mercury’s orbit, that the scientific community seems to have abandoned the search for Vulcan. Mercury’s orbit is now considered a classical successful test of General Relativity, but why don’t we consider it a refutation of Newtonian Mechanics? Strict falsificationism would seem to dictate that, but then a strict falsificationist would have thrown out Newtonian Mechanics as soon as we noticed Uranus misbehaving. So, falsificationism of this sort leads us to something of a paradox—if a single basic statement contradicts a theory, there’s no way of knowing if there is some second basic statement that, in conjunction with the first, could save the theory.

Still, it’s difficult to toss out falsification entirely, because a theory that doesn’t reflect reality, may be interesting but isn’t scientific.[2]Though sometimes, theories which seem to be empirically idle end up being scientifically important (cf. non-Euclidean geometry) Also, any reasonable person who has ever tried to give an explanation to any phenomenon, probably rejects most of their own ideas rather quickly on empirical bases. We should instead adopt falsificationism as a relative notion—use it when comparing multiple theories. So, Le Verrier was ultimately wrong, but acted reasonably—he had a pretty good theory of mechanics so he worked to reconcile it with some problematic data. Had someone developed General Relativity in Le Verrier’s time, then it would have been unreasonable to insist that a hypothesized planet was a better explanation than an improved theory.

Returning to the hypothetical debate between the Critic and the Proponent, then, I think a reasonable albeit slightly rude response for the proponent would be “Well, do you have a better theory?”

References

References
1 Technically though, Le Verrier’s work on Mercury predated his work on Uranus
2 Though sometimes, theories which seem to be empirically idle end up being scientifically important (cf. non-Euclidean geometry)

Internal unity in science again

Or, how to criticize a scientific theory

Recently, I discovered a book called The Primacy of Grammar by philosopher Nirmalangshu Mukherji. The book is basically an extended, and in my opinion quite good, apologia for biolinguistics as a science. The book is very readable and covers a decent amount of ground, including an entire chapter discussing the viability of incorporating a faculty of music into biolinguistic theory. I highly recommend it.

At one point, while defending biolinguistics from the charge of incompleteness levied by semanticists and philosophers, Mukherji makes the following point.

[D]uring the development of a science, a point comes when our pretheoretical expectations that led to the science in the first place have changed enough, and have been accommodated enough in the science for the science to define its objects in a theory-internal fashion. At this point, the science—viewed as a body of doctrines—becomes complete in carving out some specific aspect of nature. From that point on, only radical changes in the body of theory itself—not pressures from common sense—force further shifting of domains (Mukherji 2001). In the case of grammatical theory, either that point has not been reached or … the point has been reached but not yet recognized.

Mukherji (2010, 122-3)

There are two interesting claims that Mukherji is making about linguistic theory and scientific theory in general. One is that theoretical objects are solely governed by theory-internal considerations. The other is that the theory itself determines what in the external world it applies to.

The first claim reminded me of a meeting I had with my doctoral supervisor while I was writing my thesis. My theoretical explanation rested on the hypothesis that even the simplest of non-function words, like coffee, were decomposable into root objects (√COFFEE) and categorizing heads (n0). I had a dilemma though. It was crucial to my argument that, while categorizing heads had discrete features, roots were treated as featureless blobs by the grammar, but I couldn’t figure out how to justify such a claim. When I expressed my concern to my supervisor, she immediately put my worries to rest. I didn’t need to justify that claim, she pointed out, because roots by their definition have no features.

I had fallen into a very common trap in syntax—I had treated a theory-internal object as an empirical object. Empirical objects can be observed and sensibly argued about. Take, for instance, English specificational clauses (e.g. The winner is Mary). Linguists can and do argue about the nature of these—i.e. whether or they are truly the inverse of predicational clauses (e.g., Mary is the winner)— and cite facts the do so. This is because empirical objects and phenomena are out there in the real world, regardless of whether we study them. Theory-internal objects, on the other hand are not subject to fact-based argument, because, unless the Platonists are right, they have no objective reality. As long as my theory is internally consistent, I can define its objects however I damn please. The true test of any theory is how well it can be mapped onto some aspect of reality.

This brings me to Mukherji’s second assertion, that the empirical domain to a theory is determined by the theory itself. In the context of his book, this assertion is about linguistic meaning. The pretheoretic notion of meaning is what he calls a “thick” notion—a multifaceted concept that is very difficult to pin down. The development of a biolinguistic theory of grammar, though, has led to a thinner notion of meaning, namely, the LF of a given expression. Now obviously, this notion of meaning doesn’t include notions of reference, truth, or felicity, but why should we expect it to? Yes, those notions belong to our common-sense ideas of meaning, but surely at this stage of human history, we should expect that scientific inquiry will reveal our common-sense notions to be flawed.

As an analogy, Aristotle and his contemporaries didn’t distinguish between physics, biology, chemistry, geology, an so on—they were all part of physics. One of the innovations of the scientific revolutions, then, was to narrow the scope of investigation—to develop theories of a sliver of nature. If Aristotle saw our modern physics departments, he might look past all of their fantastic theoretical advances and wonder instead why no one in the department was studying plants and animals. Most critiques of internalist/biolinguistic notions of semantics by modern philosophers and formal semanticists echo this hypothetical time-travelling Aristotle—they brush off any advances and wonder where the theory of truth is.

Taken together, these assertions imply a general principle: Scientific theories should be assessed on their own terms. Criticizing grammatical theory for its lack of a theory of reference makes as much sense as criticizing Special Relativity for its lack of a theory of genetic inheritance. While this may seem to render any theory beyond criticism, the history of science demonstrates that this isn’t the case. Consider, for instance, quantum mechanics, which has been subject to a number of criticisms in its own terms—see: Einstein’s criticisms of QM, Schrödinger’s cat, and the measurement problem. In some cases these criticisms are insurmountable, but in others addressing them head-on and modifying or clarifying the theory is what leads to advances in the theory. Chomsky’s Label Theory, I think, is one of the latter sorts of cases—a theory-internal problem was identified and addressed and as a result two unexplained phenomena (the EPP and the ECP) were given a theoretical explanation. We can debate how well that explanation generalizes and whether it leans too heavily on some auxiliary hypotheses, but what’s important is that a theory-internal addressing of a theory-internal problem opened up the possibility of such an explanation. This may seem wildly counter-intuitive, but as I argued in a previous post, this is the only practical way to do science.

The principle that a theory should be criticized in its own terms is, I think, what irks the majority of linguists about biolinguistic grammatical theory the most. It bothers them because it means that very few of their objections to the theory ever really stick. Ergativity, for instance, is often touted as a serious problem for Abstract Case Theory, but since grammatical theory has nothing to say about particular case alignments, theorists can just say “Yeah, that’s interesting” and move on. Or to take a more extreme case, recent years have seen all out assaults on grammatical theory from people who bizarrely call themselves “cognitive linguists”, people like Vyvyan Evans and Daniel Everett, they claim to have evidence that roundly refutes the very notion of a language faculty. The response of biolinguists to this assault: mostly a resounding shrug as we turn back to our work.

So, critics of biolinguistic grammatical theory dismiss it in a number of way. They say it’s too vague or slippery to be any good as a theory, which usually means they refuse to seriously engage with it, they complain that the theory keeps changing—a peculiar complaint to lodge against a scientific theory, or they accuse theorists of arrogance—a charge that, despite being occasionally true, is not a criticism of the theory. This kind of hostility can be bewildering, especially because a corollary of the idea that a theory defines its own domain is that everything outside that domain is a free-for-all. It’s hard to imagine a geneticist being upset that their data is irrelevant to Special Relativity. I have some ideas about where the hostility comes from but they’ll take me pretty far afield, so I’ll save them for a later post and leave it here.

Play, games, science and bureaucracy

In the titular essay of his 2015 book The Utopia of Rules, David Graeber argues for a distinction between play and games. Play, according to Graeber is free, creative, and open-ended, while games are rigid, repetitive, and closed-off. Play underlies art, science, conversation, and community, while games are the preferred method of bureaucracy. This idea really resonated with me, partially because I’m someone who doesn’t really like games, but also because I think it’s perfectly consonant with something I’ve written about previously: the distinction between theoretical and descriptive science. In this post, I’ll explore that intuition, and argue that theoretical scientific research tends to center play, while descriptive research tends to center games.

The key distinction between games and play, according to Graeber, is rules. While both are leisure activities done for sheer enjoyment, games are defined by their rules. These rules can be rather simple (e.g., checkers), fiendishly complex (e.g., Settlers of Catan), or something in between, but whatever they are, the rules make the game. What’s more, Graeber argues, it’s the rules that make games an enjoyable respite from the ambiguities of real life. At any given point in a game, there are a finite number of possible moves and a fixed objective. If only we had that same certainty when navigating interactions with neighbours, co-workers, and romantic interests!

If games are defined by their rules, then play is defined by it’s lack of rules. The best example—one used by Graeber—is that of children playing. There are no rules to how children play. In fact, as Graeber observes, a good portion of play between children involves negotiating the rules. What’s more, there’s no winning at play, only enjoyment. Play is open-ended—set a group of children to play and there’s no knowing what they’ll come up with.

Yet, play is not random. It follows principles—such as the innate social instincts of children—which are a different sort of thing from the rules that govern games. Rules must be explicit, determinate, and preferably compiled in some centralized place so that, in a well-designed game, disputes can be always be settled by consulting some authority, usually a rule-book. Principles are often implicit—no one teaches kids how to play—can be quite vague—a main principle of improv is “Listen”—and are arguably somehow internal—if there are principles to playing a musical instrument, they come from the laws of physics, the form and material of the instrument, and our own innate sense of melody, harmony, and rhythm.

As this description might suggest, rules and principles, games and play, are often in conflict with each other. Taking a playful activity, and turning it into a game can eliminate some of the enjoyment. Take, for instance, flirtation—a playful activity, for which an anthropologist might be able to discover some principles. People who treat flirtation as a game understandably tend to be judged as creepy. Understandably, because gaming assumes a determinate rules—if I do X in situation Y, then Z will happen—and no-one likes to be treated like a robot. Or, consider figures like Chelsea Manning or Edward Snowden. Each was faced with a conflict between the external rules of an institution and their internal principles, and chose the latter.

This conflict, however, need not be an overall negative. Any art form at any given time follows a number of rules and conventions that, at their best, defines the space in which an artist can play. Eventually, though, the rules and conventions of a given art form or genre become too fixed and end up stifling the playfulness of the artists. I remember my cousin, who was a cinema studies major in undergrad talk about watching Citizen Kane for a class. The students were confused—this is widely lauded as one of the greatest films ever made, but they couldn’t see what was so special. The instructor explained that Citizen Kane was groundbreaking when it came out, it broke all the rules, but it ended up replacing them with new ones. Now those new rules are so commonplace, that they are completely unremarkable. While I don’t think we could develop an entire theory of aesthetics based solely on the balance between games and play, the opposition seems to be active in how we judge art.

But what does this have to do with science? Well, thus far I’ve suggested that games are defined by external rules, while play is guided by internal principles. This contrast lines up quite nicely with Husserl’s definitions descriptive and theoretical sciences respectively. Descriptive sciences are sets of truths grouped together by some externally imposed categorization, while theoretical sciences are sets of truth which have an internal cohesion. If I’m on the right track, then descriptive sciences should share some characteristics with games, while theoretical sciences should share some with play.

Much as games impose rules on their participants, descriptive sciences impose methods on their researchers. Often times they are quite explicit about this. Noam Chomsky, for instance, often says of linguistics education in the mid-20th century, that it was almost exclusively devoted to learning and practicing elicitation procedures (read: methods). The cognitive revolution that Chomsky was at the center of changed this, allowing theory to take center-stage, but we are currently in the midst of a shift back towards method. Graduate students are now expected or even required to take courses in “experimental” or quantitative methods. Job ads for tenure-track positions are rarely simply for a phonologist, or a semanticist, but rather, they invariably ask for experience with quantitative analysis or experimental methods, etc.

The problem with this is that methods in science, like rules in games, serve to fence in possibilities. When you boil it down to its essences, a well run experiment or corpus study is nothing but an attempt to frame and answer a yes-or-no question. What’s more, each method is quite restricted as to what sort of questions it can even answer. Even the presentation of method-driven research tends to be rather rigidly formatted—experimental reports follow the IMRaD format, so do many observational studies, and grammars, the output of field methods, tend to start with the phonetics and end with the syntax/semantics. So when someone says they’re going to perform an eye-tracking study, or some linguistic fieldwork, you can be fairly certain as to what their results will look like, just like you can be certain of what a game of chess will look like.

Contrast this with theoretical work, which tends to start with sometimes horribly broad questions and often ends up somewhere no-one would have expected. So, asking what language is yielded results in computer science, inquiring about the motion of the planets led to a new understanding of tides, and asking about the nature of debt reveals profound truths about human society. No game could have these kinds of results—if you sat down to play Pandemic and ended up robbing a bank, it probably means you read the rules wrong at least. But theory is not like a game, it’s inherently playful.

Now anyone who has read any scientific theory might object to this, as the writing produced by theorists tends to be rather abstract and inaccessible, but writing theory is like retelling an fun conversation—the fun is found in the moment and can never be fully recreated. The playful nature of theory, I think, can be seen in two of the main criticisms leveled at theoretical thinkers by non-theorists: that theoreticians can’t make up their minds and that they just make it up as they go along. These criticisms, however, tend to crop up whenever there is serious theoretical progress being made. In fact, many advances in scientific theories are met with outright hostility by the scientific community (see, atomic theory, relativity, the theory of grammar, etc), likely, i think, because a new theory tends to invalidate a good portion of what the contemporary community spend years, decades, or centuries, getting accustomed to, or worse yet, a theoretical advance might appear to render certain empirical results irrelevant or even meaningless.

Compare this to children playing. If children make up some rules while playing, only a fool would take those to be set in stone. Almost certainly, the children would come up against those rules and decide to toss them by the wayside.

As I mentioned, Graeber discusses games and play as a way of analyzing bureaucracy and our relationship to it. Bureaucracy—be it in government, corporations, or academia—is about creating games that aren’t fun. They are also impersonal power structures, what Hannah Arendt calls “rule by no-one”. And just as games are, bureaucracies are designed to hem in playfulness, because allowing people to be playful might lead them to realize that a better life is possible without those bureaucracies.

Within science, too, we can see bureaucracies being aligned with strictly methodical empirical work and somewhat hostile to theoretical work. We can see this in how the respective researcher organize themselves. Empirical work is done in a lab, which does not just refer to a physical space, but to a hierarchical organization with a PI (primary investigator), supervising and directing post-docs and grad students, who often in turn supervise and direct undergraduate research assistants—a chief executive, middle management, and workers. Theoretical work, on the other hand, is done in a wide array of spontaneously organized affinity groups. So, for instance, neither the Vienna Circle, in philosophy, nor the Bourbaki group, in mathematics, had any particular hierarchical structure and both were quite broad in their interests.

The distinction can even be seen in how theoretical and descriptive sciences interact with time and space. Experimental work must be done in specially designed rooms, sometimes made just for that one experiment, and observational work must be done in the natural habitat of the phenomena to be observed—just as a chess game must be limited to an 8×8 grid. Theoretical work, can be done almost anywhere: in a cafe, a bar, on a train, in a dark basement, or spacious detached house. The less specialized the better. In fact, the only limiting factor is the theorist themself. As for time, nowhere is this clearer than in the timelines given by PhD candidates in their thesis proposal. While not all games are on a clock, all games must account for all of their time—each moment of a game has a purpose. This is what a timeline for a descriptive project looks like: “Next month I’ll travel to place X where I’ll conduct Y hours of interviews. The following month I will organize and code the data…” and so on. It’s impossible to provide such detail in the plan for a theoretical work for several related reasons: The time spent working tends to be unstructured. You never know when inspiration or some kind of moment of clarity will strike. You can’t possibly know what the next step is until you complete the current step. and so on. Certainly, the playful work of theory can sometimes benefit from some structure, but descriptive work, like a game, absolutely depends on structured time and space.

This alignment can also be seen with how theory and method interact with the superstructures of scientific research, that is, the funding apparatuses—granting agencies and corporations. Both sorts of structures are bureaucratic and tend to be structurally opposed to theoretical (read: playful) work. In both cases, funders must evaluate a bunch of proposals and choose to fund those that are most likely to yield a significant result. Suppose you’re a grant evaluator and you have two proposals in front of you: Proposal A is to do linguistic fieldwork on some understudied and endangered language focusing on some possibly interesting aspect of that language, and Proposal B is to attempt to reconcile two seemingly contradictory theoretical frameworks. Assuming each researcher is eminently qualified to carry out their respective plans, which would you fund? Proposal A is all but guaranteed to have some results—they may be underwhelming, but they could be breakthroughs (though this is very unlikely)—a guarantee that’s implicit in the method—It’s always worked before. If Proposal B is successful, it is all but guaranteed to be a major breakthrough, however there is absolutely no guarantee that it will be successful—if the researcher cannot reconcile the two frameworks, then we cannot draw any particular conclusion from it. So which one do you choose? The guarantee, or the conditional guarantee? The conditional guarantee is a gamble, and bureaucrats aren’t supposed to gamble, so we go with the guarantee.

So, bureaucratic funding structures are more inclined to fund methods-based research, that’s fine as far as it goes—theoretical research is dirt cheap, only requiring a nourished mind and some writing materials—but grants aren’t just about the money. Today, grants are used as a metric for research capability. If you can get a lot of grants, then you must be a good researcher. Set aside the fact that virtually any academic will tell you that grant-writing is a particular skill that isn’t directly related to research ability, or that many researchers delegate grant-writing to their post-docs, the logic here is particularly twisted: Granting agencies use past grants as an indication of a good researcher, so do hiring committees. This makes sense—previous success in a process is a good indicator of future success—provided everything stays more or less the same. Thus the grant system and other bureaucratic systems are likely to defend the status quo, by funding descriptive rather than theoretical work.

If my analysis is correct, then the sciences are being held back by the bureaucracies that are supposed to enable them such as university administration and funding agencies. They’re also held back by their own mythology—the “scientific method”—which promises breakthroughs if only they keep playing the game. This should not be too surprising to anyone who considers how bureaucracies hold them back in their day-to-day lives. What’s frustrating about this though, is that academia, more than any sector of modern society, is supposed to be self-organized. University administrators (Deans, Presidents, Provosts, etc.) are supposed to be drawn from the faculty of that university, and funding organizations are supposed to be run by researchers. So, unlike the bureaucracies the demean the poor and outsource jobs, the bureaucracies that stifle academics are self-imposed. The positive side of this is that, if academics wanted to, they could dismantle many of their bureaucracies tomorrow.

Instrumentalism in Linguistics

(Note: Unlike my previous posts, this one is not aimed at a general audience. this one’s for linguists)

As a generative linguist, I like to think of myself as a scientist. Certainly, my field is not as mature and developed as physics, chemistry, and biology, but my fellow linguists and I approach language and its relation to human psychology scientifically. This is crucial to our identity. Sure our universities consider linguistics a member of the humanities, and we often share departments with literary theorists, but we’re scientists!

Because it’s so central to our identity, we’re horribly insecure about our status as scientists. As a result of our desire to be seen as a scientific field, we’ve adopted a particular philosophy of science without even realizing it: Instrumentalism.

But, what is instrumentalism? It’s the belief that the sole, or at least primary, purpose of a scientific theory is its ability to generate and predict the outcome of empirical tests. So, one theory is preferable to another if and only if the former better predicts the data than the latter. A theory’s simplicity, intelligibility, or consistency is at best a secondary consideration. Two theories that have the same empirical value can then be compared according to these standards. Generative linguistics seems to have adopted this philosophy, to its detriment.

What’s wrong with instrumentalism? Nothing per se. It definitely has its place in science. It’s perfectly reasonable for a chemist in a lab to view quantum mechanics as an experiment-generating machine. In fact, it might be an impediment to their work to worry about how intelligible QM is. They would be happy to leave that kind of thinking to the theorists and philosophers while they, the experimenter, used the sanitized mathematical expressions of QM to design and carry out their work.

“Linguistics is a science,” the linguist thinks to themself. “ So, linguists ought to behave like scientists.” Then with a glance at the experimental chemist, the linguist adopts instrumentalism. But, there’s a fallacy in that line of thinking: Instrumentalism being an appropriate attitude for some people in a mature science, like chemistry, does not mean it should be the default attitude for people in a nascent science, like linguistics. In fact, there are good reasons for instrumentalism to be only a marginally acceptable attitude in linguistics. Rather, we should judge our theories on the more humanistic measures of intelligibility, simplicity, and self-consistency in addition to consistency with experience.

What’s wrong with instrumentalism in linguistics?

So why can’t linguists be like the chemist in the lab? Why can’t we read the theory, develop the tests of the theory, and run them? There are a number of reasons. First, as some philosophers of science have argued, It is never the case that a theoretical statement is put to the test by an empirical statement, but rather the former is tested by the latter in light of a suite of background assumptions. So, chemists can count the number of molecules in a sample of gas if they know its pressure, volume, and temperature. How do they know, say, the temperature of the gas sample? They use a thermometer, of course, an instrument they trust by virtue of their background assumptions regarding the how matter, in general, and mercury, in particular, are affected by temperature changes. Lucky for chemists, those assumptions have centuries worth of testing and thinking behind them. No such luck for generative linguists, we’ve only got a few decades of testing and thinking behind our assumptions, which is reflected by how few empirical tools we have and how unreliable they are. Our tests for syntactic constituency are pretty good in a few cases — good enough to provide evidence that syntax traffics in constituency — but they give way too many false positives and negatives. Their unreliability means real syntactic work must develop diagnostics which are more intricate and which carry much more theoretical baggage. If a theory is merely a hypothesis-machine, and the tools for testing those hypotheses depend on the theory, how can we avoid rigging the game in our favour?

Suppose we have two theories, T1 and T2, which are sets of statements regarding an empirical domain D. T1 has been rigorously vetted and found to be internally consistent, simple, and intelligible, and predicts 80% of the facts in D. T2 is rife with inconsistencies, hidden complexities, and opaque concepts, but covers 90% of the facts in D. Which is the better theory? Instrumentalism would suggest T2 is the superior theory due to its empirical coverage. Non-dogmatic people might disagree, but I suspect would all be uncomfortable with instrumentalism as the sole arbiter in this case.

The second problem, which exacerbates the first, is that there’s too much data, and it’s too easy to get even more. This has resulted in subdisciplines being further divided into several niches each devoted to a particular phenomenon or group of languages. Such a narrowing of the empirical domain, coupled with an instrumentalist view of theorizing, has frequently led to the development of competing theories of that domain, theories which are largely impenetrable to those conversant with the general theory but uninitiated with the niche in question. This is a different situation from the one described above. In this situation T1 and T2 might each cover 60% of a subdomain D’, but those 60% are overlapping. Each has a core set of facts that the other cannot, as yet, touch, so the two sides take turns claiming parts of the overlap as their sole territory, and no progress is made.

Often it’s the case that one of the competing specific theories is inconsistent with the general theory, but proponents of the other theory don’t use that fact in their arguments. In their estimation the data always trumps theory, regardless of how inherently theory-laden the description of the data is. It’s as if two factions were fighting each other with swords despite the fact that one side had a cache of rifles and ammunition that they decided not to use.

The third problem, one that has been noted by other theory-minded linguists here and here, is that the line between theoretical and empirical linguistics is blurry. To put it a bit more strongly, what is called “theoretical linguistics” is often empirical linguistics masquerading as theoretical. This assertion becomes clear when we look at the usual structure of a “theoretical syntax” paper in the abstract. First, a grammatical phenomenon is identified and demonstrate. After some discussion of previous work, the author demonstrates the results of some diagnostics and from those results gives a formal analysis of the phenomenon. If we translated this into the language of a mature science it would be indistinguishable from an experimental report. A phenomenon is identified and discussed, the results of some empirical techniques are reported, and an analysis is given.

You might ask “So what? Who cares what empirical syntacticians call themselves?” Well, if you’re a “theoretical syntactician,” then you might propose a modification of syntactic theory to make your empirical analysis work, and other “theoretical syntacticians” will accept those modifications and propose some modifications of their own. It doesn’t take too long in this cycle before the standard theory is rife with inconsistencies, hidden complexities, and opaque concepts. None of that matters, however, if your goal is just to cover the data.

Or, to take another common “theoretical” move, suppose we find an empirical generalization, G (e.g., All languages that allow X also allow Y), the difficult task of the theoretician is to show that G follows from independently motivated theoretical principles. The “theoretician,” on the other hand, has another path available, which is to restate G in “theoretical” terms (e.g., Functional head, H, is responsible for both X and Y), and then (maybe) go looking for some corroboration. Never mind that restating G in different terms does nothing to expand our understanding of why G holds, but understanding is always secondary for instrumentalism.

So, what’s to be done?

Reading this, you might think I don’t value empirical work in linguistics, which is simply not the case. Quite frankly, I am constantly in awe of linguists who can take a horrible mess of data and make even a modicum sense out of it. Empirical work has value, but linguistics has somehow managed to both over- and under-value it. We over-value it by tacitly embracing instrumentalism as our guiding philosophy. We under-value it by giving the title “theoretical linguist” a certain level of prestige. We think empirical work is easier and less-than. This has led us to under-value theoretical work, and view theoretical arguments as just gravy when they’re in our favour, and irrelevancies when they’re against us.

What we should strive for, is an appropriate balance between empirical and theoretical work. To get to that balance we must do the unthinkable and look to the humanities. To develop as a science, we ought to look at mature sciences, not as they are now, but as they developed. Put another way, we need to think historically. If we truly want our theory to explain the human language faculty, we need to accept that we will be explaining it to humans and designing a theory that another human can understand requires us to embrace our non-rational qualities like intuition and imagination.

In sum, we could all use a little humility. Maybe we’ll reach a point when instrumentalism will work for empirical linguistics, but we’re not there yet, and pretending we are won’t make it so.