Piantadosi and MLMs again (I)

Last spring, Steven Piantadosi, professor of psychology and neuroscience, posted a paean to Modern Language Models (MLMs) entitled Modern language models refute Chomsky’s approach to language on LingBuzz. This triggered a wave of responses from linguists, including one from myself, pointing out the many ways that he was wrong. Recently, Prof. Piantadosi attached a postscript to his paper in which he responds to his critics. The responses are so shockingly bad, I felt I had to respond—at least to those that stem from my critiques—which I will do, spaced out across a few short posts.

In my critique, I brought up the problem of impossible languages, as did Moro et al. in their response. In addressing this critique, Prof. Piantadosi surprisingly begins with a brief diatribe against “poverty of the stimulus.” I say surprisingly, not because it’s surprising for an empiricist to mockingly invoke “poverty of stimulus” much in the same way as creationists mockingly ask why there are still apes if we evolved from them, but because poverty of stimulus is completely irrelevant to the problem of impossible languages and neither I nor Moro et al. even use the phrase “poverty of stimulus.”[1]For my part, I didn’t mention it because empiricists are generally quite assiduous in their refusal to understand poverty of stimulus arguments.

This irrelevancy expressed, Prof. Piantadosi moves on to a more on-point discussion. He argues that it would be wrong-headed for the constraints that would make some languages impossible to be encoded in our model from the start. Rather, if we start with an unconstrained model, we can discover the constraints naturally:

If you try to take constraints into account too early, you might have a harder time discovering the key pieces and dynamics, and could create a worse overall solution. For language specifically, what needs to be built in innately to explain the typology will interact in rich and complex ways with what can be learned, and what other pressures (e.g. communicative, social) shape the form of language. If we see a pattern and assume it is innate from the start, we may never discover these other forces because we will, mistakenly, think innateness explained everything

p36 (v6)

This makes a certain intuitive sense. The problem is that it’s refuted both by the history of generative syntax and the history of science more broadly.

In early theories, a constraint like “No mirroring transformations!” would have to be stated explicitly. Current theories, though, are much simpler with most constraints being derivable from the theory rather than tacked onto the theory.

A digression on scholarly responsibility: Your average engineer working on MLMs could be forgiven for not being up on the latest theories in generative syntax, but Piantadosi is an Associate Professor who has chosen to write a critique of generative syntax, so he really ought to know these things. In fact, he would only not know these thing by a conscious choice not to know or laziness.

Furthermore, the natural sciences have progressed thus far in precisely the opposite direction as what Piantadosi prescribes—they have started with highly constrained theories and progress has generally occurred when some constraint is questioned. Copernicus questioned the constraint that Earth stood still, Newton questioned the constraint that all action was local, Friedrich Wöhler questioned the constraint that organic and inorganic substances were inherently distinct.

None of this, of course, means that we couldn’t do science in the way that Piantadosi suggests—I think Feyerabend was correct that there is no singular Scientific Method—but the proof of the pudding is in the eating. Piantadosi is effectively making a promise that if we let MLM research run its course we will find new insights[2]He seems to contradict himself later on when he asserts that the “science” of MLMs may never be intelligible to humans. More on this in a later post. that we could not find had we stuck with the old direction of scientific progress, and he may be right—just as AGI may actually be 5 years away this time—but I’ll believe it when I see it.

After expressing his methodological objections to considering impossible languages, Piantdosi expresses skepticism as to the existence of impossible languages, stating ” More troubling, the idea of “impossible languages” has never actually been empirically justified.” (p37, v6) This is a truly astounding assertion on his part considering both Moro et al. and I explicitly cite experimental studies that arguable provide exactly the empirical justification that Piantadosi claims does not exist. Both studies cited present participants with two types of made-up languages—one which follows and one which violates the rules of language as theorized by generative syntax—and observes their responses as they try to learn the rules of the particular languages. The study I cite (Smith and Tsimpli 1995) compares the behavioural responses of a linguistic savant to those of neurotypical participants, while the studies cited by Moro et al. (Tettamanti et al., 2002; Musso et al., 2003) uses neuro-imaging techniques. Instead Prof. Piantadosi refers to every empiricists favourite straw-man argument—the alleged lack of embedding structures in Pirahã.

This bears repeating. Both Moro et al. and I expressly point to experimental evidence of impossible languages, and Piantadosi’s response is that no one has ever provided evidence of impossible languages.

So, either Prof. Piantadosi commented on mine and Moro et al‘s critiques without reading them, or he read them and deliberately misrepresented them. It is difficult to see how this could be the result of laziness or even willful ignorance rather than dishonesty.

I’ll leave off here, and return to some of Prof. Piantadosi’s responses to my critiques at a later time.


1 For my part, I didn’t mention it because empiricists are generally quite assiduous in their refusal to understand poverty of stimulus arguments.
2 He seems to contradict himself later on when he asserts that the “science” of MLMs may never be intelligible to humans. More on this in a later post.

The Descriptivist Fallacy

A recent hobby-horse of mine—borrowed from Norbert Hornstein—is the idea that the vast majority of what is called “theoretical generative syntax” is not theoretical, but descriptive. The usual response when I assert this seems to be bafflement, but I recently got a different response—one that I wasn’t able to respond to in the moment, so I’m using this post to sort out my thoughts.

The context of this response was that I had hyperbolically expressed anger at the title of one of the special sessions at the upcoming NELS conference—”Experimental Methods In Theoretical Linguistics.” My anger—more accurately described as irritation—was that, since experiment and theory are complementary terms in science, the title of the session was contradictory unless the NELS organizers were misusing the terms. My point, of course, was that the organizers of NELS—one of the most prestigious conferences in the field of generative linguistics—were misusing the terms because the field as a whole has taken to misusing the terms. A colleague, however, objected, saying that generative linguists were a speech community and that it was impossible for a speech community to systematically misuse words of its own language. My colleague was, in effect, accusing me of the worst offense in linguistics—prescriptivism.

This was a jarring rebuttal because, on the one hand, they aren’t wrong, I was being prescriptive. But, on the other hand and contrary to the first thing students are taught about linguistics, a prescriptive approach to language is not always bad. To see this, let’s consider the to basic rationales for descriptivism as an ethos.

The first rationale is purely practical—if we linguists want to understand the facts of language, we must approach them as they are, not as we think they should be. This is nothing more than standard scientific practice.

The second rationale is a moral one, stemming from the observation that language prescription tends to be directed at groups that lack power in society—Black English has historically been treated as “broken”, features of young women’s speech (“up-talk” in the 90s and “vocal fry” in the 2010s) is always policed, rural dialects are mocked. Thus, prescriptivism is seen as a type of oppressive action. Many linguists make it no further in thinking about prescriptivism, unfortunately, but there are many cases in which prescriptivism is not oppressive. Some good instances of prescriptivism—assuming they are done in good faith—are as follows:

  1. criticizing the use of obfuscatory phrases like “officer-involved shooting” by mainstream media
  2. calling out racist and antisemitic dog-whistling by political actors.
  3. discouraging the use of slurs
  4. encouraging inclusive language
  5. recommending that a writer avoid ambiguity
  6. Asking an actor to speak up

Examples 1 and 2 are obviously non-oppressive uses of prescriptivism, as they are directed at powerful actors; 3 and 4 can be acceptable even if not directed at a powerful person, because they attempt to address another oppressive act; and 5 and 6 are useful prescriptions, as they help the addressee to perform their task at hand more effectively.

Now, I’m not going to try to convince you that the field of generative syntax is some powerful institution, nor that the definition of “theory” is an issue of social justice. Here my colleague was correct—members of the field are free to use their terminology as they see fit. My prescription is of the third variety—a helpful suggestion from a member of the field that wants it to advance. So, while my prescription may be wrong, I’m not wrong to offer it.

Using anti-prescriptivism as a defense against critique is not surprising—I’m sure I’ve had that reaction to editorial suggestions on my work. In fact, I’d say it’s a species of a phenomenon common among folks who care about social justice, where folks mistake a formal transgression for a violation of an underlying principle. In this case the formal act of prescription occurred but without any violation of the principle of anti-oppression.

A response to Piantadosi (2023)

(Cross-posted on LingBuzz.)

It is perhaps an axiom of criticism that one should treat the object of criticism on its own terms. Thus, for instance, a photograph should not be criticized for its lack of melody. This axiom makes it difficult to critique a recent paper by Steven Piantadosi—hereafter SP—as it is difficult to determine what its terms are. It is ostensibly the latest installment of the seeming perennial class of papers that argue on the basis of either a new purported breakthrough in so-called AI or an exotic natural language dataset that rationalist theories of grammar are dead wrong, but it actually is a curious mix of criticism of Generative Grammar, promissory notes, and promotion for OpenAI’s proprietary ChatGPT chatbot.

The confusion begins with the title of the paper in (1) which doubles as its thesis statement and contains a category error.

(1) Modern language models refute Chomsky’s approach to language.

To refute something is show that it is false, but approaches do not have truth values. One can refute a claim, a theory, or a hypothesis, and one can show an approach to be ineffective, inefficient, or counterproductive, but one cannot refute an approach. The thesis of the paper under discussion, then, is neither true nor false, and we could be excused for ignoring the paper altogether.

Another axiom of criticism, though, is the principle of charity, which dictates that we present the best possible version of the object of our criticism. To that end we can split up (1) into two theses (2) and (3).

(2) Modern language models refute Chomsky’s theories language.
(3) Modern language models show Chomsky’s approach to language to be obsolete.

It is these theses that I address below.

The general shape of SP’s argument is as follows: (A) Chomsky claims that adult linguistic competence cannot be attained or simulated on the basis of data and statistical analysis alone. (B) The model powering ChatGPT simulates adult linguistic competence on the basis of data and statistical analysis alone. Therefore, (C) The model powering ChatGPT shows Chomsky’s claims to be false. To support his argument, SP presents queries and outputs from ChatGPT and argues that each refutes or approaches a refutation of a specific claim of Chomsky’s—each argument is of the form “Chomsky claims a purely statistical model could never do X, but ChatGPT can do (or can nearly do) X.”

As the hedging in this summary indicates, SP admits there are some phenomena for which ChatGPT does not exhibit human-like behaviour. For instance, when SP prompts the chatbot to generate ten sentences like (4), the program returns ten sentences all of which share the syntactic structure of (4), none of which are wholly meaningless like (4).

(4) Colorless green ideas sleep furiously.

SP explains this as away, writing “[w]e can note a weakness in that it does not as readily generate wholly meaningless sentences …, likely because meaningless language is rare in the training data.” Humans can generate meaningless language, despite the fact that is “rare in the
training data” for us too. The autonomy of syntax, then, is an instance where OpenAI’s language model does not exhibit human-like behaviour. Furthermore, SP notes that current models require massive amounts of data to achieve their results—amounts far outstripping the amount of data available to a child. He also notes that the data is qualitatively different from that available to a child.[1]SP also wrongly implies that the data that informs actual language acquisition consists of child-directed speech. In doing so, he admits that modern language models (MLMs) are not good models of the human language faculty, contradicting one of the premises of his argument.

Though these empirical shortcomings of models like the one powering ChatGPT quite plainly refute (2), we do not even need such evidence to do so, as (2) is self-refuting. It is self-refuting because it does not address theoretical claims that Chomsky or, to my knowledge, any
Generative theoretician has made. Far from claiming that MLMs could never do the things that ChatGPT can do, Chomsky has repeatedly claimed the opposite—that with enough data and computing power, a statistical model would almost certainly outperform any scientific theory in terms of empirical predictions. Indeed, this is the point of one the quotes that SP includes:

You can’t go to a physics conference and say: I’ve got a great theory. It accounts for everything and is so simple it can be captured in two words: “Anything goes.”

All known and unknown laws of nature are accommodated, no failures. Of
course, everything impossible is accommodated also.

Furthermore, Generative theories are about a component of human cognition[2]This is the crux of the I-/E-language distinction that Chomsky often discusses., and nowhere does SP claim that “modern language models” are good models of human cognition. Indeed, this is an extension of the above discussion of the data requirements of MLMs, and logically amounts to a claim that the supposed empirical successes of MLMs are illusory
without biological realism.

So, SP does not show that MLMs refute Chomsky’s theory, but what of his approach to language? Here we can look at the purported successes of MLMs. For instance, SP presents ChatGPT data showing grammatical aux-inversion in English, but provides no explanation as to how it achieves this. Such an explanation though, is at the core of Chomsky’s approach to language. If MLMs do not provide an explanation, then how can they supplant Chomsky’s approach?

The failure of MLMs to supplant Chomsky’s approach can be demonstrating by extending one of SP’s metaphors. According to SP, the approach to science used by MLMs is the same that is used to model and predict hurricanes and pandemics. Let’s assume this is true, it is also true
that meteorological and epidemiological models have at their cores, equations arrived at by theoretical/explanatory work done by physicists and biologists respectively. If MLMs supplant theoretical/explanatory linguistics, then hurricane and pandemic models should supplant physics and biology. No serious person would make this argument about physics or
biology, yet it is fairly standard in linguistics.

Thus far we have been taking SP’s data at face-value, and while there is absolutely no reason to believe that SP has falsified it in any way, there is still a serious problem with it—it is, practically speaking, unreplicable, since we have no access to the model that generated it.
The data in the paper was generated by ChatGPT in early 2023. When it was initially released, ChatGPT worked with the GPT 3.5 model, and has since been migrated to GPT 4—both of which are closed-source. So, while SP adduces ChatGPT data as evidence in favour of the sort of
models that he has developed as his research program, there is no way to know whether ChatGPT uses the same sort of model. Indeed, ChatGPT could be built atop a model based on Generative theories of language for all we know.

Returning to the axiom I started with—that one should criticize something on its own terms—The ultimate weakness of SPs paper, is its failure to follow it. Chomsky’s main critique of MLMs—alluded to in the quote above—is not that they are unable to produce grammatical expressions. It’s that if they were to be trained on data from an impossible language—a language that no human could acquire—they would “learn” that language just as easily as, say, English. One does
not need to look very far to find Chomsky saying exactly this. Take, for instance, the following quote in which Chomsky responds to a request for his critique of current so-called AI systems.[3]Taken from extemporaneous speech. Edited to remove false starts and other disfluencies. Source: https://www.youtube.com/watch?v=PBdZi_JtV4c

There’s two ways in which a system can be deficient. One way is it’s not strong enough—[it] fails to do certain things. The other way is it’s too strong—it does what it shouldn’t do. Well, my own
interests happen to be language and cognition—language specifically. So take GPT. Gary Marcus others have found lots of ways in which the system’s deficient—this system and others—[it] doesn’t do certain things. That can in principle at least be fixed—you add another trillion parameters double the number of terabytes and maybe do better. When a system is too strong it’s unfixable typically and that’s the problem with GPT and the other systems.

So if you give a database to the GPT system which happens to be from an impossible language—one that violates the rules of language—they’ll do just as well—often better because the rules can be simpler. For example one of the fundamental properties of the way language works—there’s good reasons for it—is that the rules the core rules ignore linear order of words—they ignore everything that you hear. They attend only to abstract structures that the mind creates So it’s very easy to construct impossible languages which use very simple procedures involving linear order of words [The] trouble is that’s not language but GPT will do just fine with them. so it’s kind of as if somebody were to propose uh say a revised version of the of the periodic table which included all the elements all the possible elements and all the impossible elements and didn’t make any distinction between them that wouldn’t tell us anything about elements. And if a system works just as well for impossible languages as for possible ones by definition not telling us anything about language. And that’s the way these systems—work it generalizes the other systems too. So the deep problem that concerns me is too much strength. I don’t see any conceivable way to remedy that.

The key notion here is that of an “impossible language” which, though it seems to have an a priori flavour to it, is actually an empirical notion. Generative theory, like every scientific theory, predicts not only what is possible, but what is impossible? For instance, generative theory predicts that linear order is not available to syntax, and therefore no language has grammatical rules based on linear order. SP indirectly addresses this concern:

It’s worth thinking about the standard lines of questioning generative syntax has pursued—things like, why don’t kids ever say “The dog is believed’s owners to be hungry” or “The dog is believed is hungry” […]. The answer provided by large language models is that these are not permitted under the best theory the model finds to explain what it does see. Innate constraints are not needed.

Following this standard empiricist reasoning, there are no impossible languages, only languages which have yet to be seen.[4]Setting aside languages which are logical impossibilities, like a
language which has and lacks determiners.
If all we had to go on was description of actually existing languages, then the empiricist and rationalist accounts would be equally plausible. Luckily for us, we are not limited in this way, we have experimental results that directly support the rationalist accounts—Smith and Tsimpli (1995), for instance, provides evidence that, while we can learn “impossible languages”, we do so in a fundamentally different way than we learn possible languages, with the former treated like puzzles rather than languages.

To summarize, SP purports to show that MLMs refute Chomsky’s approach to language—a logical impossibility. What he does show is that there are multiple aspects adult English competence that ChatGPT is unable to simulate, and the in the cases where ChatGPT was able to mimic an adult English speaker, there is no explanation as to how. Neither of these results are germane to either Chomsky’s approach to language or his theories of language, as Chomsky studies the human capacity for language, which MLMs tell us nothing about. More importantly, SP does not even address Chomsky’s actual critique of MLM qua models of
language competence.


1 SP also wrongly implies that the data that informs actual language acquisition consists of child-directed speech.
2 This is the crux of the I-/E-language distinction that Chomsky often discusses.
3 Taken from extemporaneous speech. Edited to remove false starts and other disfluencies. Source: https://www.youtube.com/watch?v=PBdZi_JtV4c
4 Setting aside languages which are logical impossibilities, like a
language which has and lacks determiners.

The problem with reporting on Bill C-18

About a year ago, Bill C-18—The Online News Act—was introduced into the Canadian House of Commons. On its face C-18 will require online platforms like Google, Facebook and Twitter, to negotiate with Canadian news organizations. The coverage of C-18, at least what I’ve been seeing, has been … weird. Since most Canadian news orgs have a vested interest in the outcome, they haven’t been reliable. Instead, the coverage comes from media critics like Jesse Brown, and Law professor Michael Geist, who are, perhaps, less conflicted about the bill and who have been fairly consistently and sharply critical of it.[1]I’m critical of Brown and Geist here, but not always. Brown has insightful takes on Canadian media more often than most other journalists, while Geist was somewhat heroic when it came to … Continue reading between the two of them they paint a picture of a Postmedia, Torstar, and other media conglomerates using the Liberal government to shake down tech platforms for subsidies, and that, while this shakedown might help the big guys, it will almost certainly harm independent news outlets and ordinary Canadians. Indeed, recent developments seem to have confirmed this story as Google has made moves to block news links from Canadians, with Facebook/Instagram following suit.

But there’s always been something that’s bothered me about these narratives—for all their correct Herman-Chomsky-esque analysis of news media as consisting of huge profit-seeking corporations, they seem to assiduously avoid turning that lens on the tech platforms. Take, for instance, Prof. Geist’s framing of the news that Facebook planned to block news sharing for Canadians:

Rather than calling it what it is—a giant multinational corporation run by a billionaire attempting to extort the duly elected government of Canada with the threat of a capital strike—Geist calls it “the Consequence” of the government doing its job and attempting to regulate a market, which implies that what is happening is simply the laws of nature at work—just as if you throw a ball X m/s at angle Y, it will trace parabola Z in the air, and if you strike a healthy person’s knee just so, they will kick, if you try to regulate a market, it will cease to function. The only agents in the story are the government and the media companies, and they’re playing with forces they are either too stupid to understand or too corrupt to acknowledge. Facebook and Google, or more accurately, their managers, are not agents here, or to the extent that they are agents, they’re good-faith agents trying to provide a service—the shop-owner to the government and big media’s racketeer.

This framing couldn’t be farther from the truth. Not only are Google, Facebook, etc. actors in this dispute, they are often bad actors. Take, for instance, the infamous pivot to video, when Facebook told news and entertainment publishers that, according to The Data, the best way for publishers to drive users to their sites—i.e., to their advertisers—was to make videos instead of written content. Of course it turned out that The Data was bullshit. As Cory Doctorow put it: “Big Tech Isn’t Stealing News Publishers’ Content It’s Stealing Their Money.” Google and Facebook are no innocent grocers being shook down.

Including the Big Tech firms as actors also puts in a new light another concern over C-18 that’s been brought up, usually by Jesse Brown—that Bill C-18 would create a government registry of news media, with only those in the registry benefiting from the ability to bargain with Big Tech. Any sort of state press registry, of course, is at least in tension with the notion of a free press, as the original notion of a free press was in opposition to restrictive press licensing regimes in monarchical societies.

Adding Big Tech into the mix, though, complicates the matter. Google and Facebook are able to credibly extort the governments because they have made themselves seem virtually indispensable to news media—the Big Tech “platforms” are how news gets disseminated. The threat to drop news was credible, for a more sinister reason too: Google and Facebook could actually do it—Google and Facebook know which sites publish news and they are able to shut them out of their platforms. Viewed this way, it’s hard to see the Big Tech “platforms” as anything but a potentially restrictive press registry, but a privately held registry, shielded from even the modicum of transparency and responsibility that an elected government has. Even if C-18 doesn’t require transparency or responsibility, it could serve as a precedent for further regulation of Big Tech.

But to be clear, I’m not here to defend Postmedia, Torstar, or the Liberal Party government of Canada. Big Media, as Herman and Chomsky have argued, consists of a handful of giant profit-seeking corporations, that have no interest in competition, preferring to have an oligopoly, while Justin Trudeau’s government mostly lurches from corruption scandal to corruption scandal, and in between it’s a bog-standard centrist administration, meaning it does virtually nothing for Canadians while saying the right things. I’m fairly sure the only reason they’ve remained in power for this long, is that the main opposition is obviously much worse.[2]There’s a narrative that probably stretches back to a time when the Whigs squared off against the Tories in which the left/liberal party pushes a country’s legislation forward and the … Continue reading

There don’t seem to be any good actors in this story, and that’s what makes it tough to talk about—that and the universality among Serious Commentators. of a particular assumption called capitalist realism, expressed by one of it’s greatest proponents as “There is no alternative”, and by its critics as fact that for most elites “it’s easier to imagine the end of the world than the end of capitalism.” We’re at a crisis-point in media industries. Big Tech and Big Media depend on each other—Big Tech needs media to serve to its users, while Big Media needs platforms to serve its products to the consumers—but they also compete with each other—both industries are funded by a finite pool of advertising money. This is an untenable situation, as capitalist competition means one firm trying to put the other firm out of business, an outcome that, in this case, would mean self-destruction. Serious Commentators will always struggle to properly explain the nature of this crisis, because it’s not the fault of any of the individual actors, but something inherent in capitalism, and under capitalist realism capitalism is like air or water—maybe it’s polluted or corrupted a bit, but the idea that there’s anything per se wrong with it is unimaginable.

There’s another problem with the Big Tech–Big Media relationship that conflicts with capitalist realism—Big Tech is clearly the dominant side, despite the fact that it depends on Big Media.[3]Big Tech arguably needs Big Media more than the other way around. Big Tech, as a player in the news industry, is a creature of the 21st century—Google News came out in 2002, Facebook in 2004, … Continue reading Such a situation is almost unthinkable under capitalist realism, as it’s almost axiomatic that relations of dominance are, in fact, derived from dependence—Capitalists “create jobs”, Landlords “provide housing”, Slave owners “feed, clothe, and shelter” enslaved people. This is why truisms like “you don’t need your job, your job needs you” are so subversive. So the idea that Facebook needs media firms and also can effectively dictate their business practices is nonsense, no matter how much the facts suggest it.

And again, I’m not saying that the coverage should flip, and take the side of Big Media and the Government—there are no good actors here. Rather, coverage should take the side of the people who are likely to be harmed bay any outcome—the actual journalists and the consumers of journalism. Indeed, it’s difficult to have a clear-eyed view of this and similar dust-ups and not adopt the slogan ¡Que se vayan todos! (“They can all go to hell!”). What would such an approach mean? It would mean coverage of that includes the context that Big Tech and Big Media are both a collection of monopolistic profit-seekers, that reminds us that Big Tech keeps committing fraud, that the Liberals promised us good things, including electoral reform, and reneged. This is all too much to hope for, but for a start, it would be nice for Serious Commentators to treat Big Tech as what it is—a cabal of monopolists threatening to punish Canadians for the crime of trying to regulate them.


1 I’m critical of Brown and Geist here, but not always. Brown has insightful takes on Canadian media more often than most other journalists, while Geist was somewhat heroic when it came to copyright and digital privacy in the earlier 21st century. Both seem incapable of seeing Big Tech clearly though, I suspect, having to do with their relations to the cycles of enshittification at Google and Facebook. Maybe I’ll write a separate post about that.
2 There’s a narrative that probably stretches back to a time when the Whigs squared off against the Tories in which the left/liberal party pushes a country’s legislation forward and the right/conservative party resists such moves. The reverse is now true: the right/conservative parties actively enact barbaric and anti-social policies, and the left/liberal parties, despite promises of rolling back said policies, mostly just do nothing when in power.
3 Big Tech arguably needs Big Media more than the other way around. Big Tech, as a player in the news industry, is a creature of the 21st century—Google News came out in 2002, Facebook in 2004, Twitter in 2006—while Big Media goes back much farther, and Big Tech has repeatedly gone out of their way to entice media firms to become more integrated in the tech platforms they control.

How do we get good at using language?

Or: What the hell is a figure of speech anyway?

At a certain level I have the same level of English competence as Katie Crutchfield, Josh Gondelman, and Alexandria Ocasio-Cortez. This may seem boastful to a delusional degree of me, but we’re all native speakers of a North American variety of English of a similar age, and this is the level of competence that linguists tend to care about. Indeed, according to our best theories of language, the four of us are practically indistinguishable.

Of course, outside of providing grammaticality judgements, I wouldn’t place myself anywhere near those three, each of whom could easily be counted among the most skilled users of English living. But what does it mean for people to have varied levels of skill in their language use? And is this even something that linguistic theory should be concerned about?

Linguists, of course, have settled on 5 broad levels of description of a given language

  1. Phonetics
  2. Phonology
  3. Morphology
  4. Syntax
  5. Semantics

It seems quite reasonable to say we can break down language skill along these lines. So, skilled speakers can achieve a desire effect by manipulating their phonetics, say by raising their voices, hitting certain sounds in a particular way, or the like. Likewise, phonological theory can provide decent analyses of rhyme, alliteration, rhythm etc. Skilled users of a language also know when to use (morphologically) simple vs complex words, and which word best conveys the meaning they intend. Maybe a phonetician, phonologist, morphologist, or semanticist, will disagree, but these seem like fairly straightforward to formalize, because they all involve choosing from among a finite set of possibilities—a language only has so many lexical entries to choose from. What does skill mean in the infinite realm of syntax? What does it mean to choose the correct figure of speech? Or even more basically, how does one express any figure of speech in the terms of syntactic theory?

It’s not immediately obvious that there is any way to answer these questions in a generative theory for the simple reason that figures of speech are global properties of expressions, while grammatical theory deals in local interactions between parts of expressions. Take an example from Abraham Lincoln’s second inaugural address:

(1) Fondly do we hope—fervently do we pray—that this mighty scourge of war may speedily pass away.

There are three syntactic processes employed by Lincoln here that I can point out:

(2) Right Node Raising
Fondly do we hope that this mighty scourge of war may speedily pass away, and fervently do we pray that this mighty scourge of war may speedily pass away. -> (1)

(3) Subject-Aux Inversion
Fondly we hope … -> (1)

(4) Adverb fronting
We hope fondly… -> (1)

Each of these represents a choice—conscious or otherwise—that Lincoln made in writing his speech and, while most generative theories allow for choices to be made, they are not at the same levels.

Minimalist theories, for instance, allow for choices at each stage of sentence construction—you can either move constituent, add a constituent, or stop the derivation. Each of (3) and (4) could conceivably be represented as a single choice, but it seems highly unlikely that (2) could. In fact, there is nothing approaching a consensus as to how right node raising is achievable, but it is almost certainly a complex phenomenon. It’s not as if we have a singular operation RNR(X) which changes a mundane sentence into something like (1), yet Lincoln and other writers and orators seem to have it as a tool in their rhetorical toolboxes.

Rhetorical skill of this kind suggest the possibility of a meta-grammatical knowledge, which all speakers of a language have to some extent, and which highly skilled users have in abundance. But what could this meta-grammatical knowledge consist of? Well, if the theoretical representation of a sentence is a derivation, then the theoretical representation of a figure of speech would be a class of derivations. This suggests an ability to abstract over derivations in some way and therefore, it suggests that we are able to acquire not just lexical items, but also abstractions of derivations.

This may seem to contradict the basic idea of Minimalism by suggesting two grammatical systems and indeed, it might be a good career move on my part to declare that the fact of figures of speech disproves the SMT, but I don’t see any contradiction inherent here. In fact, what I’m suggesting here and have argued for elsewhere is something that is a fairly basic observation from computer science and mathematical logic—that the distinction between operations and operands is not that distinct. I am merely suggesting that part of a mature linguistic knowledge is higher-order grammatical functions—functions that operate on other functions and/or yield other functions—and that, since any recursive system is probably able to represent higher-order functions, we should absolutely expect our grammars to allow for them.

Assuming this sort of abstraction is available and responsible for figures of speech, our task as theorists then is to figure out what form the abstraction takes, and how it is acquired, so I can stop comparing myself to Katie Crutchfield, Josh Gondelman, and AOC.

My Top Culture Things of 2022

It’s the end of 2022 and I’ve got nothing else to do, so I thought I’d share some of the works of culture that really made my year (even including things that weren’t made in 2022).

I did one of these before in 2019, but something happened (and continues to happen) and I missed the following two years, so a couple of these might be things I discovered in 2020 or 2021 but continued to really enjoy this year.

The Revolutions Podcast by Mike Duncan

The first episode or Revolutions came out in 2013 the series finale was just released on Christmas day of 2022. I started listening to it this year and managed to go through the entire catalogue. It’s a sprawling look at the revolutionary period that was kicked off by the English civil wars and ended with the Russian Revolution, including the revolutions in the US, Central/South America, Haiti, and Mexico, the several revolutions in France, and the revolutionary uprisings in 1848. I was initially skeptical of the idea of an American podcaster recounting revolutions, fearing it might end up being nothing but simplistic narratives, but I was pleasantly surprised by the nuance and detail that Duncan draws out of these histories. He covers the historical, political, and even ecological factors that shaped revolutions, and draws interesting connections and parallels between seemingly unrelated revolutions. If I had one critique it would be that, while Duncan certainly doesn’t endorse a Great Man theory of history, he does, in my opinion, give fairly short shrift to popular movements that lack a charismatic leader—anarchists in the Russian Revolution, Anti-Federalists in the US, The Diggers in the English civil wars, to name a few. This is certainly an unfair critique stemming from my own biases, and it in no way detracts from my enjoyment of the podcast.

Orwell’s Roses by Rebecca Solnit

“In the spring of 1936, a writer planted roses.” This is the opening line of Orwell’s Roses, and in some ways the puzzle that its author Rebecca Solnit is trying to solve—why would an apparently grim pessimist like Orwell bother with planting something as apparently frivolous as roses? The book is an exploration of both Orwell and roses and a refutation of their reputations as being grim and frivolous respectively. Solnit’s almost stream-of-consciousness style of writing belies the fact that she’s making an argument and backing it up with research and reason. The argument seems to be a perennial one on the left as to what place the non-material welfare of people should matter—should leftists be concerned with beautiful things like roses, or are such concerns ultimately bourgeois? Solnit is decidedly on the side of roses, and argues that Orwell was too.

The book somehow manages to be extremely readable but dense, poetic but journalistic . Definitely worth it.


I’ve been pretty much done with Star Wars for a few years now. I didn’t see The Rise of Skywalker and other than The Mandolorian—which I watched because I was out of things to watch in lockdown and would describe as “fine”—I’ve steered clear of the streaming shows. So when I heard they were making a series about the origin story of the second lead in Rogue One—A film I enjoyed—I thought “wow, they’re really scraping the bottom of the barrel here”, and boy was I wrong! Everything about the show feels fresh, its links to the Star Wars canon are so tenuous that it could almost not even be a Star Wars series, and it definitely has something to say. Even the choice of protagonist—Cassian Andor, the petty thief transformed into a revolutionary—is interesting precisely because, as Alan Sepinwall notes, Cassian might be the least compelling character in the show. But while Sepinwall sees this as a flaw, I can’t help but see it as a secret weapon. Because Cassian doesn’t hog the screen, the secondary and tertiary characters get to have their say and make their perspectives known. Andor, much like The Wire—a comparison already made by David Klion—is ultimately a social drama. It’s much more interested in exploring the links between capitalism, imperialism, colonialism and fascism, and the nuances of resistance and rebellion—the showrunner, Tony Gilroy, apparently listens to the Revolutions podcast—than any individual relationships, though it doesn’t shy away from exploring the personal impacts of the social.

Actual critics have done the show more justice than I can, but one last thing I want to highlight is the score by Nicholas Britell, which has the epic orchestral sweeps that you’d expect but also jarringly centers a wobbly detuned synth for much of the score, highlighting the fact that the world of the show is rather shaky—teetering on the brink of collapse. Again, really not something I expected from a Star Wars franchise.

The Sloppy Boys Podcast/The Blowout

The Sloppy Boys are a comedy party rock band consisting of Jeff Dutton, Mike Hanford, and Tim Kalpakis, all former members of The Birthday Boys sketch group. In 2020, just when COVID hit, they released their third album Paradiso, and without the possibility of touring to promote the album, they decided to start a cocktail podcast. Tale as old as time, really.

The premise of the show is simple: every week, the Boys make a new cocktail—the Trinidad Sour was an early classic—and talk about it. Add to that the fact that these are three good friends and some of the funniest guys on the planet and they legitimately make each other laugh and you’ve got an excellent podcast. They also have a second show The Blowout available to Patreon subscribers—patróns in the parlance of the show—where they talk about whatever they want—best guitar solo, taking a bath, going to the mall, the 80s movie Gremlins, the best Christmas aspect, to name a few. It’s sometimes truly the thinnest of premises but Jeff, Mike, and Tim always manage to make it great!

LIFE ON EARTH by Hurray For The Riff Raff

Hurray For The Riff Raff is the musical project of Alynda Segarra, a singer-songwriter originally from The Bronx, who formed the band when they moved to New Orleans. I first encountered Hurray For The Riff Raff in their 2017 album The Navigator—an album which you should absolutely seek out—and they released LIFE ON EARTH this year. While The Navigator was big and overtly political, LIFE ON EARTH kind of snuck up on me. It’s a smaller sort of album and much earthier than its predecessor—with titles like “WOLVES”, “RHODODENDRON“, “JUPITER’S DANCE” and “ROSEMARY TEARS“—but not devoid of politics—”PRECIOUS CARGO” tells the story of a migrant coming across the US/Mexico border only to be abused by US authorities. I don’t think it got much press, but when I was reviewing the music I’d listened to this year, I realized LIFE ON EARTH had really wormed its way into my rotation as one of my familiar records, even though it’s less than a year old.

Honourable mentions

The Time of Monsters Podcast with Jeet Heer.

Jeet Heer is a Canadian journalist and critic. On his podcast he talks with other commentators about some current topic in the news, politics, or culture.The podcast is also completely unpolished, which very much adds to its charm.

Everything Everywhere All at Once

A wonderfully unique movie amid what’s become the standard fare of Disney-owned IP and other studios trying to emulate/compete with Disney. Any attempt to describe the plot would do it a great disservice, so all I can say is you should watch it if you can.

Roses by Jadea Kelly

I met Jadea in undergrad where she would often perform at our college open mic. She was clearly talented so when she sent out a Kickstarter request to help fund her next album I was happy to throw in a few bucks. Flash forward several years to 2022, when I get notified that her album is complete and a CD is on its way to me. I didn’t know what to expect, but I was completely floored by what I heard—well-written songs with mature poignant lyrics beautifully performed and produced. An early standout and still one of my favourite tracks: “When I Fly”

Dan Padley

A jazz guitarist out of Iowa City IA, I met Dan through the Sloppy Boys discord server. He released an excellent solo EP this year as well as an LP with Jarrett Purdy, both are well worth a listen. He also regularly posts cool guitar covers of whatever songs he feels like on his Instagram and Youtube channel.

De re/De dicto ambiguities and the class struggle

If you follow the news in Ontario, you likely heard that our education workers are demanding an 11.7% wage raise in the current round of bargaining with the provincial government. If, however, you are more actively engaged with this particular story—i.e., you read past the headline, or you read the union’s summary of bargaining proposals—you may have discovered that, actually, the education workers are demanding a flat annual $3.25/hr increase across the board. On the surface, these seem to be two wildly different assertions that can’t both be true. One side must be lying! Strictly speaking, though, neither side is lying, but one side is definitely misinforming.

Consider a version of the headline (1) that supports the government’s line.

(1) Union wants 11.7% raise for Ontario education workers in bargaining proposal.

This sentence is ambiguous. More specifically is shows a de re/de dicto ambiguity. The classic example of such an ambiguity is in (2).

(2) Alex wants to marry a millionaire.

There is one way of interpreting this in which Alex wants to get married and one of his criteria for a spouse is that they be a millionaire. This is the de dicto (lit. “of what is said”) interpretation of (2). The other way of interpreting it is that Alex is deeply in love with a particular person and wants to marry them. It just so happens that Alex’s prospective spouse is a millionaire—a fact which Alex may or may not know. This is the de re (lit. “of the thing”) interpretation of (2). Notice how (2) can describe wildly different realities—for instance, Alex can despise millionaires as a class, but unknowingly want to marry a millionaire.

Turning back to our headline in (1), what are the different readings? The de dicto interpretation is one in which the union representatives sit down at the bargaining table and say something like “We demand an 11.7% raise”. The de re interpretation is one in which the union representatives demanded, say, a flat raise that happens to come out to an 11.7% raise for those workers with the lowest wages when you do the math. The de re interpretation is compatible with the assertions made by the union, so it’s probably the accurate interpretation.

So, (1) is, strictly speaking, not false under one interpretation. It is misinformation, though, because it deliberately introduces a substantive ambiguity in a way that, the alternative headline in (3) does not.

(3) Union wants $3.25/hr raise for Ontario education workers in bargaining proposal

Of course (3) has the de re/de dicto ambiguity—all expressions of desire do—but both interpretations would accurately describe the actual situation. Someone reading the headline (3) would be properly informed regardless of how they interpreted it, while (1) leads some readers to believe a falsehood.

What’s more, I think it’s reasonable to call the headline in (1) deliberate misinformation.

The simplest way to report the union’s bargaining positions would be to simply report it—copy and paste from their official summary. To report the percentage increase as they did, someone had to do the arithmetic to convert absolute terms to relative terms—a simple step, but an extra step nonetheless. Furthermore, to report a single percentage increase, they had to look only at one segment of education workers—the lowest-paid segment. Had they done the calculation on all education workers, they would have come up with a range of percentages, because $3.25 is 11.7% of $27.78, but 8.78% of 37.78, and so on. So, misinforming the public by publishing (1) instead of (3) involved at least two deliberate choices.

It’s worth asking why misinform in this way. A $3.25/hr raise is still substantial and the government could still argue that it’s too high, so why misinform? One reason is that puts workers in the position of explaining that it’s not a bald-faced lie, but it’s misleading, making us seem like pedants. but I think there’s another reason for the government to push the 11.7% figure, it plays into and furthers an anti-union trope that we’re all familiar with.

Bosses always paint organized labour as lazy, greedy, and corrupt—”Union leaders only care about themselves only we bosses care about workers and children.” They especially like to claim that unionized workers, since they enjoy higher wages and better working conditions, don’t care about poor working folks.[1]Indeed there are case in which some union bosses have pursued gains for themselves at the expense of other workers—e.g., construction Unions endorsing the intensely anti-worker Ontario PC Party … Continue reading The $3.25/hr raise demand, however, reveals these tropes as lies.

For various reasons, different jobs, even within a single union, have unequal wages. These inequalities can be used as a wedge to keep workers fighting amongst themselves rather than together against their bosses. Proportional wage increases maintain and entrench those inequalities—if everyone gets a 5% bump, the gap between the top and bottom stays effectively the same. Absolute wage increases, however, shrink those inequalities. Taking the example from above a $37.78/hr worker makes 1.33x the $27.78/hr worker, but after a $3.25/hr raise for both the gap narrow slightly to 1.29x, and continues to do so. So, contrary to the common trope, union actions show solidarity rather than greed.[2]Similar remarks can be made about job actions, which are often taken as proof that workers are inherently lazy. On the contrary, strikes are physically and emotionally grueling and rarely taken on … Continue reading

So what’s the takeaway here? It’s frankly unreasonable to expect ordinary readers to do a formal semantic analysis of their news, though journalists could stand to be a bit less credulous of claims like (1). My takeaway is that this is just more evidence of my personal maxim that people in positions of power lie and mislead whenever it suits them as long as no one questions them. Also, maybe J-schools should have required Linguistics training.


1 Indeed there are case in which some union bosses have pursued gains for themselves at the expense of other workers—e.g., construction Unions endorsing the intensely anti-worker Ontario PC Party because they love building pointless highways and sprawling suburbs
2 Similar remarks can be made about job actions, which are often taken as proof that workers are inherently lazy. On the contrary, strikes are physically and emotionally grueling and rarely taken on lightly

Why are there no Cartesian products in grammar?

This post, I think, doesn’t rise above the level of “musings.” I think there’s something here, but I’m not sure if I can articulate it properly.

An adequate scientific theory is one in which facts about nature are reflected in facts about the theory. Every entity in the theory should have an analogue in nature, relations in the theory should be found in nature, and simple things in the theory should be ubiquitous in nature. This last concern is at the core of minimalist worries about movement—early theories saw movement as complex and had to explain its ubiquity, while later theories see it as simple and have to explain the constraints on it. But my concern here is not minimalist theories of syntax, but model-theoretic semantics.

Model theories of semantics often use set-theory as their formal systems,[1]Yes, I know that there are many other types of model theories put forth so if they are adequate, then ubiquitous semantic phenomena should be simply expressible in set theory, and simple set-theoretic notions should be ubiquitous in semantics. For the most part this seems to be the case—you can do a lot of semantics with membership, subset, intersection, etc.—but obviously it’s not perfect. One point of mismatch is the notion of the Cartesian product (X × Y = {⟨x, y⟩ | xX, yY }) a very straightforward notion in set-theory, but one that does not have a neat analogue in language.

What do I mean by this? Well, consider the set-theoretic statement in (1) and its natural language translation in (2).

(1) P × P ⊆ R

(2) Photographers respect themselves and each other.

What set-theory expresses in a simple statement, language does in a compound one. Or consider (3) and (4) which invert the situation

(3) (P × P) − {⟨p, p⟩ | p ∈ P} ⊆ R

(4) Photographers respect each other.

The natural language expression has gotten simpler at the expense of its set-theoretic translation. This strikes me as a problem.

If natural language semantics is best expressed as set theory (or something similar), why isn’t there a simple bound expression like each-selves with the denotation in (5)?

(5) λX.λY (Y × Y ⊆ X)

What’s more, this doesn’t seem to be a quirk of English. When I first noticed this gap, I asked some native non-English speakers—I got data from Spanish, French (Canadian and Metropolitan), Dutch, Italian, Cantonese, Mandarin, Persian, Italian, Korean, Japanese, Hungarian, Kurdish, Tagalog, Western Armenian, and Russian[2]I’d be happy to get more data if you have it. You can email me, put it in the comments, or fill out this brief questionnaire.—and got fairly consistent results. Occasionally there was ambiguity between plural reflexives and reciprocals—French se, for instance, seemed to be ambiguous—but none of the languages had an each-selves.

My suspicion—i.e. my half-formed hypothesis—is that the “meanings” of reflexives and reciprocals are entirely syntactic. We don’t interpret themselves or each other as expressions of set-theory or whatever. Rather, sentences with reflexives and reciprocals are inherently incomplete, and the particular reflexive or reciprocals tells the hearer how to complete it—themselves says “derive a sentence for each member of the subject where that member is also the object”, while each other says “for each member of the subject, derive a set of sentences where each object is one of the other members of the subject.” Setting aside the fact that this, even to me, proposal is mostly nonsense, it still predicts that there should be an each selves. Perhaps making it sensible, would fix this issue, or vice versa. Or maybe it is just nonsense, but plenty of theories started as nonsense.


1 Yes, I know that there are many other types of model theories put forth
2 I’d be happy to get more data if you have it. You can email me, put it in the comments, or fill out this brief questionnaire.

Some good news on the publication front

Today I woke up to an email from the editor of Biolinguistics informing me that my manuscript “A parallel derivation theory of adjuncts” had been accepted for publication. I was quite relieved, especially since I had been expecting some news about my submission for a couple of days—the ability to monitor the progress of submissions on a journal’s website is a decidedly mixed blessing—and there was a definite possibility in my mind that it could have been rejected.

It was also a relief because it’s been a long road with this paper. I first wrote about the kernel of its central idea—that syntactic adjuncts were entirely separate objects from their “hosts”—in my thesis, and I presented it a couple of times within the University of Toronto Linguistics Department a few times. I first realized that it had some legs when it was accepted as a talk at the 2020 LSA Meeting in New Orleans, and I started working on it in earnest in the spring and summer of 2020, submitting the first manuscript version to a different journal in August 2020.

If you follow me on Twitter, you saw my reactions to the peer-review process in real time, but it’s worth summarizing. Versions of this manuscript underwent peer-review at multiple journals and in every case there were one or two constructive reviews—some positive reviews, and some negative reviews that nevertheless pointed out serious but fixable issues—but invariably there was one reviewer who was clearly hostile to the manuscript—there was often sarcasm and vague comments.

I’m sure the manuscript improved over the various submissions, but I believe that the main reason that the paper will finally be published is because the editor of Biolinguistics, Kleanthes Grohmann, recognized and agreed with me that one of the reviewers was being unreasonable, so I definitely owe him my gratitude.

There’s more edits to go, but you can look forward to seeing my paper in Biolinguistics in the near future.

Why are some ideas so sticky? A hypothesis

Anyone who has tried to articulate a new idea or criticize old ones may have noticed that some ideas are washed away relatively easily, while others seem to actively resist even the strongest challenges—some ideas are stickier than others. In some cases, there’s an obvious reason for this stickiness—in some cases there’s even a good reason for it. Some ideas are sticky because they’ve never really been interrogated. Some are sticky because there are powerful parts of society that depend on them. Some are sticky because they’re true, or close to true. But I’ve started to think there’s another reason an idea can be sticky—the amount of mental effort people put into understanding the idea as students.

Take, for instance, X-bar theory. I don’t think there’s some powerful cabal propping it up, it’s not old enough to just be taken for granted, and Chomsky’s Problems of Projection papers showed that it was not really tenable. Yet X-bar persists. Not just in how syntacticians draw trees, or how they informally talk about them, but I remember commentary on my definition of minimal search here involved puzzlement about why I didn’t simply formalize the idea that specifiers were invisible to search followed by more puzzlement when I explained that the notion of specifier was unformulable.

In my experience, the stickiness of X-bar theory—and syntactic projection/labels more broadly—doesn’t manifest itself in an attempt to rebut arguments against it, but in attempts to save it—to reconstitute it in a theory that doesn’t include it.[1]My reading of Zeijstra’s chapter in this volume is as one such attempt This is very strange behaviour—X-bar is a theoretical construct, it’s valid insofar as it is coherent and empirically useful. Why are syntacticians fighting for it? I wondered about this for a while and then I remembered my experience learning X-bar and teaching it—it’s a real challenge. It’s probably the first challenging theoretical construct that syntax students are exposed to. It tends to be presented as a fait accompli, so students just have to learn how it functions. As a result, those students who do manage to figure it out are proud of it and defend it like someone protecting their cherished possessions.[2]I think I may be describing “effort justification,” but I’m basing this just on the Wikipedia article

Of course, it’s a bit dangerous to speculate about the psychological motivations of others, but I’m certain I’ve had this reaction in the past when someone’s challenged an idea that I at one point struggled to learn. And I’ve heard students complain about the fact that every successive level of learning syntax starts with “everything you learned last year is wrong”—or at least that’s the sense they get. So, I have a feeling there’s at least a kernel of truth to my hypothesis. Now, how do I go about testing it?


As I was writing this, I remembered something I frequently think when I’m preparing tests and exams that I’ve thus far only formulated as a somewhat snarky question:

How much of our current linguistic theory depends on how well it lends itself to constructing problem sets and exam questions?


1 My reading of Zeijstra’s chapter in this volume is as one such attempt
2 I think I may be describing “effort justification,” but I’m basing this just on the Wikipedia article