NotebookLM's Podcast Feature Is Not Ready for Prime Time But Already Pretty Entertaining

Google recently released a free tool called NotebookLM that enables users to upload numerous documents and then chat with an AI about the content. It's not clear to me that this is very different from other AI tools that enable users to interrogate materials they upload but it has one huge advantage: Google does not use the files users upload to train the model, so one can take advantage of the tool without worrying about giving away one's content.

On the latest episode of the podcast Hard Fork, hosts Casey Newton and Kevin Roose (yes, the NY Times reporter whom chatGPT famously tried to convince to leave his wife) discuss NotebookLM with Steven Johnson, who works at Google Labs and helped create NotebookLM. From their discussion, I inferred that NotebookLM could be a useful tool for scholars, journalists, and others trying to organize information from many sources.

One feature of NotebookLM especially intrigued Newton and Roose: its ability to create podcasts with human-sounding hosts discussing whatever one uploads to a Notebook. Roose demo'd the feature giving it a variety of content, including, in one of the demos, just his most recent credit card statement, whereupon the AI hosts bantered about Roose's seemingly excessive use of Uber. Fun, right?

Yes, fun indeed. Google has done great work to make the AI hosts sound realistic and engaging. What it hasn't done (yet) is make them into experts. How do I know? I tried an experiment. I asked the podcast feature to make a podcast based on my last eight Verdict columns. You can listen to the result here. It's just over nine minutes in length.

The podcast based on my Verdict columns is engaging. It's also flattering, indeed, a bit over the top. I'd advise Google in its next release to include a sycophancy dial and set the default to 5 rather than where it is now, at 11. But the bigger problem is the content. The AI podcasters get some stuff right. For example, they (yes, I know I'm anthropomorphizing) get the Supremacy Clause right.

But they get other stuff pretty badly wrong. In two places they talk about how a future President could restrict abortion "without overruling Roe v. Wade," which is absolutely not what I said and makes no sense, given that the columns were written and published in 2024, well after that had already happened.

In addition, the AI discovers connections that aren't really there. It describes the theme running through the eight columns as a concern that what the law or some set of rules calls something can have important practical consequences. That's certainly a fair characterization of one of the eight columns--in which I advised university administrators not to formulate "expressive activities policies" with respect to the time, place, and manner of speech but instead to conceptualize such policies as the application of policies relating to all activities to those that happen to be expressive.

The AI podcasters also squeezed my column on the SCOTUS bump stock case into the naming theme, which seems superficially plausible, given that the case turned on what counts as a machine gun, but upon examination, the account makes little sense. For one thing, the case itself, not my analysis, turned on whether to call a bump-stock-equipped semiautomatic rifle a machine gun. For another, I made clear in that column that I thought the dissenters had the better argument--and the dissenters were taking a functional view of what should count as a machine gun as against the majority's more formalistic view. Under a functional view, precise wording is less important than policy goals, which is more or less the opposite of the lesson that the AI podcasters drew from my campus speech column. (To be clear, I wasn't inconsistent. I didn't claim in the campus speech column that how one describes some policy is always important.)

There are other minor flaws in the AI podcasters' efforts. Overall, it sounded like a conversation among people who read my columns, lacked legal training, were reasonably smart, and got about half of what I was saying but didn't really follow a number of key points. Based on this experience, under no circumstances would I entrust any serious research task to NotebookLM, although I could imagine using it as a starting point, so long as I or some other human were to thoroughly check its work.

At least so far as voices go, we have passed out of the uncanny valley. The AI podcasters sound like real people. I find it a little odd that Google has conquered that challenge--which I would have thought more difficult--than the challenge of avoiding obvious errors that should be avoidable with enough data, like realizing that Roe v. Wade has already been overruled.

To be clear, this isn't just a glitch in the podcast. As a second test, I used NotebookLM in chat mode, again based on the same Notebook consisting of those eight most recent Verdict columns. I asked the following question: "Does Dorf worry that a future Trump administration might restrict abortion even without overturning Roe v. Wade?"

A human interlocutor who had read my eight articles and was even modestly aware of current events would have said something like this: "Dorf does worry that a future Trump administration might restrict abortion, but your question incorrectly assumes that Roe v. Wade has not already been overturned. It was overturned in 2022." Here's how NotebookLM actually responded:

Yes, Dorf expresses concern that a future Trump administration could significantly restrict abortion access even without overturning Roe v. Wade. While he doesn't explicitly state this worry, he highlights potential avenues for Trump to achieve this:
● Exploiting Executive Power: Dorf points out that despite Trump's current stance of leaving abortion to states, there's no guarantee he wouldn't change his position if it served his interests, especially as a second-term president less concerned about reelection. Dorf argues that Trump could exploit executive power to undermine abortion rights, even without new legislation:
● Revoking Mifepristone Approval: He suggests a Trump-appointed FDA head could revoke the approval of mifepristone, the abortion pill, despite the Supreme Court dismissing a challenge to its approval on standing grounds.
● Enforcing the Comstock Act: Dorf also highlights the possibility of a second Trump administration enforcing the Comstock Act to restrict the distribution of abortion pills by mail, even in states where abortion is legal. This strategy was suggested by some on the right after a Trump-appointed judge invoked the Act in a mifepristone ruling.
● Congressional Action is Unlikely to Help: Dorf believes that a federal abortion ban is unlikely unless Republicans control both Congress and the presidency. He sees little hope for Congressional action to protect abortion rights, even with a Democratic majority, due to potential roadblocks like the filibuster.
While acknowledging the limitations of state power against federal overreach, Dorf emphasizes that protecting rights requires not just legal frameworks but also elected officials who genuinely value those rights. He suggests that the future of abortion rights hinges significantly on the outcome of presidential and congressional elections.

That's an excellent summary of one of my columns, but the preface--repeating the error I intentionally embedded in my question--reflects a certain, dare I say, "robotic" quality.

I thus continue to have doubts about the ability of any LLM to achieve AGI (artificial general intelligence). I would not be surprised if the LLM approach reaches a limit in the next few years. I have no reason to doubt that AGI is possible; I just think there are grounds to doubt that the LLM architecture is the way to achieve it.

Search This Blog

Dorf on Law

NotebookLM's Podcast Feature Is Not Ready for Prime Time But Already Pretty Entertaining

Get new posts by email: