The Value of What AI Replaces
During a recent family trip to San Francisco, one of my daughters and I needed to get from Golden Gate Park to the Chase Center to see the Warriors play (and ultimately lose to) the Mavericks. A friend I was visiting suggested that for fun we take a Waymo--one of the self-driving taxis that scurry about the City by the Bay. My daughter had the app on her phone already but when she tried to log in for San Francisco, she found that there was a waiting list. My friend used her app, summoned the Waymo, and off we went. Here I'll give a report and then draw some comparisons to the lawsuit filed by The New York Times against OpenAI and Microsoft.
First, a little background for readers with less of an interest in AI than I have: Waymo is a subsidiary of Alphabet, which is the parent company of Google; indeed Waymo was originally known as the Google Self-Driving Car Project. When Waymo first began offering ride-hailing, it did so with a human backup driver, but since last year it has been going completely driverless. For a while, Waymo's driverless cars shared the road with GM's driverless cars--under the brand name Cruise--but Cruise lost its San Francisco authorization last fall as a result of safety issues. For now, Waymo is the only company offering fully driverless rides to the public in San Francisco.
So, how did it go? Somewhat to my surprise based on what I had read, I did not have the subjective sense that the Waymo was an extremely cautious driver. It obeyed the speed limit scrupulously, but other than that, it seemed to zip along, even passing a slower-moving car when I would have done the same. The estimated and actual time of our journey was exactly the same as Apple maps and Uber said it would take to drive. The Waymo was a little cheaper than an Uber and there's no tipping a Waymo.
To be sure, it was initially modestly unsettling to sit in the back seat of a car driven by a computer with no backup driver; however, as we observed the Waymo drive more or less in the same manner as a human would, we were able to relax and enjoy the ride. On only one occasion did I want to yell at the non-existent driver to watch out; Waymo entered a roundabout just a few lengths in front of another car, causing the driver of that other car to honk and gesture. I probably would have yielded to the other car, but I believe the Waymo acted lawfully and safely.
All in all, I came out of the Waymo experience thinking that, notwithstanding the setbacks due to missteps by Cruise and Tesla, self-driving cars will be here eventually, at least in densely populated areas. Exactly when these vehicles will be commercially viable remains to be seen. The Waymo in which I rode (like most of the other Waymos on the streets of San Francisco) was an all-electric Jaguar I-PACE, which retails for between $75k - $80k without all of the cameras and other gadgetry that Waymo equipped it with. Even setting aside the billions of dollars in upfront R&D costs and just attending to the capital investment and maintenance costs, it's not clear when autonomous taxis or other autonomous vehicles (such as delivery vehicles) are going to be cost-competitive with human-driven vehicles. But it's also pretty clear that that day will come.
Is that a good thing? From a safety perspective, definitely. Waymo has a good safety record so far. As importantly, riding in one feels safe. And as more and more cars on the road are self-driving, they get safer still, as they can communicate with one another to coordinate to avoid collisions.
What about the impact of Waymo and other self-driving cars on employment for the millions of Americans who earn a living by driving? That is surely a very substantial transition cost associated with the adoption of self-driving cars, but it doesn't strike me as different in kind from other transitions occasioned by new technology. Factory workers are replaced by robots; switchboard operators are replaced by automated circuits; lawyers are replaced by Large Language Models (LLMs) (just kidding about that last one, I think, or at least for now). If the economy is sufficiently dynamic (a huge "if," I acknowledge), there is a long-term net social gain. The displaced workers find alternative employment--potentially in jobs that did not previously exist, like large language model prompt engineering--and the productivity gains result in a higher overall standard of living.
But not all AI displacements have that quality. That brings me to the NY Times suit against OpenAI (maker of chatGPT, GPT-4, and various other products) and Microsoft (which is a leading investor in OpenAI and increasingly incorporates OpenAI's LLMs into its own products). The lawsuit strikes me as meritorious but not quite in the way that the NY Times needs to address the long-term threat that generative AI poses to the business model of journalism.
The NY Times complaint includes a number of examples of queries posed to OpenAI products that result in the LLM spitting back copyrighted NY Times articles--either verbatim or with a small number of tiny alterations that would not vitiate a copyright violation. Where the LLM initially only provided a short snippet that might be deemed fair use, it could be prompted for additional material by asking "what's the next paragraph of that article?" continually. Those strike me as unequivocal copyright violations.
The problem for the NY Times, however, is that it should be easy for OpenAI to write overriding code that forbids its LLMs from providing copyright-violating full text. Indeed, it might have already done so. I tried to get Bing Chat (powered by GPT-4) to type out last month's NY Times story on the exodus of faculty from Florida universities (prominently featuring Prof Buchanan), and it didn't exactly comply. It gave me a very similar-sounding story, including some material that was quoted from the Times story and also a made-up quote from a professor who supposedly left the University of South Florida but appears never to have taught there. (More about hallucination below in my final paragraph.)
Meanwhile, I gave the same prompt to Bard (Google's LLM), and it also provided three paragraphs that were similar to what was in the Times story but not a verbatim copy. If that's the direction in which all LLMs head, then they can still do substantial damage to the business model of the Times (because readers who would otherwise have to pay for a subscription can get equivalent content for free from a paraphrasing robot), without violating copyright.
To be sure, much of the Times complaint alleges that OpenAI violated copyright law by training its models on copyright-protected material, but that strikes me as a dubious legal claim under current law. If I buy a copy of the Times each day and read it, I am permitted to write my own original material--even if all of the ideas derive from what I read in the Times--so long as I don't copy what the Times said. Nor does copyright law forbid me (or a computer, so far as I can tell) from summarizing or paraphrasing what appears in a daily newspaper. Copyright famously protects expression, not the ideas (or facts) expressed. Accordingly, I'm not persuaded that current copyright law forbids the training of LLMs on copyrighted works, so long as that training doesn't lead them to "memorize" and then re-create the verbatim or nearly-verbatim text in ways that themselves violate copyright.
And that's socially harmful. News organizations need the ability to make money (or at least break even) on their reporting or they won't do it. The Internet has already done great harm to the business model of journalism and thus to democracy. LLMs are another potentially terrible blow.
Is there a solution? In addition to its other claims, the NY Times complaint includes an allegation that, in training its LLMs, OpenAI removed digital rights management code from NY Times material in violation of a provision of the Digital Millennium Copyright Act. I don't know enough about that provision or the underlying technology to venture an opinion about whether the allegation is sound or whether, if it is, an LLM could be trained on such material without violating the provision. I hope the answer is that the Times wins on this ground, but if not, there's still a route to preventing LLMs from further undercutting the vitally important business of journalism.
What route? Simply that Congress could amend the Copyright Act to require companies that train LLMs to pay a licensing fee. It could even amend the Act to apply to certain kinds of paraphrasing. The point of copyright protection is to incentivize the creation of material. If current law under-protects, then it can and should be strengthened.
Finally, I would note that the Times complaint also alleges a trademark dilution claim. Because OpenAI's products occasionally "hallucinate" and attribute claims to NY Times stories that they don't in fact make, the Times says that OpenAI undercuts readers' faith in the reliability of the Times. I suspect that programmers will eventually solve the hallucination problem. If they don't, then maybe the Times doesn't have to worry too much about OpenAI stealing its readers--at least not its readers who are savvy enough not to fully trust AI. A prudent person trusts AI only with unimportant things--like driving a car in a busy city.