When generative AI products started rolling out to the general public last year, it kicked off a frenzy of excitement and fear.
People were amazed at the images and words these tools could create from just a single text prompt. Silicon Valley salivated over the prospect of a transformative new technology, one that it could make a lot of money off of after years of stagnation and the flops of crypto and the metaverse. And then there were the concerns about what the world would be after generative AI transformed it. Millions of jobs could be lost. It might become impossible to tell what was real or what was made by a computer. And if you want to get really dramatic about it, the end of humanity may be near. We glorified and dreaded the incredible potential this technology had.
Several months later, the bloom is coming off the AI-generated rose. Governments are ramping up efforts to regulate the technology, creators are suing over alleged intellectual property and copyright violations, people are balking at the privacy invasions (both real and perceived) that these products enable, and there are plenty of reasons to question how accurate AI-powered chatbots really are and how much people should depend on them.
Assuming, that is, they’re still using them. Recent reports suggest that consumers are starting to lose interest: The new AI-powered Bing search hasn’t made a dent in Google’s market share, ChatGPT is losing users for the first time, and the bots are still prone to basic errors that make them impossible to trust. In some cases, they may be even less accurate now than they were before. Is the party over for this party trick?
Generative AI is a powerful technology that isn’t going anywhere anytime soon, and the chatbots built with this new technology are one of the most accessible tools for consumers, who can directly access and try them out for themselves. But recent reports suggest that, as the initial burst of excitement and curiosity fades, people may not be as into chatbots as many expected.
OpenAI and its ChatGPT chatbot quickly took the lead as the buzziest generative AI company and tool out there, no doubt helped along by being one of the first companies to release tools to the general public, as well as a partnership with Microsoft worth billions of dollars. That partnership led to Microsoft’s big February announcement about how it was incorporating a custom chatbot built with OpenAI’s large language model (LLM) — this is also what powers ChatGPT — into Bing, its web search engine. Microsoft hailed generative AI-infused search as the future of web search. Instead of getting a bunch of links or knowledge windows back, this new AI chatbot would combine information from multiple websites into one response.
There was plenty of hype, and Bing suddenly went from being a punchline to a potential rival in a market so completely dominated by Google that it’s literally synonymous with it. Google rushed to release a chatbot of its own, called Bard. Meta, not to be outdone and possibly still smarting from its disastrous metaverse pivot, released not one but two open source(ish) versions of its large language model. OpenAI licensed ChatGPT out to other companies, and dozens lined up to put it in their own products.
That reinvention may be a longer way off than the excitement from a few months ago suggested, assuming it happens at all. A recent Wall Street Journal article said that the new Bing isn’t catching on with consumers, citing two different analytics firms that had Bing’s market share at roughly the same now as it was in the pre-AI days of January. (Microsoft told WSJ that those firms were underestimating the numbers but wouldn’t share its internal data.) According to Statcounter, Microsoft’s web browser, Edge, which consumers had to use in order to access Bing Chat, did get a user bump, but still barely moved the needle and has already started to recede, while Chrome’s market share increased during that time. There is still hope for Microsoft, however. When Bing Chat is easier or possible to access on different and more popular browsers, it may well get more use. Microsoft told WSJ it plans to do this soon.
Meanwhile, OpenAI’s ChatGPT seems to be flagging, too. For the first time since its release last year, traffic to the ChatGPT website fell by almost 10 percent in June, according to the Washington Post. Downloads of its iPhone app have fallen off, too, the report said, although OpenAI wouldn’t comment on the numbers.
And Google has yet to integrate its chatbot into its search services as extensively as Microsoft did, keeping it off the main search page and continuing to frame it as an experimental technology that “may display inaccurate or offensive information.” Google didn’t respond to a request for comment on Bard usage numbers.
Google’s approach may be the right one, given how problematic some of these chatbots can be. We now have myriad examples of chatbots going off the rails, from getting really personal with a user to spouting off complete inaccuracies as truth to containing the inherent biases that seem to permeate all of tech. And while some of those issues have been mitigated by some companies to some degree along the way, things seem to be getting worse, not better. The Federal Trade Commission is looking into ChatGPT’s inaccurate responses. A recent study showed that OpenAI’s GPT-4, the newest version of its LLM, showed marked declines in accuracy in some areas in just a few months, indicating that, if nothing else, the model is changing or being changed over time, which can cause drastic differences in its output. And attempts by journalistic outlets to fill pages with AI-generated content have resulted in multiple and egregious errors. As chatbot-fueled cheating proliferated, OpenAI had to pull its own tool to detect ChatGPT-generated text because it sucked.
Last week, eight companies behind LLMs, including OpenAI, Google, and Meta, took their models to DEF CON, a massive hacker convention, to have as many people as possible test their models for accuracy and safety in a first-of-its-kind stress test, a process called “red teaming.” The Biden administration, which has been making a lot of noise about the importance of AI technology being developed and deployed safely, supported and promoted the event. President Biden’s science adviser and the director of the White House Office of Science and Technology, Arati Prabhakar, told Vox it was a chance to “really figure out how well these chatbots are working; how hard or easy is it to get them to come off the rails?”
The goal of the challenge was to give the companies some much-needed data on if and how their models break, supplied by a diverse group of people who would presumably test it in ways the companies’ internal teams hadn’t. We’ll see what they do with that data, and it’s a good sign that they participated in the event at all, though the fact that the White House urged them to do so surely was a motivating factor.
In the meantime, these models and the chatbots created from them are already out there being used by hundreds of millions of people, many of whom will take what these chatbots say at face value. Especially when they may not know that the information is coming from a chatbot in the first place (CNET, for example, barely disclosed which articles were written by bots). As various reports show a waning interest in some AI-powered tools from the public, however, they need to get better if they want to survive. We also don’t even know if the technology actually can be fixed, given how even their own developers claim not to know all of their inner workings.
Generative AI can do some amazing things. There’s a reason why Silicon Valley is excited about it and so many people have tried it out. What remains to be seen is whether it can be more than a party trick, which, given its still-prevalent flaws, is probably all it should be for now.
A version of this story was also published in the Vox technology newsletter. Sign up here so you don’t miss the next one!