Yo, check this out. As we head into fall, we got a showdown between two big players in the tech world: Google and OpenAI. They’re going head to head in a software race, trying to be the first to launch the next-gen of language models called multimodal. These bad boys can handle both images and text, which means they can take a sketch of a website and whip up the code for it. Or they can analyze visual charts and break them down into text without bothering your engineer buddy.
Now, Google’s getting all up in it. They’re already sharing their upcoming multimodal LLM named Gemini with a small group of outside companies. But OpenAI ain’t gonna let them have all the glory. This Microsoft-backed startup is in a mad rush to integrate their most advanced LLM called GPT-4 with multimodal features that are comparable to what Gemini has to offer. I got the insider info on this, folks.
Back in March, OpenAI gave us a little taste of their multimodal features when they unveiled GPT-4. But they only made it available to one company, Be My Eyes, which is all about creating tech for the visually impaired. Fast forward six months, and now OpenAI is gearing up to release these features, they call it GPT-Vision, to a wider audience.