OpenAI has been dominating the world of artificial intelligence (AI) and chatbots lately, with its GPT-4 large language model (LLM) powering ChatGPT and taking the world by storm. The company got an early lead and everyone else has been playing catch-up ever since.
Yet OpenAI has a fresh challenger in the form of Google Gemini. This new arrival burst onto the scene in December 2023 and stunned onlookers with its impressive capabilities (even if the demos were somewhat exaggerated). We’ve been waiting for months to see what Google has up its sleeve, and the results look pretty spectacular.
But is it enough to defeat GPT-4? What can it do right now, and what about in the future? And if you want to use Gemini, how exactly do you do that? We’ve taken a deep dive into the world of Gemini to find the answers to all these questions and more. If you’re curious about Google’s latest AI efforts, this is the place to be.
Gemini is Google’s latest large language model (LLM). What’s an LLM? It’s the system that underpins the types of AI tools you’ve probably seen and interacted with on the internet. For example, GPT-4 powers ChatGPT Plus, OpenAI’s advanced paid-for chatbot.
In Google’s case, Gemini will be woven into a wide array of tools, such as the Bard chatbot, Google Search, YouTube, and more. In other words, Gemini isn’t a chatbot itself, but the “brain” that makes it (and other tools) tick.
Google also specified that it has created three variants, or “sizes,” of Gemini: Nano, Pro and Ultra. Nano is now inside the Pixel 8 Pro and destined for other mobile devices, while Gemini Pro has already found its way into Google Bard. Ultra, meanwhile, is designed for “highly complex tasks,” although it will also come to Bard once Google has completed extensive testing and safeguarding.
WHAT CAN GEMINI DO?
In a press release, Google explained that Gemini is a multimodal AI tool. In other words, it can deal with various forms of input and output, including text, code, audio, images and videos. That gives it a lot of flexibility to perform a wide range of tasks.
Google’s Gemini launch event saw it showcase the tool’s abilities in a “hands on” video, and it’s safe to say it was pretty mind-blowing (even if it wasn’t quite representative of today’s reality).
Gemini could be seen following a paper ball hidden under a cup and understanding a user’s sleight-of-hand coin trick. It could predict what a dot-to-dot puzzle showed before a single line was drawn and explain when one path on a map might lead to danger and one may lead to safety.
Better yet, all of this seemingly happened in real-time, with a human asking Gemini a question and rapidly getting an accurate response. It suggested that natural, flowing conversations will be possible with Google’s chatbot. However, the reality might not quite live up to the video demo’s hype.
A separate Google blog post showed how the demo had actually been created – by feeding Gemini still image frames from the captured footage and prompting the AI model using text, rather than voice. So while the video below does show real outputs from Gemini, we’re still quite far from the real-time conversations it depicts.
Gemini Pro has recently been incorporated into Google Bard but, as in the early days of other tools like ChatGPT (and earlier versions of Bard), it seems prone to mistakes.
For instance, it has struggled to name recent Oscar award winners and produce accurate code. It has also shown itself to be inaccurate when working in non-English languages – one user on X (formerly Twitter) asked Gemini to tell it a six-letter French word, to which Gemini responded with a five-letter word. (Then again, ChatGPT also sometimes struggles with this task.)