Google is trying to create a new wave with Gemini, its flagship suite of generative AI models, apps, and services.
So what is Gemini? How can I use it? And how does it compare to the competition?
To make it easier for you to keep up with the latest Gemini developments, we've put together this handy guide. This guide will continue to be updated as new Gemini models, features, and news about Google's Gemini plans become available.
What is Gemini?
Gemini is Google's long-promised next-generation GenAI model family, developed by Google Research and DeepMind, Google's AI lab. Available in three flavors:
- gemini ultraThis is the Gemini model with the highest performance.
- gemini proThis is the “Light” Gemini model.
- gemini nanoIt's a smaller 'tablet' model that runs on mobile devices like the Pixel 8 Pro.
All Gemini models were trained to be “multimodal by default.” That is, they are trained to use and work with more than just words. They are pre-trained and fine-tuned on a variety of audio, images, video, large sets of code bases, and text in a variety of languages.
This sets Gemini apart from models like Google's own LaMDA, which are trained solely on text data. LaMDA cannot understand or generate anything other than text (e.g. essays, email drafts), but this is not the case with the Gemini model.
What is the difference between the Gemini app and the Gemini model?
![the bard of google](https://techcrunch.com/wp-content/uploads/2023/09/Extensions-title.png)
Image Credits: Google
Google once again demonstrated its lack of branding prowess, failing to make it clear from the outset that Gemini was separate and distinct from the Gemini apps (formerly Bard) on web and mobile. The Gemini app is simply an interface that allows you to access specific Gemini models. Think of it as a client of Google GenAI.
Gemini apps and models are also completely independent of Imagen 2, Google's text-to-image conversion model, which is available in some of the company's development tools and environments.
What can Gemini do?
The Gemini model is multimodal, so it can theoretically perform a variety of multimodal tasks, from transcribing speech to adding captions to images and videos to creating works of art. Some of these features have yet to reach product stage (more on that later), and Google is promising all of them and more at some point in the not-too-distant future.
Of course, it's a bit difficult to take the company's word for it.
Google seriously underperformed the original Bard release. And recently, many videos purporting to demonstrate Gemini's abilities were revealed to have been doctored and somewhat ambitious.
Still, assuming Google has some truth to its claims, here's what Gemini's various layers can do once they reach their full potential:
gemini ultra
Google says that thanks to its multiple modes, Gemini Ultra can be used to help with tasks like physics homework, step-by-step solving problems on worksheets, and pointing out mistakes that may have occurred in answers already entered.
Gemini Ultra can also be applied to tasks such as identifying scientific papers related to a specific problem, Google says. We “update” the charts by extracting information from these papers and generating the formulas needed to recreate the charts with the latest data. .
Gemini Ultra technically supports image creation as mentioned earlier. However, the feature has not yet made its way to the production version of the model. This is probably because the mechanism is more complex than how apps like ChatGPT generate images. Instead of feeding prompts to an image generator (e.g. DALL-E 3 for ChatGPT), Gemini outputs images “natively” without any intermediate steps.
Gemini Ultra is available as an API through Vertex AI, Google's fully managed AI developer platform, and AI Studio, Google's web-based tool for app and platform developers. It also supports the Gemini app, but it's not free. To access Gemini Ultra through what Google calls Gemini Advanced, you'll need to sign up for the Google One AI premium plan, which costs $20 per month.
The AI Premium plan also connects Gemini to your broader Google Workspace account, including emails in Gmail, documents in Docs, presentations in Sheets, and recordings in Google Meet. This is useful for summarizing emails, for example, or letting Gemini capture notes during a video call.
gemini pro
Google says Gemini Pro has improved reasoning, planning, and understanding capabilities over LaMDA.
An independent study conducted by Carnegie Mellon and BerriAI researchers found that an early version of Gemini Pro was actually better than OpenAI's GPT-3.5 at handling longer and more complex inference chains. However, the study found that, like all large-scale language models, this version of Gemini Pro struggled with math problems, especially those involving multiple numbers, and users found examples of faulty inferences and obvious mistakes.
But Google has promised a solution, and the first one has arrived in the form of Gemini 1.5 Pro.
Designed as a drop-in replacement, Gemini 1.5 Pro improves in several areas over its predecessor, with the biggest improvement being the amount of data it can process. Gemini 1.5 Pro can handle ~700,000 words or ~30,000 lines of code. This is 35 times what the Gemini 1.0 Pro can handle. And the model is multimodal, so it's not limited to just text. Although slower, Gemini 1.5 Pro can analyze up to 11 hours of audio or 1 hour of video in multiple languages (e.g., scene detection in 1 hour of video takes 30 seconds to 1 minute of processing time) ).
Gemini 1.5 Pro entered public preview at Vertex AI in April.
An additional endpoint, Gemini Pro Vision, can process text. and It generates images, including photos and videos, and outputs text along the lines of OpenAI's GPT-4 with Vision model.
![gemini](https://techcrunch.com/wp-content/uploads/2024/01/structured_prompt.png)
Using Gemini Pro at Vertex AI Image Credits: gemini
Within Vertex AI, developers can use a fine-tuning, or 'tweaking' process, to customize Gemini Pro to their specific context and use cases. Gemini Pro can also connect to external third-party APIs to perform certain tasks.
AI Studio has a workflow for creating structured chat prompts using Gemini Pro. Developers have access to both Gemini Pro and Gemini Pro Vision endpoints and can adjust model temperature to control the creative range of output, provide examples to provide tone and style guidance, and even adjust safety settings.
gemini nano
The Gemini Nano is a much smaller version of the Gemini Pro and Ultra models and is efficient enough to run (some) tasks directly on your phone instead of sending them to a server. So far, it supports a few features from the Pixel 8 Pro, Pixel 8, and Samsung Galaxy S24, including Recorder Summary and Smart Reply from Gboard.
The Recorder app, which allows users to record and transcribe audio with the press of a button, includes Gemini-powered summaries of recorded conversations, interviews, presentations, and other content. Users can view these summaries even when a signal or Wi-Fi connection is unavailable, and to ensure privacy, no data leaves the phone in the process.
Gemini Nano is also present in Gboard, Google's keyboard app. It's equipped with a feature called Smart Reply that helps suggest what you want to say next when chatting in messaging apps. The feature will initially only work with WhatsApp, but will roll out to more apps over time, Google says.
And in the Google Messages app on supported devices, Nano enables Magic Compose, allowing you to compose messages in styles such as 'exciting', 'formal', and 'lyrical'.
Is Gemini better than OpenAI's GPT-4?
Google has touted Gemini's superiority in benchmarks several times, claiming that Gemini Ultra surpasses current state-of-the-art results on “30 of 32 widely used academic benchmarks used for large-scale language model research and development.” Meanwhile, the company says the Gemini 1.5 Pro is better at tasks like content outlining, brainstorming, and writing than the Gemini Ultra in some scenarios. Perhaps this will change when the next Ultra model is released.
But putting aside the question of whether the benchmark actually represents a better model, the scores pointed out by Google appear to be slightly better than OpenAI's corresponding model. And, as mentioned earlier, some initial impressions were not good. Users and academics have pointed out that previous versions of Gemini Pro tended to get basic facts wrong and had difficulty translating and providing coding suggestions.
How much does Gemini cost?
Gemini 1.5 Pro is available for free in the Gemini app and now in AI Studio and Vertex AI.
However, when Gemini 1.5 Pro leaves preview on Vertex, the model will cost $0.0025 per character and the print cost will be $0.00005 per character. Vertex customers pay per 1,000 characters (about 140 to 250 words) and, for models like Gemini Pro Vision, per image ($0.0025).
Let’s say your 500-word article contains 2,000 characters. It costs $5 to summarize that article with Gemini 1.5 Pro. Meanwhile, creating an article of similar length costs $0.1.
Ultra pricing has not yet been announced.
Where can I try Gemini?
gemini pro
The easiest place to experience Gemini Pro is through the Gemini app. Pro and Ultra are answering questions in multiple languages.
Gemini Pro and Ultra are also accessible in preview from Vertex AI via API. The API is free to use “within limits” for the time being and supports certain regions, including Europe, as well as features like chat functionality and filtering.
Elsewhere, Gemini Pro and Ultra can be found in AI Studio. This service allows developers to iterate through prompts and Gemini-based chatbots, then obtain an API key to use in their apps. Alternatively, you can export your code to a more full-featured IDE.
Code Assist (formerly Duet AI for Developers), Google's suite of AI-powered assistance tools for code completion and generation, uses the Gemini model. Developers can make “big” changes across their codebase, such as updating dependencies between files and reviewing large chunks of code.
Google has introduced the Gemini model in its development tools for its Chrome and Firebase mobile development platforms, as well as its database creation and management tools. And we launched a new security product based on Gemini. Gemini by Threat Intelligence is a component of Google's Mandiant cybersecurity platform that analyzes potentially significant chunks of malware and allows users to perform natural language searches for persistent threats or indicators of compromise.