What Is Google Gemini? - Built in

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

!

LOG IN

Arti%cial Intelligence

What Is Google Gemini?


Here’s everything you need to know about Google’s latest generative AI model.

Written by Ellen Glover

! ! ! ♥ !
Jobs Companies Articles My Items More

Image: Shutterstock

UPDATED BY
Matthew Urwin | Jun 12, 2024

Gemini is a family of AI models and the name of Google’s generative AI


product. These models come in three different sizes and are being
incorporated into several Google products, including Gmail, Docs and
its search engine.
:
What is Google Gemini?

Gemini is a family of AI models created by Google to power many of its


products, including its chatbot, also named Gemini, as well as Gmail,
Docs and its search engine.

Gemini is multimodal, meaning its capabilities span text, image and audio
applications. It can generate natural written language, transcribe speeches,
create artwork, analyze videos and more, although not all of these capabilities
are yet available to the general public. Like other AI models, Gemini is
expected to get better over time as the industry continues to advance.

What Is Google Gemini?

Gemini is Google’s family of multimodal foundation models and the name of


the company’s generative AI chatbot. Google is integrating Gemini across
several of its products and sees it as the answer to OpenAI’s GPT-4, the
multimodal large language model (LLM) that powers the paid version
of ChatGPT, which kicked off a generative AI arms race that has sent several
tech companies scrambling to bring the latest and greatest products to
market.

Launched in December of 2023, Gemini is Google’s largest and most capable


model to date, according to the company. It was developed by Google’s AI
research labs DeepMind and Google Research, and is the culmination of
nearly a decade of work.

Gemini Models
:
The model comes in four different versions, which vary in size and
complexity:

Gemini 1.0 Ultra

Gemini 1.0 Ultra is the largest model for performing highly complex tasks,
according to Google. The company says it is the first model to outperform
human experts on a benchmark assessment that covers topics like physics, law
and ethics. The model is being incorporated into several of Google’s most
popular products, including Gmail, Docs, Slides and Meet. For $19.99 a
month, users can access Gemini 1.0 Ultra through the Gemini Advanced
service.

Gemini 1.5 Pro

Gemini 1.5 Pro is the middle-tier model designed to understand complex


queries and respond to them quickly, and it’s suited for “a wide range of tasks”
thanks to an expanded context window for improved memory and recall. A
specially trained version of Pro powers the AI chatbot Gemini and is available
via the Gemini API in Google AI Studio and Google Cloud Vertex AI.

Gemini 1.0 Nano

A much smaller version of the Pro and Ultra models, Gemini 1.0 Nano is
designed to be efficient enough to perform tasks directly on smart devices,
instead of having to connect to external servers. 1.0 Nano currently powers
features on the Pixel 8 Pro like Summarize in the Recorder app and Smart
Reply in the Gboard virtual keyboard app.

Gemini 1.5 Flash

The latest member of the Gemini family, Gemini 1.5 Flash is a smaller version
of 1.5 Pro and built to perform actions much more quickly than its Gemini
:
counterparts. 1.5 Flash was trained by 1.5 Pro, receiving 1.5 Pro’s skills and
knowledge. As a result, this model has the context window to handle hefty
tasks while serving as a more cost-efficient alternative to larger models.

RELATED READING

Grok: What We Know About Elon Musk’s AI Chatbot


What Can Google Gemini Do?

Gemini is a multimodal model, so it is capable of responding to a range of


content types, whether that be text, image, video or audio.

Generate Text

Gemini can generate text, whether that’s used to engage in


written conversations with users, proofread essays, write cover letters
or translate content into different languages. Gemini can also understand,
explain and generate code in some of the most popular programming
languages, including Python, Java, C++ and Go.

Like any other LLM, though, Gemini has a tendency to hallucinate. “The
results should be used with a lot of care,” Subodha Kumar, a professor of
statistics, operations and data science at Temple University’s Fox School of
Business, told Built In. “They can come with a lot of errors.”

Produce Images

Gemini is able to generate images from text prompts, similar to other AI art
generators like Dall-E, Midjourey and Stable Diffusion.

This capability was temporarily halted to undergo retooling after Google was
:
criticized on social media for producing images that depicted specific white
figures as people of color. Image generators have developed a reputation for
amplifying and perpetuating biases about certain races and genders. Google’s
attempts to avoid this pitfall may have gone too far in the other direction,
though.

Analyze Images and Videos

Gemini can accept image inputs and then analyze what is going on in those
images and explain that information via text. For example, a user can take a
photo of a flat tire and ask Gemini how to fix it, or ask Gemini for help on
their physics homework by drawing out the problem. Gemini can also process
and analyze videos, generate descriptions of what is going on in a given clip
and answer questions about it.

Understand Audio

When fed audio inputs, Gemini can support speech recognition across more
than 100 languages, and assist in various language translation tasks — as
shown in this Google demonstration.

Streamline WorkMows

Gemini can be integrated into several Google Workspace products, including


Gmail, Docs and Drive. Users can query Gemini (through its chatbot
interface) to find a document in their Drive and summarize it, or
automatically generate specific emails. “It becomes a little bit of an assistant
in that sense,” Gen Furukawa, an AI expert and entrepreneur, told Built In.

Within more specific business contexts, professionals can use Gemini to


produce drafts for blog posts, emails and advertisements in Docs; generate
images for Slides presentations by inputting a text prompt and selecting a
visual style; and even tailor their virtual background in Google Meet with a
:
detailed text prompt.

MORE ON GENERATIVE AI

Best Use-Cases for Generative AI in 2024


How Does Google Gemini Work?

At a high level, the Gemini model can see patterns in data and generate new,
original content based on those patterns.

To accomplish this, Gemini was trained on a large corpus of data. Like several
other LLMs, Gemini is a “closed-source model,” generative AI expert Ritesh
Vajariya told Built In, meaning Google has not disclosed what specific training
data was used. But the model’s dataset is believed to include annotated
YouTube videos, queries in Google Search, text content from Google Books
and scholarly research from Google Scholar. (Google has said that it did not
use any personal data from Gmail or other private apps to train Gemini.)

After training, Gemini leveraged several neural network techniques to better


understand its training data. Specifically, Gemini was built on Transformer —
a neural network architecture Google invented in 2017 that is now used by
virtually all LLMs, including the ones that power ChatGPT.

When a user types a prompt or query into Gemini, the transformer generates
a distribution of potential words or phrases that could follow that input text,
and then selects the one that is most statistically probable. “It starts by
looking at the first word, and uses probability to generate the next word, and
so on,” AI expert Mark Hinkle told Built In.

Gemini can also process images, videos and audio. It was trained on trillions
:
of pieces of text, images (along with their accompanying text descriptions),
videos and audio clips. And it was further fine-tuned using reinforcement
learning with human feedback (RLHF), a method that incorporates human
feedback into the training process so the model can better align its outputs
with user intent.

By training on all these mediums at once, Google claims Gemini can


“seamlessly understand and reason about” a variety of inputs, such as reading
the text on a photo of a sign, or generating a story based on an illustration.

MORE FROM GOOGLE

At Google, We Win Over Our Customers With Imperfect Data. Here’s How. →

Gemini vs. GPT-4o

Both the Gemini and GPT-4o language models share several similarities in
their underlying architecture and capabilities. But they also have some
significant differences that impact the user experience and functionalities of
their associated chatbots, Gemini and ChatGPT, respectively.

Gemini Has a Broader Context Window Than GPT-4o

Both Gemini 1.5 Pro and 1.5 Flash display increased context windows, with the
former possessing a context window of up to 2 million tokens and the latter up
to 1 million tokens. GPT-4o’s context window pales in comparison, landing at
128,000 tokens. Alphabet CEO Sundar Pichai has referred to Gemini’s context
window as “the longest context window of any foundational model yet,” and it
appears this statement is valid for the time being.

As a result, 1.5 Pro and 1.5 Flash should have a greater ability to handle dense
:
information and challenging tasks than GPT-4o.

Gemini Has Real-Time Access to the Internet, But GPT-4o Is


Catching Up

Gemini has always had real-time access to Google’s search index, which can
“keep feeding” the model information, Hinkle said. So the Gemini chatbot can
draw on data pulled from the internet to answer queries, and is fine-tuned to
select data chosen from sources that fit specific topics, such as scientific
research or coding.

Users previously had to subscribe to ChatGPT Plus to get access to a plug-in


that allows them to browse Bing, a search engine owned and operated by
OpenAI’s biggest partner, Microsoft. However, GPT-4o promises real-time
internet access, closing the information gap between it and Gemini.

Gemini Was Trained on TPUs, GPT-4o Was Trained on GPUs

Google trained Gemini on its in-house AI chips, called tensor processing units
(TPUs). Specifically, it was trained on the TPU v4 and v5e, which were
explicitly engineered to accelerate the training of large-scale generative AI
models. In the future, Gemini will be trained on the v5p, Google’s fastest and
most efficient chip yet. Meanwhile, GPT-4o was trained on Nvidia’s H100
GPUs, one of the most sought-after AI chips today.

TPUs are designed to handle the computational demands of machine learning


with more speed and efficiency than GPUs, making them an essential
component of the AI industry’s future.

LOOKING TO THE FUTURE

What Is Artificial General Intelligence?



:
How Does Gemini Compare to Other LLMs?

Google’s commitment to speed has paid off in some ways, with Gemini 1.5
Flash ranking as the fastest model on the market and one of the cheapest
options, second only to Meta’s Llama 3 model. However, the focus on going
fast has come with a price, with 1.5 Flash falling to the middle of the pack in
terms of overall quality. GPT-4o, GPT-4 Turbo, Claude 3 Opus and Llama 3 all
rank ahead of 1.5 Flash in the quality index.

Ultimately, determining the best LLM depends on a user’s preferences and


what they’re looking to get out of a generative AI tool. Gemini 1.5 Flash is a
promising option in many respects, but users who don’t view cost-efficiency as
a priority may consider other models.

How to Access Google Gemini

Gemini can be accessed in several ways:

For free: You can head to gemini.google.com and use it for free through the
Gemini chatbot. Or you can download the Gemini app on your smartphone.
Android users can also replace Google Assistant with Gemini.

Paid version: You can also subscribe to the Gemini Advanced service for
$19.99 a month, where you can access updated versions of popular products
like Gmail, Docs, Slides and Meet — all of which have Gemini Ultra built into
them.

Gemini is a work in progress, so it might generate answers that are inaccurate,


unhelpful or even offensive. And it retains users’ conversations, location,
feedback and usage information, according to Google’s privacy policy. So
users may want to avoid consulting Gemini for professional advice on
:
sensitive or high-stakes subjects (like health or finance), and refrain from
discussing private or personal information with the AI tool.

MORE ON ARTIFICIAL INTELLIGENCE

Explore Built In’s AI Coverage


Frequently Asked Questions

What can Google Gemini be used for?

"

Is Gemini better than GPT-4?

"

Is Google Gemini free?

"

Who made Google Gemini?

"

How to access Google Gemini?

"

Google was not available for an interview at the time of reporting.

Recent ArtiVcial Intelligence Articles


:
Machine Learning for Smarter Trading: 17 Companies You Should Know

Machine Learning in Finance: 20 Companies to Know

Best Practices for Designing Effective AI Chatbots

BuiltIn

Built In is the online community for startups and tech companies. Find
startup jobs, tech news and events.

! " # $

About
Our Story

Careers

Our Staff Writers

Content Descriptions
:
Get Involved
Recruit With Built In

Become an Expert Contributor

Resources
Customer Support

Share Feedback

Report a Bug

Browse Jobs

Tech A-Z

Tech Hubs
Our Sites

Learning Lab User Agreement

Accessibility Statement

Copyright Policy

Privacy Policy

Terms of Use

Your Privacy Choices/Cookie Settings

CA Notice of Collection

© Built In 2024
:

You might also like