Advancements

Google enters the AI agent race

Google has released Gemini 2.0 which is powering their first AI agent

Martin Crowley
December 12, 2024

Google has launched Gemini 2.0 and announced that it’s powering their first-ever AI agent, called Project Mariner, which can move the cursor, click buttons, browse the web, and perform certain web-based tasks, autonomously, within the Chrome browser.

It works by taking screenshots of the browser window (users must first agree to this), and sending these to the Cloud for processing, which Gemini then sends back to the computer—as instructions—to navigate the web page or perform the desired action.

Project Mariner has been released to a small group of testers, who have confirmed it can do things like find and add items to a shopping cart, fill out forms, find recipes, and search for flights and hotels. Although it’s capable of performing human-like tasks, independently, it is slow—often taking around 5 seconds for each cursor movement. When it’s working, users cannot use other tabs, browsers, or applications, instead, they have to sit and watch it work (which can be frustrating), and it won’t accept cookies, complete payments, or agree to service agreements. But Google insists these were intentional decisions, as they want to give users full sight over what the agent is doing and full control.

Alongside the general-purpose AI agent, Google has also revealed three more ‘experimental’ agents for more specific tasks, namely deep research, coding, and video gaming, meaning they have well and truly entered the AI agent race.