• Synthetic
  • Posts
  • GPT-4o stuns with human-like interaction and visual understanding

GPT-4o stuns with human-like interaction and visual understanding

Plus, Google announces huge number of AI advances

This Week in AI

It’s been a BIG week in AI news with both OpenAI and Google making significant announcements.

First, OpenAI announced a new, super responsive, multimodal model named “GPT-4o”. The new flagship model seems impressive and will be coming soon to all user tiers, including free ChatGPT users (hinting that it’s way more efficient to run than ChatGPT or GPT-4). You can read all about “o”, see lots of cool demo videos here, and then read Sam Altman’s blog comments. The new AI is designed to be far more conversational than previous efforts. It is interruptible, has a latency of just 1/3 of a second to respond, and sounds much more natural with human-like expressions and emotion.

Then on Tuesday, Google held their I/O event and shared a firehose of new AI-related announcements including:

  • Gemini Flash - a new lightweight models focused on speed and efficiency. A little GPT-4o-like.

  • Gemma 2 - Google’s latest set of open source models.

  • Project Astra - their impressive vision for a new universal AI agent.

  • Veo - a new high definition video generation model to rival Open AI’s Sora that can pump out 1080p video.

  • 2 million token context window - ability to ingest up to 2 hours of video, 22 hours of audio, 60,000 lines of code, or 1.4 million words into Gemini 1.5 Pro; Currently available only to Google developers.

  • Trillium - the 6th generation of Google’s TPU accelerator chip, which is claimed to deliver 4.7x the performance of the previous TPU v5e chips and 67% better energy efficiency.

  • Imagen 3 - Google’s latest and highest quality image generator.

While Google had plenty to show, and now claims that every one of their major online properties—from Google Maps to Gmail, and Google Drive to Google Docs—makes use of Gemini, it did still feel like Google is in catch up. But, with the incredible data resources they have at their disposal (with YouTube, Maps, Mail, Docs), they have a real advantage….never bet against Alphabet.

Chief scientist and co-founder of Open AI, Ilya Sutskever, announced this week that he is leaving the company. It’s widely reported that Sutskever was behind the 5-day ousting of CEO, Sam Altman, over differences they had related to the direction of the company, likely around AI safety. Sutskever is a towering figure in the AI industry, having previously worked at Google Brain and alongside AI godfather, Geoffrey Hinton. Synthetic is curious to see what Ilya will work on next. There will probably be plenty of VC money chasing his every next move.

After years of regulators performing little-to-no oversight on autonomous vehicle operators, the National Highway Traffic Safety Administration has opened several investigations into companies including Tesla, Ford, Waymo, Cruise, and Zoox related to alleged safety problems. The agency is looking at hundreds of crashes, some of them fatal.

Videos: Introducing GPT-4o and Project Astra

We’ve got two videos this week, one from Open AI and the other from Google DeepMind.

In case you missed it, here is Open AI’s announcement of their latest and most powerful model, GPT-4o, where ‘o’ stands for ‘omni’. The second half of the video includes live demonstrations of ‘o’ including real-time language translation, solving math problems, and singing.

Our second video show’s the sneak preview of Project Astra shared at Google I/O this week. Astra is Google DeepMind’s first foray into AI assistants. Synthetic had the pleasure of working on the periphery of this project while working at DeepMind, and it’s great to see the direction they have taken things in this short demo video, shot at DeepMind HQ in Kings Cross, London. You can view the entire Google I/O keynote here, or a 17-minute supercut of it here.

AI Tech and Innovation

It turns out the world’s fastest AI supercomputer doesn’t run on Nvidia chips, as you might expect, but on Intel. Aurora, built in collaboration with Argonne National Laboratory and Hewlett Packard Enterprises, has crossed the exascale (capable of performing more than one quintillion operations per second) performance barrier is based on the Intel Xe GPU architecture and features 10,624 compute blades in 166 racks with 21,248 Intel Xeon CPU Max Series processors and 63,744 Data Center GPU Max Series processors making it the largest GPU cluster in the world.

Not content to be an also-ran, the UAE continues to plough oil money into AI development as they seek to modernize their economy. Their latest series of models, named Falcon 2, aren’t performance leaders, but they certainly place the UAE in the AI race alongside the US, China, Canada, and the UK as the world’s leading AI nations.

Related: check out this graphic that shows the distribution of AI startups around the globe.

AI Insights

“The current generation of models that we are seeing out there are still in the $100 million range. I think all of this can scale to the $100 billion range”

Dario Amodei, co-founder and CEO of Anthropic, commenting at Bloomberg Tech on how he expects a 1000x scaling of powerful AI models in the future

In 2020, Microsoft made a bold climate pledge to be carbon negative by 2030, yet in 2023 its greenhouse gas emissions had grown 30%. With the company expected to spend $50 billion dollars expanding its AI infrastructure this year, and even more next year, that pledge is looking like it will be harder and harder to reach.

(Long live prompt engineering) This interesting piece explores the weird prompting you can use to actually deliver better results when using LLMs like ChatGPT or Gemini. Apparently, adding phrases like “You are highly intelligent. This will be fun!” can boost the quality of responses. The piece also explores how users should ask large language models for help on the best ways to prompt them for optimal results.

Toolkit for the Future

Get smarter on AI in 5 minutes a day.

  • The world’s largest AI newsletter, read by over 600,000 AI professionals.

  • One free email every morning on what’s new in AI and gives you “the rundown” of the most important developments.

  • Allowing for readers to keep up with the insane pace of AI and why it actually matters

Need fresh talent that can harness the latest AI tools to increase the speed and volume of marketing content creation? Unlock value with fully-vetted, top marketing people with experience in your industry. The marketing-as-a-service model allows you to hire talent on demand, part -time or full-time, in 30-day blocks of time.

Invite the HireLogic AI to your next phone, video, or in-person interview. It analyzes the conversation, provides specific insights, and highlight the interviewee’s strengths, skills, and any potential concerns. It creates a full transcript of the interview so you can focus on the conversation. Free to try.

Melio is a free and easy-to-use payment solution that helps U.S. small businesses pay any expense by bank transfer or credit card - even where cards are not accepted. No setup or monthly fees.

Recording, editing, and refining audio often demands more time than entrepreneurs can spare. Imagine being able to produce human-like audio effortlessly with just a single click. ElevenLabs let you generate high quality audio efficiently and cost-effectively in 29 languages. Try it free.

Find phone numbers, emails, and company org charts to reach the people you need to. Fill information gaps in your contacts, auto-generate follow up emails, and identify new target markets. Get 50 free credits and discover why over 500,000 companies use Seamless to grow their business.

Recommended Readings