Synthetic
Posts
OpenAI o1 Reasoning Wows The World 🍓

OpenAI o1 Reasoning Wows The World 🍓

Plus, New 'Debunkbot' Successfully Changes Minds of 25% of Conspiracy Theorists. Could Your Crazy Uncle Be Next?

Steve Brown
September 19, 2024

Subscribe to Synthetic

The week’s most interesting and relevant AI news and analysis

This Week in AI

Just minutes after last week’s edition of Synthetic arrived in your inbox, OpenAI dropped some monumental news about their new reasoning model, o1. Much has been written about this new technology, what it can do, and what it means for the future of AI. This week, we are sharing some of the best articles so you can dive in and get up to date on the latest big breakthrough in AI research. 🧠

OpenAI’s new o1-preview and o1-mini models, formerly codenamed ‘Strawberry’ 🍓and formerly-formerly known as Q-Star, are a big deal. Until the release of the o1 family, large language models like ChatGPT, Gemini, and Claude have exclusively used so-called System 1 thinking to deliver their incredible results. System 1 thinking uses fast recall and quick thinking to answer questions. System 2, by contrast, uses a slower, more deliberate reasoning process to deduce answers to more complex problems. It’s the difference between asking you to solve the equation ‘2+2=?’ (System 1) and estimate ‘How many golf balls could you fit inside a school bus?’ 🚌 (System 2). The o1 models now have a limited form of System 2 thinking that enables them to solve complex problems and provide more accurate, well-considered answers to questions.

Why OpenAI’s New Model Is Such a Big Deal

OpenAI’s new o1 models use a ‘chain of thought’ technique for multistep reasoning. It breaks tricky problems into more manageable chunks, recognizes its mistakes, and tries new approaches whenever the current one isn’t working. Early tests show the model ranks in the 89th percentile on coding questions from Codeforces, a competitive coding organization. It would place in the top 500 high school students in the USA Math Olympiad 🧮, and it can answer PhD-level 🎓questions in topics ranging from organic chemistry 🧪 to astrophysics 🔭 with 78% accuracy (beating human experts who score only 69.7%). Some have begun to question whether o1 is an early example of AGI (it isn’t, but it’s undoubtedly a giant leap in that direction).

Something New: On OpenAI’s “Strawberry” and Reasoning 🤔

The new o1-preview model and the forthcoming o1 are quite impressive at tasks that require planning. The writer challenged o1-preview to solve a difficult crossword puzzle. He exposes o1’s chain of thought to solve the puzzle (which is fascinating) and explores the model’s capabilities and limitations.

o1, OpenAI’s as-yet-unreleased model looks even more impressive

❝

“o1-preview is pulling back the curtain on AI capabilities we might not have seen coming, even with its current limitations”

Ethan Mollick, Professor of Entrepreneurship, Innovation and AI, Wharton School of University of Pennsylvania

OpenAI’s New Model Is Better at Reasoning, and Occasionally, Deceiving

First, the good news: OpenAI’s new model, o1 preview (they sure know how to name products at that company! 🤪), is significantly better at reasoning through complex problems. It can break a problem into pieces, plan a problem-solving approach, try different avenues, and judge the best approach. It delivers impressive results using a more ‘thoughtful’ and measured approach. Now the bad news: o1 has been caught lying and providing information that it has itself judged is likely false. This is the conclusion of independent testing by AI safety research firm, Apollo Research.

A Review of OpenAI o1 and How We Evaluate Coding Agents

Cognition is best known for Devin, their AI software engineer. In the last few months, Cognition engineers have evaluated the new capabilities of o1-preview versus the previous foundational model they were using, ChatGPT-4o. Their findings were revealing. This article is well worth the read.

OpenAI Threatens to Ban Users Who Probe o1 Model

The internet is abuzz with questions about how OpenAI achieved the impressive reasoning capabilities released in the new o1 preview model. Hackers and red-teamers have used jailbreaking and prompt injection techniques to try to uncover details of o1’s chain of thought so they can get a better insight into how it ‘thinks.’ These attempts have resulted in some users receiving warning emails from OpenAI that threaten a ban from the system.

Quick Hits

Microsoft Launches Copilot Pages - Microsoft launches new BizChat and Copilot Pages along with upgraded versions of Copilot for Excel, PowerPoint, Word, and Teams.
AI-Powered Death Clock Predicts Your Demise - Part coach and part grim reaper; this new actuarial app wants to help you live longer. 💀⏱️
Lionsgate Shares Film/TV Library with Runway - Lionsgate, a major Hollywood studio responsible for titles including Knives Out, La La Land, and the John Wick franchise, licensed its large film and TV catalog to AI video generation firm Runway to help future film-makers ‘augment their work.’ 🎬
Microsoft and Blackrock Raise $100 Billion AI Infrastructure Fund - Building frontier AI is not for the faint of heart. A single leading-edge AI data center in the 2027/2028 timeframe could cost $100 billion. 💰

Video: What Does the AI Boom Really Mean for Humanity?

Mathematician Professor Hannah Fry explores the future of AI, how it might develop, and what it will mean for us all. She speaks with leading AI researchers and considers the views of AI doomers to explore the path to superintelligence. 🧠

AI Tech and Innovation

Researchers Build AI to Predict Criminal Behavior

Researchers claim to have built an AI-powered security system that can predict felonies with 82.8% accuracy from CCTV monitoring. The system, named Dejaview, integrates CCTV footage, crime statistics, positioning data, and other signals to predict the chance of a crime occurring. The system’s output is a heat map used by law enforcement to determine the geographic location of police officers. Fans of Minority Report will recognize the theme. 🚓

What’s the Real Reason AI Hasn’t Yet Delivered on its Hype?

New AI tools are fun to use and have delivered some incremental productivity gains, but not the transformation that was promised by AI bulls. Limited ambition and capability have reduced generative AI’s impact as it’s used to semi-automate small, simple tasks. On many platforms, AI features have been reduced to a single button. “We're so focused on making AI fit into our existing workflows that we've forgotten to ask whether those workflows even make sense anymore.” AI startups must dream bigger and go beyond the button.

AI Insights

❝

“Psychological needs and motivations do not inherently blind conspiracists to evidence. It simply takes the right evidence to reach them.”

Creators of anti-conspiracy theory chatbot, Debunkbot (See article below)

How an AI ‘Debunkbot’ Can Change a Conspiracy Theorist’s Mind

Americans love a good conspiracy theory. Only 66% of Millennials firmly believe the earth is round (Source: YouGov), and 29% of American voters believe voting machines were hacked to change the result of the 2020 election. MIT, Cornell, and American University researchers built a custom chatbot to engage self-described conspiracy theorists in dialogue and produce detailed counterarguments to refute their positions and change their minds. After interacting with the bot, which the researchers have named Debunkbot, about a quarter of study participants disavowed their conspiracy theory. As Americans everywhere start to ponder spending another Thanksgiving with a crazy uncle or aunt long lost to conspiracy theories, Debunkbot starts to sound like a great idea.

Bank Warns ‘Millions’ Could Be Targeted by AI Voice Scams

Starling Bank, an online lender in the UK, says that fraudsters can clone voices with as little as three seconds of audio taken from a video posted on social media. Criminals use the AI clone to ask friends and family for money. In a recent survey, a quarter of respondents said AI voice scams had already targeted them within the last 12 months, while 46% weren’t even aware such scams existed.

Synthetic tip: AI-powered ransomware and voice scams are on the rise. Take a few minutes this evening with your family and friends to agree on secret challenge words so you can quickly foil criminal efforts to separate you from your money. 💵

Data Center Emissions Probably 662% Higher Than Big Tech Claims

Analysis by The Guardian, known for its investigative journalism, indicates that emissions from in-house data centers run by Google, Microsoft, Meta, and Apple between 2020 and 2022 were perhaps 7.62 times more than reported. 🏭

Electricity Infrastructure Is Next Play For AI Investors

The Financial Times reports that investors see electricity providers as the ‘next derivative on AI.’ Since tech darlings like Nvidia will be capacity-constrained for the foreseeable future, limiting growth, investors are looking for the next place to find high returns during the forthcoming decade of AI infrastructure build-out. 🔧

Toolkit for the Future

Here are some excellent new AI tools to try. Regain control over your calendar, boost customer satisfaction (and reduce costs) with your help desks, find AI talent, and increase sales by integrating a chatbot on your website.

Reclaim.AI: The AI assistant for your calendar

Maintain healthy calendar habits, improve your productivity, optimize cross-team meetings, boost collaboration, and improve work-life balance. Reclaim schedules 1:1s and defends your calendar so you can focus. Syncs multiple calendars and integrates with Google Calendar, Zoom, Slack, and HubSpot.

SupportBench: Elevate customer satisfaction with #1 rated help desk

Take customer support to the next level with powerful AI features including chatbots, workflow automation, emotion insights, knowledge bases, data-drive insights, real-time reporting, and empathy coaching to help your support teams increase productivity and customer satisfaction.

Firstbase: The easy way to start a new company

Incorporate your company, access one-click growth tools, stay compliant, and manage everything your business needs — all online, from anywhere. Launch your U.S. business in minutes with no paperwork or legal headaches.

Oyster: Find and hire international talent with a simple cross-border HR solution

Need to expand internationally, but not sure where to start? Oyster makes it easy to find, hire, and retain local talent. They handle the details (country-specific labor laws, international tax laws, compliance, and global payroll) so you can focus on finding the right talent and ramping their impact.

Manifest: Boost website sales with this chatbot

Build intelligent personal shopping experiences by adding the Manifest AI chatbot to your site. Help shoppers find what they need, faster. Double add-to-cart and conversion rates and get 25% higher AOV. Easy Shopify and help desk integration. Free 14-day trials.

ThirstySprout: Hire top-tier tech talent, fast

Hire top talent with industry-specific expertise to build world-class engineering teams and solve complex business problems. AI-assisted sourcing, vetting, and hiring delivers the right talent, at the right time, globally!