• Synthetic
  • Posts
  • OpenAI o1 Reasoning Wows The World šŸ“

OpenAI o1 Reasoning Wows The World šŸ“

Plus, New 'Debunkbot' Successfully Changes Minds of 25% of Conspiracy Theorists. Could Your Crazy Uncle Be Next?

Subscribe to Synthetic

The weekā€™s most interesting and relevant AI news and analysis

This Week in AI

Just minutes after last weekā€™s edition of Synthetic arrived in your inbox, OpenAI dropped some monumental news about their new reasoning model, o1. Much has been written about this new technology, what it can do, and what it means for the future of AI. This week, we are sharing some of the best articles so you can dive in and get up to date on the latest big breakthrough in AI research. šŸ§ 

OpenAIā€™s new o1-preview and o1-mini models, formerly codenamed ā€˜Strawberryā€™ šŸ“and formerly-formerly known as Q-Star, are a big deal. Until the release of the o1 family, large language models like ChatGPT, Gemini, and Claude have exclusively used so-called System 1 thinking to deliver their incredible results. System 1 thinking uses fast recall and quick thinking to answer questions. System 2, by contrast, uses a slower, more deliberate reasoning process to deduce answers to more complex problems. Itā€™s the difference between asking you to solve the equation ā€˜2+2=?ā€™ (System 1) and estimate ā€˜How many golf balls could you fit inside a school bus?ā€™ šŸšŒ (System 2). The o1 models now have a limited form of System 2 thinking that enables them to solve complex problems and provide more accurate, well-considered answers to questions.

OpenAIā€™s new o1 models use a ā€˜chain of thoughtā€™ technique for multistep reasoning. It breaks tricky problems into more manageable chunks, recognizes its mistakes, and tries new approaches whenever the current one isnā€™t working. Early tests show the model ranks in the 89th percentile on coding questions from Codeforces, a competitive coding organization. It would place in the top 500 high school students in the USA Math Olympiad šŸ§®, and it can answer PhD-level šŸŽ“questions in topics ranging from organic chemistry šŸ§Ŗ to astrophysics šŸ”­ with 78% accuracy (beating human experts who score only 69.7%). Some have begun to question whether o1 is an early example of AGI (it isnā€™t, but itā€™s undoubtedly a giant leap in that direction).

The new o1-preview model and the forthcoming o1 are quite impressive at tasks that require planning. The writer challenged o1-preview to solve a difficult crossword puzzle. He exposes o1ā€™s chain of thought to solve the puzzle (which is fascinating) and explores the modelā€™s capabilities and limitations.

o1, OpenAIā€™s as-yet-unreleased model looks even more impressive

ā

ā€œo1-preview is pulling back the curtain on AI capabilities we might not have seen coming, even with its current limitationsā€

Ethan Mollick, Professor of Entrepreneurship, Innovation and AI, Wharton School of University of Pennsylvania

First, the good news: OpenAIā€™s new model, o1 preview (they sure know how to name products at that company! šŸ¤Ŗ), is significantly better at reasoning through complex problems. It can break a problem into pieces, plan a problem-solving approach, try different avenues, and judge the best approach. It delivers impressive results using a more ā€˜thoughtfulā€™ and measured approach. Now the bad news: o1 has been caught lying and providing information that it has itself judged is likely false. This is the conclusion of independent testing by AI safety research firm, Apollo Research.

Cognition is best known for Devin, their AI software engineer. In the last few months, Cognition engineers have evaluated the new capabilities of o1-preview versus the previous foundational model they were using, ChatGPT-4o. Their findings were revealing. This article is well worth the read.

The internet is abuzz with questions about how OpenAI achieved the impressive reasoning capabilities released in the new o1 preview model. Hackers and red-teamers have used jailbreaking and prompt injection techniques to try to uncover details of o1ā€™s chain of thought so they can get a better insight into how it ā€˜thinks.ā€™ These attempts have resulted in some users receiving warning emails from OpenAI that threaten a ban from the system.

Quick Hits

Video: What Does the AI Boom Really Mean for Humanity?

Mathematician Professor Hannah Fry explores the future of AI, how it might develop, and what it will mean for us all. She speaks with leading AI researchers and considers the views of AI doomers to explore the path to superintelligence. šŸ§ 

AI Tech and Innovation

Researchers claim to have built an AI-powered security system that can predict felonies with 82.8% accuracy from CCTV monitoring. The system, named Dejaview, integrates CCTV footage, crime statistics, positioning data, and other signals to predict the chance of a crime occurring. The systemā€™s output is a heat map used by law enforcement to determine the geographic location of police officers. Fans of Minority Report will recognize the theme. šŸš“

New AI tools are fun to use and have delivered some incremental productivity gains, but not the transformation that was promised by AI bulls. Limited ambition and capability have reduced generative AIā€™s impact as itā€™s used to semi-automate small, simple tasks. On many platforms, AI features have been reduced to a single button. ā€œWe're so focused on making AI fit into our existing workflows that we've forgotten to ask whether those workflows even make sense anymore.ā€ AI startups must dream bigger and go beyond the button.

AI Insights

ā

ā€œPsychological needs and motivations do not inherently blind conspiracists to evidence. It simply takes the right evidence to reach them.ā€

Creators of anti-conspiracy theory chatbot, Debunkbot (See article below)

Americans love a good conspiracy theory. Only 66% of Millennials firmly believe the earth is round (Source: YouGov), and 29% of American voters believe voting machines were hacked to change the result of the 2020 election. MIT, Cornell, and American University researchers built a custom chatbot to engage self-described conspiracy theorists in dialogue and produce detailed counterarguments to refute their positions and change their minds. After interacting with the bot, which the researchers have named Debunkbot, about a quarter of study participants disavowed their conspiracy theory. As Americans everywhere start to ponder spending another Thanksgiving with a crazy uncle or aunt long lost to conspiracy theories, Debunkbot starts to sound like a great idea.

Starling Bank, an online lender in the UK, says that fraudsters can clone voices with as little as three seconds of audio taken from a video posted on social media. Criminals use the AI clone to ask friends and family for money. In a recent survey, a quarter of respondents said AI voice scams had already targeted them within the last 12 months, while 46% werenā€™t even aware such scams existed.

Synthetic tip: AI-powered ransomware and voice scams are on the rise. Take a few minutes this evening with your family and friends to agree on secret challenge words so you can quickly foil criminal efforts to separate you from your money. šŸ’µ

Analysis by The Guardian, known for its investigative journalism, indicates that emissions from in-house data centers run by Google, Microsoft, Meta, and Apple between 2020 and 2022 were perhaps 7.62 times more than reported. šŸ­

The Financial Times reports that investors see electricity providers as the ā€˜next derivative on AI.ā€™ Since tech darlings like Nvidia will be capacity-constrained for the foreseeable future, limiting growth, investors are looking for the next place to find high returns during the forthcoming decade of AI infrastructure build-out. šŸ”§

Toolkit for the Future

Here are some excellent new AI tools to try. Regain control over your calendar, boost customer satisfaction (and reduce costs) with your help desks, find AI talent, and increase sales by integrating a chatbot on your website.

Maintain healthy calendar habits, improve your productivity, optimize cross-team meetings, boost collaboration, and improve work-life balance. Reclaim schedules 1:1s and defends your calendar so you can focus. Syncs multiple calendars and integrates with Google Calendar, Zoom, Slack, and HubSpot.

Take customer support to the next level with powerful AI features including chatbots, workflow automation, emotion insights, knowledge bases, data-drive insights, real-time reporting, and empathy coaching to help your support teams increase productivity and customer satisfaction.

Incorporate your company, access one-click growth tools, stay compliant, and manage everything your business needs ā€” all online, from anywhere. Launch your U.S. business in minutes with no paperwork or legal headaches.

Need to expand internationally, but not sure where to start? Oyster makes it easy to find, hire, and retain local talent. They handle the details (country-specific labor laws, international tax laws, compliance, and global payroll) so you can focus on finding the right talent and ramping their impact.

Build intelligent personal shopping experiences by adding the Manifest AI chatbot to your site. Help shoppers find what they need, faster. Double add-to-cart and conversion rates and get 25% higher AOV. Easy Shopify and help desk integration. Free 14-day trials.

Hire top talent with industry-specific expertise to build world-class engineering teams and solve complex business problems. AI-assisted sourcing, vetting, and hiring delivers the right talent, at the right time, globally!