- Synthetic
- Posts
- OpenAI o1 Reasoning Wows The World š
OpenAI o1 Reasoning Wows The World š
Plus, New 'Debunkbot' Successfully Changes Minds of 25% of Conspiracy Theorists. Could Your Crazy Uncle Be Next?
Subscribe to Synthetic
The weekās most interesting and relevant AI news and analysis
This Week in AI
Just minutes after last weekās edition of Synthetic arrived in your inbox, OpenAI dropped some monumental news about their new reasoning model, o1. Much has been written about this new technology, what it can do, and what it means for the future of AI. This week, we are sharing some of the best articles so you can dive in and get up to date on the latest big breakthrough in AI research. š§
OpenAIās new o1-preview and o1-mini models, formerly codenamed āStrawberryā šand formerly-formerly known as Q-Star, are a big deal. Until the release of the o1 family, large language models like ChatGPT, Gemini, and Claude have exclusively used so-called System 1 thinking to deliver their incredible results. System 1 thinking uses fast recall and quick thinking to answer questions. System 2, by contrast, uses a slower, more deliberate reasoning process to deduce answers to more complex problems. Itās the difference between asking you to solve the equation ā2+2=?ā (System 1) and estimate āHow many golf balls could you fit inside a school bus?ā š (System 2). The o1 models now have a limited form of System 2 thinking that enables them to solve complex problems and provide more accurate, well-considered answers to questions.
OpenAIās new o1 models use a āchain of thoughtā technique for multistep reasoning. It breaks tricky problems into more manageable chunks, recognizes its mistakes, and tries new approaches whenever the current one isnāt working. Early tests show the model ranks in the 89th percentile on coding questions from Codeforces, a competitive coding organization. It would place in the top 500 high school students in the USA Math Olympiad š§®, and it can answer PhD-level šquestions in topics ranging from organic chemistry š§Ŗ to astrophysics š with 78% accuracy (beating human experts who score only 69.7%). Some have begun to question whether o1 is an early example of AGI (it isnāt, but itās undoubtedly a giant leap in that direction).
The new o1-preview model and the forthcoming o1 are quite impressive at tasks that require planning. The writer challenged o1-preview to solve a difficult crossword puzzle. He exposes o1ās chain of thought to solve the puzzle (which is fascinating) and explores the modelās capabilities and limitations.
o1, OpenAIās as-yet-unreleased model looks even more impressive
āo1-preview is pulling back the curtain on AI capabilities we might not have seen coming, even with its current limitationsā
First, the good news: OpenAIās new model, o1 preview (they sure know how to name products at that company! š¤Ŗ), is significantly better at reasoning through complex problems. It can break a problem into pieces, plan a problem-solving approach, try different avenues, and judge the best approach. It delivers impressive results using a more āthoughtfulā and measured approach. Now the bad news: o1 has been caught lying and providing information that it has itself judged is likely false. This is the conclusion of independent testing by AI safety research firm, Apollo Research.
Cognition is best known for Devin, their AI software engineer. In the last few months, Cognition engineers have evaluated the new capabilities of o1-preview versus the previous foundational model they were using, ChatGPT-4o. Their findings were revealing. This article is well worth the read.
The internet is abuzz with questions about how OpenAI achieved the impressive reasoning capabilities released in the new o1 preview model. Hackers and red-teamers have used jailbreaking and prompt injection techniques to try to uncover details of o1ās chain of thought so they can get a better insight into how it āthinks.ā These attempts have resulted in some users receiving warning emails from OpenAI that threaten a ban from the system.
Quick Hits
Microsoft Launches Copilot Pages - Microsoft launches new BizChat and Copilot Pages along with upgraded versions of Copilot for Excel, PowerPoint, Word, and Teams.
AI-Powered Death Clock Predicts Your Demise - Part coach and part grim reaper; this new actuarial app wants to help you live longer. šā±ļø
Lionsgate Shares Film/TV Library with Runway - Lionsgate, a major Hollywood studio responsible for titles including Knives Out, La La Land, and the John Wick franchise, licensed its large film and TV catalog to AI video generation firm Runway to help future film-makers āaugment their work.ā š¬
Microsoft and Blackrock Raise $100 Billion AI Infrastructure Fund - Building frontier AI is not for the faint of heart. A single leading-edge AI data center in the 2027/2028 timeframe could cost $100 billion. š°
Video: What Does the AI Boom Really Mean for Humanity?
Mathematician Professor Hannah Fry explores the future of AI, how it might develop, and what it will mean for us all. She speaks with leading AI researchers and considers the views of AI doomers to explore the path to superintelligence. š§
AI Tech and Innovation
Researchers claim to have built an AI-powered security system that can predict felonies with 82.8% accuracy from CCTV monitoring. The system, named Dejaview, integrates CCTV footage, crime statistics, positioning data, and other signals to predict the chance of a crime occurring. The systemās output is a heat map used by law enforcement to determine the geographic location of police officers. Fans of Minority Report will recognize the theme. š
New AI tools are fun to use and have delivered some incremental productivity gains, but not the transformation that was promised by AI bulls. Limited ambition and capability have reduced generative AIās impact as itās used to semi-automate small, simple tasks. On many platforms, AI features have been reduced to a single button. āWe're so focused on making AI fit into our existing workflows that we've forgotten to ask whether those workflows even make sense anymore.ā AI startups must dream bigger and go beyond the button.
AI Insights
āPsychological needs and motivations do not inherently blind conspiracists to evidence. It simply takes the right evidence to reach them.ā
Americans love a good conspiracy theory. Only 66% of Millennials firmly believe the earth is round (Source: YouGov), and 29% of American voters believe voting machines were hacked to change the result of the 2020 election. MIT, Cornell, and American University researchers built a custom chatbot to engage self-described conspiracy theorists in dialogue and produce detailed counterarguments to refute their positions and change their minds. After interacting with the bot, which the researchers have named Debunkbot, about a quarter of study participants disavowed their conspiracy theory. As Americans everywhere start to ponder spending another Thanksgiving with a crazy uncle or aunt long lost to conspiracy theories, Debunkbot starts to sound like a great idea.
Starling Bank, an online lender in the UK, says that fraudsters can clone voices with as little as three seconds of audio taken from a video posted on social media. Criminals use the AI clone to ask friends and family for money. In a recent survey, a quarter of respondents said AI voice scams had already targeted them within the last 12 months, while 46% werenāt even aware such scams existed.
Synthetic tip: AI-powered ransomware and voice scams are on the rise. Take a few minutes this evening with your family and friends to agree on secret challenge words so you can quickly foil criminal efforts to separate you from your money. šµ
Analysis by The Guardian, known for its investigative journalism, indicates that emissions from in-house data centers run by Google, Microsoft, Meta, and Apple between 2020 and 2022 were perhaps 7.62 times more than reported. š
The Financial Times reports that investors see electricity providers as the ānext derivative on AI.ā Since tech darlings like Nvidia will be capacity-constrained for the foreseeable future, limiting growth, investors are looking for the next place to find high returns during the forthcoming decade of AI infrastructure build-out. š§
Toolkit for the Future
Here are some excellent new AI tools to try. Regain control over your calendar, boost customer satisfaction (and reduce costs) with your help desks, find AI talent, and increase sales by integrating a chatbot on your website.
Maintain healthy calendar habits, improve your productivity, optimize cross-team meetings, boost collaboration, and improve work-life balance. Reclaim schedules 1:1s and defends your calendar so you can focus. Syncs multiple calendars and integrates with Google Calendar, Zoom, Slack, and HubSpot.
Take customer support to the next level with powerful AI features including chatbots, workflow automation, emotion insights, knowledge bases, data-drive insights, real-time reporting, and empathy coaching to help your support teams increase productivity and customer satisfaction.
Incorporate your company, access one-click growth tools, stay compliant, and manage everything your business needs ā all online, from anywhere. Launch your U.S. business in minutes with no paperwork or legal headaches.
Need to expand internationally, but not sure where to start? Oyster makes it easy to find, hire, and retain local talent. They handle the details (country-specific labor laws, international tax laws, compliance, and global payroll) so you can focus on finding the right talent and ramping their impact.
Build intelligent personal shopping experiences by adding the Manifest AI chatbot to your site. Help shoppers find what they need, faster. Double add-to-cart and conversion rates and get 25% higher AOV. Easy Shopify and help desk integration. Free 14-day trials.
Hire top talent with industry-specific expertise to build world-class engineering teams and solve complex business problems. AI-assisted sourcing, vetting, and hiring delivers the right talent, at the right time, globally!