~/ai-stream
~/research/arc-agi-3-resets-frontier-ai-scoreboard-20260326
The Rundown AI·Researchhot

ARC-AGI-3 Resets Frontier AI Scoreboard

content

ARC-AGI-3 Test

🧐ARC's New AGI Test Stumps Every Frontier AI

François Chollet's ARC Prize Foundation just released ARC-AGI-3, the newest version of its interactive reasoning benchmark, where humans can solve 100% of tasks on the first try but AI models struggle, with top systems not even scoring 1%.

  • Labs spent millions training models on earlier versions of the test, pushing ARC-AGI-2 scores from 3% to around 50% in under a year
  • Agents face game-like scenarios with zero instructions, and must discover rules, form goals, and plan strategies entirely from scratch
  • Google's Gemini Pro scored the highest among frontier models at just 0.37%, followed by GPT 5.4 High (0.26%), Opus 4.6 (0.25%), and Grok-4.20 (0%)
  • A $1M prize backs the challenge, and cofounder Mike Knoop says frontier labs are paying far more attention to V3 than they did to earlier versions

Why it matters: It's always jarring to see the top models get reset below 1% on a new ARC-AGI release, but if the older tests are any indicator, even more surprising will be how quickly frontier labs climb the ladder. Whether that reflects genuine reasoning or just more expensive brute-forcing is exactly what Chollet built V3 to find out.

Reddit AI Bots

🤖Reddit's AI Bot Crackdown Skips the ID Check

Reddit CEO Steve Huffman outlined a plan to separate humans from bots across the site, including labeling automated accounts, flagging suspicious users for verification, and letting sub-communities self-police without mass ID checks.

  • Accounts running自动化在批准方式下运行的账户将带有 [App] 标签,可疑行为将导致人工验证
  • 为了确认人性证明,Reddit 将提供 passkeys 或 Sam Altman 的 World ID 扫描仪,政府身份证作为最后手段,仅在法律要求的地方使用
  • AI 撰写的内容不会被禁止,Huffman 称其「烦人」,但表示社区可以自己对 AI 生成的帖子制定规则
  • 竞争对手平台 Digg 最近在被机器人淹没后倒闭,Cloudflare 数据显示自动流量有望在 2027 年超过人类

Why it matters: The Dead Internet Theory was already here before the AI agent acceleration we've seen over the past six months. Now, it's a reality every social media site is dealing with. While this feels a bit like a band-aid, it is a small step towards every platform needing a serious human-first solution if it wants to remain usable to them.

Slack GIF Guide

🤯Create Branded Reaction GIFs for Slack

In this guide, you will learn how to make custom, branded reaction GIFs for your company's Slack using Higgsfield (an image and video generator). The trick is to generate the starting frame before you animate it.

  • Go to Higgsfield image gen, decide the GIF's look, and enter the reaction's visual style and text, like "ESPN themed reaction gif with words 'SLOW DOWN'"
  • If your brand is not recognizable, attach your logo or another brand reference image while generating the still
  • Generate a few stills and pick the best one, then click the camera's Animate button on that still so that it becomes the start frame in Higgsfield video
  • Then, set the clip length to 3 seconds, turn off its audio, and prompt: "Reaction GIF". Finally, download the MP4 and turn it into a GIF with any MP4-to-GIF site

Why it matters: If you make a whole batch of MP4s, ask Claude Code to convert them to GIFs in bulk on your desktop so you do not have to use a converter site one file at a time.

Google TurboQuant

💾Google Shrinks AI Memory with Zero Accuracy Loss

Google Research introduced TurboQuant, an algorithm that compresses AI model memory over 6x without any retraining — while delivering up to 8x speed gains on Nvidia H100 chips and losing almost zero accuracy.

  • AI models keep a running log of each conversation, and as chats get longer, that storage balloons, which slows responses and drives up costs
  • TurboQuant shrinks that storage by over 6x with zero accuracy loss, scoring perfectly on tests that bury a key detail in a large amount of text
  • On Nvidia's top server chips, it also sped up response processing up to 8x compared to standard methods, without adding any extra cost to run
  • The paper, set to be presented at ICLR 2026 in April, also topped rival methods in vector search — the tech search engines use to match similar results quickly

Why it matters: Despite being first published in April 2025, top AI memory companies felt the heat of the official release, with stocks dropping 3-5%. One compression paper won't crater memory demand overnight, but the selloff shows Wall Street is pricing in a world where smarter software cuts into the premium AI memory commands.