三則焦點:Grok 4.1、幣市下滑、AI 幻覺排行榜

#AGI
xAI 推出 Grok 4.1,聲稱幻覺率比舊版本低三倍,並在 LMArena Text Arena 奪得冠軍寶座!新版特別針對情緒智能與創意表達進行強化,Elon Musk 更希望藉此助 Apple 強化 Siri 功能。Grok 4.1 目前已可免費使用,對 ChatGPT 和 Gemini 構成正面挑戰。

**Why it matters**: Grok 4.1’s improvements show xAI is making serious strides in model safety, usability, and competition with OpenAI and Google.
**The big picture**: With top scores in LLM benchmarking, Grok 4.1 strengthens the case for AI model diversity beyond just the major incumbents.

Full article https://x.com/arena/status/1724878772175970674
📌 一杯咖啡價錢連接 Web3 世界 https://patreon.com/wanszezit

#WEB3
CoinGecko 指出,自 10 月初起,全球逾 18,000 種加密貨幣總市值已跌 25%,蒸發約 1.2 兆美元,比特幣更跌至 $89,500,創 4 月以來新低。市場情緒轉弱,加上 AI 熱潮退燒,令數碼資產連帶受挫。

**Why it matters**: Shows how interconnected markets are—tech sector jitters now ripple into crypto.
**The big picture**: Despite ETF approval hopes, crypto remains vulnerable to macroeconomic and liquidity shocks.

Full article https://www.ft.com/content/6b17b60c-crypto-market
📌 一杯咖啡價錢連接 Web3 世界 https://patreon.com/wanszezit

#AGI
AA-Omniscience 全新基準測試顯示:目前市面大部分 AI 模型更傾向於幻覺錯答而非答對。Claude 4.1 Opus 成為最準確模型,僅 48% 幻覺率,Grok-4 也首次躋身前三名。新評估方式更反映真實世界應用表現,尤其是嵌入知識的準確度。

**Why it matters**: Highlights the need for better real-world evaluation methods of AI trustworthiness.
**The big picture**: Claude and Grok now stand out not just on capability but on reliability—key to safe AGI development.

Full article https://huggingface.co/papers/aa-omniscience
📌 一杯咖啡價錢連接 Web3 世界 https://patreon.com/wanszezit

Comments are closed.

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑