三則焦點：Grok 4.1、幣市下滑、AI 幻覺排行榜

#AGI
xAI 推出 Grok 4.1，聲稱幻覺率比舊版本低三倍，並在 LMArena Text Arena 奪得冠軍寶座！新版特別針對情緒智能與創意表達進行強化，Elon Musk 更希望藉此助 Apple 強化 Siri 功能。Grok 4.1 目前已可免費使用，對 ChatGPT 和 Gemini 構成正面挑戰。

**Why it matters**: Grok 4.1’s improvements show xAI is making serious strides in model safety, usability, and competition with OpenAI and Google.
**The big picture**: With top scores in LLM benchmarking, Grok 4.1 strengthens the case for AI model diversity beyond just the major incumbents.

Full article https://x.com/arena/status/1724878772175970674
📌 一杯咖啡價錢連接 Web3 世界 https://patreon.com/wanszezit

—

#WEB3
CoinGecko 指出，自 10 月初起，全球逾 18,000 種加密貨幣總市值已跌 25%，蒸發約 1.2 兆美元，比特幣更跌至 $89,500，創 4 月以來新低。市場情緒轉弱，加上 AI 熱潮退燒，令數碼資產連帶受挫。

**Why it matters**: Shows how interconnected markets are—tech sector jitters now ripple into crypto.
**The big picture**: Despite ETF approval hopes, crypto remains vulnerable to macroeconomic and liquidity shocks.

Full article https://www.ft.com/content/6b17b60c-crypto-market
📌 一杯咖啡價錢連接 Web3 世界 https://patreon.com/wanszezit

—

#AGI
AA-Omniscience 全新基準測試顯示：目前市面大部分 AI 模型更傾向於幻覺錯答而非答對。Claude 4.1 Opus 成為最準確模型，僅 48% 幻覺率，Grok-4 也首次躋身前三名。新評估方式更反映真實世界應用表現，尤其是嵌入知識的準確度。

**Why it matters**: Highlights the need for better real-world evaluation methods of AI trustworthiness.
**The big picture**: Claude and Grok now stand out not just on capability but on reliability—key to safe AGI development.

Full article https://huggingface.co/papers/aa-omniscience
📌 一杯咖啡價錢連接 Web3 世界 https://patreon.com/wanszezit