Frontier lab releases, open-source checkpoints, multimodal systems, inference stacks, and model capability shifts.
Irregular: Gemini 3 Pro, GPT-5.1-Codex-Max, and Claude Opus 4.5 mark a new frontier turn
OpenIrregular’s “Frontier Fortnight” outlines a recent wave of frontier releases: Google DeepMind’s **Gemini 3 Pro**, OpenAI’s **GPT-5.1-Codex-Max** agentic coding model, and Anthropic’s **Claude Opus 4.5**.[1] The piece emphasizes major jumps in benchmarks, software engineering workflows, and cybersecurity-relevant capabilities, noting Anthropic’s use of the SOLVE framework to score vulnerability discovery and exploit development.[1]
John C. Derrick: Llama 4 MoE and Small 4 push open-source multimodal and coding performance
OpenJohn C. Derrick’s frontier ranking notes **Llama 4** has gone mixture-of-experts and natively multimodal, with Scout fitting on a single H100 and Maverick reported as beating GPT‑4o on many benchmarks.[6] He also highlights **Small 4**, a 119B MoE model unifying reasoning, multimodal, and coding under Apache 2.0 and running about 40% faster than Small 3, positioning it as a practical open-weight option for advanced workloads.[6]
YouTube overview: GLM 5.1 and Meta’s Muse/Spark highlight non-US competition in coding and multimodal reasoning
OpenA recent YouTube deep dive on “All of AI’s New Models and Tools” calls out **GLM 5.1** as an open-source model that overtakes leading Western models on coding benchmarks such as SweetBench Pro, with a 58.4 score beating GPT‑5.4 and Anthropic Opus 4.6.[3] The same video details Meta’s **Muse/Spark** multimodal models with tool use, visual chain-of-thought, and multi-agent orchestration, reporting Muse as state-of-the-art on visual reasoning benchmarks like CharViC.[3]