Frontier lab releases, open-source checkpoints, multimodal systems, inference stacks, and model capability shifts.
Frontier Fortnight: Gemini 3 Pro, GPT‑5.1‑Codex‑Max, and Claude Opus 4.5
OpenIrregular reports a recent wave of frontier releases: Google DeepMind’s Gemini 3 Pro, OpenAI’s GPT‑5.1‑Codex‑Max agentic coding model, and Anthropic’s Claude Opus 4.5, all showing major gains on software engineering and vulnerability discovery benchmarks.[1] Anthropic’s Claude Opus 4.5 system card uses Irregular’s SOLVE framework to score exploit‑development capabilities, highlighting systematically measured security‑relevant skills.[1]
Perplexity Pro Search Stack Adds GPT‑5.2, Claude 4.6, Gemini 3.1 Pro, and Nemotron 3 Super 120B
OpenPerplexity documents its current production inference stack, listing OpenAI’s GPT‑5.2, Anthropic’s Claude Sonnet 4.6 (including a “thinking” variant), Google’s Gemini 3.1 Pro, and NVIDIA’s Nemotron 3 Super 120B among its advanced search models.[2] These models are wired into a real‑time search and reasoning pipeline, with toggles for advanced logical processing and multimodal understanding.[2]
Llama 4 Mixture‑of‑Experts and Small 4 Apache‑2.0 Open‑Source Upgrade
OpenJohn C. Derrick’s model ranking notes that Meta’s Llama 4 has moved to a mixture‑of‑experts architecture with native multimodality, and highlights “Small 4,” a 119B MoE model that unifies reasoning, multimodal, and coding under Apache‑2.0 and runs roughly 40% faster than Small 3.[5] The same analysis positions Llama 4’s Maverick variant as beating GPT‑4o on many benchmarks while Scout fits on a single H100 GPU, emphasizing practical deployment profiles.[5]