AI Models
Large language models, architectures, training approaches, and model comparisons
Gemma 4 Model Family Overview
Gemma 4 is Google DeepMind's April 2026 open model family with four variants (E2B, E4B, 26B MoE, 31B Dense), now Apache 2.0 licensed, spanning edge devices to top-tier reasoning.
Gemma 4 Unique Technical Features
Gemma 4's key innovations include configurable image token budgets, native bounding box detection, Per-Layer Embeddings, shared KV cache, and native function calling.
Gemma 4 Benchmarks and Performance
Gemma 4 31B ranks #3 among open models on Arena AI. The 26B MoE achieves #6 with only 4B active params. OpenRouter pricing is $0.14/$0.40 per million tokens for the 31B.
Gemma 4 Local Setup and Deployment
Gemma 4 runs locally via Ollama, llama.cpp, LM Studio, vLLM, and others. Use Q8 quant for quality. Chat template differs from Gemma 3. Architecture uses dual RoPE and sliding/global attention.
Gemma 4 Multimodal Capabilities and Limitations
Audio/video support is E2B/E4B only, with 30s audio and 60s video limits. Larger models handle text and image only. Multimodal content should precede text in prompts.