DeepSeek V4 Paper logo

DeepSeek V4 Paper Guide

DeepSeek V4 Paper

DeepSeek V4 paper readers usually want the official PDF, the benchmark table, and the fastest verified path to the real model pages. This landing page compresses those into one search-friendly summary with direct official links back to DeepSeek.

CSA + HCA hybrid attention

mHC residual upgrade

Muon optimizer

Expert-first post-training

Highlights

What the DeepSeek V4 paper emphasizes at a glance.

The official DeepSeek-V4 introduction frames the release as a long-context MoE family with efficiency-focused attention work, updated optimization, and a post-training flow meant to consolidate domain specialists into a stronger general model.

Architecture

Hybrid attention for 1M context

The report describes a combined CSA and HCA attention stack. In the 1M-token setting, the official README says DeepSeek-V4-Pro uses 27% of the single-token inference FLOPs and 10% of the KV cache of DeepSeek-V3.2.

Stability

mHC strengthens signal propagation

Manifold-Constrained Hyper-Connections are presented as a stability upgrade over conventional residual paths while preserving model expressivity across deep layers.

Optimization

Muon is part of the training story

DeepSeek states that the Muon optimizer is used to improve convergence speed and training stability across the DeepSeek-V4 series.

Post-training

Experts first, consolidation second

The public overview describes a two-stage pipeline: SFT and RL with GRPO for domain-specific experts, followed by unified consolidation through on-policy distillation.

Benchmarks

A compact benchmark snapshot from the official model card.

These figures come from the public DeepSeek model card and are here to help searchers orient quickly before opening the full report PDF. They are official reported results, not third-party reproductions.

Base model

HumanEval

76.8 pass@1

DeepSeek-V4-Pro-Base is listed above DeepSeek-V3.2-Base and DeepSeek-V4-Flash-Base on HumanEval.

Long context

LongBench-V2

51.5 EM

The official base-model table places DeepSeek-V4-Pro-Base at 51.5 on LongBench-V2.

Reasoning mode

LiveCodeBench

93.5 pass@1

DeepSeek-V4-Pro Max is shown at 93.5 pass@1 in the official frontier-model comparison.

Agentic

SWE Verified

80.6 resolved

The official comparison table lists DeepSeek-V4-Pro Max at 80.6 on SWE Verified.

Model Total Params Activated Context
V4-Flash-Base 284B 13B 1M
V4-Flash 284B 13B 1M
V4-Pro-Base 1.6T 49B 1M
V4-Pro 1.6T 49B 1M

Resources

The fastest official links to open next.

If you searched for the DeepSeek V4 paper PDF, the README, or the model downloads, start here. Every outbound link below points to an official DeepSeek-controlled destination or its official model-hosting page.

FAQ

Short answers for common search intent around the paper.

What is the official DeepSeek V4 paper title?

The public report is titled DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence.

Where is the DeepSeek V4 PDF hosted?

The official PDF is hosted on the DeepSeek-V4-Pro Hugging Face repository, alongside the public model card and release notes.

Does the official preview mention two model families?

Yes. The official introduction presents DeepSeek-V4-Pro and DeepSeek-V4-Flash as the two main families in the preview release.

What is the fastest way to verify the 1M-token claim?

Open the official README or the technical report. The introduction explicitly states that both model families support one million tokens of context.

Is this page the official DeepSeek documentation?

No. This is an independent editorial landing page meant to reduce search friction and route readers back to the official sources.