DeepSeek V4: 1.6T Model, 1M-Token Context and Huawei Ascend Support

DeepSeek V4 Debuts 1.6T and 284B Models with Million-Token Context and Aggressive Pricing as Huawei Moves to Run It

DeepSeek’s V4 launches with 1.6T and 284B variants, a one‑million token context window and aggressive pricing, while Huawei pledges Ascend chip support.

FACTUAL ACCURACY

Only include information explicitly supported by the source content.
Do not infer, assume, or generalize beyond the source.
Do not invent features, architecture, benchmarks, or integrations.
If a detail is uncertain or not clearly stated, omit it.

DeepSeek V4: a larger model and a sharper price point

DeepSeek has introduced a new generation of its large language model, branded V4, positioned as a higher-scale and lower-cost alternative to market-leading closed-source systems. The company released two V4 variants: V4-Pro, described by DeepSeek as a 1.6‑trillion-parameter model, and V4-Flash, a 284‑billion-parameter option. Both variants support an expanded context window of up to one million tokens, a significant step up from earlier DeepSeek releases. Company statements and reporting emphasize cost efficiency as a central part of V4’s positioning, with the aim of remaining competitive with top-tier proprietary models while reducing per-token inference expense.

Model sizes, context window, and stated capabilities

DeepSeek’s V4-Pro and V4-Flash differ sharply in parameter count and intended cost profile. The V4-Pro is the larger of the two at roughly 1.6 trillion parameters; the V4-Flash is listed at 284 billion parameters. Reported by multiple outlets, both models are manufactured to operate with context lengths up to one million tokens, which the company describes as a major increase compared with earlier versions of its models. Independent coverage cited by DeepSeek and related reporting also indicates V4-Pro matches leading models “in several areas” and that the V4 family brings improved agent-style capabilities for multi-step tasks—features the company highlighted when discussing the model’s performance profile.

Pricing that undercuts major rivals

Price was a headline element of DeepSeek’s announcement. The company’s published pricing places V4-Pro at approximately $3.48 per million output tokens, a level explicitly contrasted with the roughly $25–$30 per million-token charges reported for some competitors. The smaller V4-Flash is offered much more cheaply, with reported pricing dropping to as low as $0.28 per million output tokens. Observers noted that this pricing strategy could exert pressure on rivals already managing demand through higher fees and usage limits. DeepSeek framed cost-efficiency as a differentiator intended to make large-context, high-capability inference more affordable.

Huawei’s immediate hardware support and integration

Almost contemporaneous with DeepSeek’s V4 announcement, Huawei confirmed support for the new models and emphasized close coordination between its Ascend chip lineup and DeepSeek’s V4. Huawei stated that its Ascend SuperNode family had been “fully adapted” for model inference at launch, and company engineers described the adaptation as the result of close collaboration prior to V4’s release. Reporting also noted compatibility across several Ascend chip families, including the Ascend A2, A3, and the 950 series, and highlighted alignment with Huawei’s Compute Architecture for Neural Networks software platform. That alignment was presented as enabling day‑one inference capability on Ascend hardware for the V4 models.

Short-term throughput limits and expected hardware deliveries

DeepSeek cautioned that V4 might experience throughput constraints in the near term. The company indicated that these limitations could persist until the second half of the year, when Huawei’s Ascend 950PR SuperNode systems are expected to ship at scale and increase available inference capacity. The timing of those hardware deliveries is tied to broader deployment plans for the Ascend 950PR supernodes; until wider shipments arrive, DeepSeek acknowledged potential bottlenecks in handling peak inference demand.

Domestic chip compatibility and on‑ramps to local GPU adoption

Several analysts cited in reporting highlighted that DeepSeek’s release explicitly references compatibility with domestically produced chips. Financial and market observers pointed out that the announcement could accelerate adoption of local GPU and accelerator hardware in production AI stacks. The combination of a lower-cost model offering and explicit compatibility with domestic chip families was framed as a move that could encourage broader local hardware deployment in the near term.

Why the V4 launch matters to inference economics

DeepSeek’s V4 release reframes part of the competitive conversation around inference economics rather than purely raw capability and training scale. By combining larger context windows with reported per-token costs that undercut several established providers, DeepSeek has spotlighted how inference efficiency and pricing can shape access to high-capacity models. Industry commentary accompanying the rollout emphasized that, as demand for inference grows, efficiency in running models becomes as consequential as how those models were trained. In that view, the V4 strategy aims to make larger-context, multi-step agent workflows usable at much higher throughput for price-sensitive customers.

Who the announcement affects and how

The V4 announcement touches multiple constituencies in the AI ecosystem. Enterprises and service providers that prioritize long-context applications—legal, research, and other use cases that can exploit million-token windows—stand to be most directly affected by the availability of larger context lengths at lower quoted costs. Cloud and infrastructure providers will be watching reported throughput and hardware dependencies closely, because the short-term limitations DeepSeek acknowledged could affect service-level planning and capacity provisioning. Hardware vendors and systems integrators, especially those in regions with an interest in domestic chip ecosystems, will also be monitoring adoption signals now that Huawei has signaled immediate support for Ascend adaptations.

Technical coordination between model and accelerator

Huawei framed its Ascend SuperNode portfolio as having reached “day zero” adaptation for DeepSeek V4, a description that points to coordinated engineering work between the model developer and the chip vendor before the release. Reported compatibility spans multiple Ascend microarchitectures and extends to Huawei’s neural network compute stack, suggesting that the vendor-side software and system optimizations were prepared to accept V4 inference workloads from the outset. The practical result, as presented, is immediate ability to run V4 inference workloads on Ascend hardware lines—subject to the capacity caveats DeepSeek noted until larger shipments of Ascend 950PR nodes arrive.

Market positioning and competitive pressure

DeepSeek framed V4 as competitive with top closed-source systems while stressing a distinct price-performance posture. Public reporting contrasted DeepSeek’s lower per-token rates with those of several U.S.-based competitors, and analysts observed that the pricing could intensify pressure on providers who have responded to heightened demand by raising fees or throttling usage. DeepSeek’s combination of a larger context window, a sizable top-end parameter count for V4-Pro, and a sharply reduced cost profile positions the company to pursue customers who need extensive context at lower unit pricing.

Operational and adoption hurdles

DeepSeek’s release and subsequent reporting make clear that lower pricing and immediate hardware support do not erase operational challenges. The company itself flagged potential throughput shortfalls until the planned delivery of scaled Ascend 950PR supernodes in the second half of the year. That caveat places a practical limit on how quickly broad customer bases can migrate to V4 for high-throughput inference. Systems architects and operations teams that evaluate V4 will need to balance the model’s expanded context capabilities and per-token economics against short-term capacity constraints while planning rollout cadence and redundancy.

Analyst perspective on domestic supply chains

Market analysts referenced in coverage emphasized that explicit support for domestic chips was a notable feature of the V4 release. Their commentary suggested that compatibility with local hardware could lead to wider use of local GPUs and accelerators over the coming months. That framing aligns with broader industry discussions that tie AI deployment choices to hardware sovereignty, supply-chain considerations, and strategic ecosystem development.

Potential business use cases grounded in stated features

Reporting around V4 linked the model’s characteristics to practical enterprise needs without overstating specifics. Larger context windows and improved agent capabilities for multi-step tasks—attributes the company highlighted—map to real-world scenarios such as long-form document analysis, multi-stage reasoning pipelines, and workflows that must maintain extended state across interactions. DeepSeek’s lower-priced tier could make these scenarios more financially feasible for organizations that require high-volume inference but operate under constrained budgets.

What remains explicitly unstated

The company and reporting describe model sizes, context length, pricing, and hardware compatibility; they do not provide detailed benchmarking figures, exhaustive latency characteristics, or the specific configuration and tuning steps used to achieve Ascend adaptation. Where details are not clearly stated in source reporting—such as precise throughput metrics under production conditions or the exact timing and quantities of hardware shipments—this article does not extrapolate beyond the published statements. DeepSeek itself noted near-term throughput limitations and tied broader capacity improvements to anticipated Ascend 950PR SuperNode deliveries in the second half of the year.

Broader industry implications

The V4 rollout underscores a shifting set of priorities in large-model economics. As reported, the balance between raw model scale and cost-per-inference is moving to the forefront: a model that supports extremely long contexts and a lower per-token fee can change procurement calculations for businesses and cloud providers. The tight coupling between DeepSeek and Huawei in this announcement also illustrates an industry trend in which vendors increasingly coordinate software and hardware stacks to accelerate production readiness. That coordination may influence how competitors approach optimizations, partnerships, and pricing strategies going forward.

As the second half of the year approaches and Ascend 950PR SuperNode shipments are expected to scale, stakeholders across cloud, enterprise IT, and chip supply chains will be watching actual throughput improvements and adoption signals. If the stated pricing and context capabilities are realized at production scale, the practical consequence could be a wider adoption of long-context, agent-enabled workflows at price points that were previously uneconomic. Conversely, until hardware availability and real-world throughput are demonstrated at scale, adoption may proceed unevenly, with early deployments limited to tailored pilots and capacity-constrained environments.

DeepSeek’s V4 release, paired with Huawei’s immediate support, thus represents both a tactical product move—introducing two distinct model sizes with a shared million-token context capability—and a strategic nudge toward lower-cost inference anchored to domestic accelerator ecosystems. The near-term narrative will hinge on how quickly larger Ascend supernode capacity comes online and how effectively operators translate per-token pricing into predictable, production-ready inference services.

Looking ahead, observers should track the shipment cadence of the Ascend 950PR SuperNode systems and any further disclosures from DeepSeek about production throughput and real-world deployment case studies; those developments will determine whether the combination of larger context windows, lower per-token costs, and hardware alignment produces meaningful competitive shifts in cloud and enterprise AI markets.