Nvidia Reportedly Working on New AI Chip to Boost Inference Speed

Nvidia is reportedly working on a new chip designed specifically to speed up AI processing, as the company looks to strengthen its grip on the rapidly evolving artificial intelligence market. The development was first detailed by the Wall Street Journal and later echoed by other outlets, pointing to a growing shift in focus from AI training toward real-time inference.

According to the report, the upcoming processor from Nvidia is built to handle AI inference workloads more efficiently, the stage where trained models generate responses for users. This is increasingly becoming the bigger commercial opportunity as chatbots, copilots, and AI assistants scale to millions of users.

A key part of the strategy involves technology from startup Groq. Nvidia is said to be licensing Groq’s language-processing architecture to improve how the chip handles the critical “prefill” and “decode” steps used in generative AI systems. The agreement is reportedly worth around $20 billion, underscoring how serious Nvidia is about optimizing inference performance.

The new chip is expected to be unveiled at Nvidia’s annual GTC developer conference in San Jose, where the company typically introduces major AI and data-center innovations. While full specifications remain under wraps, the processor is believed to focus heavily on lower latency and better power efficiency, two areas that matter far more for inference than for model training.

Pressure in the AI hardware space is clearly building. Major customers including OpenAI have been exploring alternative silicon options to reduce costs and improve performance at scale. At the same time, tech giants like Google and Amazon continue investing heavily in their own custom AI chips, raising the competitive stakes.

Nvidia still dominates the AI training market with its GPUs, but inference is widely expected to become the larger long-term revenue driver as AI moves into everyday products and services. By building a dedicated inference-focused processor and bringing Groq’s technology into the fold Nvidia appears to be positioning itself for that next phase.

More concrete details, including performance claims and availability, should surface once the company officially takes the stage at GTC.

Source