Two new open-weight AI reasoning models from OpenAI are expected to bring cutting-edge AI development directly into the hands of developers, enthusiasts, enterprises, startups and governments everywhere. Available under the Apache 2.0 licence, these models outperform similarly-sized open models on reasoning tasks, demonstrate strong tool use capabilities, and are optimised for efficient deployment on consumer hardware, OpenAI said.
OpenAI shared that the gpt-oss-120b model achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running on a single 80 GB GPU. The gpt-oss-20b model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with 16 GB of memory, making it ideal for on-device use cases, local inference, or rapid iteration without costly infrastructure.
NVIDIA’s collaboration with OpenAI on these open models — gpt-oss-120b and gpt-oss-20b — is a testament to the power of community-driven innovation and highlights NVIDIA’s foundational role in making AI accessible worldwide, NVIDIA said.
OpenAI’s new large language models (LLMs) were trained on NVIDIA H100 GPUs and run inference best on the hundreds of millions of GPUs running the NVIDIA CUDA platform across the globe. The models are now available as NVIDIA NIM microservices, offering easy deployment on any GPU-accelerated infrastructure with flexibility, data privacy and enterprise-grade security. The new models enable agentic AI applications such as web search, in-depth research and more, NVIDIA said.
With software optimisations for the NVIDIA Blackwell platform, the models offer optimal inference on NVIDIA GB200 NVL72 systems, achieving 1.5 million tokens per second. AI enthusiasts and developers can use the optimised models on NVIDIA RTX AI PCs and workstations through popular tools and frameworks like Ollama, llama.cpp and Microsoft AI Foundry Local, and expect performance of up to 256 tokens per second on the NVIDIA GeForce RTX 5090 GPU.
“OpenAI showed the world what could be built on NVIDIA AI — and now they’re advancing innovation in open-source software,” said Jensen Huang, founder and CEO of NVIDIA.
As advanced reasoning models like gpt-oss generate exponentially more tokens, the demand on compute infrastructure increases dramatically. Meeting this demand calls for purpose-built AI factories powered by NVIDIA Blackwell, an architecture designed to deliver the scale, efficiency and return on investment required to run inference at the highest level.
NVIDIA Blackwell enables ultra-efficient, high-accuracy inference while significantly reducing power and memory requirements. This makes it possible to deploy trillion-parameter LLMs in real time.
Developers can build with their framework of choice. Demonstrating their commitment to open-sourcing software, OpenAI and NVIDIA have collaborated with top open framework providers to provide model optimisations for FlashInfer, Hugging Face, llama.cpp, Ollama and vLLM, in addition to NVIDIA Tensor-RT LLM and other libraries.
![]() | ||
Source: NVIDIA blog post. Overall performance of the gpt-oss-20b model on various RTX AI PCs. |
An NVIDIA blog post suggests testing these models on RTX AI PCs, on GPUs with at least 24 GB of VRAM, with the new Ollama app, as well as trying out the gpt-oss models on RTX AI PCs through various other applications and frameworks, all powered by RTX, on GPUs that have at least 16 GB of VRAM.
Windows developers can also access OpenAI’s new models via Microsoft AI Foundry Local, currently in public preview. Foundry Local is an on-device AI inferencing solution that integrates into workflows via the command line, software development kit (SDK) or application programming interfaces.
NVIDIA’s collaboration with OpenAI began in 2016 when Huang hand-delivered the first NVIDIA DGX-1 AI supercomputer to OpenAI’s headquarters in the US. Since then, the companies have been working together to push the boundaries of what’s possible with AI, providing the core technologies and expertise needed for massive-scale training runs.
And by optimising OpenAI’s gpt-oss models for NVIDIA Blackwell and RTX GPUs, along with NVIDIA’s extensive software stack, NVIDIA is enabling faster, more cost-effective AI advancements for its 6.5 million developers across 250 countries using 900+ NVIDIA software development kits and AI models — and counting.
No comments:
Post a Comment