OpenAI Debuts GPT-5.3-Codex-Spark on Cerebras Chips, Surpassing Previous Speeds

On Thursday, OpenAI introduced GPT-5.3-Codex-Spark, its first production AI model to run on non-Nvidia hardware. The model was deployed on chips provided by Cerebras and is specifically designed for coding tasks, demonstrating an impressive speed of more than 1,000 tokens (data chunks) per second. This performance is reported to be approximately 15 times faster than that of its precursor.

In comparison, Anthropic’s Claude Opus 4.6, in its newly introduced premium fast mode, is about 2.5 times quicker than its standard speed of 68.2 tokens per second, despite being a more advanced and larger model than Codex-Spark.

"Cerebras has been a great engineering partner, and we’re excited about adding fast inference as a new platform capability," commented Sachin Katti, OpenAI's head of compute, in a statement.

Codex-Spark is currently a research preview available to ChatGPT Pro subscribers at a cost of $200 per month through the Codex app, command-line interface, and VS Code extension. Furthermore, OpenAI is rolling out API access to selected design partners. The model is launched with a 128,000-token context window and is limited to text-only processing at present.

This release follows OpenAI’s earlier launch of the complete GPT-5.3-Codex model earlier this month, which excels at complex coding tasks. Conversely, Spark is fine-tuned for speed rather than depth of knowledge, making it ideal for coding by operating as a text-only model, unlike the general-purpose tasks its larger sibling performs.

Spark reportedly surpasses the older GPT-5.1-Codex-mini in software engineering evaluations on SWE-Bench Pro and Terminal-Bench 2.0, completing tasks in significantly less time, as reported by OpenAI. Nonetheless, these performance metrics have not been independently validated by external sources.

Comparatively, past tests revealed Codex's slower performance; when evaluated by Ars in December, it took double the time of Anthropic’s Claude Code to complete a Minesweeper game creation.

In the competitive landscape of coding agents, GPT-5.3-Codex-Spark's achievement of 1,000 tokens per second marks a substantial advancement over any models OpenAI has previously deployed using its infrastructure. According to independent benchmarks by Artificial Analysis, OpenAI's fastest Nvidia-based models did not reach such speeds: GPT-4o hit about 147 tokens per second, o3-mini about 167, and GPT-4o mini approximately 52 tokens per second.