Red Hat teams with AWS to boost AI inference on Inferentia and Trainium

Red Hat logo

Red Hat said on the 3rd it will expand support for enterprise Generative AI by combining Red Hat AI with AWS AI silicon on the Amazon Web Services (AWS) cloud.

The collaboration between Red Hat and AWS combines Red Hat's platform capabilities, AWS cloud infrastructure, and AI chipsets to help implement Generative AI strategies. The Red Hat AI Inference Server, based on vLLM, runs on AWS Inferentia2 and Trainium3, providing a common inference layer for a wide range of Generative AI models regardless of generation. The company said this enables customers to achieve up to 30%–40% better price performance than existing GPU (graphics processing unit)-based Amazon EC2.

The two companies are also developing the AWS Neuron Operator, which applies to AWS Red Hat OpenShift Service, Red Hat OpenShift, and Red Hat OpenShift AI. The AWS Neuron Community Operator is currently available to Red Hat OpenShift and AWS Red Hat OpenShift Service users. The Red Hat AI Inference Server supporting AWS AI chips is scheduled to be offered as a developer preview in Jan. 2026.

Joe Fernandez, vice president of Red Hat's AI business unit, said, "By implementing the Red Hat AI Inference Server with AWS AI chips, we are helping organizations scale their AI workloads based on efficiency and flexibility."

Colin Brace, vice president of AWS Annapurna Labs, said, "Trainium and Inferentia are designed to deliver high-performance AI inference and training in a cost-efficient way," and added, "This collaboration lays the foundation for customers to rapidly scale Generative AI into production."

※ This article has been translated by AI. Share your feedback here.