Penguin Solutions Releases ICE ClusterWare Management Software 13.0 for Optimizing AI Infrastructure
Penguin Solutions, Inc. ("Penguin Solutions") (Nasdaq: PENG), a leading provider of high-performance computing and AI infrastructure solutions, today announced the release of ICE ClusterWare software 13.0. This latest version introduces powerful new capabilities that solve two critical challenges in production-scale AI and HPC: sustaining peak cluster performance and secure provisioning of a single cluster to diverse user groups. These new features enable organizations to maximize return on their AI infrastructure investments by safely sharing resources across more users while ensuring consistent, reliable performance.
This press release features multimedia. View the full release here: https://www.businesswire.com/news/home/20251117135297/en/
Penguin Solutions releases ICE ClusterWare management software 13.0 with powerful new capabilities that solve two critical challenges in production-scale AI and HPC: sustaining peak cluster performance and secure provisioning of a single cluster to diverse user groups.
When an organization’s AI deployments progress from isolated pilot projects to enterprise-wide production environments, operational demands on infrastructure intensify immediately. Penguin’s ICE ClusterWare 13.0 addresses this with built-in anomaly detection and auto-remediation, along with network-isolated multi-tenancy—delivering the operational excellence required to support AI as a core business function.
“With the launch of our ICE ClusterWare software 13.0, we’re delivering pivotal advancements to help organizations manage the growing complexity of modern AI and HPC environments,” said Sharri Parsell, vice president software engineering for Penguin Solutions. “As AI continues to evolve from experimental pilots to enterprise-scale deployments, organizations need robust, intelligent infrastructure that drives operational excellence and enables AI success across the enterprise.”
The patent-pending anomaly detection and auto-remediation technology ensures peak cluster performance and resource availability, continuously monitoring for hidden performance degradation that traditional diagnostic tools miss. Upon detection, the system automatically isolates underperforming nodes and initiates remediation in real time, ensuring that workloads are scheduled on validated, high performing nodes. This proactive approach reduces administrative burdens, prevents unplanned downtime, and maximizes the cluster’s usable capacity. As a result, this new capability significantly shortens model training by reducing restarts and loss of work.

