Design Principles
Through our Vertiv User Experience process, which integrates in-depth customer interviews, extensive technical proofs-of-concept, and technical collaboration with leading technology partners, Vertiv has curated a collection of key AI infrastructure imperatives that current and future designers, developers, and operators of AI factories share.
Be Transformative
Companies are leveraging AI to transform products, services, and customer engagement. This shift demands overhauls in operating models and infrastructure, facing these critical challenges head-on.
Efficient capital deployment is crucial for cost competitiveness while balancing AI's processing power and environmental impact is essential for addressing these challenges known as the "AI efficiency paradox"
Be Efficient
To make quick progress in innovating products and customer experiences, it's crucial to address the separate management of power and cooling systems.. Overcoming these challenges can lead to a first mover advantage.
Be First
A meticulous plan is necessary to differentiate between calculated risks and reckless decisions. Understand these challenges to approach infrastructure innovation with confidence.
Be Confident
The explosive growth of AI and high-performance computing demands will require data centers to handle rack densities exceeding 100kW. Are you ready? Prepare your data center for a high-powered future.
Be Future-ready
Retrofitting existing data center infrastructure in a transformative way. Accommodating rack power densities >100kW and hardware >5,000lbs. Deploying liquid and hybrid air-liquid cooling. Understanding liquid distribution is as critical as
power distribution. Ensuring power availability and intelligent grid interaction.
See challenges to risk management
AI Imperatives
Design both power and cooling together to optimize AI infrastructure.
Power is at a premium. Eliminate stranded power by aligning AI clusters to data center capacity blocks.
Handle AI workload surges through system-level controls including power and cooling buffers.
Balance cost, redundancy,
and risk in AI design.
Design for a mix of liquid and air cooling.
Design for the future.
Power into a data center is segmented into capacity blocks, commonly 1-3 MW and determined by
industry-standard sizing of breakers or generators.
AI is deployed in clusters, soon to be common
at 100+ kW / rack, and upward from there.
Aligning clusters to capacity blocks can ensure that every available kW can be utilized.
Power, cooling, and AI hardware compete for limited space and energy.
A wholistic power and cooling design approach is required to maximize the share of space and energy dedicated to AI processing.
Consider the total cost of ownership, redundancy, and blast radius in AI power and cooling designs and the tradeoffs among them.
The value of AI hardware, at $1-4M+ per rack, and the processing it supports are driving increased consideration of redundancy in power and cooling designs, especially for inference applications.
Designs that limit the blast radius or the impact from the loss of a single capacity segment (server, rack, row) tend to use higher counts of smaller components, potentially at the expense of the total cost of ownership.
Designs that favor total cost of ownership tend to use fewer larger components, often with redundancy, to reduce the possibility of a capacity segment's loss.
AI' training' tends to drive large numbers of processors to act in unison, creating massive power consumption surges that can repeat and degrade the performance and lifespan of power and cooling infrastructure.
Mitigation designs include system-level controls with rapid response and immediately accessible buffers in power and cooling capacity.
The combination of liquid and air cooling has an interdependent impact on the ability to remove heat.
Power into the data center equals the heat rejected.
Air and liquid cooling temperatures and flows must stay within the operating envelope of the AI servers and the data center heat rejection equipment.
AI power density will rapidly increase to 500kW per rack. The typical lifespan of a data center is almost two decades, while the AI chip design cycle is less than two years.
i
See infrastructure efficiency challenges
i
See challenges for future-readiness
i
See technical challenges to being first
i
See challenges to being transformative
i
Critical Infrastructure challenges:
Designing power and cooling paths independently and not as a single system. Assessing multi-vendor qualifications, dependencies, and technical specifications. Adopting new technologies and market entrants that lack scalability and global supply chain. Delivering significantly more field installation work that is costly, time-consuming, and non-repeatable.
Critical Infrastructure challenges:
Leveraging existing infrastructure investments with robust technical design experience. Addressing constraints in delivering operational efficiency with densities accelerating >33x. Avoiding over-provisioning and stranded capacity when considering fault tolerance. Blending new and existing technologies without common language and controls. Deploying and maintaining AI factories anywhere
in the world.
Critical Infrastructure challenges:
Combining new and existing technologies by deeply understanding what is technically possible. Getting the most from existing infrastructure investments when retrofitting for AI. Planning today's AI factory with future transformations in mind as densities continues to accelerate. Maintaining a robust service and maintenance network with experience, footprint, and trusted performance. Designing to use cases with different risk strategies in mind, i.e., Training vs. Inference
Critical Infrastructure challenges:
Future-proofing the infrastructure investment, mitigating costly future upgrades. Designing for power and cooling scalability that can leap, not just grow. Tackling sustainability challenges today and exponentially grow in scope in the future. Partnering with technically and financially viable players that can support into the future. Collaborating with players investing in ER&D and are closely aligned with technology leaders.
Critical Infrastructure challenges: