How ChatGPT Is Transforming the Data Center
This article examines gen AI’s impact on data center design and operations and explores the changes necessary to support its continued development.
Generative artificial intelligence (gen AI) is reshaping the technology landscape, but its impact extends beyond the user interface. Behind the scenes, gen AI is driving a fundamental transformation of data center infrastructure. According to the Equinix 2023 Global Tech Trends Survey, 42 percent of IT leaders doubt their infrastructure can accommodate growing gen AI adoption, while 41 percent question their team’s ability to implement the technology.[1]
Evolving Intelligence: How Generative AI Changes the Rules
Gen AI represents a significant evolution from traditional machine learning approaches, introducing new capabilities and challenges and reshaping the landscape of AI and data center operations.
Machine learning, the dominant AI paradigm for decades, excels at pattern recognition and prediction based on historical data. It is highly effective for tasks like classification, regression, and clustering. However, machine learning models are typically constrained by their training data and predefined rules, limiting their ability to create novel outputs.
Gen AI goes beyond recognizing patterns to creating new, original content. This fundamental shift in capability introduces several key differences:
- Creative output: Gen AI is a subset of machine learning focused on producing new text, images, or other content that mimics human creativity. In contrast, traditional machine learning models focus on making predictions or classifications. While these older models can be used for human-like decision-making, they do not offer creative outputs.
- Contextual understanding: Gen AI typically uses large models trained on huge datasets, which can lead to a nuanced understanding of context and the ability to handle complex, open-ended tasks. This contrasts with machine learning models, which are often more specialized and limited in scope.
- Data requirements: Gen AI typically requires vastly larger datasets for training than traditional machine learning models. This increased data demand significantly affects data center storage and processing capabilities.
- Model complexity: Generative models, such as those used in GPT-4 or DALL-E, are often orders of magnitude more complex than traditional machine learning models. This complexity translates to greater computational demands and more sophisticated hardware requirements.
- Training and inference processes: The training process for gen AI models is typically more resource-intensive and time-consuming than traditional machine learning models. Additionally, inference (i.e., generating outputs) with gen AI can be more computationally demanding, especially for real-time applications.
- Adaptability: While both machine learning and gen AI can be fine-tuned, gen AI models often demonstrate greater flexibility in adapting to new tasks or domains without extensive retraining.
Table 1 summarizes some critical distinctions between machine learning and gen AI:
Table 1. Standard AI vs. generative AI
Feature | Machine Learning | Generative AI |
---|---|---|
Primary Function | Train on well-defined data sets to make predictions or decisions | Train on large, often unstructured data sets to create new content |
Training Techniques | Supervised learning, unsupervised learning, and reinforcement learning | Neural networks like generative adversarial networks (GANs) |
Example Applications | Spam detection, fraud detection, and predictive maintenance | Chatbots, image generation, music composition tools |
These differences have profound implications for data center design and operation. The massive computational requirements, the need for specialized hardware, and the demands of processing and storing vast amounts of data all present unique challenges when supporting gen AI workloads. As we explore the transformations needed in data center infrastructure, we must keep these fundamental differences in mind, understanding that solutions optimized for traditional machine learning may not be sufficient for the demands of generative AI.
Why Can’t Current Data Centers Support Gen AI?
Gen AI’s resource requirements significantly exceed traditional AI's, necessitating changes in several areas of data center operations. These include increased computational capacity, specialized architecture, and new power and network optimization approaches. The scale of this challenge is reflected in projections that gen AI server infrastructure and operating costs will exceed US$76 billion by 2028.[2]
Additionally, gen AI presents unique data governance challenges. Many popular models have been trained on web-scraped data, raising concerns about privacy and copyright law.[3] There’s also a risk of using sensitive or proprietary information as training data, creating potential legal and regulatory issues.
Rethinking Hardware Architectures
Gen AI requirements are so fundamentally different from those of traditional workloads that they demand a complete reimagining of data processing hardware architectures. The sheer scale of computation, the complexity of operations, and the vast amounts of data involved push current data center designs to their limits and beyond. This isn’t a matter of simply scaling up existing solutions; it requires innovative approaches that challenge long-standing assumptions about how data centers should be built and operated.
To meet the demands of gen AI, data centers must evolve in several key areas.
HPC and GPUs
High-performance computing (HPC) is essential for running generative AI applications. HPC architecture leverages multiple compute nodes, allowing for parallel processing of complex operations. Graphics processing units (GPUs), with their inherent parallel processing capabilities, are thus well-suited for HPC systems and the computational demands of gen AI.[4]
Consider GPT-3, a previous iteration of the large language model (LLM) powering ChatGPT. With 175 billion parameters, a distributed parallel computing system with at least 2,048 GPUs was required to run at minimum latency.[5] Industry speculation suggests that GPT-3’s successor, GPT-4, contains about 1.8 trillion parameters, further speaking to the massive computational requirements of advanced gen AI models.[6]
It's worth noting that computational speed can be just as crucial as computational capacity. Consider applications that generate visual content within a virtual reality setting. Here, a framerate of 90fps is required to reduce dizziness. Thus, computational resources must be powerful enough to generate content in a ninetieth of a second.[7]
However, the increasing demand for GPUs across various sectors, including crypto mining, may lead to supply challenges. Data center designers might struggle to obtain sufficient GPUs to meet their needs.
Leaving Behind Outdated Architecture
The rise of generative AI is pushing traditional data center architecture to its limits. CPU-centric technologies used in server farms have reached a point of diminishing returns, necessitating a shift toward heterogeneous architecture that decouples computing, memory, and storage resources.[8]
Field-programmable gate arrays (FPGAs) offer one alternative to fixed hardware structures. Unconstrained by bus width, FPGAs provide lower latency and hardware-level parallelism, making them up to a hundred times faster in specific data-centric analytics applications like fuzzy search.[9]
Data processing units (DPUs) play a crucial role in heterogeneous architecture. With their specialized low-power cores, coprocessors, and high-speed interfaces, DPUs can handle encryption, data compression, and quality-of-service (QoS) management tasks. This offloading frees up CPUs and GPUs for bandwidth-intensive and billable workloads, potentially lowering a data center’s total cost of ownership by reducing power utilization.[10]
Neural processing units (NPUs) are specialized processors designed to accelerate AI and machine learning workloads. They excel at tasks such as image recognition and natural language processing, further enhancing efficiency in gen AI workloads.[11]
Reducing Energy Usage
Gen AI’s computational demands translate into significant energy requirements. On average, a single ChatGPT query consumes ten times more energy than a standard Google search. Consequently, data center power demands are projected to grow 160 percent by 2030 due to gen AI.[12]
To address this challenge, data centers can implement several strategies:
- Leverage specialized chip-to-chip communication protocols to optimize data transfer between integrated circuits.[13] For example, NVIDIA’s direct chip-to-chip protocol allows for high-speed interconnects, optimizing data transfer between ICs.[14]
- Replace traditional hard disk drives (HDDs) with more energy-efficient solid-state drives (SSDs). For instance, Samsung’s enterprise SSDs consume only 1.25W of power in active mode, compared to 6W for a 15,000rpm SAS HDD.[15]
- Implement advanced cooling technologies such as direct-to-chip cooling and liquid immersion cooling. Direct-to-chip cooling circulates cool liquid through a plate in direct contact with heat sources. In contrast, liquid immersion cooling submerges IT hardware in a dielectric fluid with high thermal conductivity for more effective heat dissipation.
- Use AI itself to optimize energy utilization, particularly in cooling systems. Google’s DeepMind AI, for example, reduced cooling costs by 40 percent.[16]
Enabling Better Network Optimization
Networking infrastructure must evolve alongside computational resources to support gen AI. Data centers must implement high-capacity networking solutions capable of supporting higher data rates and greater complexity while controlling costs.[17]
Potential solutions include:
- Migrating to optical interconnects for higher bandwidth and better power efficiency.[18]
- Deploying very large GPU clusters with optimized interconnects like Elastic Fabric Adapter from Amazon Web Services.[19]
These advancements in networking are crucial for supporting the massive data transfer requirements of gen AI systems, enabling faster training and inference processes.
Improving Data Privacy
Traditional data anonymization methods like masking, aggregation, and pseudonymization are insufficient for gen AI workloads, as they often reduce data utility. To maintain data usability without compromising sensitive information, data centers need to explore AI-driven anonymization techniques such as differential privacy and synthetic data generation.
New training methodologies are also emerging to address privacy concerns:
- Federated learning shares model parameters among clients rather than data, allowing algorithms to be trained across multiple devices or servers without data transfer. This approach ensures data privacy and enables a democratized learning framework ideal for deployment on smartphones, IoT networks, and edge devices.[20]
- Split learning has each client partially train a model before passing updates to a central server, consolidating these updates into a final output. This method provides a balance between data privacy and model performance.[21]
These approaches address privacy concerns and offer potential solutions for reducing data consumption and enabling more distributed AI training paradigms.
The Future of Generative AI Is at the Edge
While redesigning data centers is crucial for supporting gen AI, edge computing represents the next frontier. By processing data at its point of origin, edge computing addresses bandwidth and privacy concerns while reducing data center workloads. This is particularly relevant in industries like healthcare and retail, where substantial data is already created at the edge.[22]
Edge computing could allow data center infrastructure to become more agile and modular. By processing data closer to where it is generated, edge computing can reduce latency and improve real-time processing capabilities, which are crucial for many gen AI applications.[23]
However, this shift requires data centers to first embrace the hardware, architecture, and infrastructure necessary to support gen AI workloads. This includes the computational resources discussed earlier and the networking and storage solutions that can support the distributed nature of edge computing.
As gen AI continues to evolve, so too must our data centers. The transformation will involve technological upgrades and new approaches to data management, privacy, and distributed computing. By adapting to these new demands, data centers will play a crucial role in unlocking the full potential of generative AI technologies, paving the way for innovations we can only begin to imagine.
Sources
[1] https://blog.equinix.com/blog/2023/06/14/accelerating-ai-innovation-requires-ecosystems-and-infrastructure/
[2] https://www.forbes.com/sites/tiriasresearch/2023/05/12/generative-ai-breaks-the-data-center-data-center-infrastructure-and-operating-costs-projected-to-increase-to-over-76-billion-by-2028/
[3] https://www.wired.com/story/how-to-stop-your-data-from-being-used-to-train-ai/
[4] https://www.nvidia.com/en-us/glossary/high-performance-computing/
[5] https://ieeexplore.ieee.org/document/10268594
[6] https://the-decoder.com/gpt-4-architecture-datasets-costs-and-more-leaked/
[7] https://ieeexplore.ieee.org/document/10268594
[8] https://www.edgecortix.com/en/blog/ai-drives-the-software-defined-heterogeneous-computing-era
[9] https://www.dataversity.net/future-data-center-heterogeneous-computing/
[10] https://www.kalrayinc.com/blog/dpus-gpus-and-cpus-in-the-data-center/
[11] https://www.purestorage.com/knowledge/what-is-neural-processing-unit.html
[12] https://www.goldmansachs.com/insights/articles/AI-poised-to-drive-160-increase-in-power-demand
[13] https://research.manchester.ac.uk/en/studentTheses/energy-efficient-encoding-methods-for-chip-to-chip-communication
[14] https://developer.nvidia.com/blog/strategies-for-maximizing-data-center-energy-efficiency
[15] https://www.techtarget.com/searchdatacenter/tip/Four-ways-to-reduce-data-center-power-consumption
[16] https://www.digitalrealty.co.uk/resources/articles/green-data-centre-ai
[17] https://www.laserfocusworld.com/optics/article/14300952/unleashing-ai-data-center-growth-through-optics
[18] https://semiengineering.com/ai-drives-need-for-optical-interconnects-in-data-centers/
[19] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html
[20] https://medium.com/@cloudhacks_/federated-learning-a-paradigm-shift-in-data-privacy-and-model-training-a41519c5fd7e
[21] https://medium.com/@minhanh.dongnguyen/a-gentle-introduction-on-split-learning-959cfe513903
[22] https://www.forbes.com/sites/forbestechcouncil/2023/12/11/why-generative-ai-makes-sense-for-edge-computing/
[23] https://www.datacenterdynamics.com/en/opinions/are-data-centers-obsolete-in-the-age-of-ai/