Challenges and Opportunities in Edge-based Generative AI
In the final chapter of the Edge AI Technology Report: Generative AI Edition explores the technical hurdles organizations face as they attempt to leverage edge-based generative AI. It also examines strategic opportunities for innovation in hardware, deployment configurations, and security measures.
We once believed the cloud was the final frontier for artificial intelligence (AI), but it turns out the real magic happens much closer to home—at the edge, where devices can now think, generate, and respond in real-time. The rapid evolution of AI, particularly generative AI, is fundamentally reshaping industries and challenging the existing computing infrastructure.
Traditional models, especially resource-intensive ones like Large Language Models (LLMs), have long relied on centralized cloud systems for the necessary computational power. However, as the need for AI-driven interactions grows across sectors—from autonomous vehicles to personalized content generation—there is a clear shift toward edge computing.
Our new report, drawing from discussions with industry leaders and technologists, explores how generative AI is being harnessed and integrated into edge environments and what this means for the future of technology.
Read an excerpt from the third chapter of the report below or read the full report by downloading it now.
While the convergence of generative AI and edge computing offers unparalleled opportunities for industries, it also introduces significant challenges that need addressing for effective implementation. While LLMs have proven transformative in cloud environments, deploying these resource-intensive models at the edge brings complexity. This chapter explores the technical hurdles organizations face as they attempt to leverage edge-based generative AI. It also examines strategic opportunities for innovation in hardware, deployment configurations, and security measures.
Key Challenges to Deploying Generative AI at the Edge
Model Size, Resource Constraints, and Hardware Limitations
Generative AI models are resource-intensive, and their deployment requires significant computational power and memory. Deploying these large models on resource-constrained edge devices can be challenging, given the limited processing power, memory, and storage capacity of state-of-the-art smartphones and IoT devices. For instance, a full-precision LLM model like Llama2-7B needs at least 28GB of memory, which is beyond the capacity of most edge devices.
Nowadays, techniques such as model quantization and pruning can reduce model size and resource demands. However, these techniques can also affect the accuracy and performance of the models. Finding the proper balance between model size and accuracy for the application at hand remains one of the main challenges of generative AI deployments at the edge.
Deployment Configurations Complexity
Deploying generative AI models at the edge presents intricate challenges due to the need to balance performance, energy consumption, and resource allocation. These systems require highly optimized configurations to ensure efficient operations without exceeding the limited resources of edge devices. Techniques such as batching, load balancing, and intelligent resource management are crucial in maintaining throughput while addressing the requirements of low latency, high accuracy, and power efficiency.
A key aspect of managing edge deployments is ensuring they remain energy-efficient, as AI models are notorious for their heavy power consumption. Gartner analysts estimate that AI could account for up to 3.5% of global electricity demand by 2030 if left unchecked. This means companies must implement strategies that maximize model performance and reduce the carbon footprint of AI applications. For edge deployments, this translates to leveraging energy-efficient hardware, implementing AI-driven orchestration, and optimizing model architectures for lower power consumption.
To address these challenges, organizations are increasingly focusing on energy-aware AI deployments that manage power consumption while meeting the growing demand for AI-powered solutions across industries. Techniques such as quantization and knowledge distillation also play an essential role by reducing the computational load of generative models without significantly compromising performance.
Connectivity and Latency
Though edge computing reduces latency by processing data closer to its source, connectivity remains a critical challenge. Not all edge devices are deployed in environments with stable, high-speed internet access. For generative AI models that rely on cloud collaboration for computationally heavy tasks, intermittent connections can limit the effectiveness of edge deployments. This challenge becomes even more pronounced in remote or industrial environments, where network instability could affect the consistency of AI-driven operations.
Furthermore, while on-device inference offers a solution for offline capabilities, it increases the demand for local resources. Edge devices must balance limited processing power with the need to run real-time AI applications independently without relying on continuous cloud connectivity. This creates a delicate balancing act between connectivity, processing capabilities, and the device's ability to provide accurate and timely responses.
By managing connectivity limitations more effectively, industries can mitigate risks, but maintaining stable, real-time AI responses at the edge remains an ongoing challenge that demands attention.
Generative AI is radically changing how we process, analyze, and utilize data, and edge computing is driving this transformation closer to devices and users. To explore this groundbreaking space, join us for The Rise of Generative AI at the Edge: From Data Centers to Devices webinar on Wednesday, January 15th.
Model Compatibility
Deploying generative AI models on edge devices by compressing models through techniques like quantization and pruning often risks degrading performance, mainly when edge devices have limited computational resources. Ensuring these compressed models run efficiently across diverse hardware environments, from IoT devices to smartphones, adds another layer of complexity.
Additionally, maintaining model compatibility across different platforms is challenging. Edge optimization frameworks help tailor AI models to specific hardware, reducing computational demands. However, they often struggle with ensuring consistent performance across various devices due to the diverse architectures and processing capabilities of edge environments, making it challenging to maintain uniform efficiency without specialized adaptations.
Solutions that address these disparities focus on hardware-agnostic methods, aiming to simplify deployment and minimize the need for constant reconfigurations across different platforms.
Privacy and Security Concerns
Deploying AI models at the edge enhances privacy by processing data locally, reducing exposure during transmission. However, safeguarding sensitive information in distributed AI environments brings new security challenges.
Protecting distributed data across numerous edge devices introduces vulnerabilities, such as unauthorized access, hacking risks, and inconsistent security protocols across different hardware. These concerns require robust security frameworks and consistent updates to safeguard against breaches, making data protection a critical aspect of managing edge deployments effectively.
Strategies and Solution Guidelines
Several strategies and best practices can be employed to address the challenges of implementing generative AI at the edge. These include:
Intelligent resource management and orchestration: Implementing intelligent resource management systems can optimize the deployment of generative AI services at the edge. This involves using AI-driven orchestration to adapt to changing demands and ensure smooth service operation. An architectural paradigm that supports multi-domain edge deployments can enhance the efficiency of these systems by decoupling high-level user intent from the AI-driven orchestration and execution plans.
Latency-aware service placement: To support latency-critical applications (e.g., LLMs for autonomous vehicles), generative AI deployments at the edge must adopt and implement latency-aware service placement strategies. This involves the use of optimization techniques (e.g., Swarm Learning and Ant Colony Optimization) to guide the placement of generative AI services based on the capabilities of edge devices and network conditions. This approach can significantly reduce latency and improve resource utilization, which leads to efficient performance of generative AI solutions at the edge.
Optimizing task distribution in edge-cloud collaboration: Collaborative edge-cloud infrastructures help overcome the resource limitations of edge devices while ensuring low application latency. This approach allows for the distribution of tasks between the cloud and edge, which optimizes performance and resource utilization. It also enables real-time, personalized AI-generated content and preserves user privacy. As a prominent example, simple LLMs can be deployed at the edge to provide personalized chatbots for natural, real-time interactions with end users. At the same time, more complex LLMs can be leveraged through cloud configurations in support of more sophisticated reasoning tasks.
Model optimization techniques: Edge AI vendors (including edge-LLM solution providers) leverage techniques that reduce the model size to enable their deployment at the edge without essentially compromising their accuracy and ability to produce results for the task at hand. These techniques include quantization, pruning, and knowledge distillation, as explained in previous chapters.
Efficient hardware utilization: Recent advances in edge device hardware (e.g., AI accelerators on smartphones) can significantly improve the power efficiency of generative AI deployments at the edge. For instance, some edge processors are designed to handle AI tasks with significantly lower power consumption compared to traditional data centers. According to research by Creative Strategies, Snapdragon 8 Gen 3, a smartphone processor, is over 30 times more efficient than a data center in generating images per watt-hour.
Standardized frameworks for interoperability and compatibility: One of the best strategies for deploying generative AI models across diverse and heterogeneous sets of devices is to develop standardized frameworks and tools that foster compatibility. Such frameworks can facilitate the deployment of edge-based generative AI at scale.
On-device inference and efficient data management: Strategies for on-device inference and efficient data management are also being developed to optimize real-time generative AI operations in ways that minimize data transfers across the edge-cloud computing continuum.
Future Opportunities and Growth Areas
The deployment of generative AI at the edge presents several promising opportunities and growth areas for innovation, including multimodal capabilities, lightweight models, and edge-specific deployment tools
Multimodal capabilities: The integration of multimodal capabilities, where AI can process and understand different types of data (such as text, image, and audio), can be a significant growth area for edge-based generative AI. It can offer a new wave of intelligent applications that will perceive and combine different forms of multi-media information while being able to generate multimodal responses. Such multimodal capabilities will enable sophisticated applications in fields like autonomous vehicles, industrial engineering, and smart home devices. For instance, they can allow industrial workers to prompt LLMs through text while inputting instructions via technical diagrams at the same time.
Lightweight models: There are already business opportunities for developing, deploying, and packaging lightweight and efficient models (such as LaMini-Flan-T5-783M) suitable for edge deployment. These opportunities will increase as the rising number of edge generative AI use cases increases demand for such models.
Edge-specific deployment tools: Edge-LLM deployments are currently supported by edge-specific deployment tools like MLC LLM, which offer open-source solutions for deploying LLMs on edge devices. These tools may face challenges such as OS-level freezes when synchronizing GPU and CPU on platforms like Android. However, this generates opportunities for improving existing tools and creating edge-specific deployment tools that will provide more stable and efficient deployments.
Integration with distributed learning frameworks: Future edge LLM deployments are likely to be combined with distributed learning approaches such as federated learning. Hence, there will be opportunities for frameworks that distribute the running of LLMs across multiple edge devices (including smartphones and industrial IoT devices). Such frameworks enable devices to collaboratively train resource-efficient LLM models at the edge without sharing raw data. This will further enhance privacy and reduce the latency of non-trivial edge LLM deployments.
Edge-based generative AI in robotics: LLMs at the edge can be used to improve the real-time capabilities of robots in ways that will increase their autonomy. This can be particularly useful in industries such as manufacturing, healthcare, and logistics, notably in human-robot collaboration scenarios where humans must frequently interact with robots in ergonomic, intuitive, safe, and efficient ways.
Security of edge generative AI: As AI systems become more integrated into critical infrastructure, including edge devices, securing these systems becomes very important. Hence, there will be a growing need for AI security solutions at the edge that protect models from adversarial attacks. In the coming years, there will be opportunities for innovators and startups that provide cybersecurity solutions for LLM models and applications at the edge.
Business efficiency gains: Generative AI applications at the edge can deliver tangible benefits to industrial enterprises, including real-time performance for applications that need it and energy savings. There will be plenty of opportunities for developing edge LLM/GenAI applications with tangible return on investment (ROI). Innovators are expected to work towards introducing applications that leverage such efficiency gains.
The above-listed insights highlight the transformative potential of generative AI at the edge, which offers opportunities for innovation, efficiency, and enhanced user experiences across various sectors. This is currently evident in the AI startup ecosystem, where companies that introduce products for Gen AI at the edge can be found. For example, Etched, a purpose-built ASIC manufacturer, develops chips that embody AI transformer architectures, which are destined to revolutionize the efficiency of edge-LLM applications.
Conclusion: Inspiring Action and Innovation
Generative AI at the edge is more than a technological breakthrough; it's an enabler of transformation across industries. As this report has demonstrated, real-world applications in sectors like healthcare, manufacturing, automotive, and smart cities are beginning to reveal the true potential of edge-driven generative AI. These use cases highlight how businesses can leverage the convergence of AI and edge computing to streamline operations, enhance decision-making, and create more personalized, real-time experiences. But while these advancements are significant, the journey is far from complete.
To fully capitalize on the promise of generative AI at the edge, several critical areas require further development. First, there is a need for ongoing innovation in model optimization techniques to ensure AI models can run efficiently on resource-constrained devices. Additionally, hybrid cloud-edge architectures must continue to evolve, providing seamless collaboration between cloud servers and edge devices to balance workloads and maximize efficiency.
Equally important are the strides needed in hardware development. Edge devices will need to be equipped with more powerful, AI-specific processors that consume less energy while supporting complex, real-time decision-making. This hardware evolution is vital for industries such as autonomous driving, where latency reduction and immediate data processing are crucial for safety and efficiency.
However, technological advancements alone will not suffice. Collaboration between industry, academia, and government is essential to drive widespread adoption and ensure that AI's benefits are felt across all sectors of society. From funding research initiatives to fostering partnerships that bridge the gap between AI developers and hardware manufacturers, collective action is needed to unlock the full potential of generative AI at the edge.
As we look toward the future, it’s clear that this is just the beginning. Generative AI at the edge will continue to evolve, offering businesses new ways to operate more efficiently and sustainably while enhancing user experiences in real time. The possibilities are endless, but realizing them will require bold action, forward-thinking strategies, and a commitment to innovation. Let this report serve as both a roadmap and a call to action for all stakeholders involved in shaping the future of edge-enabled generative AI. Together, we can drive positive change and inspire the next wave of AI-driven transformation.
Read an excerpt from the other report chapters here:
Appendix
About the Report
This report is the result of a highly collaborative effort, with significant contributions from a diverse team. Led by Samir Jaber, editor-in-chief, and John Soldatos as the supporting author, both played crucial roles in shaping and refining the report’s content through extensive research and development. We are especially grateful to our sponsors and other contributors, whose insights have greatly enriched the overall narrative. Additionally, we extend our sincere thanks to the Wevolver content team, especially Jessica Miley, content director, and Rebecka Durén, content coordinator, whose leadership and support were instrumental throughout the project. Each individual’s contribution has been invaluable, reflecting the shared dedication that brought this report to life.
References and Additional Resources
C. Chakraborty, M. Bhattacharya, and S.-S. Lee, "Need an AI-enabled, next-generation, advanced chatGPT or large language models (LLMs) for error-free and accurate medical information," Ann. Biomed. Eng., vol. 52, pp. 134-135, 2023. [Online]. Available: https://pubmed.ncbi.nlm.nih.gov/37368124/
R. Benbaki et al., "Fast as CHITA: Neural Network Pruning with Combinatorial Optimization," in Proceedings of the 40th International Conference on Machine Learning, PMLR 202:2031-2049, 2023. [Online]. Available: https://proceedings.mlr.press/v202/benbaki23a.html
"Optimizing generative AI for edge devices," Edge AI and Vision Alliance, 2024. [Online]. Available: https://www.edge-ai-vision.com/2024/02/optimizing-generative-ai-for-edge-devices/
LatentAI, "AI on the edge: Transformative trend predictions for 2024," 2024. [Online]. Available: https://latentai.com/blog/ai-on-the-edge-transformative-trend-predictions-for-2024/
Business Insider, "New AI and 5G advancements will usher in the era of edge computing on smartphones, autonomous cars, and more," 2024. [Online]. Available: https://www.businessinsider.com/ai-edge-computing-5g-cloud-artificial-intelligence-2024-3
McKinsey & Company, "The state of AI in early 2024: Gen AI adoption spikes and starts to generate value," 2024. [Online]. Available: https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai
X. Wang et al., "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," IEEE Xplore, 2024. [Online]. Available: https://ieeexplore.ieee.org/document/8976180
Gartner, "Gartner Says CIOs Must Balance the Environmental Promises and Risks of AI," 2023. [Online]. Available: https://www.gartner.com/en/newsroom/press-releases/2023-11-07-gartner-says-cios-must-balance-the-environmental-promises-and-risks-of-ai
Precedence Research, "Edge AI Market Size, Share and Trends 2024 to 2034," 2024. [Online]. Available: https://www.precedenceresearch.com/edge-ai-market
Statista, "Generative AI - Worldwide," 2024. [Online]. Available: https://www.statista.com/outlook/tmo/artificial-intelligence/generative-ai/worldwide
N. Nelson, S. Huver, and M. Toloui, "Deploy large language models at the edge with NVIDIA IGX Orin developer kit," NVidia, 2024. [Online]. Available: https://developer.nvidia.com/blog/deploy-large-language-models-at-the-edge-with-nvidia-igx-orin-developer-kit/
Ambarella, "Ambarella brings generative AI capabilities to edge devices," 2024. [Online]. Available: https://investor.ambarella.com/news-releases/news-release-details/ambarella-brings-generative-ai-capabilities-edge-devices
A. Jadhav, "Qualcomm and Meta join forces to bring generative AI to edge devices," Qualcomm, 2024. [Online]. Available: https://www.edgeir.com/qualcomm-meta-join-forces-to-bring-generative-ai-to-edge-devices-20230803
N. Chen et al., "Towards Integrated Fine-tuning and Inference when Generative AI meets Edge Intelligence," arXiv, 2024. [Online]. Available: https://arxiv.org/abs/2401.02668
T. Malhotra, "The next big trends in large language model (LLM) research," Marktechpost, 2024. [Online]. Available: https://www.marktechpost.com/2024/07/04/the-next-big-trends-in-large-language-model-llm-research/
R. Qin et al., "Empirical guidelines for deploying LLMs onto resource-constrained edge devices," arXiv, 2024. [Online]. Available: https://arxiv.org/abs/2406.03777
J. Banks and T. Warkentin, "Gemma open models for AI," Google Developers Blog, 2024. [Online]. Available: https://blog.google/technology/developers/gemma-open-models/
M. Williams, "Edge Impulse unveils groundbreaking new edge AI solutions," 2024. [Online]. Available: https://www.businesswire.com/news/home/20240924612549/en/Edge-Impulse-Unveils-Groundbreaking-New-Edge-AI-Solutions-for-Industrial-Environments-at-Its-Annual-Imagine-Conference
Mistral AI, "Capabilities of AI agents," 2024. [Online]. Available: https://docs.mistral.ai/capabilities/agents/
T. Varshney, "Introduction to LLM agents," NVidia, 2024. [Online]. Available: https://developer.nvidia.com/blog/introduction-to-llm-agents/
S. Jaber, J. Soldatos, and R. Rao, "The 2024 State of Edge AI Report," Wevolver, 2024. [Online]. Available: https://www.wevolver.com/article/2024-state-of-edge-ai-report/
S. Jaber, J. Soldatos, M. Milovanovic, & L. Husser, “2023 Edge AI Technology Report,” Wevolver, 2023. [Online]. Available: https://www.wevolver.com/article/2023-edge-ai-technology-report
M. Zhang, J. Cao, X. Shen, and Z. Cui, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," arXiv, 2024. [Online]. Available: https://arxiv.org/abs/2405.14371
X. Zhang et al., "Edge Intelligence Optimization for Large Language Model Inference with Batching and Quantization," 2024 IEEE Wireless Communications and Networking Conference (WCNC), pp. 1-6, 2024. [Online]. Available: https://arxiv.org/abs/2405.07140
L. Ale et al., "AI for healthcare: transforming medical imaging & diagnostics," Nature Reviews Electrical Engineering, volume 1, pages478–486, 2024. [Online]. Available: https://www.nature.com/articles/s44287-024-00053-6
J. Jongboom, "Bringing Large Language Models to the Edge with GPT-4o and NVIDIA TAO," Edge Impulse, 2024. [Online]. Available: https://www.edgeimpulse.com/blog/llm-knowledge-distillation-gpt-4o/
NVIDIA, "Getting Started With NVIDIA TAO," 2024. [Online]. Available: https://developer.nvidia.com/tao-toolkit-get-started
L. Mohanty et al., "Pruning techniques for artificial intelligence networks: a deeper look at their engineering design and bias: the first review of its kind," Multimedia Tools and Applications, vol. 83, no. 1, pp. 1-19, 2024. [Online]. Available: https://link.springer.com/article/10.1007/s11042-024-19192-x
Airbus, "How Airbus uses generative artificial intelligence to reinvent itself," 2024. [Online]. Available: https://www.airbus.com/en/newsroom/stories/2024-05-how-airbus-uses-generative-artificial-intelligence-to-reinvent-itself
Zapata AI, "BMW // Optimizing automotive manufacturing with Industrial Generative AI," 2024. [Online]. Available: https://zapata.ai/bmw-ai-in-automotive-case-study/
BMW Group, "How AI is revolutionising production," 2024. [Online]. Available: https://www.bmwgroup.com/en/news/general/2023/aiqx.html
S. Dickens, "Tesla's Dojo supercomputer: A paradigm shift in supercomputing," Forbes, 2024. [Online]. Available: https://www.forbes.com/sites/stevendickens/2023/09/11/teslas-dojo-supercomputer-a-paradigm-shift-in-supercomputing/
B. Danon, "How GM and Autodesk are using generative design for vehicles of the future," Autodesk, 2024. [Online]. Available: https://blogs.autodesk.com/inthefold/how-gm-and-autodesk-use-generative-design-for-vehicles-of-the-future/
P. Bigelow, "Here's how Waymo uses AI to enhance its self-driving skills," Automotive News, 2024. [Online]. Available: https://www.autonews.com/mobility-report/waymo-uses-fresh-ai-tech-bolster-self-driving-vehicles
N. Arora et al., "The value of getting personalization right—or wrong—is multiplying," McKinsey & Company, 2024. [Online]. Available: https://www.mckinsey.com/capabilities/growth-marketing-and-sales/our-insights/the-value-of-getting-personalization-right-or-wrong-is-multiplying
Meta AI, "LLaMA 2: Open models for generative AI," 2024. [Online]. Available: https://huggingface.co/meta-llama/Llama-2-7b
Gartner, "Gartner Says CIOs Must Balance the Environmental Promises and Risks of AI," 2024. [Online]. Available: https://www.gartner.com/en/newsroom/press-releases/2023-11-07-gartner-says-cios-must-balance-the-environmental-promises-and-risks-of-ai
M. Weinbach, "The Power of Efficiency: Edge Al’s Role in Sustainable Generative Al Adoption," Creative Strategies, 2024. [Online]. Available: https://creativestrategies.com/research/gen-ai-edge-testing/
MBZUAI, "LaMini-Flan-T5-783M: Open-source lightweight model for generative AI," 2024. [Online]. Available: https://huggingface.co/MBZUAI/LaMini-Flan-T5-783M
H. H. Li, "LK01.1 IEEE CEDA Distinguished Lecturer Lunchtime Keynote: AI MODELS FOR EDGE COMPUTING: HARDWARE-AWARE OPTIMIZATIONS FOR EFFICIENCY," 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2024. [Online]. Available: https://date24.date-conference.com/node/1753
MLC AI, "MLC LLM: Universal LLM Deployment Engine With ML Compilation," 2024. [Online]. Available: https://llm.mlc.ai/
Etched AI, "Innovations in edge AI transformer architectures," 2024. [Online]. Available: https://www.etched.com/