Efficient Multi-user Offloading of Personalized Diffusion Models: A DRL-Convex Hybrid Solution

  • Wanting Yang
  • , Zehui Xiong
  • , Song Guo
  • , Shiwen Mao
  • , Dong In Kim
  • , Merouane Debbah

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Generative diffusion models like Stable Diffusion are at the forefront of the thriving field of generative models today, celebrated for their robust training methodologies and high-quality photorealistic generation capabilities. These models excel in producing rich content, establishing them as essential tools in the industry. Building on this foundation, the field has seen the rise of personalized content synthesis as a particularly exciting application. However, the large model sizes and iterative nature of inference make it difficult to deploy personalized diffusion models broadly on local devices with heterogeneous computational power. To address this, we propose a novel framework for efficient multi-user offloading of personalized diffusion models. This framework accommodates a variable number of users, each with different computational capabilities, and adapts to the fluctuating computational resources available on edge servers. To enhance computational efficiency and alleviate the storage burden on edge servers, we propose a tailored multi-user hybrid inference approach. This method splits the inference process for each user into two phases, with an optimizable split point. Initially, a cluster-wide model processes low-level semantic information for each user's prompt using batching techniques. Subsequently, users employ their personalized models to refine these details during the later phase of inference. Given the constraints on edge server computational resources and users' preferences for low latency and high accuracy, we model the joint optimization of each user's offloading request handling and split point as an extension of the Generalized Quadratic Assignment Problem (GQAP). Our objective is to maximize a comprehensive metric that balances both latency and accuracy across all users. To solve this NP-hard problem, we transform the GQAP into an adaptive decision sequence, model it as a Markov decision process, and develop a hybrid solution combining deep reinforcement learning with convex optimization techniques. Simulation results validate the effectiveness of our framework, demonstrating superior optimality and low complexity compared to traditional methods. All related code, datasets, and fine-tuned models are available at https://github.com/wty2011jl/E-MOPDM.

Original languageBritish English
JournalIEEE Transactions on Mobile Computing
DOIs
StateAccepted/In press - 2025

Keywords

  • AIGC service
  • Diffusion model
  • DRL
  • edge offloading
  • generalized quadratic assignment problem
  • hybrid inference

Fingerprint

Dive into the research topics of 'Efficient Multi-user Offloading of Personalized Diffusion Models: A DRL-Convex Hybrid Solution'. Together they form a unique fingerprint.

Cite this