TY - JOUR
T1 - Efficient Multi-user Offloading of Personalized Diffusion Models
T2 - A DRL-Convex Hybrid Solution
AU - Yang, Wanting
AU - Xiong, Zehui
AU - Guo, Song
AU - Mao, Shiwen
AU - Kim, Dong In
AU - Debbah, Merouane
N1 - Publisher Copyright:
© 2002-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Generative diffusion models like Stable Diffusion are at the forefront of the thriving field of generative models today, celebrated for their robust training methodologies and high-quality photorealistic generation capabilities. These models excel in producing rich content, establishing them as essential tools in the industry. Building on this foundation, the field has seen the rise of personalized content synthesis as a particularly exciting application. However, the large model sizes and iterative nature of inference make it difficult to deploy personalized diffusion models broadly on local devices with heterogeneous computational power. To address this, we propose a novel framework for efficient multi-user offloading of personalized diffusion models. This framework accommodates a variable number of users, each with different computational capabilities, and adapts to the fluctuating computational resources available on edge servers. To enhance computational efficiency and alleviate the storage burden on edge servers, we propose a tailored multi-user hybrid inference approach. This method splits the inference process for each user into two phases, with an optimizable split point. Initially, a cluster-wide model processes low-level semantic information for each user's prompt using batching techniques. Subsequently, users employ their personalized models to refine these details during the later phase of inference. Given the constraints on edge server computational resources and users' preferences for low latency and high accuracy, we model the joint optimization of each user's offloading request handling and split point as an extension of the Generalized Quadratic Assignment Problem (GQAP). Our objective is to maximize a comprehensive metric that balances both latency and accuracy across all users. To solve this NP-hard problem, we transform the GQAP into an adaptive decision sequence, model it as a Markov decision process, and develop a hybrid solution combining deep reinforcement learning with convex optimization techniques. Simulation results validate the effectiveness of our framework, demonstrating superior optimality and low complexity compared to traditional methods. All related code, datasets, and fine-tuned models are available at https://github.com/wty2011jl/E-MOPDM.
AB - Generative diffusion models like Stable Diffusion are at the forefront of the thriving field of generative models today, celebrated for their robust training methodologies and high-quality photorealistic generation capabilities. These models excel in producing rich content, establishing them as essential tools in the industry. Building on this foundation, the field has seen the rise of personalized content synthesis as a particularly exciting application. However, the large model sizes and iterative nature of inference make it difficult to deploy personalized diffusion models broadly on local devices with heterogeneous computational power. To address this, we propose a novel framework for efficient multi-user offloading of personalized diffusion models. This framework accommodates a variable number of users, each with different computational capabilities, and adapts to the fluctuating computational resources available on edge servers. To enhance computational efficiency and alleviate the storage burden on edge servers, we propose a tailored multi-user hybrid inference approach. This method splits the inference process for each user into two phases, with an optimizable split point. Initially, a cluster-wide model processes low-level semantic information for each user's prompt using batching techniques. Subsequently, users employ their personalized models to refine these details during the later phase of inference. Given the constraints on edge server computational resources and users' preferences for low latency and high accuracy, we model the joint optimization of each user's offloading request handling and split point as an extension of the Generalized Quadratic Assignment Problem (GQAP). Our objective is to maximize a comprehensive metric that balances both latency and accuracy across all users. To solve this NP-hard problem, we transform the GQAP into an adaptive decision sequence, model it as a Markov decision process, and develop a hybrid solution combining deep reinforcement learning with convex optimization techniques. Simulation results validate the effectiveness of our framework, demonstrating superior optimality and low complexity compared to traditional methods. All related code, datasets, and fine-tuned models are available at https://github.com/wty2011jl/E-MOPDM.
KW - AIGC service
KW - Diffusion model
KW - DRL
KW - edge offloading
KW - generalized quadratic assignment problem
KW - hybrid inference
UR - https://www.scopus.com/pages/publications/105002773516
U2 - 10.1109/TMC.2025.3560582
DO - 10.1109/TMC.2025.3560582
M3 - Article
AN - SCOPUS:105002773516
SN - 1536-1233
JO - IEEE Transactions on Mobile Computing
JF - IEEE Transactions on Mobile Computing
ER -