TY - JOUR
T1 - LearnChain
T2 - Transparent and cooperative reinforcement learning on Blockchain
AU - Sami, Hani
AU - Mizouni, Rabeb
AU - Otrok, Hadi
AU - Singh, Shakti
AU - Bentahar, Jamal
AU - Mourad, Azzam
N1 - Publisher Copyright:
© 2023
PY - 2024/1
Y1 - 2024/1
N2 - We consider multi-agent reinforcement learning (MARL) with the popular paradigm of centralized training and decentralized execution (CTDE). CTDE empowers sharing knowledge from agents in different environments for updating a shared model. A wide range of applications is supported through CTDE in MARL, such as self-driving vehicle coordination, traffic lights synchronization, or cooperation in various aspects of the Internet of Things (IoT), including resource management. Despite the drawbacks of relying on a central authority for handling model updates, incorporating multiple sources of data raises concerns about the trustworthiness of the process. For instance, participating agents could provide data in the favor of their experiences to shift the model towards certain behaviors. Similarly, sending falsified data for updates could lead to adversarial attacks. To overcome these challenges, it is essential to integrate the Ethereum Blockchain technology to handle model updates in the CTDE paradigm by achieving decentralized storage and consensus mechanism for model updates. In the literature, there exist multiple efforts that propose using reinforcement learning (RL) on Blockchain; however, none of them have considered updating MARL of CTDE on-chain, allowing transparent and auditable record of the training process. Therefore, we propose LearnChain, a framework that offers an integration between the CTDE mechanism and a Consortium Blockchain built between authorized participants, thus avoiding gas costs. At the core of LearnChain, RL is integrated with Quorum, offering separate smart contracts for deployment, data handling with incentive mechanisms, training, target update, and inference. Based on a real use-case entailing management of Vehicular Edge Computing tasks through multi-agent synchronization, we implement LearnChain and evaluate its performance and cost in different settings. Our results show the ability to improve learning from shared experiences and to adapt to environment changes on the Quorum BlockChain.
AB - We consider multi-agent reinforcement learning (MARL) with the popular paradigm of centralized training and decentralized execution (CTDE). CTDE empowers sharing knowledge from agents in different environments for updating a shared model. A wide range of applications is supported through CTDE in MARL, such as self-driving vehicle coordination, traffic lights synchronization, or cooperation in various aspects of the Internet of Things (IoT), including resource management. Despite the drawbacks of relying on a central authority for handling model updates, incorporating multiple sources of data raises concerns about the trustworthiness of the process. For instance, participating agents could provide data in the favor of their experiences to shift the model towards certain behaviors. Similarly, sending falsified data for updates could lead to adversarial attacks. To overcome these challenges, it is essential to integrate the Ethereum Blockchain technology to handle model updates in the CTDE paradigm by achieving decentralized storage and consensus mechanism for model updates. In the literature, there exist multiple efforts that propose using reinforcement learning (RL) on Blockchain; however, none of them have considered updating MARL of CTDE on-chain, allowing transparent and auditable record of the training process. Therefore, we propose LearnChain, a framework that offers an integration between the CTDE mechanism and a Consortium Blockchain built between authorized participants, thus avoiding gas costs. At the core of LearnChain, RL is integrated with Quorum, offering separate smart contracts for deployment, data handling with incentive mechanisms, training, target update, and inference. Based on a real use-case entailing management of Vehicular Edge Computing tasks through multi-agent synchronization, we implement LearnChain and evaluate its performance and cost in different settings. Our results show the ability to improve learning from shared experiences and to adapt to environment changes on the Quorum BlockChain.
KW - Blockchain
KW - Cooperative artificial intelligence (AI)
KW - Ethereum
KW - Quorum
KW - Reinforcement learning
KW - Transparency
KW - Trust
KW - Vehicular edge computing
UR - https://www.scopus.com/pages/publications/85171891314
U2 - 10.1016/j.future.2023.09.012
DO - 10.1016/j.future.2023.09.012
M3 - Article
AN - SCOPUS:85171891314
SN - 0167-739X
VL - 150
SP - 255
EP - 271
JO - Future Generation Computer Systems
JF - Future Generation Computer Systems
ER -