Learning from demonstrations (LfD), wherein an agent learns a control policy from observing demonstrations provided by a human teacher, has the potential to allow end-users without technology expertise to customize the behaviors of their own artificial agents. As in other machine learning applications, many LfD algorithms rely heavily on an adequately defined distance metric, which is used to compute the distance among datapoints. In general, this distance metric can be pre-defined or learned. However, to gain full advantage of LfD, it must be learned. The user can also assist distance metric learning algorithms to learn faster. In this research, we consider the use of human interactions to help learn a distance metric and a control policy simultaneously in an online taxi dispatch problem. The nature of these interactions depends on the specific algorithm being used. As opposed to the current methods of solving online taxi dispatch system, which require complex pre-defined mathematical models; our method does not have such constraints. Thus, it can potentially be applied to different scenarios with minimal changes. To study the potential effectiveness of using user interaction to enhance distance metric learning, we conducted studies of distance metric learning in conjunction with LfD in an online taxi problem. In so doing, we analyzed the ability of user input to enhance the Global Distance Metric Learning algorithm (GDML), Large Margin Nearest Neighborhood (LMNN), Relevant Component Analysis (RCA), Discriminative Component Analysis (RCA), Information-Theoretic Metric Learniii ing (ITML) Online and LogDet Exact Gradient Online (LEGO). The experiments show that if an appropriate distance metric learning algorithm is used, it can be coupled with LfD. However, allowing the user to assist the distance metric learning algorithm may not increase its performance to a large degree. Further research should investigate methods for allowing the user to assist in feature selection, which limits the amount of information about the environment that is stored, and has a considerable impact on the performance of LfD and distance metric learning algorithms.
Date of Award | 2011 |
---|
Original language | American English |
---|
Supervisor | Jacob Crandall (Supervisor) |
---|
- Education
- Demonstration Centers in Education
- Internet in Education
Combining Learning from Demonstration and Distance Metric Learning for Online Learning Problems
Ahmed, S. (Author). 2011
Student thesis: Master's Thesis