TY - GEN
T1 - Single image-based food volume estimation using monocular depth-prediction networks
AU - Graikos, Alexandros
AU - Charisis, Vasileios
AU - Iakovakis, Dimitrios
AU - Hadjidimitriou, Stelios
AU - Hadjileontiadis, Leontios
N1 - Funding Information:
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 817732.
Publisher Copyright:
© Springer Nature Switzerland AG 2020.
PY - 2020
Y1 - 2020
N2 - In this work, we present a system that can estimate food volume from a single input image, by utilizing the latest advancements in monocular depth estimation. We employ a state-of-the-art, monocular depth prediction network architecture, trained exclusively on videos, which we obtain from the publicly available EPIC-KITCHENS and our own collected food videos datasets. Alongside it, an instance segmentation network is trained on the UNIMIB2016 food-image dataset, to detect and produce segmentation masks for each of the different foods depicted in the given image. Combining the predicted depth map, segmentation masks and known camera intrinsic parameters, we generate three-dimensional (3D) point cloud representations of the target food objects and approximate their volumes with our point cloud-to-volume algorithm. We evaluate our system on a test set, consisting of images portraying various foods and their respective measured volumes, as well as combinations of foods placed in a single image.
AB - In this work, we present a system that can estimate food volume from a single input image, by utilizing the latest advancements in monocular depth estimation. We employ a state-of-the-art, monocular depth prediction network architecture, trained exclusively on videos, which we obtain from the publicly available EPIC-KITCHENS and our own collected food videos datasets. Alongside it, an instance segmentation network is trained on the UNIMIB2016 food-image dataset, to detect and produce segmentation masks for each of the different foods depicted in the given image. Combining the predicted depth map, segmentation masks and known camera intrinsic parameters, we generate three-dimensional (3D) point cloud representations of the target food objects and approximate their volumes with our point cloud-to-volume algorithm. We evaluate our system on a test set, consisting of images portraying various foods and their respective measured volumes, as well as combinations of foods placed in a single image.
KW - Deep learning
KW - Food image processing
KW - Food volume estimation
KW - Monocular depth estimation
UR - http://www.scopus.com/inward/record.url?scp=85088518074&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-49108-6_38
DO - 10.1007/978-3-030-49108-6_38
M3 - Conference contribution
AN - SCOPUS:85088518074
SN - 9783030491079
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 532
EP - 543
BT - Universal Access in Human-Computer Interaction. Applications and Practice - 14th International Conference, UAHCI 2020, Held as Part of the 22nd HCI International Conference, HCII 2020, Proceedings
A2 - Antona, Margherita
A2 - Stephanidis, Constantine
PB - Springer
T2 - 14th International Conference on Universal Access in Human-Computer Interaction, UAHCI 2020, held as part of the 22nd International Conference on Human-Computer Interaction, HCII 2020
Y2 - 19 July 2020 through 24 July 2020
ER -