Single image-based food volume estimation using monocular depth-prediction networks

Alexandros Graikos, Vasileios Charisis, Dimitrios Iakovakis, Stelios Hadjidimitriou, Leontios Hadjileontiadis

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

14 Scopus citations

Abstract

In this work, we present a system that can estimate food volume from a single input image, by utilizing the latest advancements in monocular depth estimation. We employ a state-of-the-art, monocular depth prediction network architecture, trained exclusively on videos, which we obtain from the publicly available EPIC-KITCHENS and our own collected food videos datasets. Alongside it, an instance segmentation network is trained on the UNIMIB2016 food-image dataset, to detect and produce segmentation masks for each of the different foods depicted in the given image. Combining the predicted depth map, segmentation masks and known camera intrinsic parameters, we generate three-dimensional (3D) point cloud representations of the target food objects and approximate their volumes with our point cloud-to-volume algorithm. We evaluate our system on a test set, consisting of images portraying various foods and their respective measured volumes, as well as combinations of foods placed in a single image.

Original languageBritish English
Title of host publicationUniversal Access in Human-Computer Interaction. Applications and Practice - 14th International Conference, UAHCI 2020, Held as Part of the 22nd HCI International Conference, HCII 2020, Proceedings
EditorsMargherita Antona, Constantine Stephanidis
PublisherSpringer
Pages532-543
Number of pages12
ISBN (Print)9783030491079
DOIs
StatePublished - 2020
Event14th International Conference on Universal Access in Human-Computer Interaction, UAHCI 2020, held as part of the 22nd International Conference on Human-Computer Interaction, HCII 2020 - Copenhagen, Denmark
Duration: 19 Jul 202024 Jul 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12189 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference14th International Conference on Universal Access in Human-Computer Interaction, UAHCI 2020, held as part of the 22nd International Conference on Human-Computer Interaction, HCII 2020
Country/TerritoryDenmark
CityCopenhagen
Period19/07/2024/07/20

Keywords

  • Deep learning
  • Food image processing
  • Food volume estimation
  • Monocular depth estimation

Fingerprint

Dive into the research topics of 'Single image-based food volume estimation using monocular depth-prediction networks'. Together they form a unique fingerprint.

Cite this