Image captioning algorithms for images taken by people with visual impairments

Zhang, Meng, M.S. in Information Studies

Image captioning algorithms for images taken by people with visual impairments

Access full-text files

ZHANG-MASTERSREPORT-2019.pdf (4.27 MB)

Date

2019-07-08

Authors

Zhang, Meng, M.S. in Information Studies

Abstract

People with visual impairments regularly encounter the challenge that their visual impairments expose them to a time-consuming, or even impossible, task: what content is presented in an image without assistance. One method to address this problem is image captioning with machine learning. With the help of image captioning algorithms together with artificial intelligence speech system, people who are blind can instantly learn what is in an image, since such systems can automatically generate text captions. In this work, we analyze the new VizWiz dataset and compare it to the MSCOCO dataset, which is widely used for evaluating the performance of image captioning algorithms. We also implement and evaluate two state-of-the-art image caption models with accuracy, runtime, and resource analysis. Hopefully, our research will help the improvement of image captioning algorithms which focus on fulfilling the everyday needs of people with visual impairments