Learning to manipulate images with image segmentation, search and synthesis

Zhao, Yinan
Journal Title
Journal ISSN
Volume Title

Images are important and efficient information carriers in humans’ daily lives. With the prevalence of digital cameras and the Internet, it becomes much easier to take photos and share them with other people. However, people do not always desire to keep their photos as they are. Sometimes they wish to wipe out undesired regions or insert novel content in their photo, without an obvious trace of manipulation. Despite the need for image manipulation, it is not straightforward for a user to edit images as they wish. In my thesis, I explore how to efficiently facilitate the creative process of manipulating realistic visual content in images photographed from the real world. I aim to propose data-driven solutions to assist users in manipulating images without an obvious trace of manipulation. I explore learning-based methods in three aspects: image segmentation, image search and image synthesis. In image segmentation, I target semantic image segmentation to match human’s high-level understanding of images, in order to make it easier for users to isolate, select or replace their desired image segments. To reduce the demand for a large number of annotated images in training, I propose a few-shot semantic segmentation method, which can learn from only a few annotated examples by taking advantage of ‘objectness’. In image search, I propose the problem of unconstrained foreground object search, the goal of which is to search for a diverse set of foreground objects that are semantically compatible with a background image without any constraint on what objects to retrieve. I also propose a solution for this problem that supports efficient search by encoding the background image in the same latent space as the candidate foreground objects. In image synthesis, I propose a guided image inpainting solution, which can aid users in replacing undesired content from their photos with plausibly realistic alternatives. The key idea is to use another real image to guide the synthesis procedure with a deep learning framework.