Li, Yijun

Data-Driven Visual Synthesis for Natural Image and Video Editing

2019

Li, Yijun
Advisor(s): Yang, Ming-Hsuan

Creative Commons 'BY' version 4.0 license

Abstract

Visual data are what make our daily life fun. Often times, we consume those data created by experts in related fields, e.g., appreciating artworks drawn by famous painters or watching movies shot by professional directors. How about creating the desired data that show our own feelings, ideas and creativity by ourselves? This comes to the Visual Synthesis, which is the process of synthesizing new data or altering existing data. However, attempts from large amounts of non-experts often end up deviating from the manifold of real natural data, leading to unrealistic results with undesired artifacts. The goal of all research work in this thesis is to develop effective computational models to preserve visual realism and facilitate more stunning creations. We mainly develop data-driven approaches by learning from large amounts of existing created visual data and explore effective models so that they can generalize to enormous unseen target data. Essentially, visual synthesis is working on manipulating different factors that form the final observed data, such as structure, style, content, motion and so on. Along this direction, we mainly explore four synthesis tasks for various image and video editing scenarios, including structure enhancement, style transfer, content filling and motion prediction.

Chapter 3 describes a joint filtering method on enhancing the sharpness of low-quality structures in images. The basic idea is to leverage a reference image as a prior and transfer the structural information to the target image. Chapter 4 presents how to alter the style of an image with another new style. We propose a universal style transfer algorithm that works for arbitrary style inputs. Chapter 5 focuses on how to fill in the missing content in images in order to remove occlusions. We aim at the face completion which is more challenging as it often requires generating semantically new pixels for the missing key components. In Chapter 6, we present a novel algorithm on how to generate pixel-level future frames in multiple time steps given one still image. This represents an important step towards simulating similar preplay activities that might constitute an automatic prediction mechanism in human visual cortex.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC Merced

Data-Driven Visual Synthesis for Natural Image and Video Editing