Conditional Generative ConvNets for
Exemplar-based Texture Synthesis

Zi-Ming Wang, Meng-Han Li, Gui-Song Xia ,
LIESMARS CAPTAIN, Wuhan University, Wuhan, China

School of Computer Science, Wuhan University, Wuhan, China
Paris Descartes University, Paris, France

Brief overview

The goal of exemplar-based texture synthesis is to generate texture images that are visually similar to a given exemplar. Recently, promising results have been reported by methods relying on convolutional neural networks (ConvNets) pretrained on large-scale image datasets. However, these methods have difficulties in synthesizing image textures with non-local structures and extending to dynamic or sound textures. In this paper, we present a conditional generative ConvNet (cgCNN) model which combines deep statistics and the probabilistic framework of generative ConvNet (gCNN) model. Given a texture exemplar, the cgCNN model defines a conditional distribution using deep statistics of a ConvNet, and synthesizes new textures by sampling from the conditional distribution. In contrast to previous deep texture models, the proposed cgCNN does not rely on pre-trained ConvNets but learns the weights of ConvNets for each input exemplar instead. As a result, the cgCNN model can synthesize high quality dynamic, sound and image textures in a unified manner. We also explore the theoretical connections between our model and other texture models. Further investigations show that the cgCNN model can be easily generalized to texture expansion and inpainting. Extensive experiments demonstrate that our model can achieve better or at least comparable results than the state-of-the-art methods.

Introduction


Exemplar-based texture synthesis (EBTS) has been a dynamic yet challenging topic in computer vision and graphics for the past decades, which targets to produce new texture samples that are visually similar to a given exemplar. Recently, deep ConvNets have been used in texture modelling. These models employ deep ConvNets that are pretrained on large-scale image data sets as feature extractors, and generate new samples by seeking images that maximize certain similarity between their deep features and those from the exemplar.

We propose a new texture model named conditional generative ConvNet (cgCNN) by integrating deep texture statistics and the probabilistic framework of generative ConvNet (gCNN). Unlike previous texture models that rely on pretrained ConvNets, cgCNN learns the weights of the ConvNet for each input exemplar. It therefore has two main advantages:
  • It allows to synthesize image, dynamic and sound textures in a unified manner.
  • It can synthesize textures with non-local structures without using extra penalty terms.
The main contributions of this work are summarized as follows:
  • We propose a new texture model named cgCNN which combines deep statistics and the probabilistic framework of gCNN model. Instead of relying on pretrained ConvNets as previous deep texture models, the proposed cgCNN learns the weights of the ConvNet adaptively for each input exemplar. As a result, cgCNN can synthesize high quality dynamic, sound and image textures in a unified manner.
  • We present two forms of cgCNN and show their effectiveness in texture synthesis and expansion : c-cgCNN can synthesize highly non-stationary textures without extra penalty terms, while f-cgCNN can synthesize arbitrarily large stationary textures. We also show their strong theoretical connections with previous texture models. Note f-cgCNN is the first deep texture model that enables us to expand dynamic or sound textures.
  • We present a simple but effective algorithm for texture inpainting on the proposed cgCNN. To our knowledge, it is the first neural algorithm for inpainting sound textures.
  • Extensive experiments in synthesis, expansion and inpainting of various types of textures using cgCNN. We demonstrate that our model achieves better or at least comparable results than the state-of-the-art methods.
We present our results as follows, click each image in the following section to display the full size image.

Bounded constraint


We compare the results using different activation functions, i.e. hard sigmoid, tanh and sigmoid. The results of hard sigmoid are best.

Exemplar
hard tanh
tanh
sigmoid

Diversity of c-cgCNN


It is important for a texture synthesis algorithm to be able to synthesis diversified texture samples using a given exemplar. For the proposed c-cgCNN model, the diversity of the synthesized textures is a direct result of the model definition and the randomness of the initial Gaussian noise, thus one does not need to make extra efforts to ensure such diversity.

Exemplar
Synthesized sample 1
Synthesized sample 2
Synthesized sample 3

More samples of the ``bubble'' texture:

Ablation study on the learning algorithm


In order to verify the importance of D-learning step, we compare our method with a fixed random method, which is to synthesize textures without D-learning step. Clearly, our method produces more favorable results than this fixed random method.

Exemplar Energy evolutions.

Ours
Random
Iter 0 Iter 100 Iter 200 Iter 300 Iter 600 Iter 900 Iter 2000 Iter 4000

Ablation study on the network architecture


We investigate the roles played by different layers in our network by using sub-networks with increasing receptive fields. The results in the leftmost (rightmost) column are generated by the sub-network with the smallest (largest) receptive field.

Exemplar
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Image texture synthesis


Image texture synthesis using c-cgCNN. All exemplars and synthesized images are of size (256, 256).

Exemplar Gatys' model gCNN CoopNet self-tuning c-cgCNN-Gram c-cgCNN-mean

Dynamic texture synthesis


Dynamic texture synthesis using c-cgCNN. All exemplars and synthesized dynamic textures have 12 frames and each frame is of size (128, 128).

Exemplar
Two-stream
c-cgCNN-Gram
c-cgCNN-mean

Sound texture synthesis


Sound texture synthesis using c-cgCNN. All exemplars and synthesized textures have 50000 data points (~2 seconds).

Air conditioner Pneumatic drill Applause Frog
Exemplar
McDermott's model
Antognini's model
c-cgCNN-mean
c-cgCNN-Gram

Image texture expansion


Texture expansion using f-cgCNN. The size of exemplars is (256, 256), and the size of synthesized textures is (896, 896).

Exemplar
f-cgCNN
TextureNet
Exemplar
f-cgCNN
TextureNet

Dynamic texture expansion


Expanding dynamic textures using f-cgCNN. Exemplars have 12 frames and each frame is of size (128, 128), while synthesized textures have 47 frames and each frame is of size (475, 475).

Exemplar
f-cgCNN

Sound texture expansion


Expanding sound textures using f-cgCNN. Exemplars have 16384 data points (less than 1 second), and synthesized textures have 122880 data points (~5 seconds)

bees shaking paper wind
Exemplar
f-cgCNN

Image texture inpainting


Image texture inpainting using our method. The exemplars are of size (256, 256), and the size of masks is (60, 60). No post-processing is used.

Corrupted exemplar
deep prior
deep fill
cgCNN

Dynamic texture inpainting


Dynamic texture inpainting using our method. The size of exemplars is (128, 128), and the size of masks is (30, 30). No post-processing is used.

Corrupted exemplar
Inpainted

Sound texture inpainting


Sound texture inpainting using our method. The corrputed exemplars have 50000 data points, and the masks cover the intervals from the 20000-th to the 30000-th data point. No post-processing is used.

bees helicopter steam insects
Corrupted exemplar
Inpainted

References


  1. Texture synthesis using convolutional neural networks
    Gatys, L and Ecker, Alexander S and Bethge, Matthias NIPS, 2015
  2. A theory of generative convnet
    Xie, J and Lu, Y and Zhu, SC and Wu, YN ICML, 2016
  3. Cooperative Training of Descriptor and Generator Networks
    Xie, J and Lu, Y and Gao, R and Zhu, SC and Wu, YN IEEE TPAMI, 2018
  4. Self tuning texture optimization
    Kaspar, Alexandre and Neubert, Boris and Lischinski, Dani and Pauly, Mark and Kopf, Johannes Computer Graphics Forum, 2015
  5. Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis
    Ulyanov, Dmitry and Vedaldi, Andrea and Lempitsky, Victor CVPR, 2017
  6. Two-Stream Convolutional Networks for Dynamic Texture Synthesis
    Tesfaldet, Matthew and Brubaker, Marcus A and Derpanis, Konstantinos G CVPR, 2018
  7. Sound texture synthesis via filter statistics
    McDermott, Josh H and Oxenham, Andrew J and Simoncelli, Eero P WASPAA, 2009
  8. Audio texture synthesis with random neural networks: Improving diversity and quality
    Antognini, Joseph M and Hoffman, Matt and Weiss, Ron J ICASSP, 2019
  9. Deep image prior
    Ulyanov, Dmitry and Vedaldi, Andrea and Lempitsky, Victor CVPR, 2018
  10. Generative image inpainting with contextual attention
    Yu, JH and Lin, Z and Yang, JM and Shen, XH and Lu, X and Huang, Thomas S CVPR, 2018