Project Team member: Youngwoong Cho, Danny Hong, Rena Kim, Ahzin Nam

Introduction

Our project began with a question, “what if inorganic things ‘decay’, like organic things?”

The importance of sustainable design is increasing. Through deep convolutional neural network, we imagined the world where everything is organic, and thus perishes.

A deep neural network was trained to predict the “decayed” appearance of an arbitrary input image. A dataset that consists of several hundred images of fresh/rotten apples, oranges, and bananas were used to train the neural network. pix2pix, CycleGAN, and DiscoGAN were considered as our neural network model, and CycleGAN was found to perform the best.

this is a placeholder image
Figure 1. Our initial idea sketch. The model is expected to learn how to translate the image in the "fresh" domain to the "rotten" domain and vice versa.

Decay, in Biology

The word decay nearly instinctively gives a negative impression, because it is usually associated with death. After an organic body goes through death, it biodegrades, or less formally, decays. The decomposers, such as fungi or bacteria, break down the organic body, changing it into an inorganic matter and blending it back in with the earth.

The physical and chemical properties of the dead organism change as it decays. As a result, the appearance becomes drastically different after an organic matter decays. The colors might change from vivid to dull; dark moles might appear; cavities with varying sizes might be created as well.

Decay and sustainability

However, humankind nowadays are rather more concerned of materials that do not decay — for example, plastics. According to Plastic Soup Foundation, plastic is non-biodegradable.

Plastic does not decompose. This means that all plastic that has ever been produced and has ended up in the environment is still present there in one form or another.

This is the reason why sustainable design is gaining popularity lately. Man-made products, after some time, must consider the environmental impact. Biodegradable plastics are one of such attempts.

Decay, and Neural network

Through deep convolutional neural network, we imagined the world where everything is organic, and thus perishes. We trained a cycleGAN model, a type of Generative Adversarial Network (GAN), that predicts the “decayed” appearance of any arbitrary input images — a tennis ball, a rubber duck. etc.

this is a placeholder image
Figure 2. A regular tennis ball that would not decay, obviously. (left) Our model predicts the tennis ball after it decays. (right)

Process

Problem definition

The goal of the project is to design and build a neural network model that is capable of translating the image input $I \in \mathbb{R}^{3 \times H \times W}$ to an image output $O \in \mathbb{R}^{3 \times H \times W}$, where $H$ and $W$ are the height and the width of the image, respectively.

Dataset

The dataset taken from “Fruits fresh and rotten for classification” consists of train and test sets of fresh and rotten images of apple, banana, and orange. Take a look at Table 1 (a) and Table 1 (b) for the number of images for each class.

Fruit type Fresh/Rotten Number of images
Apple Fresh 1693
  Rotten 2342
Banana Fresh 1581
  Rotten 2224
Orange Fresh 1466
  Rotten 1595
Table 1 (a). Number of images of each fruit type of the training dataset.
Fruit type Fresh/Rotten Number of images
Apple Fresh 395
  Rotten 601
Banana Fresh 381
  Rotten 530
Orange Fresh 388
  Rotten 403
Table 1 (b). Number of images of each fruit type of the test dataset.
this is a placeholder image
Figure 3. An image of a fresh apple (left) and a rotten apple (right) that are used to train the model.

Model selection

There are various neural network architecture that can carry out the task of image-to-image translation. We have narrowed down the candidates into three: pix2pix, DIicoGAN, and CycleGAN.

Pix2pix

Our first consideration was pix2pix.

this is a placeholder image
Figure 4. An image snippet from pix2pix paper.

However, it was not successful, since pix2pix required a set of paired dataset. It was almost impossible to prepare several hundreds of fresh-rotten pair of images.

DiscoGAN

Our next consideration was DiscoGAN. It was chosen since it not only could take in unpaired dataset but also was capable of performing geometry change. Since an apple, an orange, or a banana goes through a morphological change when rotten, DiscoGAN seemed to be a good choice.

Following figure shows the resulting images after 50 epochs of training with apple2rotten dataset, which consists of 184 images of fresh apple and 271 images of rotten apple.

this is a placeholder image
Figure 5. Images generated during the training of the model.

0.A.jpg is the original image of an apple, 0.AB.jpg is the image that is mapped to the domain of rotten apple, and 0.ABA.jpg is the reconstructed image, which is mapped back to the original domain. As it can be seen, the translated image(0.AB.jpg) not only shows the change in colors but also in its contours.

However, the transformation was not performing well at all for the images that are not apples. Take a look at following transformations.

this is a placeholder image this is a placeholder image
Figure 6. Inference of the discoGAN trained with apple2rotten image dataset. The image of a face is turned into a mess!

When the model attempts to convert a non-apple image into an apple image, it tries to force to translate the image into an apple; thus, drastic failure.

CycleGAN

Our next consideration was CycleGAN because it could preserve the geometric characteristics of the input image while being capable of handling the unpaired dataset.

Following figures show the resulting images after 200 epochs of training with banana2rotten dataset, which consists of 342 images of fresh banana and 484 images of rotten banana.

this is a placeholder image
this is a placeholder image
Figure 7. An inference of the cycleGAN trained with banana2rotten image dataset. It can be seen that the geometry of the input image is preserved well.

It seems working! However, since our dataset mostly consists of banana, the model performed well only for the yellow images. Therefore, we decided to train the model using all images of the fruits - apples, oranges, and bananas altogether, so that our model can be ready for more diverse colors.

Following image shows some of the resulting images generated by the model while being trained.

this is a placeholder image
Figure 8. Inference of the cycleGAN trained with all fruit2rotten image dataset. Not only the yellow but also the red and orange colors are being translated well.

It can be seen that the model is trained to translate the images of red, yellow, orange, and green.

Result

Following images are generated by our model that is trained to predict the decay of the input image.

this is a placeholder image this is a placeholder image
Figure 9. Inference of the cycleGAN trained with all fruit2rotten image dataset.

The model is trained with 505 images of fresh fruits, and 690 images of rotten fruits. It was trained for 200 epochs.

Real-time inference from a webcam input

After the training is done, our Decay model is deployed on a machine with a webcam to perform the image translation in real time.

The code of the project can be found from the following Github link.

Leave a comment