Using Hugging Face we gain access to the DeepFloyd/IF-I-XL-v1.0 model for our difusion model. The random seed used throughout the project is 92. As you can see, as the step size increase, we can see that the result generated seems to be sharper (more obvious for man wearing hat) and closer to the prompt (for oil painting of snowy village).
We are adding noise to an image given a value t ranging from [0, 999].
We do a simple Gaussian blur filtering to try to remove the noise.
We use a pretrained UNet to do a one step denoise.
Given noisy image (from part 1.2):
1. Using the UNet, denoise the image by estimating the noise.
2. Estimate the noise in the new noisy image, by passing it through stage_1.unet.
3. Remove the noise from the noisy image to obtain an estimate of the original image.
4. Visualize the original image, the noisy image, and the estimate of the original image.
Starting from some t within [0, 999], we use the following equation to iteratively get closer to the original (clean) image.
Setting i_start = 0, we can generate random images with the prompt "a high quality photo".
We compute both a conditional and an unconditional noise estimate and use the following equation to create a new noise estimate. For the project we set gamma to be 7. We then generated 5 more images with this new function. Note that the new images are a lot better than the ones generated in part 1.5.
Using CFG from 1.6, we can now set different noise levels (t) and see how the images gradually matching the original image closer and closer. Note that we set t with indexes [1, 3, 5, 7, 10, 20] steps where the steps decrements from 990 with intervals of -30 (Just like part 1.4).
Using same idea from 1.7, we can now apply this to web images or hand drawn images.
We can apply mask to only generate part of our image!
Instead of "a high quality photo", we would use the prompt "a rocket ship".
Using the following equation, we could now create an image that has two different concepts when looked from different direction (rotated 180 degrees).
Using the following equation, we could now create a hybrid image.
I trained a UNet accroding to the spec and got the following results. Note that the process resembles the "1 step" denoising in project 5A. The attached images are all corresponding figure deliverables for part 1 (Both training and sampling).
I added time conditioning to the UNet. Note that this is similar to the iterative denoising process we see in project 5A. The deliverable images is as follows (Both training and sampling).
I added class-conditioning to the UNet. We train the UNet to target towards specific numbers. In other words, now we are not only training the model to recognize valid numbers, we also want them to know which number is what. The deliverable images is as follows (Both training and sampling).