r3st

0. Introduction

The Goal:

Create a deep learning neural network that will accept text as input and return ASCII art of the text.

Currently, the most popular, open-source image generation machine learning technique is Stable Diffusion.

In a nutshell, we'll be creating Denoising Probabilistic Diffusion Models.

It's Probable

Deep Unsupervised Learning using Nonequilibrium Thermodynamics

The previous link is a link to the seminal 2015 academic paper that founded Denoising difussion probabilistic models (DDPM) in machine learning. A perfect place to start on our journey.

The authors say that in order to create the models, they used a Markov Chain. What is a Markov Chain? Who knows? Wikipedia says:

A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event.

Well, what's a stochastic model? If you find out, let me know. Wikipedia has some words:

In probability theory and related fields, a stochastic (/stəˈkæstɪk/) or random process is a mathematical object usually defined as a sequence of random variables in a probability space, where the index of the sequence often has the interpretation of time.

Let's ground this in a real-world example:

Imagine you are on a date. The random variable represents what you say. You can say anything. Anything at all. Completely randomly. According to wikipedia:

Informally, a Markov chain may be thought of as:

"What happens next depends only on the state of affairs now."

In this example, let's define a two-state Markov chain:

State A: The first state	State B: The second state
You say your random statment.	The state of your date after your random statement.

Imagine, you, compelled by the randomness of the universe, call your date fat (state A). The probability of your date being over would be 1.0 (or 100%). It doesn't matter if you choose a great place or if your fit is fire. All that mattered was the state A. Here's a diagram to help:

After state B, you are no doubt walking home alone. As misfortune would have it, a single dark cloud follows you overhead. It begins to rain. The rain follows a normal distribution:

The paper states that the Markov-chain converts a simple know distribution into a target (data) distribution using a diffusion process. .

Denoising difussion probabilistic models requires the use of probablity. It's in the name! If you are like me, statistics and probablity are not your strong suit. So, let's take it nice and slow and explain everything as simply as we can.

Notes

This section and everything after it is a work-in-progress. Please ignore. Or don't.

To this end, the first step is to watch how to code Stable Diffusion from: Coding Stable Diffusion from scratch in PyTorch So far, the project is going according to plan. I've been watching and reading machine learning content related to diffusion models. Two resources have stood out:

The first resource is a Youtube video that provides a link to stable diffusion code.

The second resource is a Youtube video that explains the math of stable diffusion models at a high-level.

There's another annotated code writeup: The Annotated Diffusion Model

Double Annotated?

The Annotated Diffusion Model is good overall, but it's really hard for a beginner (like me) to understand.

So, I'll give it a crack to re-word it simply. Like for real, what is this?

q(x0)	=	the real data distribution, say of "real images".
x0 ∼ q(x0)	=	sample from this distribution to get an image
q(xt∣xt−1) =	=	forward diffusion process

The forward diffusion process adds Gaussian noise at each time step. Time is according to a known variance schedule 0<β1<β2<...<βT<1. μt=1−βt xt−1 and σt2=βtσt2=βt,

q(xt∣xt−1)=N(xt;1−βtxt−1,βtI)

q(xt∣xt−1)=N(xt;1−βt xt−1,βtI).

$p_{θ} (x_{t - 1} ∣ x_{t}) = N (x_{t - 1}; μ_{θ} (x_{t}, t), Σ_{θ} (x_{t}, t))$

This is beyond human comprehension.

μθ: the mean parametrized
Σθ: the variance parametrized

Day 0: The Goal

0. Introduction

It's Probable

Notes

Double Annotated?