The Goal:
Create a deep learning neural network that will accept text as input and return ASCII art of the text.
Currently, the most popular, open-source image generation machine learning technique is Stable Diffusion.
In a nutshell, we'll be creating Denoising Probabilistic Diffusion Models.
The previous link is a link to the seminal 2015 academic paper that founded Denoising difussion probabilistic models (DDPM) in machine learning. A perfect place to start on our journey.
The authors say that in order to create the models, they used a Markov Chain. What is a Markov Chain? Who knows? Wikipedia says:A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event.Well, what's a stochastic model? If you find out, let me know. Wikipedia has some words:
In probability theory and related fields, a stochastic (/stəˈkæstɪk/) or random process is a mathematical object usually defined as a sequence of random variables in a probability space, where the index of the sequence often has the interpretation of time.
Let's ground this in a real-world example:
Imagine you are on a date. The random variable represents what you say. You can say anything. Anything at all. Completely randomly. According to wikipedia:
Informally, a Markov chain may be thought of as: "What happens next depends only on the state of affairs now."
In this example, let's define a two-state Markov chain:
State A: The first state | State B: The second state |
---|---|
You say your random statment. | The state of your date after your random statement. |
After state B, you are no doubt walking home alone. As misfortune would have it, a single dark cloud follows you overhead. It begins to rain. The rain follows a normal distribution:
The paper states that the Markov-chain converts a simple know distribution into a target (data) distribution using a diffusion process. .
Denoising difussion probabilistic models requires the use of probablity. It's in the name! If you are like me, statistics and probablity are not your strong suit. So, let's take it nice and slow and explain everything as simply as we can.
This section and everything after it is a work-in-progress. Please ignore. Or don't.
The first resource is a Youtube video that provides a link to stable diffusion code.
The second resource is a Youtube video that explains the math of stable diffusion models at a high-level.
There's another annotated code writeup: The Annotated Diffusion ModelSo, I'll give it a crack to re-word it simply. Like for real, what is this?
q(x0) | = | the real data distribution, say of "real images". |
x0 ∼ q(x0) | = | sample from this distribution to get an image |
q(xt∣xt−1) = | = | forward diffusion process |
q(xt∣xt−1)=N(xt;1−βtxt−1,βtI)