So far, this series of articles by Cyber expert Ravi Das has covered two advanced topics in Generative AI: the Variational Autoencoder and the General Adversarial Network. In this latest installment, the author moves on to the topic of the Diffusion Model.
Another recent advancement that has been made in the Generative AI is that of the “Diffusion Model.” It is sophisticated in the sense that it actually draws upon the concepts of Quantum Mechanics and Computer Science. For example, whatever level of “noise” that may actually exist in the Datasets (whether they are “fake” or real) can actually be converted into newer forms of Datasets, which can also be ingested into the Generative AI Model. On a macro level, this process is done by actually Reverse Engineering the point where at “noise” is first introduced in the Datasets.
There are several key processes that are actually involved with a Diffusion Model, and they are as follows:
The Noise Schedule: This is where a sequencing of “noise” is first discovered by the Diffusion Model and is thus represented as an upward slope where the least amount of “noise” is at the beginning of the gradient, and the most amount of noise is at the top of the gradient. This is illustrated in the diagram below:
Figure 1: The Noise Schedule
There is a distinct tradeoff here as well, in that the bottom of the gradient will represent Datasets that are clear and fully optimized, and the top of the gradient represents those Datasets that have almost no clarity to them whatsoever, and thus are not near to being fully optimized for the Generative AI Model to makes use of and computing the Outputs to queries that are presented to it.
The Markov Chain: This particular process makes use of the “Hidden Markov Model.” But as it relates to the Diffusion Model, a “Markov Chain” represents a single point of space along the upward slope of the gradient. This space actually represents a point of “noise” and as the illustration up above illustrates, there will be more spaces like these, as you further along up the gradient. In other words, there is the least amount of space at the very bottom of the gradient, but there will be the most space at the very top of the gradient. This is illustrated in the diagram below:
Figure 2: The Markov Chain
The ultimate goal of the Markov Chain is to distort all levels of “noise” as much as possible so that they will become unrecognizable to the Generative AI Model.
The Conditional Modeling: This process makes use of a specialized Statistical Technique which actually tries to make an “estimation” as to what the pieces of information and data could possibly look like at every space that becomes present along the gradient. It will also include in this analysis the information and data that were also present at previous steps. So, in the end, when this technique reaches the top of the gradient, this will be the total summation of all of the pieces of information and data which have “noise” embedded into them, and what their corresponding representations look like. The primary objective here is to see how much “noise” actually has to be distorted in the end.
The Reverse Process: This process occurs once the last step has stopped or comes to a complete halt. The goal here is to ultimately remove all of the “noise” that has been detected and ascertained at all of the spaces along the gradient. Another objective here is to make these pieces of information and data resemble a realistic Dataset as much as possible, so that it can be subsequently fed into the Generative AI Model for further usage.
One of the primary reasons why the Diffusion Model works so well in Generative AI is that it has been carefully designed so that any differences that reside between the “estimated” (or “hypothesized”) data and any real Datasets that have actually been observed is minimized as much as possible. It should also be noted here that while traditional Generative AI Models typically try plot the “noise” to Statistical Distribution of the Datasets, the Diffusion Model totally avoids this from happening by minimizing the “noise” as much as possible. The end result of all of this is that any Outputs yielded by the Generative AI model will be as authentic and realistic as possible.
Also, Diffusion Models have been used where Datasets need to be created that have a wide range of characteristics. For example, since a Generative AI Model can ingest both quantitative and qualitative based Datasets, this diversity is greatly needed, and only a Diffusion Model can actually deliver this.
The Major Categories of Diffusion Models
Now that we have reviewed what a Diffusion Model is, we can advance one step further and provide an overview into the major categories of Diffusion Models. They are as follows:
The Denoising Diffusion Probabilistic Model: These are also referred to technically as the “DPPM.” This is mostly used where the Datasets are those primarily of images. In this case, an image that is over pixilated and thus contains too much “noise” will eventually be eradicated through a defined process that resides from the DDPM. This process is technically known as “Maximum Likelihood Estimation”, which tries to either get rid of the pixels in the image that are necessary, or simply close the distance between them so that the image will eventually become clearer, thus making it usable for the Generative AI Model.
The Score Based Diffusion Model: This is also known as the “SBM” for short. This kind of model makes use of what is known as a “Score Function.” This is where the “SBDM” will actually estimate, or even measure the Statistical Probability that an image is actually a real one, and not a fake. In a way, the SMB is actually like an Artificial Intelligence Model of themselves, as they are trained by using an algorithm which is known as the “Adversarial Training.” From this, the SBM can also be used to generate “fake” images that look like the real thing, although this is not its primary objective. The mathematical representation of the SBM is as follows:
S(x) = x log P data(x)
The Stochastic Differential Equation Based Diffusion Model: This is technically referred to as the “SDE” for short. The reason this particular model got the name that it did is that uses an actual based “Stochastic Differential Equation” to actually generate images for the Generative AI Model, in a manner that is very much similar to all of the other processes. The SDE also introduces a sense of “randomness” into the generated image, so that it will incorporate a sense of uniqueness into it. Further, this process is trained by using what is known as the “General Adversarial Training.” It should also be noted that the reverse process can also happen with the SDE can also happen, in that a generated can also be broken down into various pieces of “noise.” The mathematical representations of these are as follows:
Creating an image from noise:
Dx = f(x,t) * dt + g(t)* dw
Creating noise from an image:
Dx = [f(x,t) – g^2(t) xLogPt(x)] * dt = g(t) * dw
The Latent Representation Model: This is also technically referred to as the “LRM” for short. This is a specialized type of pf Diffusion Model that uses the architecture of a Neural Network. This is where the concept of “Neurons” is used and is thus represented by multiple layers in the actual Model. This approach is rather limited, as the Neural Network can only create what is known as a “Latent Image” strictly from the Datasets that have been ingested into it and trained upon. These generated images are actually a collection of mathematical based vectors, and this is stored into the Generative AI Model (assuming that the Neural Network component is also incorporated into it as well) so that future images can also be produced from this baseline. In this case, another type of Neural Network, called the “Convolutional Neural Network” (also known as the “CNN”) can be used as well.
A CNN can be technically defined as follows:
“Convolutional neural networks use three-dimensional data for image classification and object recognition tasks.”1
A unique advantage of a CNN is that it can extract various features from an image, which could have a different scale length. Thus, this kind of model could also prove very useful Facial Recognition, which is a Biometric Modality that tries to confirm the identity of an individual by extracting the unique features from their face.
Along the with the CNN, the LRM also makes use of a specialized Statistical Technique which is technically known as the “Maximum Likelihood Estimation”, also known as the “MLE” for short. One of the primary objectives of the MLE is to ascertain and implement the parameters for creating an image so that it can be subsequently used by the Generative AI Model. The LRM is mathematically represented as follows:
Po * (Xt-1|Xt)
The Diffusion Process: This is a process that also makes use of the “Markov Chain,” but it is also deemed to be a “Probabilistic Process” from the standpoint of Statistics. What is unique about this is that an image is created from the raw Datasets in a series of phases, which is directly visible to the end user that is trying to create them, i.e., it can only move from one “Statistical State” to the next only step at a time, and not any faster than that. To produce this, a concept called the “Diffusion Rate” is also applied, which simply means that throughout each “Statistical State,” the generated image will not resemble anything like the Datasets from which it has been derived. Another technique called the “Gaussian Diffusion Process” can be used here as well.
The Decoding Process: This part of the Diffusion Model once again makes use of a Neural Network Architecture, and in this case, an actual, real-world image can be reverse engineered into the Datasets from which it was originally derived from. A specialized Statistical Technique is also used here, which is known as the “Mean Squared Error,” or also known as the “MSE” for short. Its primary objective in the Diffusion Model is to reduce the differences that have been found between the reverse engineered image and the actual, real-world image.
Up Next: The DALL E -2 Algorithm
One of the cutting-edge solutions used in Generative AI is an algorithm that converts human language into an image to be used as the Output. You’ll learn about that algorithm, known as the DALL-E-2, in the next article in this series.
Sources/References:
Ravi Das is a Cybersecurity Consultant and Business Development Specialist. He also does Cybersecurity Consulting through his private practice, RaviDas Tech, Inc. He also possesses the Certified in Cybersecurity (CC) cert from the ISC2.
Visit his website at mltechnologies.io