Multimodal AI - Part 1: CLIP
Multimodal AI has always been an area I am personally very fascinated by, because I believe our understanding of the world comes from more than just language - it’s also vision and audio (and touch and smell, of course, but that might be outside the scope of what I’m capable of exploring in my own time). Since working at Adobe, I’ve grown an even deeper appreciation for how hard it is to teach machines to reason about the visual world, and beyond that, the physical world. ...
Diffusion Models - Part 2: Improved DDPM
In Part 1 of the Diffusion Models series, I covered the theory behind DDPM, the most basic diffusion model, which consists of: a forward process that gradually corrupts an image with Gaussian noise, a reverse process where a neural network U-Net learns to denoise step by step. If you got through all the math needed to understand DDPM, were not scared by it, and are in fact even more fascinated about diffusion models, the next (and hopefully easier to digest) paper is: ...
Diffusion Models - Part 1: DDPM
One of the places my curiosity took me to recently is Diffusion Models. Let’s start with Denoising Diffusion Probabilistic Models (DDPM; Ho et al. 2020). The GitHub repo for my PyTorch implementation of DDPM with instructions on how to train and generate images can be found here: halannhile/ddpm. Table of Contents Section 1: Theory Overview of DDPM The forward process The reverse process The training objective Training & sampling algorithms Section 2: Code ...