OpenAI’s Sora is powered by Diffusion transformer (DiT): What is it?

28/04/2024

In February, OpenAI stunned the world with its latest AI model, Sora. The latest offering from the Sam Altman-led company could use prompts in natural language and generate minute-long videos in high definition. This model, which came after Runway’s Gen-2 and Google’s Lumiere, showcased some breathtaking capabilities of video generation that could potentially replace filmmaking in the future.

At present, there are two kinds of models that are fuelling the AI innovation – transformers and diffusion models. These are essentially architecture that have redefined the landscape of machine learning, a subset of AI, applications. Transformer-based models have radically changed how machine learning models engage with text data, in terms of classifying it and generating it. On the other hand, diffusion models have become the most preferred for AI that generates images.

It needs to be noted that Diffusion models are the process of diffusion which is essentially the spreading of particles from a dense space to a lesser dense area. Sora is not a large language model (LLM), but is a diffusion transformer model. In this article, we will understand what a diffusion transformer model is, and how it is different from other AI models.

What is a Diffusion transformer?

Diffusion transformer also written as DiT is essentially a class of diffusion models that are based on the transformer architecture. DiT has been developed by William Peebles at UC Berkeley, who is currently the research scientist at OpenAI, and Saining XE at New York University in 2023. DiT is aimed at improving the performance of diffusion models by switching the commonly used U-Net (an architecture employed in diffusion models for iterative image denoising) backbone with a transformer.

ICYMI | How Crime GPT is using AI to help police forces with crime detection

Let’s simplify this – imagine you have a big jigsaw puzzle to solve. but you don’t know how the whole picture looks. So, you try to figure out one piece at a time. The DiT is like a special way to solve this puzzle. Usually, U-Net is used to solve it. But DiT uses something called a transformer instead. One can think of U-Net as a way to organise and understand puzzle pieces. However, this may not be the best solution all the time. In simple words, DiT is like a new and improved tool for solving big puzzles, for instance, like understanding complicated pictures or data.

When it comes to Sora, the DiT here uses the concept of diffusion for predicting videos and the strength of transformers for next-level scaling. This can be further broken down into – what happens to Sora after you give a prompt? And how does it employ the concept of diffusion transformers?

How does it all translate into videos?

Based on a LinkedIn post by Professor Tom Yeh, University of Colorado Boulder, here we attempt to simplify the process of prompt to video. Let’s imagine that you have a prompt, ‘Sora is sky’. Once you enter it, Sora splits a related video (from its dataset) into small parts called patches, similar to breaking it down into smaller puzzle pieces. Later, each patch is turned into a simpler version, like summarising it, which helps the model understand the video better.

Also Read | What is Limitless Pendant, the world’s smallest AI wearable device?

In the next step, some random elements (noise) is added to the summarised parts to make things interesting. Then comes the conditioning stage, where the prompt ‘Sora is sky’ is turned into numbers and are mixed up. This essentially helps the model adjust the video based on the prompt. In the next stage, the models use a special function to focus on different parts of the video and figure out what’s important.

Later in the attention pooling stage, the model focuses on the important parts of the video based on the prompt and random noise added. Using all the information, the model tries to guess what the noise may look like in different parts of the video. Now, the model pays attention to all the key details in the video, combines them and makes guesses about what should come next. In case the guess isn’t perfect, Sora learns from its mistakes and tries to do better. Finally, in the last stage Sora reveals the finished video without all the extra noise, making it look smooth and clear.

In simple words, DiT helps Sora understand text prompts and make cool videos by breaking them down into smaller parts, adding a bit of randomness, and then cleaning things up based on the text.

Advantages of DiT

DiT deploys transformers in a latent diffusion process, where noise is gradually transformed into the target image. This is done by reversing the diffusion process guided by a transformer network. The concept of diffusion timesteps is a key aspect of DiT. To simplify this, you have a tool called DiT which helps you make pictures. It works by using something called transformers to change a simple picture bit by bit into something you want. Think of it as cleaning a blurry image step by step. The diffusion timesteps act like checkpoints. At each checkpoint, DiT looks at what the picture looks like and decides how to make it better. In simple words, it is like different stages of cooking – you add different spices at different times.

When it comes to scalability, DiT can handle larger input data without sacrificing performance. This would need efficient resource usage and maintaining sample quality. For e.g., in natural language tasks, input size can vary widely. A scalable DiT should handle this variation without performance loss. As data volume grows, DiT’s ability to scale will be key.

For the latest news from across India, Political updates, Explainers, Sports News, Opinion, Entertainment Updates and more Top News, visit Indian Express. Subscribe to our award-winning Newsletter Download our App here Android & iOS

News Related

Anurag Kashyap unveils teaser of ‘Kastoori’

Anurag Kashyap unveils teaser of ‘Kastoori’ Finally, the highly anticipated teaser of the National Award-winning film ‘Kastoori’ launched, offering a glimpse into the compelling tale of childhood in search of the smell of happiness. Directed by Vinod Kamble and presented by acclaimed filmmakers Anurag Kashyap and Nagraj Popatrao Manjule, delves ...
See Details: Anurag Kashyap unveils teaser of ‘Kastoori’
Shehar Lakhot: Meet The Intriguing Characters Of The Upcoming Noir Crime Drama

New Delhi: After the overwhelming response to the action-packed trailer of the noir crime drama, ‘Shehar Lakhot’, Prime Video has unveiled intriguing posters featuring the intriguing leads of the series, raising the excitement among the audiences and capturing their attention. The eight-episode drama features interesting and multi-layered characters played by ...
See Details: Shehar Lakhot: Meet The Intriguing Characters Of The Upcoming Noir Crime Drama
Watch: 'My name is VVS Laxman...': When Ishan Kishan gave wrong answers to right questions

Watch: ‘My name is VVS Laxman…’: When Ishan Kishan gave wrong answers to right questions NEW DELHI: With a comfortable 2-0 series lead against Australia in the five-match series, Indian cricketers are chilling it out, having fun, enjoying each others’ company. After Monday’s video of Axar Patel throwing a ‘challenge’ ...
See Details: Watch: 'My name is VVS Laxman...': When Ishan Kishan gave wrong answers to right questions
Tennis-Sabalenka, Rybakina to open new season in Brisbane

FILE PHOTO: Tennis – WTA Finals – Cancun, Mexico – November 5, 2023 Belarus’ Aryna Sabalenka in action during her semi final match against Poland’s Iga Swiatek REUTERS/Henry Romero SYDNEY (Reuters) – World number two Aryna Sabalenka and number four Elena Rybakina will open their 2024 seasons at the Brisbane ...
See Details: Tennis-Sabalenka, Rybakina to open new season in Brisbane
Sikandar Raza Makes History For Zimbabwe With Hattrick A Day After Punjab Kings Retain Him- WATCH

sikandar raza makes history for zimbabwe with hattrick a day after punjab kings retain him- watch Zimbabwe allrounder Sikandar Raza created history on Monday by becoming the first male cricketer from his country to pick a hat trick in T20Is. He achieved the feat in Zimbabwe’s game against Rwanda in ...
See Details: Sikandar Raza Makes History For Zimbabwe With Hattrick A Day After Punjab Kings Retain Him- WATCH
Delayed Barapullah work yet to begin despite land transfer

A view of the under-construction Barapullah flyover near Mayur Vihar on Monday. Work on the third and last phase of the much-delayed Barapullah flyover project will be completed in one year, officials in the Public Works Department said, a week after the lieutenant governor (LG) approved the land transfer for ...
See Details: Delayed Barapullah work yet to begin despite land transfer
Army called in to help in tunnel rescue operation

Army called in to help in tunnel rescue operation Vertical drilling also starts as part of ‘Plan B’SILKYARA: Personnel from Madras Sappers regiment of the Army’s Corps of Engineers were pressed into service on Sunday to clear the escape passage in the Silkyara tunnel where blades of the auger drilling ...
See Details: Army called in to help in tunnel rescue operation
FIR against Redbird aviation school for non-cooperation, obstructing DGCA officials in probe

On October 19, the training institute’s Tecnam P2008JC (single engine) aircraft VT-RBC took off from runway 11 at Baramati with sole occupant on board. The Baramati Taluka Police have lodged a first information report (FIR) against flying training school Redbird Flight Training Academy on Monday, officials said. The Directorate General ...
See Details: FIR against Redbird aviation school for non-cooperation, obstructing DGCA officials in probe
IPL 2024 Auction: Why Gujarat Titans allowed Hardik Pandya to join Mumbai Indians? GT explain

On October 19, the training institute’s Tecnam P2008JC (single engine) aircraft VT-RBC took off from runway 11 at Baramati with sole occupant on board. The Baramati Taluka Police have lodged a first information report (FIR) against flying training school Redbird Flight Training Academy on Monday, officials said. The Directorate General ...
See Details: IPL 2024 Auction: Why Gujarat Titans allowed Hardik Pandya to join Mumbai Indians? GT explain
From puff sleeves to sustainable designs: Top 5 bridal fashion trends redefining elegance and style for brides-to-be

On October 19, the training institute’s Tecnam P2008JC (single engine) aircraft VT-RBC took off from runway 11 at Baramati with sole occupant on board. The Baramati Taluka Police have lodged a first information report (FIR) against flying training school Redbird Flight Training Academy on Monday, officials said. The Directorate General ...
See Details: From puff sleeves to sustainable designs: Top 5 bridal fashion trends redefining elegance and style for brides-to-be
The Judge behind China's financial reckoning

On October 19, the training institute’s Tecnam P2008JC (single engine) aircraft VT-RBC took off from runway 11 at Baramati with sole occupant on board. The Baramati Taluka Police have lodged a first information report (FIR) against flying training school Redbird Flight Training Academy on Monday, officials said. The Directorate General ...
See Details: The Judge behind China's financial reckoning
Arshdeep Singh & Axar Patel Out, Avesh Khan & Washington Sundar IN? India's Likely Playing XI For 3rd T20I

On October 19, the training institute’s Tecnam P2008JC (single engine) aircraft VT-RBC took off from runway 11 at Baramati with sole occupant on board. The Baramati Taluka Police have lodged a first information report (FIR) against flying training school Redbird Flight Training Academy on Monday, officials said. The Directorate General ...
See Details: Arshdeep Singh & Axar Patel Out, Avesh Khan & Washington Sundar IN? India's Likely Playing XI For 3rd T20I
Horoscope Today, November 28, 2023: Check here Astrological prediction for all zodiac signs

On October 19, the training institute’s Tecnam P2008JC (single engine) aircraft VT-RBC took off from runway 11 at Baramati with sole occupant on board. The Baramati Taluka Police have lodged a first information report (FIR) against flying training school Redbird Flight Training Academy on Monday, officials said. The Directorate General ...
See Details: Horoscope Today, November 28, 2023: Check here Astrological prediction for all zodiac signs
'Gurdwaras are...': US Sikh body on Indian envoy's heckling by Khalistani backers

On October 19, the training institute’s Tecnam P2008JC (single engine) aircraft VT-RBC took off from runway 11 at Baramati with sole occupant on board. The Baramati Taluka Police have lodged a first information report (FIR) against flying training school Redbird Flight Training Academy on Monday, officials said. The Directorate General ...
See Details: 'Gurdwaras are...': US Sikh body on Indian envoy's heckling by Khalistani backers

OpenAI’s Sora is powered by Diffusion transformer (DiT): What is it?

What is a Diffusion transformer?

How does it all translate into videos?

Advantages of DiT

Anurag Kashyap unveils teaser of ‘Kastoori’

Shehar Lakhot: Meet The Intriguing Characters Of The Upcoming Noir Crime Drama

Watch: 'My name is VVS Laxman...': When Ishan Kishan gave wrong answers to right questions

Tennis-Sabalenka, Rybakina to open new season in Brisbane

Sikandar Raza Makes History For Zimbabwe With Hattrick A Day After Punjab Kings Retain Him- WATCH

Delayed Barapullah work yet to begin despite land transfer

Army called in to help in tunnel rescue operation

FIR against Redbird aviation school for non-cooperation, obstructing DGCA officials in probe

IPL 2024 Auction: Why Gujarat Titans allowed Hardik Pandya to join Mumbai Indians? GT explain

From puff sleeves to sustainable designs: Top 5 bridal fashion trends redefining elegance and style for brides-to-be

The Judge behind China's financial reckoning

Arshdeep Singh & Axar Patel Out, Avesh Khan & Washington Sundar IN? India's Likely Playing XI For 3rd T20I

Horoscope Today, November 28, 2023: Check here Astrological prediction for all zodiac signs

'Gurdwaras are...': US Sikh body on Indian envoy's heckling by Khalistani backers

OTHER NEWS

Guru Nanak Jayanti: Rishi Sunak Highlights Punjabi Heritage In Message, Trudeau Extends Greetings

What US easing sanctions on Venezuela, home to world’s largest oil reserves, could mean for India

Rajshri Deshpande dedicates OTT award to innocent lives lost in Gaza, Palestine

‘Ramchandra Keh Gaye…’: From Jan 1, RSS to Spread Word of God, Ayodhya Inauguration Among 10 Crore People

Ace designer Rohit Bal critical, on ventilator: report

Bengaluru: Traffic Advisory Issued, Parking Restrictions In Place As Samyukta Horata Samiti Holds Protest | Details

Vistara Flights Diverted Due To Air Congestion At Delhi Airport | DETAILS