Development of Text-to-Video in the latest 3 years
In recent years we have witnessed the emergence of commercial text-to-video models and products. I would like to share a self-created comprehensive timeline diagram that captures the remarkable evolution of commercial text-to-video models / products in the latest 3 years (including 2022, 2023 and 2024 till now).
I created the diagram when preparing for a presentation on Sora to my team. It was exciting to see how such great products emerge along with the development of Computer Vision (CV) research works including but not limited to Generative Adversarial Networks (GANs), transformer architecture and diffusion models.
As suggested by the Microsoft Research paper “Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models”, we see Sora as leap because it is not just a tool, but also potentially a “world simulator” to simulate physical and contextual dynamics of the depicted scenes in physical world.
This evolution, of course, will not stop and I am sure we will see other exciting news coming in. As a witness I am keen to keep this diagram updated.
I would love to hear your thoughts on this evolution and where you see text-to-video technology heading next. Let’s discuss the impacts, the potential applications, and the ethical considerations that come with these advancements.
Diagram Share: The Evolution of Commercial Text-to-Video was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
Diagram Share: The Evolution of Commercial Text-to-Video
Go Here to Read this Fast! Diagram Share: The Evolution of Commercial Text-to-Video