Sora and AI Generated Video

Behold my first text-to-video masterpiece from July 2023. I made this with modelscope and stable diffusion. I created the video locally on my PC. Keep in mind that this was just around 9 months ago.

Compare this to the new AI generated videos (February, 2024) :

Yes - that’s all AI generated video, generated by OpenAI’s new Sora model. (I may have also added a soundtrack created with Suno AI.)

OpenAI is continuing to tease us all with Sora. Sora is “an AI model that can create realistic and imaginative scenes from text instructions”. It is text-to-video and it is currently the most uncanny high quality text-to-video capabilitiy available. But it isn’t the first, per my originally created Canada Day video from July 2023. Even before then, in 2022, individuals utilized Stable Diffusion to produce stunning animations.

But here lies the crux of the matter. The leap in AI-generated video quality from 2023 to 2024 is staggering, to say the least. Even when comparing the creme de la creme of AI generative video from RunwayML’s Gen-2, there’s simply no comparison.

While access to Sora remains limited to a select group of testers, we can anticipate a surge in AI-generated video content by mid-2024 that surpasses current expectations. And by the end of 2024, expect the emergence of open-source equivalents and a burgeoning community dedicated to enhancing AI animation workflows with tools like Automatic1111 and ComfyUI. We’re likely to see even more sophisticated versions of Stable Video Diffucsion from stability.ai, along with improvements in Pika and perhaps a new Gen-3 model and other emerging players, including from Meta, Google, Alibaba, and others.

The trajectory of these advancements promises continuous improvement through new models, algorithms, workflows, and more affordable, superior computing power. To me, envisioning the future of this tech is a little like trying to visualize the 4th dimension.

AI-generated video opens the door to a plethora of innovative ideas and creations. Yet, the potential downsides are substantial, reminiscent of several “Black Mirror” episodes. Deepfakes and other malicious uses could become easily accessible to anyone with nefarious intentions.

This reality is a significant source of concern for me, especially considering the future world my children will grow up in, where things like bullying could escalate to unprecedented levels, and AI generated disturbing content is just a social media scroll away.

Yes, governments are crafting legislation and mechanisms that can protect people from this inevitable AI video abuse, but governments just don’t move fast enough. It will be years before the system finds ways to mitigate the dangers of these technologies and by then, at the rate that generative AI is moving, it will be far far to late. Even if governments do try to put in laws or regulations around AI generated media, they will be forced to balance the fact that they can’t add friction to innovation or else face being out-innovated by other nations. Moloch in full effect. This situation is further complicated by the dynamism and resilience of the open-source community in AI development, which continues to advance at an accelerating pace.

So here we are - on the brink of this transformative era. It is crucial to foster a dialogue that balances innovation with ethical considerations and the well-being of society. The collective genius that propels AI technology forward must also be harnessed to anticipate and mitigate its potential harms.

And yet, I am so excited for what is around the corner. From this vantage point, rightly or wrongly, the possibilities seem absolutely limitless.