Hacker News

Powered by HN Search API

Sora: Creating video from text

From https://openai.com/sora
davidbarker | 2024-02-15 | 3647

Comments:

cod1r

2024-02-15
OpenAI is definitely cooking

htrp

2024-02-15
> All videos on this page were generated directly by Sora without modification.

I hope there is at least some cherrypicking here. This also seems like some shots fired at some of the other gen video startups

senthilnayagam

2024-02-15
samples look amazing , Looking forward for access, and hope they price it competitively

zemo

2024-02-15
> Prompt: Historical footage of California during the gold rush.

this is the opposite of history

hownowbrowncow

2024-02-15
Amazing.

One wonders how you might gain a representation of physics learned in the model. Perhaps multimodal inputs with rendered objects; physics simulations?

zmk5

2024-02-15
These samples look pretty amazing. I'm curious the compute required to train and even deploy something like this. How would it scale to making something like a CGI Pixar movie?

m3kw9

2024-02-15
Pretty sure you plus tier not be using this free, too much processing power needed

anon291

2024-02-15
Wow!

dist-epoch

2024-02-15
Totally a coincidence that it's announced immediately after the new Gemini reveal.

nuz

2024-02-15
AGI at the quality of sora or dalle but for intelligence is gonna be quite the thing to witness

treesciencebot

2024-02-15
This is leaps and bounds beyond anything out there, including both public models like SVD 1.1 and Pika Labs' / Runway's models. Incredible.

cuuupid

2024-02-15
Not loving that there are more details on safety than details of the actual model, benchmarks, or capabilities.

> That’s why we believe that learning from real-world use is a critical component of creating and releasing increasingly safe AI systems over time.

"We believe safety relies on real-world use and that's why we will not be allowing real-world use until we have figured out safety."

imbusy111

2024-02-15
I had a good laugh looking at the sliding and twisting legs in the "Girl walking in City" video.

minimaxir

2024-02-15
I do wonder why OpenAI chose the name "Sora" for this model. AI is now going to have intersectionality with Kingdom Hearts. (Atleast you don't need a PhD to understand AI.)

nerdjon

2024-02-15
It is honestly quite concerning just how good these videos look.

Like you can see some weird artifacts, but take one of these videos, compress it down to a much lower quality and with the loss of quality you might not be able to tell the difference based on these examples. Any artifacts would likely be gone.

Given what I had seen on social media I had figured anything remotely real was a few years away, but I guess not...

I guess we have just stopped worrying about the impact of these tools?

agomez314

2024-02-15
Imagine someone combining this with the Apple Vision Pro...many people will simply opt out of reality and live in a digital world. Not that this is new, but I'll entice a lot more people than ever before.

supriyo-biswas

2024-02-15
I wonder what served as the dataset for the model. Videos on YouTube presumably, since messing around with the film industry would be too expensive?

chasing

2024-02-15
Yeah, you just can't let all media, all the cost and hard work of millions of photographers, animators, filmmakers, etc be completely consumed and devalued by one company just because it's a very cool technical trick. The more powerful these services become the more obvious that will be.

What OpenAI does is amazing, but they obviously cannot be allowed to capture the value of every piece of media ever created — it'll both tank the economy and basically halt all new creation if everything you create will be immediately financially weaponized against you, if everything you create goes immediately into the Machine that can spit out a billion variations, flood the market, and give you nothing in return.

It's the same complaint people have had with Google Search pushed to its logical conclusion: anything you create will be anonymized and absorbed. You put in the effort and money, OpenAI gets the reward.

Again, I like OpenAI overall. But everyone's got to be brought to the table on this somehow. I wish our government would be capable of giving realistic guidance and regulation on this.

dietmtnview

2024-02-15
oh man, we're going to be in The Running Man really quick.

hansonkd

2024-02-15
Countdown to when studios licensing this for "unlimited" episodes of your favorite series.

There was Seinfeld "Nothing, Forever" AI parody, but once the models improve enough and are cheap enough to deploy, studios will license their content for real and just have endless seasons.

Or even custom episodes. Imagine if every episode of a TV show was unique to the viewer.

Janicc

2024-02-15
I honestly expected video generation to get stuck at barely consistent 5 second clips without much movement for the next few years. This is the type of stuff I expected to maybe be possible towards the end of the decade. Maybe we really are still at the bottom of the S curve which is scary to think about.

EwanG

2024-02-15
I have a book I've written (first three parts available free at https://www.amazon.com/Summer-of-Wonders/dp/B0CV84D7GR). Is there some way to feed this to the tool and get an animated version out? Or this with some other tool(s)?

pknerd

2024-02-15
So no APIs yet?

ilaksh

2024-02-15
Holy %@$%! Abso%@#inglutely amazing! Also, now I see why we need $7 trillion worth of GPUs.

uoaei

2024-02-15
Visual sharpness at the expense of wider-scale coherence (see: sliding/floating walking woman in Tokyo demo or tiny people next to giant people in Lagos demo) seems to be a local optimum consistently achieved by today's SOTA models in all domains.

This is neat and all but mostly just a toy. Everything I've seen has me convinced either we are optimizing the wrong loss functions or the architectures we have today are fundamentally limited. This should be understood for what it is and not for what people want it to be.

epberry

2024-02-15
These looks fantastic. Very slight weirdness in some movement, hands, etc. But the main thing that strikes me is the cinematic tracking shots. I guess that's why they use "scenes". It doesn't seem like a movie could be generated with this involving actors talking.

mring33621

2024-02-15
I wanna see the rest of the knit hat spaceman movie!

Imnimo

2024-02-15
https://openai.com/sora?video=big-sur

In this video, there's extremely consistent geometry as the camera moves, but the texture of the trees/shrubs on the top of the cliff on the left seems to remain very flat, reminiscent of low-poly geometry in games.

I wonder if this is an artifact of the way videos are generated. Is the model separating scene geometry from camera? Maybe some sort of video-NeRF or Gaussian Splatting under the hood?

sidcool

2024-02-15
Even the videos with some physics anomalies are quite good and entertaining.

bluechair

2024-02-15
The signs are non-sensical but this is probably expected.

gzer0

2024-02-15
Truly stunning. Waiting on the research paper, says will be published (soon). Can't wait to read on the technical details.

Delumine

2024-02-15
This is insane. Even though there are open-source models, I think this is too dangerous to release to the public. If someone would've uploaded that Tokyo video to youtube, and told me it was a drone.. I would've believed them.

All "proof" we have can be contested or fabricated.

reducesuffering

2024-02-15
Apple Vision Pro VR + unlimited, addicting... I mean, engaging video feed into your eyes. The machines will keep you tube fed and your bowels emptied. Woe to the early 21st century techno-optimism. An alien intelligence rules the galaxy now. Welcome to the simulation.

jonplackett

2024-02-15
No mention of how much they had to cherry pick right?

Interested to know what the success rate of such amazingmess

Pika have really impressive videos on their homepage that are borderline impossible to make for myself.

nuz

2024-02-15
This is the second time OpenAI has released something right at the same time as google did (Gemini 1.5 Pro with 10M token context length just now). Can't just be a coincidence

fardinahsan146

2024-02-15
This is insane.

sabzetro

2024-02-15
Can't wait until we can generate feature length films with a prompt.

rambambram

2024-02-15
I like how the dalmatian puppy moves like a cat.

sebnun

2024-02-15
This is amazing. My first thought was about the potential for abuse. Deepfakes will be more realistic than ever.

Also, nicely timed to overshadow the Google Gemini 1.5 announcement.

aantix

2024-02-15
This the killer feature.

“ Sora can also create multiple shots within a single generated video that accurately persist characters and visual style.”

To create a movie I need character visual consistency across scenes.

Getting that right is the hardest part of all the existing text->video tools out there.

dom96

2024-02-15
This is going to make the latest election really interesting (and scary). Is anyone working to ensure a faked video of Biden that looks plausible but is AI generated doesn't get significant traction at a critical moment of the election?

drcwpl

2024-02-15
Wow - "All videos on this page were generated directly by Sora without modification."

The prompts - incredible and such quality - amazing. "Prompt: An extreme close-up of an gray-haired man with a beard in his 60s, he is deep in thought pondering the history of the universe as he sits at a cafe in Paris, his eyes focus on people offscreen as they walk as he sits mostly motionless, he is dressed in a wool coat suit coat with a button-down shirt , he wears a brown beret and glasses and has a very professorial appearance, and the end he offers a subtle closed-mouth smile as if he found the answer to the mystery of life, the lighting is very cinematic with the golden light and the Parisian streets and city in the background, depth of field, cinematic 35mm film."

bogwog

2024-02-15
My AI idea: Civil war as a service (CWaaS)

Prompt: poll worker sneakily taking ballots labeled <INSERT POLITICAL PARTY HERE>, and throwing them in the trash.

wouldbecouldbe

2024-02-15
It looks beautiful, however I thought openai's mission was creating AGI, not become a generative ai content supplier.

thepasswordis

2024-02-15
OpenAI demonstrating the size of their moat. How many multi-million-dollar funded startups did this just absolutely obsolete? This is so, so, so much better than every other generative video AI we've seen. Most of those were basically a still image with a very slowly moving background. This is not that.

Sam is probably going to get his $7T if he keeps this up, and when he does everybody else will be locked out forever.

I already know people who have basically opted out of life. They're addicted to porn, addicted to podcasts where it's just dudes chatting as if they're all hanging out together, and addicted to instagram influencers.

100% they would pay a lot of money to be able to hang out with Joe Rogan, or some only fans person, and those pornstars or podcasts hosts will never disagree with them, never get mad at them, never get bored of them, never thing they're a loser, etc.

These videos are crazy. Highly suggest anybody who was playing with Dall-E a couple of years ago, and being mindblown by "an astronaut riding a horse in space" or whatever go back and look at the images they were creating then, and compare that to this.

IceHegel

2024-02-15
Those samples are incredibly impressive. It blows RunwayML out of the water.

As a layman watching the space, I didn't expect this level of quality for two or three more years. Pretty blown away, the puppies in the snow were really impressive.

drcongo

2024-02-15
This is actually mind-blowing.

smusamashah

2024-02-15
Did anyone else feel motion sickness or nausea watching some of these videos? In some of the videos with some panning or rotating motion, i felt some nausea like sickness effect. I guess its because some details were changing while in motion and I was unable to keep track or focus anything in particular.

Effect was stronger in some videos.

DeathArrow

2024-02-15
Goodbye, Hollywood!

idiliv

2024-02-15
People here seem mostly impressed by the high resolution of these examples.

Based on my experience doing research on Stable Diffusion, scaling up the resolution is the conceptually easy part that only requires larger models and more high-resolution training data.

The hard part is semantic alignment with the prompt. Attempts to scale Stable Diffusion, like SDXL, have resulted only in marginally better prompt understanding (likely due to the continued reliance on CLIP prompt embeddings).

So, the key question here is how well Sora does prompt alignment.

swayvil

2024-02-15
It really makes me wonder if something like this is running inside my head.

The prompt tho. Probably not text. Probably a stream of vibes or something.

ummonk

2024-02-15
Looked at the first clip and immediately noticed the woman's feet swap at ~15 seconds in. My eyes were drawn to the feet because of the extreme supination in her steps.

Looks like a dramatic improvement in video generation but still a miss in terms of realism unless one can apply pose control to the generated videos.

tzm

2024-02-15
"so far ahead" "leaps and bounds beyond anything out there" "This is insane"

Let's temper the emotions for a second. Sora is great, but it's not ready for prime time. Many people are working on this problem that haven't shared their results yet. The speed of refinement is what's more interesting to me.

kevingadd

2024-02-15
It's interesting how a lot of the higher frequency detail is obviously quantized. The motion of humans in the drone shots for example is very 'low frequency' or 'low framerate', and things like flowing ocean water also appears to be quantized. I assume this is because of the internal precision of these models not being very high?

ugh123

2024-02-15
Imagine a movie script, but with more detail of the scenes and actors, plugged into this.

The killer app for this is being able to give a prompt of a detailed description of a scene, with actor movements and all detail of environment, structure, furniture, etc. Add to that camera views/angles/movement specified in the prompt along with text for actors.

jk_tech

2024-02-15
This is bananas. This is ahead of anything else I've seen. The entire stock footage industry may be shut down over night because of something like this.

And it is still not perfect. Looking at the example of the plastic chair being dug up in the desert[1] is frankly a bit... funky. But imagine in 5 or even 10 years.

1. https://openai.com/sora?video=chair-archaeology

m3kw9

2024-02-15
Impressive actually, i can actually see UI being real time generated one day now.

You give it data like real time stock data, feed it into Sora, the prompt is "I need a chart based on the data, show me different time ranges"

As you move the cursor, it feeds into sora again, generating the next frame in real time.

m3kw9

2024-02-15
How many of the video startups are shtting their pants right now?

break_the_bank

2024-02-15
In less than a few hours Gemini 1.5 is old news. Sam is doing live demos on Twitter while Google just released a blog.

Didn't think Google would be the first of the Facebook, Apple, Google and Microsoft to get disrupted.

VladimirGolovin

2024-02-15
I did not expect this level of quality in the beginning of 2024. Makes me think that we may see AGI by the end of this decade.

tropdrop

2024-02-15
I see many possibilities for commercials, demos... not to mention kids' animations, of course.

Actually, thinking of this from the perspective of a start-up, it could be cool to instantly demonstrate a use-case of a product (with just a little light editing of a phone screen in post). We spent a lot of money on our product demo videos and now this would basically be free.

AbuAssar

2024-02-15
Sora means picture or image in Arabic language

ilteris

2024-02-15
Where is the tool that we can try?

OscarTheGrinch

2024-02-15
How is this done technically? So many moving parts and the tracking on each is exquisite.

My initial observation is that the camera moves are very similar to a camera in a 3D modeling program: on an inhuman dolly flying through space on a impossibly smooth path / bezier curve. Makes me wonder if there is actually a something like 3D simulation at the root here, or maybe a 3D unsupervised training loop, and they are somehow mapping persistent AI textures onto it?

peterisza

2024-02-15
holy ....

doakes

2024-02-15
This is super cool. So many innovations come to mind. But it makes me wonder what will come from having the ability to virtually experience anything we want. It'll take a while, but I'm hoping we'll eventually want to go outside more instead of less.

thomastraum

2024-02-15
I am a CG artist and Director and this made me so sad. I am watching in horror and amazement. I am not anti AI at all, but being on the wrong side of efficiency, for the individual this is heartbreaking. its so much fun to make CG and create shots and the reason its hard (just like anything) makes it rewarding.

gigatexal

2024-02-15
I am genuinely impressed.

crazygringo

2024-02-15
This is insane. But I'm impressed most of all by the quality of motion. I've quite simply never seen convincing computer-generated motion before. Just look at the way the wooly mammoths connect with the ground, and their lumbering mass feels real.

Motion-capture works fine because that's real motion, but every time people try to animate humans and animals, even in big-budget CGI movies, it's always ultimately obviously fake. There are so many subtle things that happen in terms of acceleration and deceleration of all of the different parts of an organism, that no animator ever gets it 100% right. No animation algorithm gets it to a point where it's believable, just where it's "less bad".

But these videos seem to be getting it entirely believable for both people and animals. Which is wild.

And then of course, not to mention that these are entirely believable 3D spaces, with seemingly full object permanence. As opposed to other efforts I've seen which are basically briefly animating a 2D scene to make it seem vaguely 3D.

golol

2024-02-15
This does put a smile on my face

beders

2024-02-15
Finally, a true Star Wars prequel is in reach. Everybody gets their own :)

rafaelero

2024-02-15
holy shit

kweingar

2024-02-15
Obviously incredibly cool, but it seems that people are incredibly overstating the applications of this.

Realistically, how do you fit this into a movie, a TV show, or a game? You write a text prompt, get a scene, and then everything is gone—the characters, props, rooms, buildings, environments, etc. won’t carry over to the next prompt.

void-pointer

2024-02-15
This is the beginning of the end, folks

SushiHippie

2024-02-15
I find the watermark at the bottom right really interesting at first it looks like random movement and then in the end it transforms into the OpenAI logo

M4v3R

2024-02-15
> The model can also take an existing video and extend it or fill in missing frames

I wonder if it could be used as a replacement for optical flow to create slow motion videos out of normal speed ones.

mushufasa

2024-02-15
Do you think they announced this today to steal attention from Google/Gemini annuncement?

sorokod

2024-02-15
Just in time for the election season. Also "A cat waking up its sleeping owner demanding breakfast" has too many paws - yes I do feel petty saying this.

birriel

2024-02-15
With the third and last videos (space men, and man reading in the clouds), this is the first time I have found the resolution indistinguishable from real life. Even with SOTA stills from Midjourney and Stable Diffusion I was not entirely convinced. This is incredible.

corobo

2024-02-15
Oooh this is gonna usher in a new wave of GPT wrappers!

If anyone's taking requests, could you do one that takes audio clips from podcasts and turns them into animations? Ideally via API rather than some PITA UI

Being able to keep the animation style between generations would be the key feature for that kind of use-case I imagine.

sys32768

2024-02-15
Game of Thrones Season 8 will be great in a few years.

gondo

2024-02-15
This might be amazing progress, but I would never know as the website is consistently crashing Safari on my iPhone 13.

ulnarkressty

2024-02-15
To put it into perspective, the Will Smith eating spaghetti video came out not even a year ago --

https://www.youtube.com/watch?v=XQr4Xklqzw8

khazhoux

2024-02-15
The focus here is on video motion, but I'm very impressed by the photorealistic humans.

kuprel

2024-02-15
Next they have to add audio, then VideoChatGPT is possible

torginus

2024-02-15
I'm not sure about others, but I'm extremely unnerved about how OpenAI just throws these innovations out with zero foreshadowing - it's crazy how the world's potentially most life-changing company operates with the secrecy of a black military program.

I really wonder what's going to come out of the company and on what timeline.

throwitaway222

2024-02-15
What in the flying f just happened.

I guess we've all just been replaced.

partiallypro

2024-02-15
These are insanely good, but there are still some things that just give them away (which is good, imo.) Like the Tokyo video is amazing, the reflections, etc are all great, but the gaits of people in the background and how fast they are moving is clearly off. It sticks out once you notice it. These things will obviously improve as time marches on.

The fear I have has less to do with these taking jobs, but in that eventually this is just going to be used by a foreign actor and no one is going to know what is real anymore. This already exists in new stories, now imagine that with actual AI videos that are near indistinguishable from reality. It could get really bad. Have an insane conspiracy theory? Well, now you can have your belief validated by a completely fictional AI generated video that even the most trained eyes have trouble debunking.

The jobs thing is also a concern, because if you have a bunch of idle hands that suddenly aren't sure what to believe or just believe lies, it can quickly turn into mass political violence. Don't be naive to think this isn't already being thought of by various national security services and militaries. We're already on the precipice of it, this could eventually be a good shove down the hill.

ulnarkressty

2024-02-15
I do hope that they have a documentary team embedded in this company, like DeepMind had. They're making historical advancements on multiple fronts.

tehsauce

2024-02-15
It's fascinating that it can model so much of the subtle dynamics, structure, and appearance of the world in photorealistic detail, and still have a relatively poor model of things like object permanence:

https://cdn.openai.com/sora/videos/puppy-cloning.mp4

Perhaps there are particular aspects of our world that the human mind has evolved to hyperfocus on.

Will we figure out an easy way make these models match humans in those areas? Let's hope it takes some time.

ionwake

2024-02-15
How long until there is an open source model for.... text to video?

Genuine question I have no idea

jenny91

2024-02-15
Absolutely insane. It's very odd where the glitches happen. Did anyone else notice in the "stylish woman ... Tokyo" clip how her legs skip-hop and then cross at 0:30 in a physically impossible way. Everything else about the clip seems so realistic, yet this is where it trips up?

synapsomorphy

2024-02-15
Holy cow, I've literally only looked at the first two videos so far, and it's clear that this absolutely blows every other generative video model out of the water, barely even worth comparing. We immediately jumped from interesting toy models where it was pretty easy to tell that the output was AI generated to.. this.

nopinsight

2024-02-15
Many might miss the key paragraph at the end:

   "Sora serves as a foundation for models that can understand and simulate the real world, a capability we believe will be an important milestone for achieving AGI."
This also helps explain why the model is so good since it is trained to simulate the real world, as opposed to imitate the pixels.

More importantly, its capabilities suggest AGI and general robotics could be closer than many think (even though some key weaknesses remain and further improvements are necessary before the goal is reached.)

EDIT: I just saw this relevant comment by an expert at Nvidia:

   “If you think OpenAI Sora is a creative toy like DALLE, ... think again. Sora is a data-driven physics engine. It is a simulation of many worlds, real or fantastical. The simulator learns intricate rendering, "intuitive" physics, long-horizon reasoning, and semantic grounding, all by some denoising and gradient maths.

   I won't be surprised if Sora is trained on lots of synthetic data using Unreal Engine 5. It has to be!

   Let's breakdown the following video. Prompt: "Photorealistic closeup video of two pirate ships battling each other as they sail inside a cup of coffee." ….”
https://twitter.com/DrJimFan/status/1758210245799920123

torginus

2024-02-15
I wonder why the input is always text - can't it be text, as well as a low quality blender scene with a camera rig flying through space, a moodboard, sketches of the characters etc.?

cboswel1

2024-02-15
Who owns a person’s likeness? Now that we’re approaching text to video of a quality that could fool an average person, won’t this just open a whole new can of worms if the training models are replicating celebrities? The ambiguity around copyright when something on paper is in the style of seems to fall into an entirely separate category than making AI generated videos of actual people without their consent. Will people of note have to get a copyright of their likeness to fight its use in these models?

danjoredd

2024-02-15
Porn is about to get so much weirder

ed_balls

2024-02-15
How to invest in OpenAI?

bilsbie

2024-02-15
Could this same technology be used to make games? It seems like it has a built in physics engine.

lairv

2024-02-15
The 3D consistency of those videos is insane compared to what has previously been done, they must have used some form of 3D regularization with depth or flow I think

xyproto

2024-02-15
The big question is if it will be able to create a video of whisky without ice or a car without windows.

lacoolj

2024-02-15
Total coincidence this comes out the day Google announces Gemini 1.5 I'm sure

throwitaway222

2024-02-15
https://openai.com/sora?video=cat-on-bed

Even though many things are super impressive, there is a lot of uncanny valley happening here.

vilius

2024-02-15
The Lagos video (https://openai.com/sora?video=lagos) is very much how my dreams unfold. One moment, I'm with my friends in a bustling marketplace, then suddenly we are no longer at the marketplace, but rather overlooking a sunset and a highway. I wonder if there are some conceptual similarities how dreams and AI video models work.

ericzawo

2024-02-15
Why can't AI take the non-fun jobs?

dsign

2024-02-15
This is impressive and amazing. I can already see a press release not too far down the road: "Our new model HoSapiens can do everything humans can do, but better. It has been specifically designed to deprecate humanity. We are working with red teamers — domain experts in areas like union busting, corporate law and counterinsurgency, plus our habitual bias, misinformation, and hateful content against AI orange team— who will be adversarially testing the model.

hooande

2024-02-15
This really seems like "DALL-E", but for videos. I can make cool/funny videos for my friends, but after a while the novelty wears off.

All of the AI generated media has this quality where I can immediately tell that it's ai, and that becomes my dominant thought. I see these things on social media and think "oh, another ai pic" and keep scrolling. I've yet to be confused about whether something is ai generated or real for more than several seconds.

Consistency and continuity still seem to be a major issues. It would be very difficult to tell a story using Sora because details and the overall style would change from scene to scene. This is also true of the newest image models.

Many people think that Sora is the second coming, and I hope it turns out to have a major impact on all of our lives. But right now it's looking to have about the same impact that DALL-E has had so far.

bilsbie

2024-02-15
I wish this was connected to chatgpt4 such that it could directly generate videos as part of its response.

The bottleneck of creating a separate prompt is very limiting.

Imagine asking for a recipe or car repair and it makes a video of the exact steps. Or if you could upload a video and ask it to make a new ending.

That’s what I imagine multi modal models would be.

max_

2024-02-15
This is amazing!

1. Why would Adrej Karpathy leave when he knows such an impressive breakthrough is in the pipeline?

2. Why hasn't Ilya Stuskever spoken about this?

dwighttk

2024-02-15
What do y’all think caused the weird smoke/cloud in the mammoth video?

throwitaway222

2024-02-15
How many of you think YT is looking through their logs trying to find a high burn rate of videos that might possibly be from Open AI?

d--b

2024-02-15
Jesus Christ.

AGI can’t be far off, that stuff clearly understand a bunch of high level concepts.

rareitem

2024-02-15
I used to think a few years ago that virtual reality/ai projects such as the mataverse wouldn't amount to anything big. I even thought of them ridiculous. Even recently, I thought that GPT's and ai generated images would be the pinnacle of what this new ai wave would amount to. I just keep getting baffled.

Jeve11326gr6ed

2024-02-15
How can I get started

helix278

2024-02-15
They're attaching metadata to the videos which can be easily removed. Aren't there techniques to hash metadata into the content itself? I.e. such that removing the data would alter the image.

ericra

2024-02-15
It's been said a thousand times, but the "open" in openai becomes more comical every day. I can't imagine how much money they will generate from such a tool, and I'm sure they will do everything possible to keep a tight lid on all the implementation details.

This product looks incredible...

chrishare

2024-02-15
I am very uncomfortable with this being released commercially without the requisite defence against misuse being also accessible. If we didn't have a problem with deepfakes, spam, misleading media before, we surely are now. All leading AI organisations are lacking here, benefiting from the tech but not sufficiently attacking the external costs that society will pay.

guybedo

2024-02-15
Looks like OpenAI managed to burry Gemini 1.5 news.

I guess it was anticipated.

taejavu

2024-02-15
Do we know anything yet about the maximum resolution of the output, or how long it takes to generate these kind of examples?

karpour

2024-02-15
Not a single line saying anything about training data.

hubraumhugo

2024-02-15
The amount of VC money in the text-to-video space that just got wiped out is impressive. Have we ever seen such fast market moves?

Pika - $55M

Synthesia - $156M

Stability AI - $173M

alex201

2024-02-15
It's a revolutionary thing, but I'll reserve my judgment until I see if it can handle the real challenge: creating a video where my code works perfectly on the first try.

cush

2024-02-15
Does OpenAI hang out with these kinds of features in their back pocket just waiting for a Gemeni announcement so they can wait an hour and absolutely dunk on Google?

cfr2023

2024-02-15
I want to storyboard/pre-vis/mess around with this ASAP

sebastiennight

2024-02-15
I think the implications go much further than just the image/video considerations.

This model shows a very good (albeit not perfect) understanding of the physics of objects and relationships between them. The announcement mentions this several times.

The OpenAI blog post lists "Archeologists discover a generic plastic chair in the desert, excavating and dusting it with great care." as one of the "failed" cases. But this (and "Reflections in the window of a train traveling through the Tokyo suburbs.") seem to me to be 2 of the most important examples.

- In the Tokyo one, the model is smart enough to figure out that on a train, the reflection would be of a passenger, and the passenger has Asian traits since this is Tokyo. - In the chair one, OpenAI says the model failed to model the physics of the object (which hints that it did try to, which is not how the early diffusion models worked ; they just tried to generate "plausible" images). And we can see one of the archeologists basically chasing the chair down to grab it, which does correctly model the interaction with a floating object.

I think we can't underestimate how crucial that is to the building of a general model that has a strong model of the world. Not just a "theory of mind", but a litteral understanding of "what will happen next", independently of "what would a human say would happen next" (which is what the usual text-based models seem to do).

This is going to be much more important, IMO, than the video aspect.

superconduct123

2024-02-15
So do we think this is the "breakthrough" that was mentioned back when the Sam Altman stuff was going on?

internetter

2024-02-15
The watermark is interesting. Looks like it's unique for every video so they can trace it to the creator?

countmora

2024-02-15
> We’re also building tools to help detect misleading content such as a detection classifier that can tell when a video was generated by Sora.

I am curious of how optimised their approach is and what hardware you would need to analyse videos at reasonable speed.

neutralx

2024-02-15
Has anyone else noticed the leg swap in Tokyo video at 0:14. I guess we are past uncanny, but I do wonder if these small artifacts will always be present in generated content.

Also begs the question, if more and more children are introduced to media from young age and they are fed more and more with generated content, will they be able to feel "uncanniness" or become completely blunt to it.

There's definitely interesting period ahead of us, not yet sure how to feel about it...

mtlmtlmtlmtl

2024-02-15
This is all very impressive. I can't help to wonder though. How is text-to-video going to benefit humanity? That's what OpenAI is supposedly about, right?

We'll get some groundbreaking film content out of this in the hands of a few talented creatives, and a vast ocean of mediocre content from the hands of talentless people who know how to type. What's the benefit to humanity, concretely?

Sxubas

2024-02-15
I wonder what this tech would do using a descriptive fragment from a book. I don't read many books at all but I would spend some time feeding in fantasy fragments and see how much they differ from what I imagined.

dartos

2024-02-15
God the legs of the woman walking are horrifying.

tsunamifury

2024-02-15
The film "The Congress" will end up being the most on point prediction of our future in ever. I can't believe it. Im in shock.

bscphil

2024-02-15
Not that this isn't a leaps and bounds improvement over the state of the art, but it's interesting to look at the mistakes it makes - where do we still need improvements?

This video is pretty instructive: https://cdn.openai.com/sora/videos/amalfi-coast.mp4

It "eats" several people with the wall part of the way through the video, and the camera movements are odd. Strange camera movements, in response to most of the prompts, seems like the biggest problem. The model arbitrarily decides to change direction on a dime - even a drone wouldn't behave quite like that.

cyrialize

2024-02-15
Does anyone know how to handle the depression/doom one feels with these updates?

Yes, it's a great technical achievement, but I just worry for the future. We don't have good social safety nets, and we aren't close to UBI. It's difficult for me to see that happen unless something drastic changes.

I'm also afraid of one company just having so much power. How does anyone compete?

itissid

2024-02-15
I've to go lie down...

kashnote

2024-02-15
Absolutely unreal. Kinda funny how some people are complaining about minor glitches or motion sickness when this is the most impressive piece of technology I've seen. Way to go, OpenAI.

seydor

2024-02-15
This inside VR goggles would make it amazing. probably it wouldnt even need to render 360, it would generate it on demand. I better go get some feeding tube

timetraveller26

2024-02-15
Is this real life? Or is just a generated fantasy?

s-xyz

2024-02-15
This is seriously insane, in particular as someone mentioned the quality of it. I can't wait to play around with this. SICK!

lagrange77

2024-02-15
Finally new TNG episodes!

lagrange77

2024-02-15
They should generate a video of Steve Jobs introducing this in a keynote.

stephenw310

2024-02-15
The results are mindblowing, to say the least. But will they allow developers to fine-tune this eventually? OpenAI is still yet to give that ability to txt2img DALLE models, so I doubt that will be the case.

mlsu

2024-02-15
They must be using techniques from NeRF in here, maybe in tokenization? The artifacts are unmistakeable.

0xcb0

2024-02-15
Wow, feels unreal. Can't believe we have come so far, yet we cannot solve the worlds most basic problems and people still starve each day.

ij09j901023123

2024-02-15
We thought programmers, fast food workers, and drivers would be automated first. Turns out, it's movie / video, actors, editors and artists....

rglover

2024-02-15
I was super on board until I saw...the paw: https://player.vimeo.com/video/913131059?h=afe5567f31&badge=...

Exciting for the potential this creates, but scary for the social implications (e.g., this will make trial law nearly impossible).

cooper_ganglia

2024-02-15
It's always kinda crazy to me to see an emerging technology like this have it's next iteration in the development pipeline, and even after seeing the First Gen AI video models, even many of the HN people here still say, "Meh, not that impressive."

Brother, have you seen Runway Gen 2, or SVD 1.1? I'm not excited about Sora because I think it looks like Hollywood animations, I'm excited because an open-source 3rd-Gen Sora is going to be so much better, and this much progression in one step is really exciting!

darkhorse13

2024-02-15
Does anyone else feel a sense of doom from these advancements? I'm definitely not a Luddite, I've been working professionally as a programmer for quite some time now, but I just can't shake this feeling. And this is not in the "I might lose my job to this" kind of feeling, that's obviously there, but it's something deeper, more sinister. I don't think I can explain it properly.

Anyway, videos look incredible. I genuinely can't believe my eyes.

kaimac

2024-02-15
meanwhile people are dying

aubanel

2024-02-15
I love how they show the failure cases: compare that with Gemini 1.5 pro's technical paper that carefully avoids any test where it does not seem like a 100% perf! I think confronting your failures a condition for success, and Google seems much too self-indulgent here.

ij09j901023123

2024-02-15
Apple vision pro + OpenAI entertainment on the fly + living in a tight pod next to millions of other people, hooked onto life support. A wonderful matrix fantasy

ein0p

2024-02-15
That actually looks borderline useful in practice. 3 years from now someone will make a decent full length movie with this.

notpachet

2024-02-15
OpenAI: Prompt: The camera follows behind a white vintage SUV with a black roof rack as it speeds up a steep dirt road surrounded by pine trees on a steep mountain slope...

Sora: plays GTA V

itissid

2024-02-15
How does one cope with this?

- Disruptions like this happen to every industry every now and then. Just not on the level of "Communicating with people with words, and pictures". Anduril and SpaceX disrupted defense contractors and United Launch Alliance; Someone working for a defense contractor/ULA here affected by that might attest to the feeling?

- There will be plenty of opportunity to innovate. Industries are being created right now. People probably also felt the same way when they saw HTTP on their screens the first time. So don't think your career or life's worth of work is miniscule, its just a moving target, adapt & learn.

- Devil is in the details. When a bunch of large SaaS behemoths created Enterprise software an army of contractors and consultants grew to support the glue that was ETL. A lot of work remains to be done. It will just be a more imaginative glue.

d4rkp4ttern

2024-02-15
Mind blown of course.

Two things are interesting:

- No audio -- that must have been hard to add, or else it would have been there.

- Spelling is still probably hard to do (the familiar DallE problem)... e.g. a video showing a car driving past a billboard with specified text.

pradn

2024-02-15
It's impressive, but I think it's still in the same category as even the best LLMs: the demos look good and they can be quite useful, but you can never quite trust them. You really can't just have an LLM write a whole report for you - who knows what facts it'll make up, what it'll miss? You really can't use this to generate video for work, who knows where the little artifacts are (it's easier to tell with video).

The future of these high-fidelity (but not perfect) generative AI systems is in realizing we're going to need "humans in the loop". This means designing to output human-manipulable data - perhaps models/skeletons/textures instead of whole output. Pixels are hard to manipulate directly!

As for entertainment, already we see people sick of CGI - will people really want to pay for AI-generated video?

redm

2024-02-15
Why are all the example videos in slow motion?

itissid

2024-02-15
So what happens to the film industry now?

- Local/Bespoke high quality video content creation by ordinary Joes: Check. - Ordinary joes making fake porn videos for money: Check. - Reduce cost for real movies dramatically by editing in AI scenes: Check.

A whole industry will get upturned.

slothtrop

2024-02-15
RE worrying about the future: what concerns me most is post-truth reality. Being thrown into a world where it's impossible to tell fact from fiction is insane and dangerous. Just thinking about it evokes paranoia.

We're nowhere near full-automation, these are growing pains, but maybe the canary in the goldmine for the job market. Expect more enthusiasm for UBI or negative tax and the like and policies to follow. Cheap energy is also coming eventually, just slower.

foobar_______

2024-02-15
Feels like another pivotal moment in AI. Feel like I’m watching history live. I think I need to go lay down.

telesilla

2024-02-15
Watching these made me think, I'm going to want to go to the theatre a lot more in the future and see fellow humans in plays, lectures and concerts.

Such achievements in technology must lead to cultural change. Look at how popular vinyl has become, why not theatre again.

2OEH8eoCRo0

2024-02-15
Shall I get into the unemployment line now and beat the rush?

impulser_

2024-02-15
This is good, but far from being useful or production ready.

It's still too easy to notice these are all AI rendered.

qwertox

2024-02-15
The one with the grandma is outright scary. All the lies...

lorenzofalco

2024-02-15
Ahora si que si se jodio todo. Apaga todo o desco ecta

elorant

2024-02-15
This could kill the porn industry.

lxe

2024-02-15
Blown every expectation way away....

lagrange77

2024-02-15
Has anyone noticed the label on the surfing otter's lifejacket? :D

accra4rx

2024-02-15
More layoffs

lqcfcjx

2024-02-15
This is very impressive. I know in general people are iffy about research benchmark. How does it work to evaluate text-to-video types of use cases? I want to have some intuition on how much this is better than other systems like pika quantatively.

hansoolo

2024-02-15
Is it really just coincidence that Andrej Karpathy just left yesterday?

jgalt212

2024-02-15
These looks like well done PS5 games. Which, of course, is a great achievement.

jmfldn

2024-02-15
Technically breathtaking, but why do these examples of AI-generated content always have a cheap clipart vibe about them? So naff and uninspired given the, no doubt, endless potential this technology has.

I also feel a sense of dread too. Imagine the tidal wave of rubbish coming our way. First text, then images and now video can be spewed out in industrial quantities. Will it lead to a better culture? In theory it could, in practice I just feel like we'll be deluged with exponentially more mediocre "content" .

MobinaMaghami

2024-02-15
hi, my name is mobina and I am from Iran. I want to make a video from text and so yeah. thank you for watching.

MobinaMaghami

2024-02-15
are you gays all hackers? I am not

quadcore

2024-02-15
HN server runs smoothly and is having a walk in the park it seems - impressive compared to previous OpenAI annoucements. Has there been significant rollouts?

cdme

2024-02-15
I don't understand why anyone would find these videos compelling enough to watch. They're visually polished, but totally uninteresting.

kromem

2024-02-15
So the top two stories are about a model that can generate astonishingly good video from text and a model that has a context window which allows it to process and identify nuanced details in an hour long video.

We've fairly quickly moved from a world where AIs would communicate with each other through text to one in which they can do so through video.

I'm very curious how something like Sora might end up being used to generate synthetic training data for multimodal models...

albertzeyer

2024-02-15
It's such a shame that they aren't releasing any detailed technical paper anymore on all the technical details of the model and how it was trained.

*Edit* Oh, I just read here (https://www.reddit.com/r/MachineLearning/comments/1armmng/d_...) that a technical paper should be released later today?

david_shi

2024-02-15
If you draw a line from Pong (1972, or 52 years ago) to Sora, what does that imply for the quality and depth of simulations in 2076 (52 years in the future)?

Would we be able to perceive the differences between those and the physical world? I can't help but feel like there is a proof for the simulation theory possible here.

pxeger1

2024-02-15
Funny that this launched so soon after Gemini 1.5. I guess OpenAI have a strong incentive to dominate the media narrative.

multicast

2024-02-15
Even though this is highly impressive, I think it is still important to stay rational and optimistic to see the other side of the coin.

Every industrial revolution and its resulting automation has brought not only more jobs but also created a more diverse set of jobs. Therefore also new industries are created. History rhymes, the ruling fears in such times have always been similar. Claims are being made but without any reasonable theories, expertise or provable facts (e.g. Goldman Sachs unemployment prediction is absolute bs). This is even more true when such related AI matters are thought about in more detail. Furthermore, even though employing tens of millions of people probably, only a few industries like content creation, movie etc. are affected. The affacted workforce of these industries is highly creative, as they are being paid for their job. The set of jobs today is big, they won't become cleaning staff nor homeless.

This technology has also to proof itself (Its technical potential is unlimited but financially limited by the size of funds being invested, and these are limited)

Transition to the use of such tools in corporations could take years, depending on the type and size and other parameters. People underestimate the inefficiencies that a lot of companies embody - and I am only talking about the US and some parts of Europe here. If a company did their job for 2 decades the same way, a sudden switch does not happen overnight. Affected people have ways to transition to other industries, educate themselves further and much more. Especially as someone living in the west, the opportunities are huge. And in addition, the wide array of different variables about the economy and the earth, and everything its differing societies are, comes into play: Some corporations want real videos made by real people; Some companies want to stay the way they are and compete using their traditional methods; Corporations are still going to hire ad agencies - ad agencies whose workflow his now much more efficient and more open to new creative spheres which benefits both customer and themselves. They list could go one endlessly.

Lots of people seem to fear or think about the alleged sole power OpenAI COULD achieve. But would that be a problem, would "another Alphabet" be a problem? Hundreds of millions of people benefited and are benefiting today from their products. They have products that are reliable and work (This forum consisting of tech experts is a niche case, nearly all people don't care at all if data on them is being used for commercial purposes). Google had a patent guaranteed monopoly on search. But here we have: an almost non patented or patentable market, an open source community, other companies of all sizes competing, innovation happening and much more. It is true that companies like OpenAI have more funds available to spend than others, but such circumstances have always driven competition and innovation. And at the end of the day, customers are still going to use the best product they have decided to be so.

I know I may be stating the obvious but: The economy and the world is a chaos system with a unpredictable future to come.

Animats

2024-02-15
The Hollywood Reporter says many in the industry are very scared.[1]

“I’ve heard a lot of people say they’re leaving film,” he says. “I’ve been thinking of where I can pivot to if I can’t make a living out of this anymore.” - a concept artist responsible for the look of the Hunger Games and some other films.

"A study surveying 300 leaders across Hollywood, issued in January, reported that three-fourths of respondents indicated that AI tools supported the elimination, reduction or consolidation of jobs at their companies. Over the next three years, it estimates that nearly 204,000 positions will be adversely affected."

"Commercial production may be among the main casualties of AI video tools as quality is considered less important than in film and TV production."

[1] https://www.hollywoodreporter.com/business/business-news/ope...

bonaldi

2024-02-15
It’s heartening and gives me hope that the reaction here is so full of scepticism and concern. Sometimes proceeding with caution is warranted.

layer8

2024-02-15
The left hand of the Tokyo woman looks really creepy, especially from second ~20 onward. I guess some things don’t change. ;)

jibalt

2024-02-15
Something odd happens with that Tokyo woman's legs. First she skips a couple of times, then her feet change places.

majani

2024-02-15
In the last few days I've been asking myself what would drive the next big leap in advertising efficiency after big data and conversion pixels. I think I have my answer now. This is going to disrupt the ad agency side of the business big time.

0xE1337DAD

2024-02-15
How far are we from just giving it a novel and effectively asking it to create a TV series from it

thesmart

2024-02-15
What real life problem does this solve?

HEGalloway

2024-02-15
This is a great technical achievement, but in a couple of years time this will be as interesting as AI image generators.

SandroG

2024-02-15
This is surreal, both literally and figuratively.

geor9e

2024-02-15
Looking forward to someone feeding it the first draft of The Empire Strikes Back https://www.starwarz.com/starkiller/the-empire-strikes-back-...

eggplantemoji69

2024-02-15
Obviously concern yourself with your job and what you need to do to ensure you can obtain buying power going forward, but most problems and concerns about things like these go away if you just turn off your tech, or really be intentional about your usage.

Extremely hard to do, it is, but you’ll become quasi-Amish and realize how little is actually actionable and in our control.

You’ll also feel quite isolated, but peaceful. There’s always tradeoffs. You can’t have something without giving up not-something, if that makes sense.

Edit: So, essentially, ignorance is bliss, but try to look past the pejorative nature of that phrase and take it for what it is without status implications.

pants2

2024-02-15
Another step in the trend of everything becoming digital (in film and otherwise). It used to be that everything was done in camera. Then we got green screens, then advanced compositing, then CGI, then full realistic CGI movies modeled after real things and mocap suits. Now we're at the end game, where there will be no cameras used in the production of a movie, just studios of people sitting at their computers. Because more and more, humans are more efficient at just about anything when aided by a computer.

javednissar

2024-02-15
I question how much anyone has really used these models if they actually think these systems can replace people. I’ve consistently failed to get professional results out of these things and the degree of work required to get professional results makes me think a new class of job will be created to get professional results out of these systems.

That being said, there is value in these systems for casual use. For example, me and my girlfriend got into the habit of sending little cartoons to each other. These are cartoons we would have never created otherwise. I think that’s pretty awesome.

gatane

2024-02-15
AI was a mistake

XCSme

2024-02-15
If this can generate videos in real-time (60FPS), then you can, in theory, create any game just from text/prompts.

You just write the rules of the game and the player input, and let the AI generate the next frame.

sylware

2024-02-15
Hopefully we will see AIs with tools which are not "paint" or "notepad", but a maths formal proof solver, etc.

But I have a problem: I am unable to believe the videos I saw were dreamt by AI. I can feel deeply that I do believe there is some trickery or severe embellishment. If I am wrong, I guess we are at an inflexion point.

I can recall 10+ years ago, we were talking "in hacking groups" about AI because we thought the human brain alone was not good enough anymore... but in a maths/sciences context.

eggplantemoji69

2024-02-15
Value is going to be higher for professions where the human essence is an essential component of the function. Or professions that are more coupled with physical reality…my hedge is probably becoming an electrician.

I’d imagine IRL no-tech experiences will be the new ‘escapes’ too.

Maybe I’m too idealistic about the importance of the human spirit/essence…whatever that actually is.

srameshc

2024-02-15
Probably we humans will come to a point where we wouldn't even bother ourselves with making videos. We may just consume based on our emotional state on the fly generated by such services.

CommanderData

2024-02-15
All the software engineers and VFX people training to become plumbers. I'm afraid your clients will be jobless or underpaid by that time.

Jokes aside. It's becoming more apparent, Power will further concentrate to big tech firms.

LeicaLatte

2024-02-15
Real GPT-4 moment. Your 3500 MacBook cannot do this.

thelastparadise

2024-02-15
This looks like state of the art?

StarterPro

2024-02-15
Call me whatever you want, but this technology should not exist.

People to just create lifelike videos of anything they can put their mind to, is bound to lead to the ruining of many peoples' lives.

As many people that are aware and interested in this technology, there is 100x people who have no idea, don't care or can't comprehend it. Those are the people that I fear for. Grab a few pictures of the grandkids off of facebook, and now they have a realistic ransom video to send.

Am i being hyperbolic? I don't think so. Anything made by humans can be broken. And once its broken and out there, good luck.

HermitX

2024-02-15
AI will eventually be capable of performing most of the tasks humans can do. My neighbor's child is only 6 years old now. What advice do you think I should give to his parents to develop their child in a way that avoids him growing up to find that AI can do everything better than he can?

Devasta

2024-02-15
This technology is going to destroy society.

Want to form a trade union I'm your workplace? Best be ready to have videos of you jacking off to be all over the internet.

Videotape a police officer brutalising someone? Could easily have been made with AI, not admissable.

These things will ruin the ability to trust anything online.

seabombs

2024-02-15
All the examples feel so familiar, like I have seen them all before buried in the depths of YouTube and long-forgotten BBC documentaries. Which I guess is obvious knowing roughly how the training works.

I guess what I'm wondering is how "new" the videos are, or how closely do they mimic a particular video in the training set? Will we generate compelling and novel works of art with this, or is this just a very round-about way of re-implementing the YouTube search bar?

noisy_boy

2024-02-15
To those who are saying, look at this at a positive and it lets people unleash their creativity?

- This enables everyone to be creators

- Given that everyone's creativity isn't top notch, highest quality will be limited to a the best

- So rest of us will be consumers

- How will we consume if we don't have work and there is no UBI?

wnc3141

2024-02-15
I wonder if we as a society, have overrated value created digitally, and underrated value created physically or with proximity.

We still need nurses, cooks, theater, builders etc.

johnwheeler

2024-02-15
Holy fuck

billiam

2024-02-15
I find creepy things in all the videos, despite their breathtaking quality at first glance. Whether it is the way the dog walks out into space or the clawlike hand of the woman in Tokyo, they are still uncanny valley to me. I'm not going to watch a movie made this way, even if it costs my $0.15 instead of $15.00. But I got tired of Avatar after watching it for 20 minutes. Maybe all the artificial abundance and intellectual laziness the generative AI world will make us realize how precious and beautiful the real world is. For my kids' sake, I hope so.

justinl33

2024-02-15
This will probably cost some downvotes, but can we start a thread explaining the architecture behind this for this interested in how it actually works?

Tempest1981

2024-02-15
The rendering of static on the TVs is interesting/strange. Must be hard for AI to generate random noise:

Video 7 of 8 on the 2nd player on the page.

> Prompt: The camera rotates around a large stack of vintage televisions all showing different programs — 1950s sci-fi movies, horror movies, news, static, a 1970s sitcom, etc, set inside a large New York museum gallery.

justinl33

2024-02-15

packetlost

2024-02-15
I wonder how much of a blocker to real use not having things like model rigging or fine-tuned control over things will be to practical use of this? Clearly it can be used in toy examples with extremely impressive results, but I'm not entirely convinced that, as is, it can replace the VFX industry as a whole.

jononomo

2024-02-15
Maybe this means someone will make a non-superhero movie now.

oxqbldpxo

2024-02-15
US Elections about to peak, terrible timing.

wsintra2022

2024-02-15
Seriously cannot wait to be able to put a 1 weeks worth of dream diary into a tool like this and see my dream inspired movies!

xkgt

2024-02-15
This is pretty impressive, it seems that OpenAI consistently delivers exceptional work, even when venturing into new domains. But looking into their technical paper, it is evident that they are benefiting from their own body of work done in the past and also the enormous resources available to them.

For instance, the generational leap in video generation capability of SORA may be possible because:

1. Instead of resizing, cropping, or trimming videos to a standard size, Sora trains on data at its native size. This preserves the original aspect ratios and improves composition and framing in the generated videos. This requires massive infrastructure. This is eerily similar to how GPT3 benefited from a blunt approach of throwing massive resources at a problem rather than extensively optimizing the architecture, dataset, or pre-training steps.

2. Sora leverages the re-captioning technique from DALL-E 3 by leveraging GPT to turn short user prompts into longer detailed captions that are sent to the video model. Although it remains unclear whether they employ GPT-4 or another internal model, it stands to reason that they have access to a superior captioning model compared to others.

This is not to say that inertia and resources are the only factors that is differentiating OpenAI, they may have access to much better talent pool but that is hard to gauge from the outside.

anupamchugh

2024-02-15
Wow. And just like that fliki.ai and similar products have been sherlocked. Great time to be a creator, not the best time to be a product developer, production designer

booleandilemma

2024-02-15
Does Google have a competing product I can join the wait list for?

wingspar

2024-02-15
Watched the MKBHD video on this and couldn’t help but think about copyrights when he spoke of the impact on stock footage companies.

As I understand the current US situation, a straight prompt-to-generate-video cannot be copyrighted. https://www.copyright.gov/ai/ai_policy_guidance.pdf

But the copyright office is apparently considering the situation more thoroughly now.

Is that where it stands?

If it can’t be copyrighted, it seems that would tamper many uses.

donsupreme

2024-02-15
All current form of entertainment will be impacted, all of them.

Except for live sporting events.

This is why I think megacorps all going to bid for sport league streaming right. That's the only one that AI can't touch.

justin66

2024-02-15
Now that they’ve gone corporate, the OpenAI corporate motto ought to be “Because We Could.”

justanotherjoe

2024-02-15
What the f. What. I'm no AI pessimist by any means but I thought there are some significant hurdles before we get realistic, video generation without guidance. This is nothing short of amazing.

It's doubly amazing when you think that the richness of video data is almost infinitely more than text, and require no human made data.

The next step is to combine LLM with this, not for multimodal, but to team up together to make a 'reality model' that can work together to make a shared understanding?

I called LLMs 'language induced reality model' in the past. Then this is 'video induced reality model', which is far better at modeling reality than just language, as humans have testified.

dang

2024-02-15
Related ongoing thread: Video generation models as world simulators - https://news.ycombinator.com/item?id=39391458 - Feb 2024 (43 comments)

Also (since it's been a while): there are over 2000 comments in the current thread. To read them all, you need to click More links at the bottom of the page, or like this:

https://news.ycombinator.com/item?id=39386156&p=2

https://news.ycombinator.com/item?id=39386156&p=3

https://news.ycombinator.com/item?id=39386156&p=4[etc.]

geor9e

2024-02-15
Today we scroll social media feeds where every post we see is chosen by an algorithm based on all the feedback it gets from our interactions. Now imagine years down the road when Sora renders at 60 fps, every frame influenced by our reaction to the prior frame.

whyenot

2024-02-15
The world is changing before our eyes. It's exciting, sure, but I am also deeply afraid. AI may take humans to the next level, but it may also end us.

...and our future lies in the hand of venture capitalists, many of whom have no moral compass, just an insatiable hunger to make ever larger sum of money.

ramathornn

2024-02-15
Wow, some of those shots are so close to being unnoticeable. That one of the eye close up is insane.

It’s interesting reading all the comments, I think both sides to the “we should be scared” are right in some sense.

These models currently give some sort of super power to experts in a lot of digital fields. I’m able to automate the mundane parts of coding and push out fun projects a lot easier today. Does it replace my work, no. Will it keep getting better, of course!

People who are willing to build will have a greater ability to output great things. On the flip side, larger companies will also have the ability to automate some parts of their business - leading to job loss.

At some point, my view is that this must keep advancing to some sort of AGI. Maybe it’s us connecting our brains to LLMs through a tool like Neuralink. Maybe it’s a random occurrence when you keep creating things like Sora. Who knows. It seems inevitable though doesn’t it?

_blk

2024-02-15
"We’ll be taking several important safety steps ahead of making Sora available in OpenAI’s products. We are working with red teamers — domain experts in areas like misinformation, hateful content, and bias — who will be adversarially testing the model." - To make sure that the perfectly unbiased algorithms are biased against bias. So in essence, red teamers as in commies I suppose.

selvan

2024-02-15
Ad generation usecases are getting interesting with Video generation + Controlnet + Finetuning

TriangleEdge

2024-02-15
Welp, goodbye internet, it was fun to know you.

alokjnv10

2024-02-15
I'm simply blown away

alokjnv10

2024-02-15
How will it effect gaming industry? https://news.ycombinator.com/item?id=39393252

alokjnv10

2024-02-15
I'm just blown away. This can't be real. But lets be face the truth. Its even more impressive than ChatGPT. I think its the most impressive AI tech i've seen till now. I'm speechless.

Now the big question is. As OpenAI keeps pushing boundaries, it's fascinating to see the emergence of tools like Sora AI, capable of creating incredibly lifelike videos. But with this innovation comes a set of concerns we can't ignore.

So i'm worried about getting these tools misused. I'm thinking about what impact could they have on the trustworthiness of visual media, especially in an era plagued by fake news and misinformation? And what about the ethical considerations surrounding the creation and dissemination of content that looks real but isn't?

And, what we should do to tackle these potential issues? Should there be rules or guidelines to govern the use of such tools, and if so, how can we make sure they're effective?

Marwari

2024-02-15
Videos don’t feel real though this is best thing I have ever seen on topic ‘text-to-video’. I am sure this will go so far and become more realistic. But does this mean that we will not hire actors and creators but we will hire video editors who can stitch all together and prompt writers who can create tiny videos for story.

alokjnv10

2024-02-15
I'm just blown away. This can't be real. But lets be face the truth. Its even more impressive than ChatGPT. I think its the most impressive AI tech i've seen till now. I'm speechless. Now the big question is. As OpenAI keeps pushing boundaries, it's fascinating to see the emergence of tools like Sora AI, capable of creating incredibly lifelike videos. But with this innovation comes a set of concerns we can't ignore.

So i'm worried about getting these tools misused. I'm thinking about what impact could they have on the trustworthiness of visual media, especially in an era plagued by fake news and misinformation? And what about the ethical considerations surrounding the creation and dissemination of content that looks real but isn't?

TaylorGood

2024-02-15
Anyone to invite

hoc

2024-02-15
Everytime OpenAI comes up with an new fascinating gen model it also allows for that bluntly eye-opening perspective on what flood of crappy und unnecessary content we have been gotten accustomed to being thrown at us. Be it blown-up text description and filler talk, to these kind of vodka-selling commercial videos.

It's a nice cleansing benefit that comes with these really extraordinary tech achievement that should not be undervalued (after all it produces basically an endless amount of equally trained producers like the industry did in a - somehow malformed - way before).

Poster frames and commercials thrown at us all the time, consumed by our brains to a degree that we actually see a goal in producing more of them to act like a pro. The inflationary availability that comes with these tools seems a great help to leave some of this behind and draw a clearer line between it and actual content.

That said, Dall-E still produces enough colorful weirdness to not fall into that category at all.

Zuiii

2024-02-15
What goes around, comes around. I'm glad this is happening. Gitty and friends should be driven out of business for their absurd stunt they pulled with image search.

Yes, I'm still bitter about that.

krisboyz781

2024-02-15
OpenAI will be the most valuable company in history at this rate. This is insane

aggrrrh

2024-02-15
Looking at it and in my opinion it just reinforces theory that we live in simulation

jon37

2024-02-15
This is a weapon.

firefoxd

2024-02-15
Now I can finally adapt my short story into a short film. All for however this thing will end up costing.

ta93754829

2024-02-15
puts on the movie industry

_virtu

2024-02-15
In the future, we're not going to have common tv shows or movies. We'll have a constantly evolving stream of entertainment that's perfectly customized to the viewer's preferences in real time. This is just the first step.

nomad86

2024-02-15
Demo is always better than the real product. We'll soon see how it works...

velo_aprx

2024-02-15
I don't think i like the future.

throwaway4good

2024-02-15
What’s the connection between this and high end game engines (like unreal 5). I would expect 3d game engines to be used at least for training data and fine tuning. But perhaps also directly in the generation of the resulting videos?

For example this looks very much like something from a modern 3d engine:

https://twitter.com/OpenAI/status/1758192957386342435

apexalpha

2024-02-15
Wow. It's bizarre to see these video's.

Creating these video's in CGI is a profession that can make you serious money.

Until today.

What a leap.

pantulis

2024-02-15
This is the harbinger that announces that, as a technologist, the time has come for me to witness more and more things that I cannot understand how they work any more. The cycle has closed and I have now become my father.

timonoko

2024-02-15
What is the first book you want to see movie of? It should be verbatim and last a week, if needed.

I vote for Hothouse, by Brian W Aldiss. So many images need to imagined, like spiders that jump to the moon and back again.

wslh

2024-02-15
Where is the link to try it, ChatGPT doesn't know anything about it:

"Sora" is not a video generation technology offered by OpenAI. As of my last update in April 2023, OpenAI provides access to various AI technologies, including GPT (Generative Pre-trained Transformer) for text generation and DALL·E for image generation. For video generation or enhancement, there might be other technologies or platforms available, but "Sora" as a specific product related to OpenAI or video generation does not exist in the information I have.

If you're interested in AI technologies for video generation or any other AI-related inquiries, I'd be happy to provide information or help with what's currently available!

hnaccountme

2024-02-15
AI = Better CGI

CapitalTntcls

2024-02-15
Good by civilization

quonn

2024-02-15
> Sora serves as a foundation for models that can understand and simulate the real world, a capability we believe will be an important milestone for achieving AGI.

Why would it?

alkonaut

2024-02-15
It's odd how the model thinks "historical footage" could be done by drone. So it understands that there should be no cars in the picture. But not that there should be no flying perspective.

mihaic

2024-02-15
This is both amazing and saddening to me. All our cultural legacy is being fed into a monstrous machine that gives no attribution to the original content with which it was fed, and so the creative industry seems to be in great danger.

Creativity being automated while humans are forced to perform menial tasks for minimum wage doesn't seem like a great future and the geriatric political class has absolutely no clue how to manage the situation.

globular-toast

2024-02-15
This might actually ruin video and films for me. I don't want to be looking out for AI giveaways in everything I watch.

I can see a new market for true end-to-end analogue film productions emerging for people who like film.

tokai

2024-02-15
Even the good ones look kinda bad.

steveBK123

2024-02-15
Genuinely impressive.

I've always been a digital stills guy, and dabbled in video.. as a hobby. As a hobbyist, I always found the hardest thing is making something worth looking at. I don't see AI displacing the pleasure of the art for a hobbyist.

My next guess is the 80/20% or 95/5% problem is gonna be stuff like dialogue matching audio and mouth/face motion.

I do see this kind of stuff killing the stock images / media illustrator / b-roll footage / etc jobs.

Could a content mill pump out plausibly decent Netflix video series given this tool and a couple half decent writers.. maybe? Then again it may be the perpetual "5 years away". There's a wide gap between generating filler content & producing something people choose to watch willingly for entertainment.

Lichtso

2024-02-15
Here is my prediction of how this will play out for the entertainment industry in the coming decades:

Phase 1 (we are here now): While generative AI is not good enough to directly produce parts of the final product, it can already be used to quickly prototype styles, stories, designs, moods, etc. A good chunk of the unnamed behind-the-scenes-people will loose their job.

Phase 2: While generative AI is still expensive, the output quality is sufficient to directly produce parts of / the entire final product. Big production outlets will use it to produce AAA titles and blockbusters. Even actors, directors and other high publicity positions will be replaced.

Phase 3: The production cost will sink further until it becomes attainable by smaller studios and indie productions. The already fierce markets will be completely flooded with more and more quantity over quality. Advertisement will not be pre-produced and cut into videos anymore but become very subtle product placements, impossible for ad-blockers to strip from the product.

Phase 4: Once the production cost falls below the price of one copy of the product, we will get completely customized entertainment products tailored to our personal taste. Online communities will emerge which craft skeletons / templates which then are filled out by the personal parameter sets of the consumers. That way you can still share the experience with friends even though everybody experiences a different variation.

Phase 5: As consumers do not hit any production limits any more (e.g. binge watch their favorite series ad infinitum) and the product becomes optimized to be maximally addictive by measuring their reaction to it, it will become impossible for most human beings to resist. The entertainment mania will reach its peak and social isolation, health issues and economic factors will bring down the human reproduction rate to basically zero.

Phase 6: Human civilization collapsed within one or two generations and the only survivors will be extremely technology-adverse people by selection. AGI might have happened in the meantime but did not have the time to gracefully take over and remodel the human infrastructure to become self sufficient. Instead a strict religion will rule the lands and the dark ages begin anew.

Note that none of this is new, it is just the continuation and intensification of already existing trends. This is also not AGI doomerism as it does not involve a malicious AGI gone rouge or anything like that. It is simply what happens when human nature meets powerful technology.

TLDR: While I love the technology I can only see very negative long-term outcomes from this.

generagent

2024-02-15
This is machine simulated art. It is not a convincing simulation to videographers, yet it pleases software architects and other non-visual artists. Aptitude for visual art making provokes envy in some who lack it. The drive to simulate art is almost as common as the desire to be recognized as a capable visual artist. The most interesting generative art I’ve seen does not attempt verisimilitude. Children want their art to look real. Verisimilitude is hard, especially for children and quasi AI.

slowturtle

2024-02-15
I can’t wait for the day I can strap on my Apple® Vision Pro® 9 with OpenAI® integration and spend all my time interfacing (wink) with my virtual girlfriend. Sure my unlit 3 by 3 meter LifePod® is a little cramped and my arm itches from the Soylent® IV drip, but I’ll save so much time by not having to go outside and interact with legacy humans!

totaldude87

2024-02-15
What happens when humanity stops generating new content/recording new findings/knowledge etc ? are at a place where whatever we had is enough knowledge for AI takeover?

or we are heading towards a skynet-y feature

dr__mario

2024-02-15
I'd love to feel excited by all these advancements and somehow I feel numb. I get part of the feeling (worry about inequalities it may generate), but I sense something more. It's like I see it as a toy... I'm unable to dream on how this will impact my life in any meaningful way.

pcdoodle

2024-02-15
Call me a Luddite but I don't want these videos hitting my retinas.

There should be an opt out from being subjected to AI content.

landingunless

2024-02-15
Wonder how the folks at Runway and Pika are thinking about this.

To me, it's becoming increasingly obvious that startups whose defensibility hinges on "hoping OpenAI doesn't do this" are probably not very enduring ones.

anirudhv27

2024-02-15
What makes OpenAI so far ahead of all of these other research firms (or even startups like Pika, Runway, etc.)? I feel like I see so many examples of fields where progress is being made all across and OpenAI suddenly swoops in with an insane breakthrough lightyears ahead of everyone else.

asciii

2024-02-15
Beautifully terrifying

eutropia

2024-02-15
I hope this doesn't get buried...

As several others have pointed out, realism of these models will continue to improve, and will soon be economically useful for producing beautiful or functional artifacts - however prompt adherence (getting what you want or intend) of the models is growing much more slowly.

However I think we have a long ways to go before we'll see a decent "AI Film" that tells a compelling story - and this has nothing to do with some sort of naturalistic fallacy that appeals to some innate nature of humans!

It comes down to the dataset and the limits of human creators in their ability to communicate their process. Image-Text and Video-Text pairs are mostly labeled by semi-skilled humans who describe what they see in detail. They are, for the most part, very good at capturing the obvious salient features of an image or a video. "reflections of the neon lights glisten in the sidewalk". However, what you see in a movie scene is the sum total of dozens if not hundreds of influences, large and subtle. Choices made by the actors, camera operators, lighting designers, sound designers, costuming, makeup, editors, etc... Most people are not trained to recognize these choices at all, or might not even be aware that there are choices to make. We (simply) see "Joaqin Phoenix is making awkward small-talk in the elevator with other office workers".

So much of what we experience processes on subconscious and emotional and purely sensory levels, we don't elevate those lower-level qualia to our higher-brain's awareness and label them with vocabulary without intentional training (such as tasting wine, coffee, beer, etc - developing a palate is an act of sensory-vocabulary alignment).

However, despite not raising these things to our intentional awareness, it has an influence on us -- often the desired impact of the person who made that choice in the first place. The overall effect of all of these intentional choices makes things 'feel right'.

There's no fundamental reason AI can't produce an output that has the same effect as those choices, however finding each little choice is like a needle in a haystack. Accurate labeling of the training data tells the AI where to look -- but the people labeling the data are probably not well-versed in all of the little intentional choices that can be made when creating a piece of video-media.

Beyond the issue of the labeling folks being trained in the art itself, there's the problem too of the artists themselves not being able to fully articulate their (numerous, little, snowflake-into-avalanche) choices - or simply not articulating it even if they could. Ask Jackson Pollock about paint viscosity and you'll learn a great deal, but ask about abstract painting composition and there's this ineffable gap that language seems ill-suited to cross. The painter paints what they feel, and they hope that feeling is conveyed to the viewer - but you'd be hard pressed to recreate "Autumn Rhythm (Number 30)" if you had to transmit the information via language and hope they interpreted it correctly. Art is simultaneously vague and specific!

So, to sum up the problem of conveying your intent to the model:

- The training data labels capture obvious or salient features, but not choices only visible to the trained eye

- The material itself is created by human artists who might not even be able to explain all of their choices in words

- You the prompter might not have the vocabulary that captures succinctly and specifically the intended effect

- The end result will necessarily be not quite what you imagined in your mind's eye as a result of all of this missing information

You can still get good results if you tell it to copy something, because the label "Tarantino" captures a lot of detail, even all the little things you and the training data would never have labeled in words. But it won't be yours and - until we have an army of trained artists providing precise descriptions for training data in their area of expertise, and you know how to speak those artists' language - it can't be yours.

robblbobbl

2024-02-15
Holy Moly

yandrypozo

2024-02-15
did anyone saw the two-leg horses in the video?

hpeter

2024-02-15
One one side, we have people who are upset because the creators of the videos in the dataset used for teaching this language model were not compensated.

On the other hand, people find the tech very impressive and there are a lot of mind blowing use-cases.

Personally, this opens up the world for me to create video ads for software projects I create, since I have no financial resources or time to actually make videos, I only know how to code. So I find it pretty exciting. It's great for solo entrepreneurs.

cchance

2024-02-15
The scene of the train, could easily be used in a transition scene in a movie, like theres so much here like stock videos are gonna be f*cked in short order, and if they add composition and planning tools, and loras, so will the movie industry.

lencastre

2024-02-15
One day OpenAI itself will replace Altman and take charge.

TriangleEdge

2024-02-15
I predict the word "disrupt" will see an exponential curve [1].

https://trends.google.com/trends/explore?date=all&q=disrupt&...

hcarvalhoalves

2024-02-15
Oh nice, we’ll get a new shitty Marvel movie every week now.

dudeinhawaii

2024-02-15
I say this with all sincerity, if you're not overwhelmingly impressed with Sora then you haven't been involved in the field of AI generated video recently. While we understand that we're on the exponential curve of AI progress, it's always hard to intuit just what that means.

Sora represents a monumental leap forward, it's comically a 3000% improvement in 'coherent' video generation seconds. Coupled with a significantly enhanced understanding of contextual prompts and overall quality, it's has achieved what many (most?) thought would take another year or two.

I think we will see studios like ILM pivoting to AI in the near future. There's no need for 200 VFX artists when you can have 15 artists working with AI tooling to generate all the frame-by-frame effects, backgrounds, and compositing for movies. It'll open the door for indie projects that can take place in settings that were previously the domain of big Hollywood. A sci-fi opera could be put together with a few talented actors, AI effects and a small team to handle post-production. This could conceivably include AI scoring.

Sure, Hollywood and various guilds will strongly resist but it'll require just a handful of streaming companies to pivot. Suddenly content creation costs for Netflix drops an order of magnitude. The economics of content creation will fundamentally change.

At the risk of being proven very wrong, I think replacing actors is still fairly distant in the future but again... humans are bad at conceptualizing exponential progress.

nbzso

2024-02-15
The idea that prompting is a creative tool is utterly illogical. This will result in a ton of mediocre synthetic crap for corporate presentations and porn generating.

Contrary to the trends in SV, dehumanization of creative professions will result not in productivity boost but in utter chaos and as a result will add more time loss in production process.

I never liked Sam Altman in his Y years, now I know why.

Even with the "blessings" from the "masters" in Davos/Bilderberg, a bad idea is a bad idea. Maybe this will push World ID as a result, but is it necessary?

The current trends in tech are not producing solutions for a professional problem. With rare exceptions, this looks more and more as removing of human input and normalization of a society ruled by AI at any cost. So sad.

uconnectlol

2024-02-15
right, we all new AI would be closer to realization in 2020. of course the first one to do it is some complete sellout asshole, affirming hateful rhetoric like "we have to make thing safe", which is just thinly veiled pro police state sentiment. every single thing you can come up with why this is "unsafe" is just police state mentality.

"porn without consent" - thought crime

"too much porn of whatever you dream of" - yes, conservatives (50% of USA) actually think this is a problem

"spam" - advancing the closed garden model email is heading towards. soon you will simply need government id to make email even though there are plenty of alternative ways to do communication aside from email which was already considered insecure and a bad protocol in 2000. this has nothing to do with AI but they are still acknowledging this absurdity by framing AI as the enabler of that.

"automated social engineering" - just weaponizing the ignorance the bad thought leaders of the industry left us. instead of giving us proper authentication methods, we still have "just send my photo id to these 33 companies, which will ask for it in random ways we dont expect and just have to trust them"

"copyright" - literally not a problem, almost nothing "protected" by copyright matters and the law is just used by aggressive capitalists to shove their products down everyone's throat

"ICBMs being automatically hacked and launched at people" - just stop being bad government and hiring completely uncredible people to implement every mission critical control system while hooking it up to the internet

"racist bias" (or whatever) - this is the dumbest fucking thing i've ever heard of

this website is a perfect snapshot of why tech sucks so hard. its dressed up like cinematic film using a ton of js libs and css hacks or god knows so it can only be viewed smoothly on the latest computer hardware. only on one of the big 3 browsers that each had a trillion man hours of pointless iterations driven by digital graphics marketing companies. and on top of that they have a nice professional tone made by $300K/year PR people. please, sincerely, fuck off.

kaonashi

2024-02-15
quite the technical feat I suppose, but the actual result is nightmare fuel -- legs swapping places, people walking into simulacrum of spaces -- just deeply unsettling uncanny valley stuff

garfieldnate

2024-02-15
The gold rush scene is the most captivating to me. The film style looks like it's from the 70's/80's (reminds me of Little House on the Prairie), but the footage is from a drone standpoint. I find it magically immersive in a time when none of the technology to make the shot would have existed.

throw310822

2024-02-15
Very late so probably invisible to all, but is this just a byproduct of OpenAI's work on understanding of video input? The Google Gemini presentation video suggested that this is the next step-level of AIs. Already with GPT-4V, being able to converse with an AI about the contents of an image feels surreal. The applications that become possible with an AI that can just look at video streams are incredible.

lofaszvanitt

2024-02-15
When will this bubble going to burst? :D

mywacaday

2024-02-15
This is amazing and was to be expected. Are there any good solutions that can be used to prove a video is not generated? I guess in some ways we have come full circle and are back to trusting individual journalists and content creators I just did'nt think it would happen this fast.