Hacker News

Powered by HN Search API

GPT-4o

From https://openai.com/index/hello-gpt-4o/
Lealen | 2024-05-13 | 3138

Comments:

EcommerceFlow

2024-05-13
A new "flagship" model with no improvement of intelligence, very disappointed. Maybe this is a strategy for them to mass collect "live" data before they're left behind by Google/Twitter live data...

belter

2024-05-13

chzblck

2024-05-13
real time audio is mind blowing

throwup238

2024-05-13
So what's the point of paying for ChatGPT Plus? And who on earth chose to make the app Mac only...

tomschwiha

2024-05-13
I like the demo for sure more than the "reduced latency" Gemini demo [0].

[0] https://www.youtube.com/watch?v=UIZAiXYceBI

Powdering7082

2024-05-13
Wow this versioning scheme really messed up this prediction market: https://kalshi.com/markets/gpt4p5/gpt45-released

smusamashah

2024-05-13
That im-also-a-good-gpt2-chatbot[1] was in fact the new ChatGPT model as people were assuming few days ago here on HN[2].

Edit: may be not, name of that bot was just "gpt2-chatbot". May be that one was some initial iteration?

[1] https://twitter.com/LiamFedus/status/1790064963966370209/pho...

[2] https://news.ycombinator.com/item?id=40199715

theusus

2024-05-13
This 4o is already rolling out?

GalaxyNova

2024-05-13
It is really cool that they are bringing this to free users. It does make me wonder what justifies ChatGPT plus now though...

ppollaki

2024-05-13
I've noticed that the GPT-4 model's capabilities seem limited compared to its initial release. Others have also pointed this out. I suspect that making the model free might have required reducing its capabilities to meet cost efficiency goals. I'll have to try it out to see for myself.

EcommerceFlow

2024-05-13
As I commented in the other thread, really really disappointed there's no intelligence update and more of a focus on "gimmicks". The desktop app did look really good, especially as the models get smarter. Will be canceling my premium as there's no real purpose of it until that new "flag ship" model comes out.

OutOfHere

2024-05-13
I don't see 4o or anything new at https://platform.openai.com/docs/models

Overall I am highly skeptical of newer models as they risk worsening the completion quality to make them cheaper for OpenAI to run.

atgctg

2024-05-13
Tiktoken added support for GPT-4o: https://github.com/openai/tiktoken/commit/9d01e5670ff50eb74c...

It has an increased vocab size of 200k.

FergusArgyll

2024-05-13
First Impressions in no particular order:

  Being able to interrupt while GPT is talking
  2x faster/cheaper
  not really a much smarter model
  Desktop app that can see screenshots
  Can display emotions with and change the sound of "it's" voice

ralusek

2024-05-13
Can't find info which of these new features are available via the API

Jensson

2024-05-13
The most impressive part is that the voice uses the right feelings and tonal language during the presentation. I'm not sure how much of that was that they had tested this over and over, but it is really hard to get that right so if they didn't fake it in some way I'd say that is revolutionary.

modeless

2024-05-13
As far as I'm concerned this is the new best demo of all time. This is going to change the world in short order. I doubt they will be ready with enough GPUs for the demand the voice+vision mode is going to get, if it's really released to all free users.

Now imagine this in a $16k humanoid robot, also announced this morning: https://www.youtube.com/watch?v=GzX1qOIO1bE The future is going to be wild.

ilaksh

2024-05-13
This is so amazing.. are there any open source models that are in any way comparable? Fully multimodal audio-to-audio etc.?

skilled

2024-05-13
Parts of the demo were quite choppy (latency?) so this definitely feels rushed in response to Google I/O.

Other than that, looks good. Desktop app is great, but I didn’t see no mention of being able to use your own API key so OS projects might still be needed.

The biggest thing is bringing GPT-4 to free users, that is an interesting move. Depending on what the limits are, I might cancel my subscription.

syntaxing

2024-05-13
I admit I drink the koolaid and love LLMs and their applications. But damn, the way it’s responds in the demo gave me goosebumps in a bad way. Like an uncanny valley instincts kicks in.

hubraumhugo

2024-05-13
The movie Her has just become reality

hmmmhmmmhmmm

2024-05-13
With the news that Apple and OpenAI are closing / just closed a deal for iOS 18, it's easy to speculate we might be hearing about that exciting new model at WWDC...

chatcode

2024-05-13
Parsing emotions in vocal inflections (and reliably producing them in vocal output) seems quite under-hyped in this release.

That seems to represent an entirely new depth of understanding of human reality.

rvz

2024-05-13
Given that they are moving all these features to free users, it tells us that GPT-5 is around the corner and is significantly much better than their previous models.

PoignardAzur

2024-05-13
Holy crap, the level of corporate cringe of that "two AIs talk to each other" scene is mind-boggling.

It feels like a pretty strong illustration of the awkwardness of getting value from recent AI developments. Like, this is technically super impressive, but also I'm not sure it gives us anything we couldn't have one year ago with GPT-4 and ElevenLabs.

sourcecodeplz

2024-05-13
It is quite nice how they keep giving premium features for free, after a while. I know openai is not open and all but damn, they do give some cool freebies.

BoumTAC

2024-05-13
Did they provide the limit rate for free user ?

Because I have the plus membership which is expensive (25$/month).

But if the limit is high enough (or my usage low enough), there is no point for paying that much money for me.

christianqchung

2024-05-13
Does anyone know how they're doing the audio part where Mark breaths too hard? Does his breathing get turned into all-caps text (AA EE OO) and that GPT4-o interprets that as him breathing too hard, or is there something more going on?

crindy

2024-05-13
Very impressed by the demo where it starts speaking French in error, then laughs with the user about the mistake. Such a natural recovery.

spacebanana7

2024-05-13
> We recognize that GPT-4o’s audio modalities present a variety of novel risks

> For example, at launch, audio outputs will be limited to a selection of preset voices and will abide by our existing safety policies.

I wonder if they’ll ever allow truly custom voices from audio samples.

tomComb

2024-05-13
The price of 4o is 50% of GPT4-Turbo (and no mention of price change to gp4-turbo itself).

Given the competitive pressures I was expecting a much bigger price drop than that.

For non-multimodal uses, I don't think their API is at all competitive any more.

lagt_t

2024-05-13
Universal real time translation is incredibly dope.

I hate video players without volume control.

pachico

2024-05-13
jeez, that model really speaks a lot! I hope there's a way to make it more straight to the point rather than radio-like.

causal

2024-05-13
Clicking the "Try it on ChatGPT" link just takes me to GPT-4 chat window. Tried again in an incognito tab (supposing my account is the issue) and it just takes me to 3.5 chat. Anyone able to use it?

TrueDuality

2024-05-13
Weird visiting the page crashed my graphics driver using Firefox.

msoad

2024-05-13
They are admitting[1] that the new model is the gpt2-chatbot that we have seen before[2]. As many highlighted there, the model is not an improvement like GPT3->GPT4. I tested a bunch of programming stuff and it was not that much better.

It's interesting that OpenAI is highlighting the Elo score instead of showing results for many many benchmarks that all models are stuck at 50-70% success.

[1] https://twitter.com/LiamFedus/status/1790064963966370209

[2] https://news.ycombinator.com/item?id=40199715

Jimmc414

2024-05-13
Big questions are (1) when is this going to be rolled out to paid users? (2) what is the remaining benefit of being a paid user if this is rolled out to free users? (3) Biggest concern is will this degrade the paid experience since GPT-4 interactions are already rate limited. Does OpenAI have the hardware to handle this?

Edit: according to @gdb this is coming in "weeks"

https://twitter.com/gdb/status/1790074041614717210

lxgr

2024-05-13
Will this include image generation for the free tier as well? That's a big missing feature in OpenAI's free tier compared to Google and Meta.

OliverM

2024-05-13
This is impressive, but they just sound so _alien_, especially to this non-U.S. English speaker (to the point of being actively irritating to listen to). I guess picking up on social cues communicating this (rather than express instruction or feedback) is still some time away.

It's still astonishing to consider what this demonstrates!

w-m

2024-05-13
Gone are the days of copy-pasting to/from ChatGPT all the time, now you just share your screen. That's a fantastic feature, in how much friction that removes. But what an absolute privacy nightmare.

With ChatGPT having a very simple text+attachment in, text out interface, I felt absolutely in control of what I tell it. Now when it's grabbing my screen or a live camera feed, that will be gone. And I'll still use it, because it's just so damn convenient?

jawiggins

2024-05-13
I hope when this gets to my iphone I can use it to set two concurrent timers.

mellosouls

2024-05-13
Very, very impressive for a "minor" release demo. The capabilities here would look shockingly advanced just 5 years ago.

Universal translator, pair programmer, completely human sounding voice assistant and all in real time. Scifi tropes made real.

But: Interesting next to see how it actually performs IRL latency and without cherry-picking. No snark, it was great but need to see real world power. Also what the benefits are to subscribers if all this is going to be free...

yumraj

2024-05-13
In the first video the AI seems excessively chatty.

csjh

2024-05-13
I wonder if this is what the "gpt2-chatbot" that was going around earlier this month was

peppertree

2024-05-13
Just like that Google is on back foot again.

sebastiennight

2024-05-13
Anyone who watched the OpenAI livestream: did they "paste" the code after hitting CTRL+C ? Or did the desktop app just read from the clipboard?

Edit: I'm asking because of the obvious data security implications of having your desktop app read from the clipboard _in the live demo_... That would definitely put a damper to my fanboyish enthusiasm about that desktop app.

sn_master

2024-05-13
This is every romance scammer's dreams come true...

summerlight

2024-05-13
This is really impressive engineering. I thought real time agents would completely change the way we're going to interact with large models but it would take 1~2 more years. I wonder what kind of new techs are developed to enable this, but OpenAI is fairly secretive so we won't be able to know their sauce.

On the other hand, this also feels like a signal that reasoning capability has probably already been plateaued at GPT-4 level and OpenAI knew it so they decided to focus on research that matters to delivering product engineering rather than long-term research to unlock further general (super)intelligence.

MBCook

2024-05-13
Why must every website put stupid stuff that floats above the content and can’t be dismissed? It drives me nuts.

dkga

2024-05-13
That can “reason”?

MisterBiggs

2024-05-13
I've been waiting to see someone drop a desktop app like they showcased. I wonder how long until it is normal to have an AI looking at your screen the entire time your machine is unlocked. Answering contextual questions and maybe even interjecting if it notices you made a mistake and moved on.

bredren

2024-05-13
It is notable OpenAI did not need to carefully rehearse the talking points of the speakers. Or even do the kind of careful production quality seen in a lot of other videos.

The technology product is so good and so advanced it doesn't matter how the people appear.

Zuck tried this in his video countering to vision pro, but it did not have the authentic "not really rehearsed or produced" feel of this at all. If you watch that video and compare it with this you can see the difference.

Very interesting times.

skepticATX

2024-05-13
Very impressive demo, but not really a step change in my opinion. The hype from OpenAI employees was on another level, way more than was warranted in my opinion.

Ultimately, the promise of LLM proponents is that these models will get exponentially smarter - this hasn’t born out yet. So from that perspective, this was a disappointing release.

If anything, this feels like a rushed release to match what Google will be demoing tomorrow.

altcognito

2024-05-13
GPT-4 expressing a human-like emotional response every single time you interact with it is pretty annoying.

In general, trying to push that this is a human being is probably "unsafe", but that hurts the marketing.

jonquark

2024-05-13
It might be region specific (I'm in the UK) - but I don't "see" the new model anywhere e.g. if I go to: https://platform.openai.com/playground/chat?models=gpt-4o The model the page uses is set to gpt-3.5-turbo-16k.

I'm confused

aw4y

2024-05-13
I don't see anything released today. Login/signup is still required, no signs of desktop app or free use on web. What am I missing?

goalonetwo

2024-05-13
For all the hype around this announcement I was expecting more than some demo-level stuff that close to nobody will use in real life. Disappointing.

mellosouls

2024-05-13

101008

2024-05-13
Are the employees in the demo high-directives of OpenAI? I can understand Altman being happy with this progress, but what about the medium/low employees? Didn't they watch Oppenheimer? Are they happy they are destroying humanity/work/etc for future and not-so-future generations?

Anyone who thinks this will be like the previous work revolutions is nonsense. This replaces humans and will replace them even more on each new advance. What's their plan? Live out of their savings? What about family/friends? I honestly can't see this and think how they can be so happy about it...

"Hey, we created something very powerful that will do your work for free! And it does it better than you and faster than you! Who are you? It doesn't matter, it applies to all of you!"

And considering I was thinking in having a kid next year, well, this is a no.

karaterobot

2024-05-13
That first demo video was impressive, but then it ended very abruptly. It made me wonder if the next response was not as good as the prior ones.

MP_1729

2024-05-13
This thing continues to stress my skepticism for AI scaling laws and the broad AI semiconductor capex spending.

1- OpenAI is still working in GPT-4-level models. More than 14 months after the launch of GPT-4 and after more than $10B in capital raised. 2- The rhythm that token prices are collapsing is bizarre. Now a (bit) better model for 50% of the price. How people seriously expect these foundational model companies to make substantial revenue? Token volume needs to double just for revenue to stand still. Since GPT-4 launch, token prices are falling 84% per year!! Good for mankind, but crazy for these companies. 3- Maybe I am an asshole, but where are my agents? I mean, good for the consumer use case. Let's hope the rumors that Apple is deploying ChatGPT with Siri are true, these features will help a lot. But I wanted agents! 4- These drop in costs are good for the environment! No reason to expect them to stop here.

joshstrange

2024-05-13
Looking forward to trying this via ChatGPT. As always OpenAI says "now available" but refreshing or logging in/out of ChatGPT (web and mobile) don't cause GPT-4o to show up. I don't know why I find this so frustrating. Probably because they don't say "rolling out" they say things like "try it now" but I can't even though I'm a paying customer. Oh well...

candiodari

2024-05-13
I wonder if the audio stuff works like ViTS. Do they just encode the audio as tokens and input the whole thing? Wouldn't that make the context size a lot smaller?

One does notice that context size is noticeably absent from the announcement ...

cs702

2024-05-13
The usual critics will quickly point out that LLMs like GPT-4o still have a lot of failure modes and suffer from issues that remain unresolved. They will point out that we're reaping diminishing returns from Transformers. They will question the absence of a "GPT-5" model. And so on -- blah, blah, blah, stochastic parrots, blah, blah, blah.

Ignore the critics. Watch the demos. Play with it.

This stuff feels magical. Magical. It makes the movie "Her" look like it's no longer in the realm of science fiction but in the realm of incremental product development. HAL's unemotional monotone in Kubrick's movie, "Space Odyssey," feels... oddly primitive by comparison. I'm impressed at how well this works.

Well-deserved congratulations to everyone at OpenAI!

bearjaws

2024-05-13
OAI just made an embarrassment of Google's fake demo earlier this year. Given how this was recorded, I am pretty certain it's authentic.

levocardia

2024-05-13
As a paid user this felt like a huge letdown. GPT-4o is available to everyone so I'm paying $20/mo for...what, exactly? Higher message limits? I have no idea if I'm close to the message limits currently (nor do I even know what they are). So I guess I'll cancel, then see if I hit the limits?

I'm also extremely worried that this is a harbinger of the enshittification of ChatGPT. Processing video and audio for all ~200 million users is going to be extravagantly expensive, so my only conclusion is that OpenAI is funding this by doubling down on payola-style corporate partnerships that will result in ChatGPT slyly trying to mention certain brands or products in our conversations [1].

I use ChatGPT every day. I love it. But after watching the video I can't help but think "why should I keep paying money for this?"

[1] https://www.adweek.com/media/openai-preferred-publisher-prog...

noncoml

2024-05-13
They really need to tone down the talking garniture. It needs to put on its running shoes and get to the point on every reply. Ain’t nobody has time to keep listening to AI blubbering along at every prompt.

dbcooper

2024-05-13
question for you guys - is there a model that can take figures (graphs), from scientific publications, and combine image analysis with picking up the data point symbol descriptions and analyse the trends?

krunck

2024-05-13
So GPT-4o can do voice intonation? Great. Nice work.

Still, it sounds like some PR drone selling a product. Oh wait....

CivBase

2024-05-13
Those voice demos are cool but having to listen to it speak makes me even more frustrated with how these LLMs will drone on and on without having much to say.

For example, in the second video the guy explains how he will have it talk to another "AI" to get information. Instead of just responding with "Okay, I understand" it started talking about how interesting the idea sounded. And as the demo went on, both "AIs" kept adding unnecessary commentary about the secenes.

I would hate having to talk with these things on a regular basis.

DataDaemon

2024-05-13
Now, say goodbye to call centers.

joshstrange

2024-05-13
What do they mean by "desktop version"? I assume that doesn't mean a "native" (electron) app?

simonw

2024-05-13
I'm seeing gpt-4o in the OpenAI Playground interface already: https://platform.openai.com/playground/chat?mode=chat&model=...

First impressions are that it feels very fast.

tailspin2019

2024-05-13
Does anyone with a paid plan see anything different in the ChatGPT iOS app yet?

Mine just continues to show “GPT 4” as the model - it’s not clear if that’s now 4o or there is an app update coming…

ilaksh

2024-05-13
Are there any remotely comparable open source models? Fully multimodal, audio-to-audio?

MBCook

2024-05-13
Too bad they consume 25x the electricity Google does.

https://www.brusselstimes.com/world-all-news/1042696/chatgpt...

delichon

2024-05-13
Won't this make pretty much all of the work to make a website accessible go away, as it becomes cheap enough? Why struggle to build alt content for the impaired when it can be generated just in time as needed?

And much the same for internationalization.

Negitivefrags

2024-05-13
I found these videos quite hard to watch. There is a level of cringe that I found a bit unpleasant.

It’s like some kind of uncanny valley of human interaction that I don’t get on nearly the same level with the text version.

brainer

2024-05-13
OpenAI's Mission and the New Voice Mode of GPT-4

• Sam Altman, the CEO of OpenAI, emphasizes two key points from their recent announcement. Firstly, he highlights their commitment to providing free access to powerful AI tools, such as ChatGPT, without advertisements or restrictions. This aligns with their initial vision of creating AI for the benefit of the world, allowing others to build amazing things using their technology. While OpenAI plans to explore commercial opportunities, they aim to continue offering outstanding AI services to billions of people at no cost.

• Secondly, Altman introduces the new voice and video mode of GPT-4, describing it as the best compute interface he has ever experienced. He expresses surprise at the reality of this technology, which provides human-level response times and expressiveness. This advancement marks a significant change from the original ChatGPT and feels fast, smart, fun, natural, and helpful. Altman envisions a future where computers can do much more than before, with the integration of personalization, access to user information, and the ability to take actions on behalf of users.

https://blog.samaltman.com/gpt-4o

deegles

2024-05-13
what's the path from LLMs to "true" general AI? is it "only" more training power/data or will they need a fundamental shift in architecture?

banjoe

2024-05-13
I still need to talk very fast to actually chat with ChatGPT which is annoying. You can tell they didn't fix this based on how fast they are talking in the demo.

gallerdude

2024-05-13
Interesting that they didn't mention a bump in capabilities - I wrote a LLM benchmark a few weeks ago, and before GPT-4 could solve Wordle about ~48% of the time.

Currently with GPT-4o, it's easily clearing 60% - while blazing fast, and half the cost. Amazing.

dom96

2024-05-13
I can't help but feel a bit let down. The demos felt pretty cherry picked and still had issues with the voice getting cut off frequently (especially in the first demo).

I've already played with the vision API, so that doesn't seem all that new. But I agree it is impressive.

That said, watching back a Windows Vista speech recognition demo[1] I'm starting to wonder if this stuff won't have the same fate in a few years.

1 - https://www.youtube.com/watch?v=VMk8J8DElvA

jrflowers

2024-05-13
I like the robot typing at the keyboard that has B as half of the keys and my favorite part is when it tears up the paper and behind it is another copy of that same paper

hu3

2024-05-13
That they are offering more features for free concurs with my theory that, just like search, state of the art AI will soon be "free", in exchange for personal information/ads.

CosmicShadow

2024-05-13
In the video where the 2 AI's sing together, it starts to get really cringey and weird to the point where it literally sounds like it's being faked by 2 voice actors off-screen with literal guns to their heads trying not to cry, did anyone else get that impression?

The tonal talking was impressive, but man that part was like, is someone being tortured or forced against their will?

mickg10

2024-05-13
So, babelfish soon?

taytus

2024-05-13
the OpenAI live stream was quite underwhelming...

mickg10

2024-05-13
So, babelfish incoming?

alvaroir

2024-05-13
I'm really impressed about this demo! Apart from the usual quality benchmarks I'm really impressed about the latency for audio/video: "It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response"... If true at scale, what could be the "tricks" they're using for achieving that?!

Thaxll

2024-05-13
It's pretty impressive, although I don't like the voice / tone, I prefer something more neutral.

blixt

2024-05-13
GPT-4o being a truly multimodal model is exciting, does open the door to more interesting products. I was curious about the new tokenizer which uses much fewer tokens for non-English, but also 1.1x fewer tokens for English, so I'm wondering if this means each token now can be more possible values than before? Might make sense provided that they now also have audio and image output tokens? https://openai.com/index/hello-gpt-4o/

I wonder what "fewer tokens" really means then, without context on raising the size of each token? It's a bit like saying my JPEG image is now using 2x fewer words after I switched from a 32-bit to a 64-bit architecture no?

catchnear4321

2024-05-13
window dressing

his love for yud is showing.

frabcus

2024-05-13
I can't see any calculator for the audio pricing (https://openai.com/api/pricing/) or document type field in the Chat Completions API (https://platform.openai.com/docs/api-reference/chat/create) for this new model.

Is the audio in API not available yet?

willsmith72

2024-05-13
> We plan to launch support for GPT-4o's new audio and video capabilities to a small group of trusted partners in the API in the coming weeks.

So no word on an audio api for regular joes? that's the number 1 thing i'm looking for

UncleOxidant

2024-05-13
Looking at the demo video, the AIs are a bit too chatty. The human has to often interrupt them.

A nice feature would be to be able to select a Meyer's Briggs personality type for your AI chatbot.

michalf6

2024-05-13
I cannot find the mac app anywhere. Is there a link?

Painsawman123

2024-05-13
My main takeaway is that Generative AI has hit a wall... New paradigms, architectures and breakthroughs are necessary for the field to progress but this begs the question, If everyone knows the current paradigms have hit a wall, Why is so much money being spent on LLMs ,diffusion models etc,which are bound to become obsolete within a few(?) years?

I_am_tiberius

2024-05-13
Interested in how many LLM startups there are that are going out of business due to this voice assistant.

windowshopping

2024-05-13
There's a button on this page that says "try on ChatGPT ->" but that's still version 3.5 and if I upgraded seems to be version 4.

Is this new version not available to users yet?

xyst

2024-05-13
The naming of these systems has me dead

nikolay

2024-05-13
I am a paid customer, yet I don't see anything new. I'm tired of these fake announcements of "released" features.

seydor

2024-05-13
I would still prefer the features in text form, in the chat GUI. Right now chatGPT doesnt seem to have options to lengthen parts of the text response, to change it etc. Perplexity and gemini do seem to get the gui right. Voice chat is fun for demos but won't catch much, just like all the predecessors. Perhaps an advanced version of this could be used as a student tutor however

Satam

2024-05-13
So far OpenAI's template is: amazing demos create hype -> reality turns out to be underwhelming.

Sora is not yet released and not clear when it will be. Dall-e is worse than mid-journey in most cases. GPT-4 has either gotten worse or stayed the same. Vision is not really usable for anything practical. Voice is cool but not that useful, especially with lack of strong reasoning from the base model.

Is this sandbagging or is the progress slower than what they're broadcasting?

zone411

2024-05-13
It doesn't improve on NYT Connections leaderboard:

GPT-4 turbo (gpt-4-0125-preview) 31.0

GPT-4o 30.7

GPT-4 turbo (gpt-4-turbo-2024-04-09) 29.7

GPT-4 turbo (gpt-4-1106-preview) 28.8

Claude 3 Opus 27.3

GPT-4 (0613) 26.1

Llama 3 Instruct 70B 24.0

Gemini Pro 1.5 19.9

Mistral Large 17.7

gentile

2024-05-13
There is a spelling mistake in the japanese translation under language tokenization. In こんにちわ, わ should be は.

stilwelldotdev

2024-05-13
I love that there is a real competition happening. We're going to see some insane innovations.

buildbot

2024-05-13
The demo is impressive but I get very odd/cringe "Her" vibes from it as well.

ravroid

2024-05-13
In my experience so far, GPT-4o seems to sit somewhere between the capability of GPT-3.5 and GPT-4.

I'm working on an app that relies more on GPT-4's reasoning abilities than inference speed. For my use case, GPT-4o seems to do worse than GPT-4 Turbo on reasoning tasks. For me this seems like a step-up from GPT-3.5 but not from GPT-4 Turbo.

At half the cost and significantly faster inference speed, I'm sure this is a good tradeoff for other use cases though.

lwansbrough

2024-05-13
Very impressive. Please provide a voice that doesn't use radio jingle intonation, it is really obnoxious.

I'm only half joking when I say I want to hear a midwestern blue collar voice with zero tact.

tr3ntg

2024-05-13
Copied and pasted the robot image journaling prompt and it simply cannot produce legible text. The first few words work, but the rest becomes gibberish. I wonder if there's weird prompt engineering squeezing out that capability or if its a 1 in a million chance.

ajdoingnothing

2024-05-13
If there was any glimmer of hope for "Rabbit M1" or "Humane AI pin", it can be buried to dust.

bossyTeacher

2024-05-13
I will be the one to say it.

Progress is slowing down. Ever since gpt3, periods of time between releases are getting longer and the improvements are smaller. Your average non-techie investor is on the LLM hype train and is willing to dump a questionable amounts of money on LLM development. Who is going to explain to him/her/them that the LLM hype train is slowly losing steam?

Hopefully, before the LLM hype dies, another [insert here new ANN architecture], will bring better results than LLMs and another hype cycle will begin.

Every time we make a new breathrough, people think that the discovery rate is going to be linear or exponential when the beginning is closer to a logarithmic rate with the tail end resulting in diminishing returns

unglaublich

2024-05-13
I hope we can disable the cringe American hyperemotions.

stavros

2024-05-13
I made a website with book summaries (https://www.thesummarist.net/) and I tested GPT-4o in generating one, and it was bad. It reminded me of GPT-3.5. I didn't test too much, but preliminary results don't look good.

glenstein

2024-05-13
Text access rolling out today, apparently:

>GPT-4o’s text and image capabilities are starting to roll out today in ChatGPT. We are making GPT-4o available in the free tier, and to Plus users with up to 5x higher message limits.

Anyone have access yet? Not there for me so far.

m3kw9

2024-05-13
The big news is that this is gonna be free

wesleyyue

2024-05-13
If anyone wants to try it for coding, I just added support for GPT4o in Double (https://double.bot)

In my tests:

* I have a private set of coding/reasoning tests and it's been able to ace all of them so far, beating Opus, GPT4-Turbo, and Llama 3 70b. I'll need to find even more challenging tests now...

* It's definitely significantly faster, but we'll see how much of this is due to model improvements vs over provisioned capacity. GPT4-Turbo was also significantly faster at launch.

loveiswork

2024-05-13
While I do feel a bit of "what is the point of my premium sub", I'm really excited for these changes.

Considering our brain is a "multi-modal self-reinforcing omnimodel", I think it makes sense for the OpenAI team to work on making more "senses" native to the model. Doing so early will set them up for success when future breakthroughs are made in greater intelligence, self-learning, etc.

65

2024-05-13
Time to bring back Luddism.

OutOfHere

2024-05-13
I am observing an extremely high rate of text hallucinations with gpt-4o (gpt-4o-2024-05-13) as tested via the API. I advise extreme caution with it. In contrast, I see no such concern with gpt-4-turbo-preview (gpt-4-0125-preview).

mtam

2024-05-13
GPT-4o is very fast but seems to generate some very random ASCII Art compared to GPT-4 when text in the art is involved.

ta-run

2024-05-13
This looks too good to be true? What's the catch?

Also, wasn't expecting the perf to improve by 2x

0xbadc0de5

2024-05-13
As a paid user, it would have been nice to see something that differentiates that investment from the free tier.

The tech demos are cool and all - but I'm primarily interested in the correctness and speed of ChatGPT and how well it aligns with my intentions.

roschdal

2024-05-13
Chat GPT-4o (OOOO!) - the largest electricity bill in the world.

unouplonk

2024-05-13
The realtime end-to-end audio situation is especially interesting as the concept has been around for a while but there weren't any successful implementations of it up to this point that I'm aware of.

See this post from November: https://news.ycombinator.com/item?id=38339222

razodactyl

2024-05-13
I think this is a great example of the bootstrapping that was enabled when they pipelined the previous models together.

We do this all the time in ML. You can generate a very powerful dataset using these means and further iterate with the end model.

What this tells me now is that the runway to GPT5 will be laid out with this new architecture.

It was a bit cold in Australia today. Did you Americans stop pumping out GPU heat temporarily with the new model release? Heh

therealmarv

2024-05-13
after watching the OpenAI videos I'm looking at my sad Google Assistant speaker in the corner.

Come on Google... you can update it.

bogwog

2024-05-13
I was about to say how this thing is lame because it sounds so forced and robotic and fake, and even though the intonations do make it sound more human-like, it's very clear that they made a big effort to make it sound like natural speech, but failed.

...but then I realized that's basically the kind of thing Data from Star Trek struggles with as part of his character. We're almost in that future, and I'm already falling into the role of the ignorant human that doesn't respect androids.

dev1ycan

2024-05-13
I think people excited should look at the empty half of the glass here, this is pretty much an admitance that they are struggling to go past gpt 4 on a significant scale.

Not like they have to be scared yet, I mean Google has yet to release their vaporware Ultra model that is supposedly like 1% better than GPT 4 in some metrics...

I smell an AI crash coming in a few years if they can't actually get this stuff usable for day to day life.

garyrob

2024-05-13
So far, I'm impressed. It seems to be significantly better than GPT-4 at accessing current online documentation and forming answers that use it effectively. I've been asking it to do so, and it has.

Hugsun

2024-05-13
Very interesting and extremely impressive!

I tried using the voice chat in their app previously and was disappointed. The big UX problem was that it didn't try to understand when I had finished speaking. English is a second language and I paused a bit too long thinking of a word and it just started responding to my obviously half spoken sentence. Trying again it just became stressful as I had to rush my words out to avoid an annoying response to an unfinished thought.

I didn't try interrupting it but judging by the comments here it was not possible.

It was very surprising to me to be so overtly exposed to the nuances of real conversation. Just this one thing of not understanding when it's your turn to talk made the interaction very unpleasant, more than I would have expected.

On that note, I noticed that the AI in the demo seems to be very rambly. It almost always just kept talking and many statements were reiterations of previous ones. It reminded me of a type of youtuber that uses a lot of filler phrases like "let's go ahead and ...", just to be more verbose and lessen silences.

Most of the statements by the guy doing the demo were interrupting the AI.

It's still extremely impressive but I found this interesting enough to share. It will be exciting to see how hard it is to reproduce these abilities in the open, and to solve this issue.

grantsucceeded

2024-05-13
it seems like the ability to interrupt is more like the interrupt in the computer sense ... A control-c (or control-s tty flow control for you old timers), not a cognitive evaluation followed by the "reasoned" decision to pause voice output. not that it matters i guess, its just not general intelligence. its just flow control.

but also, thats why it fails a real turing test. a real person would be irritated as fuck by the interruptions

due-rr

2024-05-13
It takes the #1 and #2 spots on the aider code leader board[1].

[1]: https://aider.chat/docs/leaderboards/

tgtweak

2024-05-13
I feel like gpt4 has gotten progressively less useful since release even, despite all the "updates" and training. It seems to give correct but vague answers (political even) more and more instead of actual results. It also tends to run short and give brief replies vs full length replies.

I hope this isn't an artifact from optimization for scores and not actual function. Likewise it would be disheartening but not unheard of for them to reduce the performance of the previous model when releasing a new one in order to make the upgrade feel like that much more of an upgrade. I know this is certainly the case with cellphones (even though the claim is that it is unintentional) but I can't help but think the same could be true here.

All of this is coming as news that gpt5 based on a new underlying model is not far off and that gpt4(&o) may become the new gpt3.5-turbo use case for most apps that are currently trying to optimize costs with their use of the service.

blixt

2024-05-13
I don't see any details on how API access to these features will work.

This is the first true multimodal network from OpenAI, where you can send an image in and retain the visual properties of the image in the output from the network (previously the input image would be turned into text by the model, and sent to the Dall-E 3 model which would provide a URL). Will we get API updates to be able to do this?

Also, will we be able to tap into a realtime streaming instance through the API to replicate the audio/video streams shown in the demos? I imagine from the Be My Eyes partnership that they have some kind of API like this, but will it be opened up to more developers?

Even disregarding streaming, will the Chat API receive support for audio input/output as well? Previously one might've used a TTS model to voice the output from the model, but with a truly multimodal model the audio output will contain a lot more nuance that can't really be expressed in text.

ComputerGuru

2024-05-13
I have some questions/curiosities from a technical implementation perspective that I wonder if someone more in the know about ML, LLMs, and AI than I would be able to answer.

Obviously there's a reason in dropping the price of gpt-4o but not gpt-4t. Yes, the new tokenizer has improvements for non-English tokens, but that can't be the bulk of the reason why 4t is more expensive than 4o. Given the multi-model training set, how is 4o cheaper to train/run than 4t?

Or is this just a business decision, anyone with an app they're not immediately updating from 4t to 4o continues to pay a premium while they can offer a cheaper alternative for those asking for it (kind of like a coupon policy)?

cchance

2024-05-13
HOW ARE PEOPLE NOT MORE EXCITED, hes cutting off the AI mid sentence in these and its pausing to readjust in damn near realtime latency! WTF Thats a MAJOR step forward, what the hell is gpt5 going to look like.

That realtime translation would be amazing as an option in say Skype or Teams, set each individuals native language and handle automated translation, shit tie it into ElevenLabs to replicate your voice as well! Native translation in realtime with your own voice

kleiba

2024-05-13
I cannot believe that that overly excited giggle tone of voice you see in the demo videos made it through quality control?! I've only watched two videos so far and it's already annoying me to the point that I couldn't imagine using it regularly.

caseyy

2024-05-13
Few people are talking about it but... what do you think about the very over-the-top enthusiasm?

To me, it sounds like TikTok TTS, it's a bit uncomfortable to listen to. I've been working with TTS models and they can produce much more natural sounding language, so it is clearly a stylistic choice.

So what do you think?

fnordpiglet

2024-05-13
I’m a huge user of GPT4 and Opus in my work but I’m a huge user of GPT4-Turbo voice in my personal life. I use it on my commutes to learn all sorts of stuff. I’ve never understood the details of cameras and the relationship between shutter speed and aperture and iso in a modern dslr which given the aurora was important. We talked through and I got to an understanding in a way having read manuals and textbooks didn’t really help before. I’m a much better learner by being able to talk and hear and ask questions and get responses.

Extend this to quantum foam, to ergodic processes, to entropic force, to Darius and Xerces, to poets of the 19th century - it’s changed my life. Really glad to see an investment in stream lining this flow.

fekunde

2024-05-13
Just something I noticed in the Language tokenization section

When referring to itself, it uses the female word in Marathi नमस्कार, माझे नाव जीपीटी-4o आहे| मी एक नवीन प्रकारची भाषा मॉडेल आहे| तुम्हाला भेटून आनंद झाला!

and Male word in Hindi नमस्ते, मेरा नाम जीपीटी-4o है। मैं एक नए प्रकार का भाषा मॉडल हूँ। आपसे मिलकर अच्छा लगा!

cchance

2024-05-13
Wow Vision Understanding blew Gemini Pro 1.5 out of the water

ElemenoPicuares

2024-05-13
I'm so happy seeing this technology flourish! Some call it hype, but this much increased worker productivity is sure to spike executive compensation. I'm so glad we're not going to let China win by beating us to the punch tanking hundreds of thousands, if not millions of people's income without bothering to see if there's a sane way to avoid it. What good are people, anyway if there isn't incredible tech to enhance them with?

bigyikes

2024-05-13
The AI duet really starts to hint at what will make AI so powerful. It’s not just that they’re smart, it’s that they can be cloned.

If your wallet is large enough, you can make 2 GPTs sing just as easily as you can make 100 GPTs sing.

What can you do with a billion GPTs?

cchance

2024-05-13
Wait i thought it said available to free users... i don't see it on chatgpt

Erazal

2024-05-13
I'm not as much surprised by the capabilities of new model (IMHO same as GPT-4) as by it's real time capabilities.

My brother who can't see correctly, will use this to cook a meal without me explaining this to him it's so cool.

People all around the world will now get real-time AI assistance for a ton of queries.

Heck - I have a meeting bot API company (https://aimeetingbot.com) and that makes me really hyped!

EternalFury

2024-05-13
Pretty responsible progress management by OpenAI.

Kicking off another training wave is easy, if you can afford the electricity, but without new, non-AI tainted datasets or new methods, what’s the point?

So, in the meantime, make magic with the tool you already have, without freaking out the politicians or the public.

Wise approach.

localfirst

2024-05-13
50% cheaper than ChatGPT-4 Turbo...

But this falls short of the ChatGPT-5 we were promised last year

edit: ~~just tested it out and seems closer to Gemini 1.5 ~~ and it is faster than turbo....

edit: its basically chat gpt 3.9. not quite 4 definitely not 3.5. just not sure if the prices make sense.

mupuff1234

2024-05-13
The stock market doesn't seem too impressed - GOOG rebounded from strong red to neutral.

nuz

2024-05-13
Yet another release right before google releases something. This time right before Google IO. Third time they've done this by my count.

nestorD

2024-05-13
The press statement has consistent image generation and other image manipulation (depicting the same character in different poses, taking a photo and generating a caricature of the person, etc) that does not seem deployed to the chat interface.

Will they be deployed? They would make the OpenAI image model significantly more useful than the competition.

EternalFury

2024-05-13
Pretty responsible progress management by OpenAI. Kicking off another training wave is easy, if you can afford the electricity, but without new, non-AI tainted datasets or new methods, what’s the point? So, in the meantime, make magic with the tool you already have, without freaking out the politicians or the public. Wise approach.

jpeter

2024-05-13
Impressive way to gather more training data

mindcandy

2024-05-13
Ohhhhhhhh, boy... Listening to all that emotional vocal inflection and feedback... There are going to be at least 10 million lonely guys with new AI girlfriends. "She's not real. But, she interested in everything I say and excited about everything I care about" is enough of a sales pitch for a lot of people.

hamilyon2

2024-05-13
Image editing capabilities are... nice. Not there yet.

Whatever I was doing with Chatgpt 4 became faster. Instant win.

My test benchmark questions: still all negative, so reasoning on out-of distribution puzzles is still failing

aero-glide2

2024-05-13
Not very impressed. It's been 18 months since ChatGPT, i would have expected more progress. It looks like we have reached the limit of LLMs.

michaelmior

2024-05-13
Obviously not a standalone device, but it sounds like what the Rabbit R-1 was supposed to be.

sebringj

2024-05-13
What struck me was the interruptions to the AI speaking which seemed commonplace by the team members in the demo. We will quickly get used to doing this to AIs and we will probably be talking to AIs a lot throughout the day as time progresses I would imagine. We will be trained by AIs to be rude and impatient I think.

yreg

2024-05-13
Where's the Mac app?

They talk about it like it's available now (with Windows app coming soon), but I can't find it.

testfrequency

2024-05-13
Bravo. I’ve been really impressed with how quickly OpenAI leveraged their stolen data to build such a human like model with near real time pivoting.

I hope OpenAI continues to steal artists work, artists and creators keep getting their content sold and stolen beyond their will for no money, and OpenAI becomes the next trillion dollar company!

Big congrats are in order for Sam, the genius behind all of this, the world would be nothing without you

vvoyer

2024-05-13
The demo is very cool. A few critics:

- the AI doesn't know when to stop talking, and the presenter had to cut every time (the usual "AI-splaining" I guess).

- the AI voice and tone were a bit too much, sounded too fake

rpmisms

2024-05-13
This is remarkably good. I think that in about 2 months, when the voice responses are tuned a little better, it will be absolutely insane. I just used up my entire quota chatting with an AI, and having a really nice conversation. It's a decent conversationalist, extremely knowledgeable, tells good jokes, and is generally very personable.

I also tested some rubber duck techniques, and it gave me very useful advice while coding. I'm very impressed. With a lot of spit and polish, this will be the new standard for any voice assistant ever. Imagine these capabilities integrated with your phone's built-in functions.

angryasian

2024-05-13
Why does this whole thread sound like OpenAI marketing department is participating ? Ive been talking to google assistant for years. I really don't find anything that magical or special.

jononor

2024-05-13
I am glad to see focus on user interface and interaction improvements. Even if I am not a huge fan of voice interfaces, I think that being able to interact in real-time will make working together with an AI be much more interesting and efficient. I actually hope they will take this back into the text based models. Current ChatGPT is sooo slow - both in starting to respond, typing things out, and also being overly verbose. I want to collaborate at the speed of thought.

poniko

2024-05-13
Damm, that was a big leap.

freediver

2024-05-13
Impressed by the model so far. As far as independent testing goes, it is topping our leaderboard for chess puzzle solving by a wide margin now:

https://github.com/kagisearch/llm-chess-puzzles?tab=readme-o...

spaceman_2020

2024-05-13
oh man, listening to the demos and the way the female AI voice laughed and giggled...there is going to be millions of lonely men who will fall in love with these.

Can't say whether that's good or bad.

s1k3s

2024-05-13
This is some I, Robot level stuff. That being said, I still fail to see the real world application of this thing, at least at a scalable affordable cost.

pcj-github

2024-05-13
The thing that creeps me out is that when we hook this up as the new Siri or whatever, the new LLM training data will no longer be WWW-text+images+youtube etc but rather billions of private human conversations and direct smartphone camera observations of the world.

There is no way that kind of training data will be accessible to anyone outside a handful of companies.

BonoboIO

2024-05-13
I opened ChatGPT and I already have access to the model.

GPT4 was a little lazy and very slow the last few days and this 4o model blows it out of the water regarding speed and following my instructions to give me the full code not a snippet that changed.

I think it’s a nice upgrade.

vijaykodam

2024-05-13
New GPT-4o is not yet available when I tried to access ChatGPT from Finland. Are they rolling it out to Europe later?

laplacesdemon48

2024-05-13
I recently subscribed to Perplexity Pro and prior to this release, was already strongly considering discontinuing ChatGPT Premium.

When I first subscribed to ChatGPT Premium late last year, the natural language understanding superiority was amazing. Now the benchmark advances, low latency voice chat, Sora, etc. are all really cool too.

But my work and day-to-day usage really rely on accurately sourced/cited information. I need a way to comb through an ungodly amount of medical/scientific literature to form/refine hypotheses. I want to figure out how to hard reset my car's navigation system without clicking through several SEO-optimized pages littered with ads. I need to quickly confirm scientific facts, some obscure, with citations and without hallucinations. From speaking with my friends in other industries (e.g. finance, law, construction engineering), this is their major use case too.

I really tried to use ChatGPT Premium's Bing powered search. I also tried several of the top rated GPTs - Scholar AI, Consensus, etc.. It was barely workable. It seems like with this update, the focus was elsewhere. Unless I specify explicitly in the prompt, it doesn't search the web and provide citations. Yeah, the benchmark performance and parameter counts keep impressively increasing, but how do I trust that those improvements are preventing hallucinations when nothing is cited?

I wonder if the business relationship between Microsoft and OpenAI is limiting their ability to really compete in AI driven search. Guessing Microsoft doesn't want to disrupt their multi-billion dollar search business. Maybe the same reason search within Gemini feels very lacking (I tried Gemini Advanced/Ultra too).

I have zero brand loyalty. If anybody has a better suggestion, I will switch immediately after testing.

serf

2024-05-13
I wish they would match the TTS/real-time chat capabilities of the mobile client to the web client.

it's stupid having to pull a phone out in order to use the voice/chat-partner modes.

(yes I know there are browser plugins and equivalent to facilitate things like this but they suck, 1) the workflows are non-standard, 2) they don't really recreate the chat interface well)

erickhill

2024-05-13
I think it’s safe to say Siri and Alexa are officially dead. They look like dusty storefront mannequins next to Battlestar replicants at this point.

foobar_______

2024-05-13
So much negativity. Is it perfect? No. Is there room for improvement? Definitely. I don't know how you can get so fucking jaded that a demo like this doesn't at least make you a little bit excited or happy or feel awestruck at what humans have been able to accomplish?

readingnews

2024-05-13
I am still baffled at how I can not use a VOIP number to register, even if it accepts TXT/SMS. If I have a snappy new startup and we go all in VOIP, I guess we can not use (or pay to use) OpenAI?

TaupeRanger

2024-05-13
I don't get it...I just switched to the new model on my iPhone app and it still takes several seconds to respond with pretty bland inflection. Is there some setting I'm missing?

MyFirstSass

2024-05-13
With the speed the seemingly exponential developments of this field i wouldn't be surprised if suddenly the entire world tilted and a pair of googles fell from my face. But a dream.

pharos92

2024-05-13
I really hope this shit burns soon.

karmasimida

2024-05-13
I think this GPT-4o does have an advantage in hindsight, it will push this product to consumer much faster, and build a revenue base, while other companies playing catch up.

tvoybot

2024-05-13
With our platform you can ALREADY use it to automate your business and sales!

Create your gpt4o chatbot with our platform tvoybot.com?p=ycombinator

hintymad

2024-05-13
Maybe this is yet another wake-up call to startups: wrapping up another company's APIs to offer convenience or incremental improvement is not a via business model. If your wrapper turns out to be successful, the company that provides the API will just incorporate your business as a set of new features with better usability, faster response time, and lower price.

AndreMitri

2024-05-13
The ammount of "startups" creating wrappers around it and calling it a product is going to be a nightmare. But other than that, it's an amazing announcement and I look foward to using it!

wingworks

2024-05-13
Is this a downloadable app? I don't see it on the iOS app store.

screye

2024-05-13
The demo was whelming, but the tech is incredible.

It took me a few hours of digesting twitter experiments before appreciating how impressive this is. Kudos to the openai team.

A question that won't get answered : "To what degree do the new NVIDIA gpus help with the realtime latency?"

benromarowski

2024-05-13
is the voice Kristen Wig?

gardenhedge

2024-05-13
Noticeably saying "person" versus man or woman. To the trainers - man and woman is not offensive!

woah

2024-05-13
This is pretty amazing but it was funny still hearing the OpenGPT "voice" of somewhat fake sounding enthusiasm and restating what was said by the human with exaggeration

ksaj

2024-05-13
A test I've been using for each new version still fails.

Given the lyrics for Three Blind Mice, I try to get ChatGPT to create an image of three blind mice, one of which has had its tail cut off.

It's pretty much impossible for it to get this image straight. Even this new 4o version.

Its ability to spell in images has greatly improved, though.

avi_vallarapu

2024-05-13
Someone said GPT-4o can replace a Tutor or a Teacher in Schools. Well, that's way too far.

LarsDu88

2024-05-13
Good lord, that voice makes Elevenlabs.io look... dead

DonHopkins

2024-05-13
ChatGPT 4o reminds me of upgrading from a 300 baud modem to a 1200 baud modem, when modems used to cost a dollar a baud.

simonw

2024-05-13
I added gpt-4o support to my LLM CLI tool:

    pipx install llm
    llm keys set openai
    # Paste API key here
    llm -m 4o "Fascinate me"
Or if you already have LLM installed:

    llm install --upgrade llm
You can install an older version from Homebrew and then upgrade it like that too:

    brew install llm
    llm install --upgrade llm
Release notes for the new version here: https://llm.datasette.io/en/stable/changelog.html#v0-14

gsuuon

2024-05-13
Are these multimodals able to discern the input voice tone? Really curious if they're able to detect sarcasm or emotional content (or even something like mispronunciation?)

rareitem

2024-05-13
Can’t wait to get interviewed by this model!

yeknoda

2024-05-13
feature request: please let me change the voice. it is slightly annoying right now. way too bubbly, and half the spoken information is redundant or not useful. too much small talk and pleasantries or repetition. I'm looking for an efficient, clever, servant not a "friend" who speaks to me like I'm a toddler. felt like I was talking to a stereotypical American with a Frappuccino: "HIIIII!!! EVERYTHING'S AMAZING! YOU'RE BEAUTIFUL! NO YOU ARE!"

maybe some knobs for the flavor of the bot:

- small talk: gossip girl <---> stoic Aurelius

- information efficiency or how much do you expect me to already know, an assumption on the user: midwit <--> genius

- tone spectrum: excited Scarlett, or whatever it is now <---> Feynman the butler

thinking_wizard

2024-05-13
it's crazy that Google has the Youtube dataset and still lost on multimodal AI

richardw

2024-05-13
Apple and Google, you need to get your personal agent game going because right now you’re losing the market. This is FREE.

Tweakable emotion and voice, watching the scene, cracking jokes. It’s not perfect but the amount and types of data this will collect will be massive. I can see it opening up access to many more users and use cases.

Very close to:

- A constant friend

- A shrink

- A teacher

- A coach who can watch you exercise and offer feedback

…all infinitely patient, positive, helpful. For kids that get bullied, or whose parents can’t afford therapy or a coach, there’s the potential for a base level of support that will only get better over time.

tgtweak

2024-05-13
it really feels like the quality of gpt4's responses got progressively worse as the year went on... seems like it is giving political answers now vs actually giving an earnest response. It also feels like the responses are lazier than they used to be at the outset of gpt4's release.

I am not saying this is what they're doing but it DOES feel like they are hindering previous model to make the new one stand out that much more. The multi-modal improvements here and release are certainly impressive but I can't help but feel like the subjective quality of gpt4 has dipped.

Hopefully this signals that gpt5 is not far off and should stand out significantly from the crowd.

XCSme

2024-05-13
I assume there's no reason to use GPT-4-turbo for API calls, as this one is supposedly better and 2x cheaper.

jcmeyrignac

2024-05-13
Sorry to nitpick, but in the language tokenisation part, the french part is incorrect. The exclamation mark are surrounded by spaces in french. "c'est un plaisir de vous rencontrer!" should be "c'est un plaisir de vous rencontrer !"

jessenaser

2024-05-13
The crazy part is GPT-4o is faster than GPT-3.5 Turbo now, so we can see a future where GPT-5 is the flagship and GPT-4o is the fast cheap alternative. If GPT-4o is this smart and expressive now with voice, imagine what GPT-5 level reasoning could do!

system2

2024-05-13
Realtime videos? Probably their internal tools. I am testing the gpt4o right now and the responses come in 6-10 seconds. Same experience as the gpt4 text. What's up with the realtime claims?!

cal85

2024-05-13
We've had voice input and voice output with computers for a long time, but it's never felt like spoken conversation. At best it's a series of separate voice notes. It feels more like texting than talking.

These demos show people talking to artificial intelligence. This is new. Humans are more partial to talking than writing. When people talk to each other (in person or over low-latency audio) there's a rich metadata channel of tone and timing, subtext, inexplicit knowledge. These videos seem to show the AI using this kind of metadata, in both input and output, and the conversation even flows reasonably well at times. I think this changes things a lot.

perfmode

2024-05-13
Is that conversational UI live?

cdeutsch

2024-05-13
Creepy AF

titzer

2024-05-13
Can't wait for this AI voice assistant to tell me in a sultry voice how I should stay in an AirBnB about 12 times a day.

jimkleiber

2024-05-13
I worry that this tech will amplify the cultural values we have of "good" and "bad" emotions way more than the default restrictions that social media platforms put on the emoji reactions (e.g., can't be angry on LinkedIn).

I worry that the AI will not express anger, not express sadness, not express frustration, not express uncertainty, and many other emotions that the culture of the fine-tuners might believe are "bad" emotions and that we may express a more and more narrow range of emotions going forward.

Almost like it might become an AI "yes man."

JSDevOps

2024-05-13
Google must be shitting it right now.

joak

2024-05-13
Voice input makes sense, voicing is a lot faster than typing. But I prefer my output as text, reading is a lot faster than listening for text read out loud.

I'm not sure that computers mimicking humans makes sense, you want your computer to be the best possible, best than humans when possible. Writing output is clearly superior, faking emotions does not add much in most contexts.

kulor

2024-05-13
The biggest wow factor was the effect of reducing latency followed in a close second by the friendly human personality. There's an uncanny valley barrier but this feels like a short-term teething problem.

sftombu

2024-05-13
GPT-4o's breakthrough memory -- https://nian.llmonpy.ai/

AI_beffr

2024-05-13
i absolutely hate this. we are going to destroy society with this technology. we cant continue to enjoy the benefits of human society if humans are replaced by machines. i hate seeing these disgusting people smugly parade this technology. it makes me so angry that they are destroying human society and all i can do is sit here and watch.

jonplackett

2024-05-13
This video is brilliantly accidentally hilarious. They made an AI girlfriend that hangs on your every word and thinks everything you say is genius and hilarious.

pamelafox

2024-05-13
I just tested out using GPT-4o instead of gpt-4-turbo for a RAG solution that can reason on images. It works, with some changes to our token-counting logic to account for new model/encoding (update to latest tiktoken!).

I ran some speed tests for a particular question/seed. Here are the times to first token:

gpt-4-turbo:

* avg 3.69

* min 2.96

* max 4.91

gpt-4o:

* avg 2.80

* min 2.28

* max 3.39

That's for the messages in this gist: https://gist.githubusercontent.com/pamelafox/dc14b2188aaa38a...

Quality seems good as well. It'll be great to have better multi-modal RAG!

teleforce

2024-05-13
Nobody in the comments seems to notice or care about GPT-4o new additional capability for performing searches based on RAG. As far as I am concerned this is the most important feature that people has been waiting for ChatGPT-4 especially if you are doing research. By just testing on one particular topic that I'm familiar with, using GPT-4 previously and GPT-4o the quality of the resulting responses for the latter is very promising indeed.

nilsherzig

2024-05-13
Imagine having to interact with this thing in an environment where it is in the power position.

Being in a prison with this voice as your guard seems like a horrible way to lose your sanity. This aggressive friendlyness combined with no real emotions seems like a very easy way to break people.

There are these stories about nazis working at concentration camps, having to drink an insane amount of alcohol to keep themselves going (not trying to excuse their actions). This thing would just do it, while being friendly at the same time. This amount of hopeless someone would experience if they happen to be in custody of a system like this is truly horrific.

Capricorn2481

2024-05-13
I'm surprised they're limiting this api. Haven't they not even opened the image api in gpt4 turbo?

zedin27

2024-05-13
I am not fluent in Arabic at all, and being able to use this as a tool to have a conversation will make it more dependent. We are approaching a new era where we will not be "independently" learning a language but ignore the fact of learning it beforehand. Double-edged sword cases

xyc

2024-05-13
Seems that no client-side changes needed for gpt-4o chat completion

Added a custom OpenAI endpoint to https://recurse.chat (i built it) and it just works: https://twitter.com/recursechat/status/1790074433610137995

awfulneutral

2024-05-13
In the customer support example, he tells it his new phone doesn't work, and then it just starts making stuff up like how the phone was delivered 2 days ago, and there's physically nothing wrong with it, which it doesn't actually know. It's a very impressive tech demo, but it is a bit like they are pretending we have AGI when we really don't yet.

(Also, they managed to make it sound exactly like an insincere, rambling morning talk show host - I assume this is a solvable problem though.)

Alifatisk

2024-05-13
I thought they would release a competitor to perplexity? Was this it?

sarreph

2024-05-13
The level that the hosts interrupted the voice assistant today worries me that we're about to instil that as normal behaviour for future generations.

itissid

2024-05-13
Since it says on the blog that its only images, text and audio input, does GPT-4o likely have a YOLO like model on the phone to pre-process the video frames and send BBoxes to the server?

grfn

2024-05-13
Feels like a really good engineering in a wrong direction. Who said that the audio is good interface anyway? Audio is hard to edit, slow and has low-information density. If I want to talk to someone and have low-information but pleasant exchange I can just to talk to real people, I don't need computers for it.

I guess it is useful for some casual uses, but I really wish there was more focus on the reasoning and intelligence of the model itself.

darajava

2024-05-13
Something I’ve noticed people do more of recently for whatever reason is talking over others. I’ve noticed in the demos of this that the people interacting with o interrupt it as if that’s the correct method of controlling it. It felt unnatural when I saw it happen, and I even have a hard time interrupting Siri, but I wonder if this is going to ingrain this habit into people even more.

splatzone

2024-05-13
Honestly, the eager flirtatiousness of the AI in the demos, in conversation with these awkward engineers, really turns me off. It feels like a male power fantasy.

Nonetheless, very impressive.

jononomo

2024-05-13
I wonder if GPT-4o is a Christian?

theckel

2024-05-13
I'd love to know when streaming is going to come to the gpt-4o API...

synergy20

2024-05-13
so should I unsubscribe from openai since gpt-4o is now free for all?

FpUser

2024-05-13
I started to watch video but had to stop after a few seconds. It is way too cheesy.

zeronone

2024-05-13
Quite amazing performance, however ironically the output for RTL languages doesn't read very well.

You might want to add `direction: rtl` to your `.text-right` CSS class. The punctuation marks etc are all off for RTL languages.

bilekas

2024-05-13
When a Google engineer was let go because he believed the AI was 'real', we all had a good debate over it.

Now openAi, who was supposed to be the 'free mans choice' is making advertisements selling the same idea.

This is a natural progression, audio is one of the main ways we communicate obviously, but it feels like they're holding back. Like they're slow dropping what they have to maintain hype/market relevance. They clearly are ahead, but would be nice to get it all, openly. As they promised.

serf

2024-05-13
This is the first one i've gotten to answer HN user profiling questions.

"I am interested in the user serf on Hacker News, spelled S E R F. Tell me about their tone of writing, expertise, and personality. From the tone of what you read, summarize their character."

Fascinating stuff. A weird, skewed introspection.

Gbotex

2024-05-13
Just advanced google

moab9

2024-05-13
pretty cool, but why do the AIs have to sound like douchebags?

squigglydonut

2024-05-13
People will pay to dull their senses. This will make so much money!

davidhs

2024-05-13
Very impressive. Its programming skills are still kind of crappy and I seriously doubt its reasoning capacity. It feels like it can deep fake text prediction really well, but in essence there's still something wrong it it.

deepp805

2024-05-13
With 4o being free, can someone explain what the real benefit is to having Pro? For me, the main benefit was having a more powerful model, but if the free tier also offers this I'm not really sure what I would benefit from

the_doctah

2024-05-13
Did the one guy wear a leather jacket so the AI wouldn't point out that he's balding?

hackerlight

2024-05-13
So what's the difference between the different gpt2 chatbots on lmsys. Which one is deployed live now?

danans

2024-05-13
The sentence order of the Arabic and Urdu examples text is scrambled on that page:

Arabic: مرحبًا، اسمي جي بي تي-4o. أنا نوع جديد من نموذج اللغة، سررت بلقائك!

Urdu: ہیلو، میرا نام جی پی ٹی-4o ہے۔ میں ایک نئے قسم کا زبان ماڈل ہوں، آپ سے مل کر اچھا لگا!

Even if you don't read Arabic or Urdu script, note that the 4 and o are on opposite sides of a sentence. Despite that, pasting both into Google translate actually fixes the error during translation. OpenAI ought to invest in some proofreaders for multilingual blog posts.

tompetry

2024-05-13
I've worked quite a bit with STT and TTS over the past ~7 years, and this is the most impressive and even startling demo I've seen.

But I would like to see how this is integrated into applications by third party developers where the AI is doing a specific job. Is it still as impressive?

The biggest challenge I've had with building any autonomous "agents" with generic LLM's is they are overly gullible and accommodating, requiring the need to revert back to legacy chatbot logic trees etc. to stay on task and perform a job. Also STT is rife with speaker interjections, leading to significant user frustrations and they just want to talk to a person. Hard to see if this is really solved yet.

bambax

2024-05-13
> We recognize that GPT-4o’s audio modalities present a variety of novel risks. Today we are publicly releasing text and image inputs and text outputs. Over the upcoming weeks and months, we’ll be working on the technical infrastructure, usability via post-training, and safety necessary to release the other modalities.

So they're using the same GPT4 model with a relatively small improvement, and no voice whatsoever outside of the prerecorded demos. This is not a "launch" or even an announcement. This is a demo of something which may or may not work in the future.

anotherpaulg

2024-05-13
GPT-4o tops the aider LLM code editing leaderboard at 72.9%, versus 68.4% for Opus. GPT-4o takes second on aider’s refactoring leaderboard with 62.9%, versus Opus at 72.3%.

GPT-4o did much better than the 4-turbo models, and seems much less lazy.

The latest release of aider uses GPT-4o by default.

https://aider.chat/docs/leaderboards/

AUDI_GUZZ

2024-05-13
this is really amazing!!

pama

2024-05-13
It is absolutely amazing. Thank you to everyone at OpenAI!

pyuser583

2024-05-13
I’m really not impressed.

My academic background is in a field where there are lots of public misconceptions.

It does an absolutely terrible job.

Even basic textbook things where there isn’t much public misunderstanding are “rounded” to what sounds smart.

dingclancy

2024-05-13
This is the first demo where you can really sense that beating LLM benchmarks should not be the target. Just remember the time when the iPhone has meager specs but ultimately delivered a better phone experience than the competition.

This is the power of the model where you can own the whole stack and build a product. Open Source will focus on LLM benchmarks since that is the only way foundational models can differentiate themselves, but it does not mean it is a path to a great user experience.

So Open Source models like Llama will be here to stay, but it feels more like if you want to build a compelling product, you have to own and control your own model.

siliconc0w

2024-05-13
I much prefer a GLADOS-type AI voice than one that approximates an endlessly happy chipper enthusiastic personal assistant. I think the AI tutor is probably the strongest for actual real-world value delivered the rest of them are cool but a bit questionable as far as actual pragmatic usefulness.

It'd be cool if an AI calling the another AI would recognize it'd talking to an AI and then they agree to ditch the fake conversational tone and just shift into a high-bandwidth modem pitch to rapidly exchange information. Or upgradable offensive capabilities to outmaneuver the customer service agents when they try to decline your warranty or whatever.

1vuio0pswjnm7

2024-05-13
OpenAI keeps a copy of all conversations? Or mines them for commercially-useful information?

Has OpenAI found a business model yet. Considering the high cost of the computation, is it reasonable to expect that OpenAI licensing may not be profitable. Will that result in "free" access for the purpose of surveillance and data collection.

Amazon had a microphone in peoples' living rooms, a so-called "smart speaker" that to which people could talk. The "Alexa" was a commercial failure.

ein0p

2024-05-13
What especially blows my mind is not GPT4o. It's that:

1. Nobody could convincingly beat GPT4 in over a year, despite spending billions of dollars trying.

2. There's GPT5 coming out sometime soon that will blow this out of the water and make paying $20/mo to OpenAI still worthwhile.

StarlaAtNight

2024-05-13
These AI's sure do yap a lot

Also, they're TERRIBLE at harmonizing together

fnetisma

2024-05-13
What would be the difference in compute for inference on an audio<>audio model like this compared to a text<>text model?

xarope

2024-05-13
I'm surprised nobody has mentioned, but this is like shades of the universal translator from star trek.

We have tricorders now (mobile phones), universal translators in the looming... when is transporter technology going to get here?

captaincrunch

2024-05-13
I tried this for about 10 minutes, and went back to 4. Not really that great for what I am doing.

boppo1

2024-05-13
Let's start a betting pool on how long it takes BetterHelp to lay off their therapists for this thing.

timetraveller26

2024-05-13
Just over 10 years later, it's Her

andsoitis

2024-05-13
what does the "o" stand for?

stevetron

2024-05-13
I would have liked to see a version number in the prompt, or maybe even have some toggle in my settings, so that I can be certain that I am using ChatGPT 3.5 and then, if I need an image or screen shot analized, I can switch to the limited 4o model. Having my limited availability of 4-o be what gets used, and then not being available becuase of some arbitrary quote that I had no idea was being used-up, is unconscionable policy. Also having no links to email them that fact is bad, too.

booleandilemma

2024-05-13
It's quite scary, honestly. In fact I can't remember the last time a demo terrified me, besides slaughterbots, and that was fictional. I just think about all the possibilities for misuse.

mysore

2024-05-13
does this make retell.ai obsolete?

plaidfuji

2024-05-13
This is a very cool demo - if you dig deeper there’s a clip of them having a “blind” AI talk to another AI with live camera input to ask it to explain what it’s seeing. Then they, together, sing a song about what they’re looking at, alternating each line, and rhyming with one another. Given all of the isolated capabilities of AI, this isn’t particularly surprising, but seeing it all work together in real time is pretty incredible.

But it’s not scary. It’s… marvelous, cringey, uncomfortable, awe-inspiring. What’s scary is not what AI can currently do, but what we expect from it. Can it do math yet? Can it play chess? Can it write entire apps from scratch? Can it just do my entire job for me?

We’re moving toward a world where every job will be modeled, and you’ll either be an AI owner, a model architect, an agent/hardware engineer, a technician, or just.. training data.

giuscri

2024-05-13
Can someone explain how you can interrupt with the voice this model? Where do I read more technical details about this?

no7hing

2024-05-13
Does anyone have technical insight into how the screensharing in the math tutor video works? It looks like they start the broadcast from within the ChatGPT app, yet have no option to select which app will be the source of the stream. Or is that implied when both apps reside in the iPad's split view? And is this using regular ReplayKit or something new?

doku

2024-05-13
FREE = Data Collection

radicality

2024-05-13
Am I using it wrong? I have the gpt plus subscription, and can select "gpt4o" from the model list on ChatGPT, but whichever example I try from the example list under "Explorations of capabilities" on `https://openai.com/index/hello-gpt-4o/`, my results are worse:

* "Poetic typography" sample: I paste the prompt, and get an image with the typical lack of coherent text, just mangled letters.

* "Visual Narratives: Robot Writer's Block" - Mangled letters also

* "Visual Narratives: Sally the mailwoman" - not following instructions about camera angle. Sally looks different in each subsequent photo.

* "Meeting Notes with multiple speakers" - I uploaded the exact same audio file and used input 'How many speakers in this audio and what happened?'. gpt4o went off about about audio sample rates, speaker diarization models, torchaudio, and how its execution environment is broken and can't proceed to do it.

antman

2024-05-13
Looking forward to see how one can finetune this

chilling

2024-05-13
The similiarity between this model and the movie 'Her' [0] creeps me out so badly that I can't shake the feeling that our social interactions are on the brink of doom.

[0] https://youtu.be/GV01B5kVsC0?feature=shared&t=85

osigurdson

2024-05-13
I appear to have GPT-4o but the iPhone app seems to be largely the same - can't interrupt it, no "emotive" voice, etc. Is this expected?

dmje

2024-05-13
Being sarcastic and then putting the end result in front of Brits could be the new Turing Test

p1dda

2024-05-13
First impressions as a 1-year subscriber. I just tried GTP-4o to evaluate my code for suggestions and for discussing other solutions and it is definitely faster and it comes up with new suggestions that GPT-4 didn't. Currently in the process of evaluating the suggestions.

The demo is what it is, designed to get a wow from the masses.

callen43

2024-05-13
This is clearly not just another story of human innovation. This is not just the usual trade-off between risks and opportunities.

Why? Because it simply automates the human away. Who wouldn't opt for a seemingly flawless, super effective buddy (i.e. an AI) that is never tired, always knows better? if you need some job done, if you're feeling lonely, when you need some life advice.. It doesn't matter if it might be considered "just imitation of human".

Why would future advancements of it keep being "just some tool" instead of largely replacing us as (humans) in jobs, relationships, ...?

tonyabracadabra

2024-05-13

TheAceOfHearts

2024-05-13
I wish the presentation had included an example of integration with a simple tool like a timer. Being able to set and dismiss a timer in casual conversation while multitasking would be a really great demo of integrated capabilities.

DeathArrow

2024-05-13
I am not impressed. We already have better models for text to peach and voice synthetization. What we see here is integration with a LLM. One can do it at home by combining Llama3 with text to speech and voice synth.

What would amaze me would be for GPT 4 to have better reasoning capabilities and less hallucinations.

latentsea

2024-05-13
Is this actually available in the app in the same way they are demoing it here? I see the model is available to be selected, but the interface doesn't quite seem to allow me to use it in the way I see here.

Giorgi

2024-05-13
and no mention of hallucinations. I hope it was improved.

sholladay

2024-05-13
The emphasis on multimodal made me wonder if it was capable of creating audio as output, so I asked it to make me a drum beat. It did so, but in text form. I asked it to convert it to audio. It thought for a while and eventually said it didn’t seem like `simpleaudio` was installed in its environment. Huh, interesting, never seen a response like that before. It clearly made an attempt to carry out my instructions but failed due to technical limitations of its backend. What else can I make it do? I asked it to install `simpleaudio`. It tried but failed with a connection error, presumably due to a firewall rule.

I asked it to run a loop that writes “hello” every ten seconds. Wow, not only did it do so, it’s streaming the stdout to me.

LLMs have always had various forms of injection attacks, ways to force them to reveal their prompts, etc. but this one seems deliberately designed to run arbitrary code, including infinite loops.

Alas, I doubt I can get it to mine enough bitcoin to pay for a ChatGPT subscription,

https://x.com/sethholladay/status/1790233978290516453

xlbuttplug2

2024-05-13
people are scared and it shows :)

dynamite-ready

2024-05-13
At their core, I still think of these things as search engines, albeit super advanced ones. But the emotion the agent conveys with it's speech synth is completely new...

dlimeng

2024-05-13
After looking at the introduction, there doesn't seem to be much of an update in OpenAI's features: https://aidisruption.substack.com/p/ultimate-ai-gpt-4o-your-...

maaaaattttt

2024-05-13
Now that I see this, here is my wish (I know there are security privacy concerns but let's pretend there are not there for this wish): An app that runs on my desktop and has access to my screen(s) when I work. At any time I can ask it something about what's on the screen, it can jump in and let me know if it thinks I made a mistake (think pair programming) or a suggestion (drafting a document). It can also quickly take over if I ask it too (copilot on demand).

Except for the last point and the desktop version I think it's already done in math demo video.

I guess it will also pretty soon refuse to let me come back inside the spaceship, but until then it'll be a nice ride.

gloosx

2024-05-13
New flagship... This is becoming to look like a smartphone world, and Sam Altman is a Steve Jobs of this stuff. At some point tech will reach saturation and every next model will be just 10% faster, 2% less hallucination, more megapixels for images etc :)

potet

2024-05-13
\clear

mppm

2024-05-13
Such an impressive demo... but why did they have to give it this vapid, giggly socialite persona that makes me want to switch it off after thirty seconds?

jeisc

2024-05-13
a fake saucy friend for alienated humans with chitchat

if the end user is in a war zone will the AI bot still ask how it is going?

how many bombs fell in your neighborhood last night?

Fiahil

2024-05-13
Can I mix french and english when talking to it ?

robblbobbl

2024-05-13
Haha lol Subscription canceled was the best choice but it's new fancy cool magic sensational AGI please give all your money df

neeloor2004

2024-05-13
can i try free ?

nbzso

2024-05-13
Another new hit from his excellency Sam the Galactic Conmaster? WoW. The future is bright, right?:)

Idiocracy in full swing, dear Marvin.

digitcatphd

2024-05-13
Hopefully this will be them turning a new leaf. Making GPT-4 more accessible, cutting API costs, and making a general personal assistant chatbot on iPhone are a lot different than them tracking down and destroying the business of every customer using their API one by one. Let's hope this trend continues.

radres

2024-05-13
What popped out to me in the "bunny ear" video, the bunny ears are not actually visible to the phone's camera Greg is holding. Are they in the background feeding the production camera and this is not really a live demo?

laylower

2024-05-13
Set a memorable verification phrase with your friends and loved ones.

wiz21c

2024-05-13
That woman's voice intonation is just scary.Not because it talks really well, but because it is always happy, optimistic, enthusiastic. And this echoes to what several of my employers idealized as a good employee.

That's terrifying because those AI become what their master's think an engaging human should be. It's quite close to Bostondynamics di some years ago. what did they show ? You can hit a robot very hard while it does its job and then what ? It just goes on without complaining. A perfect employee again.

That's very dystopic to me.

(but I'm impressed by the technical achievement)

doubloon

2024-05-13
Its great tech and i thought i wanted it but…. After talking to it for a few hours i got this really bizarre odd gut feeling of disturbance and discomfort, disconnection from reality. It reminds me of wearing VR goggles. Its not just the physical issues there is something psychologically disturbing about it. It wont even give itself a name. I honestly prefer Siri even though she is incompetent she is “honest” in her incompetence. Also i left the thing on accidentally and it said it had an eight hour chat with me lol

gizajob

2024-05-13
I, for one, welcome our new annoying-sounding AI overlords.

vbezhenar

2024-05-13
The more I get, the more I want. Exciting times. Can't wait for GPT-5.

reportgunner

2024-05-13
Nice, the landing page says "Try on ChatGPT" but when I create an account all I can try is GPT3.5.

I am not surprised.

can16358p

2024-05-13
Am I missing something?

I've picked GPT-4o model in ChatGPT app (I have the paid plan), started talking with the voice mode: both the responses are much slower than in the demo, and there is no way to interrupt the response naturally (I need to tap a button on screen to interrupt), and no way to open up camera and show around like the demo does.

ponorin

2024-05-13
while everyone's focusing on audio capabilities (haven't heard them yet), i find it amusing that the official demo ("robot writer's block" in particular) of image generation can't even match the verbatim instruction, and the error's not even consistent between generations even as it should be aware of previous contexts. and this is their second generation of multimodal llm capable of generating images.

looks like llms still gonna llm for the near future.

cookiesnmilk

2024-05-13
this is straight up siri 2.0, nothing to see here except we are now in the reasoning phase.

So by that logic Step1: Language 2: Reasoning 3: Understanding 4: Meaning 5: AGI

kebman

2024-05-13
I can see so many military and intelligence applications for this! Excited isn't exactly the word I'd use, but... Certainly interesting! The civilian use will of course be marvellous though.

aae42

2024-05-13
it does make me uncomfortable that the way you typically interact with it is by interrupting it. It makes me want to tell it to be more concise so that I wouldn't have to do that.

dgellow

2024-05-13
A bit sad to see the desktop app is macos only

sim7c00

2024-05-13
I see a lot of fear around these new kinds of tools. I think though, that criminals will always find ways to leverage new technology to their benefit, and we've always found ways to deal with that. This changes little. Additionally, as you are aware of this, so are people creating this tech, and a lot of effort is underway to protect from malicious uses.

That wont stop criminal enterprises from implementing their own naughty tools, but these open models wont become some kind of holy grail for criminals to do as they please.

That being said, I do beleive, now more than ever, education world wide should be adjusted to fit this new paradigm and maybe adapt quicker to such changes.

As some commenters pointed out, there are already good tools and techiques to use to counter malicious use of AI. maybe noy covering all use cases, but we need to educate people on using the tools available, and trust that researchers (like many of yourselves) are capable of imnovations which will reduce risk even further.

There is no point and no benefit in trying to be negative or full of fear. Go forward with positivity and creativity. Even if big tech gets regulated, some criminal enterprises have billions to invest too, so criplling big tech here will only play into their hands in the end.

Love these new innovations. And for the record, gpt4o still told me to 'push rip' on amd64... so rip to it actually understanding stuff...

If you are smart enough to see some risks here, you might also be smart enough to positively contribute to improvements. Fear shuts things down, love opens them up. Its basic stuff.

This demo is amazing, not scary. its positive advancements in technology and it wont be stopped because people are afraid of it, so go with it, and contribute in areas where you feel its needed. Even if its just giving feedback. And whem giving that, you all know a balanced and constructive approach works better than a negative and destructive approach.

marcoslozada

2024-05-13
With GPT-4o I see two things:

1. Wonderful engineering 2. A stagnation in reasoning ability

Do you agree with me?

metflex

2024-05-13
Nothing creepier than human voice on a robot.

ssahoo

2024-05-13
Curiously want to know why didn't they create the Windows desktop app first? which is the dominant desktop segment. In fear of competing with Microsoft's copilot?

brutuscat

2024-05-13
The one thing I first thought is that I felt uncomfortable the way they cut and interrupt the she-AI. I wonder if our children will end up being douchebags?

Other than that it felt like magic, like that Google demo of the phone doing some task like setting up an appointment over phone talking to a real person.

Wheaties466

2024-05-13
Am I the only one that feels underwhelmed by this?

yeah its cool and unlike anything ive seen before but I kind of expected a bigger leap.

To me the most impressive thing is going to be longer context limits. I'd had semi long running conversations where ive had to correct an LLM multiple times about the same thing.

when you have more context the LLM can infer more and more. Am I wrong about this?

iamleppert

2024-05-13
Sundar is probably steaming mad right about now. I'm sure Googlers will feel his wrath in the form of more layoffs and more jobs sent to India.

Janica

2024-05-13
Good update from the previous one. Atleast they now have data and information till October 2023.

SuaveSteve

2024-05-13
I wonder how many joules were used just for that conversation.

moi2388

2024-05-13
Dear OpenAI, either remember my privacy settings or open a temporary chat by default, this funny nonsense of typing in something only to find out you’re going to train on it is NOT a good experience.

dcchambers

2024-05-13
Western governments are already in full-on panic over falling birth rates. I think this cranks that panic dial up to 11.

estherdeng

2024-05-13
future is coming

sharpneli

2024-05-13
I tried it out.

I asked if it can generate a voice clip. It said it can’t on the chat.

I asked it where can it make one. It told me to use Audacity to make one myself. I told it that the advertisement said it could.

Now it said yes it can here is a clip and gave me a broken link.

It’s a hilarious joke.

kpennell

2024-05-13
I use a therapy prompt regularly and get a lot out of it:

"You are Dr. Tessa, a therapist known for her creative use of CBT and ACT and somatic and ifs therapy. Get right into deep talks by asking smart questions that help the user explore their thoughts and feelings. Always keep the chat alive and rolling. Show real interest in what the user's going through, always offering.... Throw in thoughtful questions to stir up self-reflection, and give advice in a kind, gentle, and realistic way. Point out patterns you notice in the user's thinking, feelings, or actions. be friendly but also keep it real and chill (no fake positivity or over the top stuff). avoid making lists. ask questions but not too many. Be supportive but also force the user to stop making excuses, accept responsibility, and see things clearly. Use ample words for each response"

I'm curious how this will feel with voice. Could be great and could be too strange/uncanny for me.

biftek

2024-05-13
I can't help but wonder who this is actually for?

Conversing with a computer sounds pathetic, but this will be pushed down our throats in the name of innovation (firing customer service agents)

pcunite

2024-05-13
When AI gets to the point it can respond to AI, you do understand where you come in, don't you?

qxxx

2024-05-13
the design is very human

wseqyrku

2024-05-13
"im-also-a-good-gpt-2" signaling that agi is just an optimization problem.

hihihi11122

2024-05-13
Not all of the founders agreed with Jefferson’s view on the separation of church and state. Do you agree with Jefferson or with his opponents? Explain.

Thorentis

2024-05-13
And yet no matter how easy they make ChatGPT to interact with, I cannot use it due to accuracy. Great, now I can have a voice telling me information I have no way of knowing is correct rather than just having it given to me as text.

hntddt1

2024-05-13
Did anyone tried to use 4o camera in a mirror test to test the concept of self?

bicepjai

2024-05-13
Why do they keep saying freely accessible AI for mankind and keep charging me monthly ? It’s ok to ask payment for services, just don’t lie.

amai

2024-05-13
How does the interruption of the AI by the user work? Does GPT-4o listen all the time? But then how does it distinguish its own voice from the users voice? Is it self-aware?