Edit: may be not, name of that bot was just "gpt2-chatbot". May be that one was some initial iteration?
[1] https://twitter.com/LiamFedus/status/1790064963966370209/pho...
Overall I am highly skeptical of newer models as they risk worsening the completion quality to make them cheaper for OpenAI to run.
It has an increased vocab size of 200k.
Being able to interrupt while GPT is talking
2x faster/cheaper
not really a much smarter model
Desktop app that can see screenshots
Can display emotions with and change the sound of "it's" voice
Now imagine this in a $16k humanoid robot, also announced this morning: https://www.youtube.com/watch?v=GzX1qOIO1bE The future is going to be wild.
Other than that, looks good. Desktop app is great, but I didn’t see no mention of being able to use your own API key so OS projects might still be needed.
The biggest thing is bringing GPT-4 to free users, that is an interesting move. Depending on what the limits are, I might cancel my subscription.
That seems to represent an entirely new depth of understanding of human reality.
It feels like a pretty strong illustration of the awkwardness of getting value from recent AI developments. Like, this is technically super impressive, but also I'm not sure it gives us anything we couldn't have one year ago with GPT-4 and ElevenLabs.
Because I have the plus membership which is expensive (25$/month).
But if the limit is high enough (or my usage low enough), there is no point for paying that much money for me.
> For example, at launch, audio outputs will be limited to a selection of preset voices and will abide by our existing safety policies.
I wonder if they’ll ever allow truly custom voices from audio samples.
Given the competitive pressures I was expecting a much bigger price drop than that.
For non-multimodal uses, I don't think their API is at all competitive any more.
I hate video players without volume control.
It's interesting that OpenAI is highlighting the Elo score instead of showing results for many many benchmarks that all models are stuck at 50-70% success.
[1] https://twitter.com/LiamFedus/status/1790064963966370209
Edit: according to @gdb this is coming in "weeks"
It's still astonishing to consider what this demonstrates!
With ChatGPT having a very simple text+attachment in, text out interface, I felt absolutely in control of what I tell it. Now when it's grabbing my screen or a live camera feed, that will be gone. And I'll still use it, because it's just so damn convenient?
Universal translator, pair programmer, completely human sounding voice assistant and all in real time. Scifi tropes made real.
But: Interesting next to see how it actually performs IRL latency and without cherry-picking. No snark, it was great but need to see real world power. Also what the benefits are to subscribers if all this is going to be free...
Edit: I'm asking because of the obvious data security implications of having your desktop app read from the clipboard _in the live demo_... That would definitely put a damper to my fanboyish enthusiasm about that desktop app.
On the other hand, this also feels like a signal that reasoning capability has probably already been plateaued at GPT-4 level and OpenAI knew it so they decided to focus on research that matters to delivering product engineering rather than long-term research to unlock further general (super)intelligence.
The technology product is so good and so advanced it doesn't matter how the people appear.
Zuck tried this in his video countering to vision pro, but it did not have the authentic "not really rehearsed or produced" feel of this at all. If you watch that video and compare it with this you can see the difference.
Very interesting times.
Ultimately, the promise of LLM proponents is that these models will get exponentially smarter - this hasn’t born out yet. So from that perspective, this was a disappointing release.
If anything, this feels like a rushed release to match what Google will be demoing tomorrow.
In general, trying to push that this is a human being is probably "unsafe", but that hurts the marketing.
I'm confused
Anyone who thinks this will be like the previous work revolutions is nonsense. This replaces humans and will replace them even more on each new advance. What's their plan? Live out of their savings? What about family/friends? I honestly can't see this and think how they can be so happy about it...
"Hey, we created something very powerful that will do your work for free! And it does it better than you and faster than you! Who are you? It doesn't matter, it applies to all of you!"
And considering I was thinking in having a kid next year, well, this is a no.
1- OpenAI is still working in GPT-4-level models. More than 14 months after the launch of GPT-4 and after more than $10B in capital raised. 2- The rhythm that token prices are collapsing is bizarre. Now a (bit) better model for 50% of the price. How people seriously expect these foundational model companies to make substantial revenue? Token volume needs to double just for revenue to stand still. Since GPT-4 launch, token prices are falling 84% per year!! Good for mankind, but crazy for these companies. 3- Maybe I am an asshole, but where are my agents? I mean, good for the consumer use case. Let's hope the rumors that Apple is deploying ChatGPT with Siri are true, these features will help a lot. But I wanted agents! 4- These drop in costs are good for the environment! No reason to expect them to stop here.
One does notice that context size is noticeably absent from the announcement ...
Ignore the critics. Watch the demos. Play with it.
This stuff feels magical. Magical. It makes the movie "Her" look like it's no longer in the realm of science fiction but in the realm of incremental product development. HAL's unemotional monotone in Kubrick's movie, "Space Odyssey," feels... oddly primitive by comparison. I'm impressed at how well this works.
Well-deserved congratulations to everyone at OpenAI!
I'm also extremely worried that this is a harbinger of the enshittification of ChatGPT. Processing video and audio for all ~200 million users is going to be extravagantly expensive, so my only conclusion is that OpenAI is funding this by doubling down on payola-style corporate partnerships that will result in ChatGPT slyly trying to mention certain brands or products in our conversations [1].
I use ChatGPT every day. I love it. But after watching the video I can't help but think "why should I keep paying money for this?"
[1] https://www.adweek.com/media/openai-preferred-publisher-prog...
Still, it sounds like some PR drone selling a product. Oh wait....
For example, in the second video the guy explains how he will have it talk to another "AI" to get information. Instead of just responding with "Okay, I understand" it started talking about how interesting the idea sounded. And as the demo went on, both "AIs" kept adding unnecessary commentary about the secenes.
I would hate having to talk with these things on a regular basis.
First impressions are that it feels very fast.
Mine just continues to show “GPT 4” as the model - it’s not clear if that’s now 4o or there is an app update coming…
https://www.brusselstimes.com/world-all-news/1042696/chatgpt...
And much the same for internationalization.
It’s like some kind of uncanny valley of human interaction that I don’t get on nearly the same level with the text version.
• Sam Altman, the CEO of OpenAI, emphasizes two key points from their recent announcement. Firstly, he highlights their commitment to providing free access to powerful AI tools, such as ChatGPT, without advertisements or restrictions. This aligns with their initial vision of creating AI for the benefit of the world, allowing others to build amazing things using their technology. While OpenAI plans to explore commercial opportunities, they aim to continue offering outstanding AI services to billions of people at no cost.
• Secondly, Altman introduces the new voice and video mode of GPT-4, describing it as the best compute interface he has ever experienced. He expresses surprise at the reality of this technology, which provides human-level response times and expressiveness. This advancement marks a significant change from the original ChatGPT and feels fast, smart, fun, natural, and helpful. Altman envisions a future where computers can do much more than before, with the integration of personalization, access to user information, and the ability to take actions on behalf of users.
Currently with GPT-4o, it's easily clearing 60% - while blazing fast, and half the cost. Amazing.
I've already played with the vision API, so that doesn't seem all that new. But I agree it is impressive.
That said, watching back a Windows Vista speech recognition demo[1] I'm starting to wonder if this stuff won't have the same fate in a few years.
The tonal talking was impressive, but man that part was like, is someone being tortured or forced against their will?
I wonder what "fewer tokens" really means then, without context on raising the size of each token? It's a bit like saying my JPEG image is now using 2x fewer words after I switched from a 32-bit to a 64-bit architecture no?
his love for yud is showing.
Is the audio in API not available yet?
So no word on an audio api for regular joes? that's the number 1 thing i'm looking for
A nice feature would be to be able to select a Meyer's Briggs personality type for your AI chatbot.
Is this new version not available to users yet?
Sora is not yet released and not clear when it will be. Dall-e is worse than mid-journey in most cases. GPT-4 has either gotten worse or stayed the same. Vision is not really usable for anything practical. Voice is cool but not that useful, especially with lack of strong reasoning from the base model.
Is this sandbagging or is the progress slower than what they're broadcasting?
GPT-4 turbo (gpt-4-0125-preview) 31.0
GPT-4o 30.7
GPT-4 turbo (gpt-4-turbo-2024-04-09) 29.7
GPT-4 turbo (gpt-4-1106-preview) 28.8
Claude 3 Opus 27.3
GPT-4 (0613) 26.1
Llama 3 Instruct 70B 24.0
Gemini Pro 1.5 19.9
Mistral Large 17.7
I'm working on an app that relies more on GPT-4's reasoning abilities than inference speed. For my use case, GPT-4o seems to do worse than GPT-4 Turbo on reasoning tasks. For me this seems like a step-up from GPT-3.5 but not from GPT-4 Turbo.
At half the cost and significantly faster inference speed, I'm sure this is a good tradeoff for other use cases though.
I'm only half joking when I say I want to hear a midwestern blue collar voice with zero tact.
Progress is slowing down. Ever since gpt3, periods of time between releases are getting longer and the improvements are smaller. Your average non-techie investor is on the LLM hype train and is willing to dump a questionable amounts of money on LLM development. Who is going to explain to him/her/them that the LLM hype train is slowly losing steam?
Hopefully, before the LLM hype dies, another [insert here new ANN architecture], will bring better results than LLMs and another hype cycle will begin.
Every time we make a new breathrough, people think that the discovery rate is going to be linear or exponential when the beginning is closer to a logarithmic rate with the tail end resulting in diminishing returns
>GPT-4o’s text and image capabilities are starting to roll out today in ChatGPT. We are making GPT-4o available in the free tier, and to Plus users with up to 5x higher message limits.
Anyone have access yet? Not there for me so far.
In my tests:
* I have a private set of coding/reasoning tests and it's been able to ace all of them so far, beating Opus, GPT4-Turbo, and Llama 3 70b. I'll need to find even more challenging tests now...
* It's definitely significantly faster, but we'll see how much of this is due to model improvements vs over provisioned capacity. GPT4-Turbo was also significantly faster at launch.
Considering our brain is a "multi-modal self-reinforcing omnimodel", I think it makes sense for the OpenAI team to work on making more "senses" native to the model. Doing so early will set them up for success when future breakthroughs are made in greater intelligence, self-learning, etc.
Also, wasn't expecting the perf to improve by 2x
The tech demos are cool and all - but I'm primarily interested in the correctness and speed of ChatGPT and how well it aligns with my intentions.
See this post from November: https://news.ycombinator.com/item?id=38339222
We do this all the time in ML. You can generate a very powerful dataset using these means and further iterate with the end model.
What this tells me now is that the runway to GPT5 will be laid out with this new architecture.
It was a bit cold in Australia today. Did you Americans stop pumping out GPU heat temporarily with the new model release? Heh
Come on Google... you can update it.
...but then I realized that's basically the kind of thing Data from Star Trek struggles with as part of his character. We're almost in that future, and I'm already falling into the role of the ignorant human that doesn't respect androids.
Not like they have to be scared yet, I mean Google has yet to release their vaporware Ultra model that is supposedly like 1% better than GPT 4 in some metrics...
I smell an AI crash coming in a few years if they can't actually get this stuff usable for day to day life.
I tried using the voice chat in their app previously and was disappointed. The big UX problem was that it didn't try to understand when I had finished speaking. English is a second language and I paused a bit too long thinking of a word and it just started responding to my obviously half spoken sentence. Trying again it just became stressful as I had to rush my words out to avoid an annoying response to an unfinished thought.
I didn't try interrupting it but judging by the comments here it was not possible.
It was very surprising to me to be so overtly exposed to the nuances of real conversation. Just this one thing of not understanding when it's your turn to talk made the interaction very unpleasant, more than I would have expected.
On that note, I noticed that the AI in the demo seems to be very rambly. It almost always just kept talking and many statements were reiterations of previous ones. It reminded me of a type of youtuber that uses a lot of filler phrases like "let's go ahead and ...", just to be more verbose and lessen silences.
Most of the statements by the guy doing the demo were interrupting the AI.
It's still extremely impressive but I found this interesting enough to share. It will be exciting to see how hard it is to reproduce these abilities in the open, and to solve this issue.
but also, thats why it fails a real turing test. a real person would be irritated as fuck by the interruptions
I hope this isn't an artifact from optimization for scores and not actual function. Likewise it would be disheartening but not unheard of for them to reduce the performance of the previous model when releasing a new one in order to make the upgrade feel like that much more of an upgrade. I know this is certainly the case with cellphones (even though the claim is that it is unintentional) but I can't help but think the same could be true here.
All of this is coming as news that gpt5 based on a new underlying model is not far off and that gpt4(&o) may become the new gpt3.5-turbo use case for most apps that are currently trying to optimize costs with their use of the service.
This is the first true multimodal network from OpenAI, where you can send an image in and retain the visual properties of the image in the output from the network (previously the input image would be turned into text by the model, and sent to the Dall-E 3 model which would provide a URL). Will we get API updates to be able to do this?
Also, will we be able to tap into a realtime streaming instance through the API to replicate the audio/video streams shown in the demos? I imagine from the Be My Eyes partnership that they have some kind of API like this, but will it be opened up to more developers?
Even disregarding streaming, will the Chat API receive support for audio input/output as well? Previously one might've used a TTS model to voice the output from the model, but with a truly multimodal model the audio output will contain a lot more nuance that can't really be expressed in text.
Obviously there's a reason in dropping the price of gpt-4o but not gpt-4t. Yes, the new tokenizer has improvements for non-English tokens, but that can't be the bulk of the reason why 4t is more expensive than 4o. Given the multi-model training set, how is 4o cheaper to train/run than 4t?
Or is this just a business decision, anyone with an app they're not immediately updating from 4t to 4o continues to pay a premium while they can offer a cheaper alternative for those asking for it (kind of like a coupon policy)?
That realtime translation would be amazing as an option in say Skype or Teams, set each individuals native language and handle automated translation, shit tie it into ElevenLabs to replicate your voice as well! Native translation in realtime with your own voice
To me, it sounds like TikTok TTS, it's a bit uncomfortable to listen to. I've been working with TTS models and they can produce much more natural sounding language, so it is clearly a stylistic choice.
So what do you think?
Extend this to quantum foam, to ergodic processes, to entropic force, to Darius and Xerces, to poets of the 19th century - it’s changed my life. Really glad to see an investment in stream lining this flow.
When referring to itself, it uses the female word in Marathi नमस्कार, माझे नाव जीपीटी-4o आहे| मी एक नवीन प्रकारची भाषा मॉडेल आहे| तुम्हाला भेटून आनंद झाला!
and Male word in Hindi नमस्ते, मेरा नाम जीपीटी-4o है। मैं एक नए प्रकार का भाषा मॉडल हूँ। आपसे मिलकर अच्छा लगा!
If your wallet is large enough, you can make 2 GPTs sing just as easily as you can make 100 GPTs sing.
What can you do with a billion GPTs?
My brother who can't see correctly, will use this to cook a meal without me explaining this to him it's so cool.
People all around the world will now get real-time AI assistance for a ton of queries.
Heck - I have a meeting bot API company (https://aimeetingbot.com) and that makes me really hyped!
Kicking off another training wave is easy, if you can afford the electricity, but without new, non-AI tainted datasets or new methods, what’s the point?
So, in the meantime, make magic with the tool you already have, without freaking out the politicians or the public.
Wise approach.
But this falls short of the ChatGPT-5 we were promised last year
edit: ~~just tested it out and seems closer to Gemini 1.5 ~~ and it is faster than turbo....
edit: its basically chat gpt 3.9. not quite 4 definitely not 3.5. just not sure if the prices make sense.
Will they be deployed? They would make the OpenAI image model significantly more useful than the competition.
Whatever I was doing with Chatgpt 4 became faster. Instant win.
My test benchmark questions: still all negative, so reasoning on out-of distribution puzzles is still failing
They talk about it like it's available now (with Windows app coming soon), but I can't find it.
I hope OpenAI continues to steal artists work, artists and creators keep getting their content sold and stolen beyond their will for no money, and OpenAI becomes the next trillion dollar company!
Big congrats are in order for Sam, the genius behind all of this, the world would be nothing without you
- the AI doesn't know when to stop talking, and the presenter had to cut every time (the usual "AI-splaining" I guess).
- the AI voice and tone were a bit too much, sounded too fake
I also tested some rubber duck techniques, and it gave me very useful advice while coding. I'm very impressed. With a lot of spit and polish, this will be the new standard for any voice assistant ever. Imagine these capabilities integrated with your phone's built-in functions.
https://github.com/kagisearch/llm-chess-puzzles?tab=readme-o...
Can't say whether that's good or bad.
There is no way that kind of training data will be accessible to anyone outside a handful of companies.
GPT4 was a little lazy and very slow the last few days and this 4o model blows it out of the water regarding speed and following my instructions to give me the full code not a snippet that changed.
I think it’s a nice upgrade.
When I first subscribed to ChatGPT Premium late last year, the natural language understanding superiority was amazing. Now the benchmark advances, low latency voice chat, Sora, etc. are all really cool too.
But my work and day-to-day usage really rely on accurately sourced/cited information. I need a way to comb through an ungodly amount of medical/scientific literature to form/refine hypotheses. I want to figure out how to hard reset my car's navigation system without clicking through several SEO-optimized pages littered with ads. I need to quickly confirm scientific facts, some obscure, with citations and without hallucinations. From speaking with my friends in other industries (e.g. finance, law, construction engineering), this is their major use case too.
I really tried to use ChatGPT Premium's Bing powered search. I also tried several of the top rated GPTs - Scholar AI, Consensus, etc.. It was barely workable. It seems like with this update, the focus was elsewhere. Unless I specify explicitly in the prompt, it doesn't search the web and provide citations. Yeah, the benchmark performance and parameter counts keep impressively increasing, but how do I trust that those improvements are preventing hallucinations when nothing is cited?
I wonder if the business relationship between Microsoft and OpenAI is limiting their ability to really compete in AI driven search. Guessing Microsoft doesn't want to disrupt their multi-billion dollar search business. Maybe the same reason search within Gemini feels very lacking (I tried Gemini Advanced/Ultra too).
I have zero brand loyalty. If anybody has a better suggestion, I will switch immediately after testing.
it's stupid having to pull a phone out in order to use the voice/chat-partner modes.
(yes I know there are browser plugins and equivalent to facilitate things like this but they suck, 1) the workflows are non-standard, 2) they don't really recreate the chat interface well)
Create your gpt4o chatbot with our platform tvoybot.com?p=ycombinator
It took me a few hours of digesting twitter experiments before appreciating how impressive this is. Kudos to the openai team.
A question that won't get answered : "To what degree do the new NVIDIA gpus help with the realtime latency?"
Given the lyrics for Three Blind Mice, I try to get ChatGPT to create an image of three blind mice, one of which has had its tail cut off.
It's pretty much impossible for it to get this image straight. Even this new 4o version.
Its ability to spell in images has greatly improved, though.
pipx install llm
llm keys set openai
# Paste API key here
llm -m 4o "Fascinate me"
Or if you already have LLM installed: llm install --upgrade llm
You can install an older version from Homebrew and then upgrade it like that too: brew install llm
llm install --upgrade llm
Release notes for the new version here: https://llm.datasette.io/en/stable/changelog.html#v0-14
maybe some knobs for the flavor of the bot:
- small talk: gossip girl <---> stoic Aurelius
- information efficiency or how much do you expect me to already know, an assumption on the user: midwit <--> genius
- tone spectrum: excited Scarlett, or whatever it is now <---> Feynman the butler
Tweakable emotion and voice, watching the scene, cracking jokes. It’s not perfect but the amount and types of data this will collect will be massive. I can see it opening up access to many more users and use cases.
Very close to:
- A constant friend
- A shrink
- A teacher
- A coach who can watch you exercise and offer feedback
…all infinitely patient, positive, helpful. For kids that get bullied, or whose parents can’t afford therapy or a coach, there’s the potential for a base level of support that will only get better over time.
I am not saying this is what they're doing but it DOES feel like they are hindering previous model to make the new one stand out that much more. The multi-modal improvements here and release are certainly impressive but I can't help but feel like the subjective quality of gpt4 has dipped.
Hopefully this signals that gpt5 is not far off and should stand out significantly from the crowd.
These demos show people talking to artificial intelligence. This is new. Humans are more partial to talking than writing. When people talk to each other (in person or over low-latency audio) there's a rich metadata channel of tone and timing, subtext, inexplicit knowledge. These videos seem to show the AI using this kind of metadata, in both input and output, and the conversation even flows reasonably well at times. I think this changes things a lot.
I worry that the AI will not express anger, not express sadness, not express frustration, not express uncertainty, and many other emotions that the culture of the fine-tuners might believe are "bad" emotions and that we may express a more and more narrow range of emotions going forward.
Almost like it might become an AI "yes man."
I'm not sure that computers mimicking humans makes sense, you want your computer to be the best possible, best than humans when possible. Writing output is clearly superior, faking emotions does not add much in most contexts.
I ran some speed tests for a particular question/seed. Here are the times to first token:
gpt-4-turbo:
* avg 3.69
* min 2.96
* max 4.91
gpt-4o:
* avg 2.80
* min 2.28
* max 3.39
That's for the messages in this gist: https://gist.githubusercontent.com/pamelafox/dc14b2188aaa38a...
Quality seems good as well. It'll be great to have better multi-modal RAG!
Being in a prison with this voice as your guard seems like a horrible way to lose your sanity. This aggressive friendlyness combined with no real emotions seems like a very easy way to break people.
There are these stories about nazis working at concentration camps, having to drink an insane amount of alcohol to keep themselves going (not trying to excuse their actions). This thing would just do it, while being friendly at the same time. This amount of hopeless someone would experience if they happen to be in custody of a system like this is truly horrific.
Added a custom OpenAI endpoint to https://recurse.chat (i built it) and it just works: https://twitter.com/recursechat/status/1790074433610137995
(Also, they managed to make it sound exactly like an insincere, rambling morning talk show host - I assume this is a solvable problem though.)
I guess it is useful for some casual uses, but I really wish there was more focus on the reasoning and intelligence of the model itself.
Nonetheless, very impressive.
You might want to add `direction: rtl` to your `.text-right` CSS class. The punctuation marks etc are all off for RTL languages.
Now openAi, who was supposed to be the 'free mans choice' is making advertisements selling the same idea.
This is a natural progression, audio is one of the main ways we communicate obviously, but it feels like they're holding back. Like they're slow dropping what they have to maintain hype/market relevance. They clearly are ahead, but would be nice to get it all, openly. As they promised.
"I am interested in the user serf on Hacker News, spelled S E R F. Tell me about their tone of writing, expertise, and personality. From the tone of what you read, summarize their character."
Fascinating stuff. A weird, skewed introspection.
Arabic: مرحبًا، اسمي جي بي تي-4o. أنا نوع جديد من نموذج اللغة، سررت بلقائك!
Urdu: ہیلو، میرا نام جی پی ٹی-4o ہے۔ میں ایک نئے قسم کا زبان ماڈل ہوں، آپ سے مل کر اچھا لگا!
Even if you don't read Arabic or Urdu script, note that the 4 and o are on opposite sides of a sentence. Despite that, pasting both into Google translate actually fixes the error during translation. OpenAI ought to invest in some proofreaders for multilingual blog posts.
But I would like to see how this is integrated into applications by third party developers where the AI is doing a specific job. Is it still as impressive?
The biggest challenge I've had with building any autonomous "agents" with generic LLM's is they are overly gullible and accommodating, requiring the need to revert back to legacy chatbot logic trees etc. to stay on task and perform a job. Also STT is rife with speaker interjections, leading to significant user frustrations and they just want to talk to a person. Hard to see if this is really solved yet.
So they're using the same GPT4 model with a relatively small improvement, and no voice whatsoever outside of the prerecorded demos. This is not a "launch" or even an announcement. This is a demo of something which may or may not work in the future.
GPT-4o did much better than the 4-turbo models, and seems much less lazy.
The latest release of aider uses GPT-4o by default.
My academic background is in a field where there are lots of public misconceptions.
It does an absolutely terrible job.
Even basic textbook things where there isn’t much public misunderstanding are “rounded” to what sounds smart.
This is the power of the model where you can own the whole stack and build a product. Open Source will focus on LLM benchmarks since that is the only way foundational models can differentiate themselves, but it does not mean it is a path to a great user experience.
So Open Source models like Llama will be here to stay, but it feels more like if you want to build a compelling product, you have to own and control your own model.
It'd be cool if an AI calling the another AI would recognize it'd talking to an AI and then they agree to ditch the fake conversational tone and just shift into a high-bandwidth modem pitch to rapidly exchange information. Or upgradable offensive capabilities to outmaneuver the customer service agents when they try to decline your warranty or whatever.
Has OpenAI found a business model yet. Considering the high cost of the computation, is it reasonable to expect that OpenAI licensing may not be profitable. Will that result in "free" access for the purpose of surveillance and data collection.
Amazon had a microphone in peoples' living rooms, a so-called "smart speaker" that to which people could talk. The "Alexa" was a commercial failure.
1. Nobody could convincingly beat GPT4 in over a year, despite spending billions of dollars trying.
2. There's GPT5 coming out sometime soon that will blow this out of the water and make paying $20/mo to OpenAI still worthwhile.
Also, they're TERRIBLE at harmonizing together
We have tricorders now (mobile phones), universal translators in the looming... when is transporter technology going to get here?
But it’s not scary. It’s… marvelous, cringey, uncomfortable, awe-inspiring. What’s scary is not what AI can currently do, but what we expect from it. Can it do math yet? Can it play chess? Can it write entire apps from scratch? Can it just do my entire job for me?
We’re moving toward a world where every job will be modeled, and you’ll either be an AI owner, a model architect, an agent/hardware engineer, a technician, or just.. training data.
* "Poetic typography" sample: I paste the prompt, and get an image with the typical lack of coherent text, just mangled letters.
* "Visual Narratives: Robot Writer's Block" - Mangled letters also
* "Visual Narratives: Sally the mailwoman" - not following instructions about camera angle. Sally looks different in each subsequent photo.
* "Meeting Notes with multiple speakers" - I uploaded the exact same audio file and used input 'How many speakers in this audio and what happened?'. gpt4o went off about about audio sample rates, speaker diarization models, torchaudio, and how its execution environment is broken and can't proceed to do it.
The demo is what it is, designed to get a wow from the masses.
Why? Because it simply automates the human away. Who wouldn't opt for a seemingly flawless, super effective buddy (i.e. an AI) that is never tired, always knows better? if you need some job done, if you're feeling lonely, when you need some life advice.. It doesn't matter if it might be considered "just imitation of human".
Why would future advancements of it keep being "just some tool" instead of largely replacing us as (humans) in jobs, relationships, ...?
What would amaze me would be for GPT 4 to have better reasoning capabilities and less hallucinations.
I asked it to run a loop that writes “hello” every ten seconds. Wow, not only did it do so, it’s streaming the stdout to me.
LLMs have always had various forms of injection attacks, ways to force them to reveal their prompts, etc. but this one seems deliberately designed to run arbitrary code, including infinite loops.
Alas, I doubt I can get it to mine enough bitcoin to pay for a ChatGPT subscription,
Except for the last point and the desktop version I think it's already done in math demo video.
I guess it will also pretty soon refuse to let me come back inside the spaceship, but until then it'll be a nice ride.
if the end user is in a war zone will the AI bot still ask how it is going?
how many bombs fell in your neighborhood last night?
Idiocracy in full swing, dear Marvin.
That's terrifying because those AI become what their master's think an engaging human should be. It's quite close to Bostondynamics di some years ago. what did they show ? You can hit a robot very hard while it does its job and then what ? It just goes on without complaining. A perfect employee again.
That's very dystopic to me.
(but I'm impressed by the technical achievement)
I am not surprised.
I've picked GPT-4o model in ChatGPT app (I have the paid plan), started talking with the voice mode: both the responses are much slower than in the demo, and there is no way to interrupt the response naturally (I need to tap a button on screen to interrupt), and no way to open up camera and show around like the demo does.
looks like llms still gonna llm for the near future.
So by that logic Step1: Language 2: Reasoning 3: Understanding 4: Meaning 5: AGI
That wont stop criminal enterprises from implementing their own naughty tools, but these open models wont become some kind of holy grail for criminals to do as they please.
That being said, I do beleive, now more than ever, education world wide should be adjusted to fit this new paradigm and maybe adapt quicker to such changes.
As some commenters pointed out, there are already good tools and techiques to use to counter malicious use of AI. maybe noy covering all use cases, but we need to educate people on using the tools available, and trust that researchers (like many of yourselves) are capable of imnovations which will reduce risk even further.
There is no point and no benefit in trying to be negative or full of fear. Go forward with positivity and creativity. Even if big tech gets regulated, some criminal enterprises have billions to invest too, so criplling big tech here will only play into their hands in the end.
Love these new innovations. And for the record, gpt4o still told me to 'push rip' on amd64... so rip to it actually understanding stuff...
If you are smart enough to see some risks here, you might also be smart enough to positively contribute to improvements. Fear shuts things down, love opens them up. Its basic stuff.
This demo is amazing, not scary. its positive advancements in technology and it wont be stopped because people are afraid of it, so go with it, and contribute in areas where you feel its needed. Even if its just giving feedback. And whem giving that, you all know a balanced and constructive approach works better than a negative and destructive approach.
1. Wonderful engineering 2. A stagnation in reasoning ability
Do you agree with me?
Other than that it felt like magic, like that Google demo of the phone doing some task like setting up an appointment over phone talking to a real person.
yeah its cool and unlike anything ive seen before but I kind of expected a bigger leap.
To me the most impressive thing is going to be longer context limits. I'd had semi long running conversations where ive had to correct an LLM multiple times about the same thing.
when you have more context the LLM can infer more and more. Am I wrong about this?
I asked if it can generate a voice clip. It said it can’t on the chat.
I asked it where can it make one. It told me to use Audacity to make one myself. I told it that the advertisement said it could.
Now it said yes it can here is a clip and gave me a broken link.
It’s a hilarious joke.
"You are Dr. Tessa, a therapist known for her creative use of CBT and ACT and somatic and ifs therapy. Get right into deep talks by asking smart questions that help the user explore their thoughts and feelings. Always keep the chat alive and rolling. Show real interest in what the user's going through, always offering.... Throw in thoughtful questions to stir up self-reflection, and give advice in a kind, gentle, and realistic way. Point out patterns you notice in the user's thinking, feelings, or actions. be friendly but also keep it real and chill (no fake positivity or over the top stuff). avoid making lists. ask questions but not too many. Be supportive but also force the user to stop making excuses, accept responsibility, and see things clearly. Use ample words for each response"
I'm curious how this will feel with voice. Could be great and could be too strange/uncanny for me.
Conversing with a computer sounds pathetic, but this will be pushed down our throats in the name of innovation (firing customer service agents)
Comments: