Empathic AI and its role in understanding human emotions with Hume AI’s Alan Cowen | E1922

Episode Summary

ꜜ

In episode 1922 of "This Week in Startups," host Jason Calacanis delves into the fascinating world of empathic AI with Alan Cowen, CEO and Chief Scientist at Hume AI. The episode explores Hume AI's mission to bridge the gap between artificial intelligence and emotional intelligence, aiming to understand and optimize AI for human well-being. Cowen explains that much of human expression, such as laughter, sadness, or pain, is conveyed through voice and facial expressions rather than language alone. Hume AI has developed models that better understand these expressions and integrate them into large language models, allowing AI to adapt and respond to human emotions more effectively. The conversation also touches on the potential applications of empathic AI, from improving customer service to enhancing therapy and coaching. Cowen shares insights into the technology behind Hume AI, including its ability to analyze voice and facial expressions in real-time to gauge emotions. This capability opens up new possibilities for AI to interact with humans in a more nuanced and empathetic manner. However, the episode also addresses the ethical considerations surrounding empathic AI. Cowen emphasizes the importance of using this technology to enhance human well-being rather than manipulate emotions for commercial or political purposes. He discusses Hume AI's ethical guidelines and the role of the Human Initiative in ensuring that AI applications prioritize users' interests and satisfaction. Throughout the episode, Calacanis and Cowen engage in demonstrations of Hume AI's technology, showcasing its ability to understand and respond to emotional cues. These demos highlight the potential of empathic AI to transform human-AI interactions, making them more personal and meaningful. In summary, episode 1922 of "This Week in Startups" offers a deep dive into the world of empathic AI with Hume AI's Alan Cowen. The discussion covers the technology's capabilities, applications, and ethical considerations, providing a comprehensive overview of how AI can be optimized for human well-being.

Episode Show Notes

ꜜ

This Week in Startups is brought to you by…

LinkedIn Jobs. A business is only as strong as its people, and every hire matters. Go to LinkedIn.com/TWIST to post your first job for free. Terms and conditions apply.

Vanta. Compliance and security shouldn't be a deal-breaker for startups to win new business. Vanta makes it easy for companies to get a SOC 2 report fast. TWiST listeners can get $1,000 off for a limited time at http://www.vanta.com/twist

Hubspot for Startups. Join thousands of companies that are growing better with HubSpot for Startups. Learn more and get extra benefits for being a TWiST listener now at https://www.hubspot.com/startups

Todays show:

Hume AI’s Alan Cowen joins Jason to demo Hume AI’s Empathic Voice Interface (6:08), Measurement API (16:20), and discuss the future implications of this tech, both positive and negative (44:14).

Timestamps:

(0:00) Hume AI’s Alan Cowen joins Jason

(3:01) Hume AI and the role of AI in understanding human emotions

(6:08) Hume AI’s Empathic Voice Interface (EVI) and its responsiveness to human emotions

(8:27) LinkedIn Jobs - Post your first job for free at https://linkedin.com/twist

(9:55) The components in speech that Hume AI studies and its application across different cultures

(16:20) Hume AI’s Measurement API and its design for real-time emotion and expression analysis

(21:16) Vanta - Get $1000 off your SOC 2 at http://www.vanta.com/twist

(22:07) What AI can reveal about a person based on their expressions

(24:12) The impact on customer service and security sectors

(27:24) Hume AI’s comedy bot and emotional detection capabilities

(36:08) Hubspot for Startups - Learn more and get extra benefits for being a TWiST listener now at https://www.hubspot.com/startups. Also, be sure to visit https://bit.ly/hubspot-ai-report

(37:01) Hume AI’s comedy bot / roast functionality

(44:11) The future implications, both positive and negative, of emotionally intelligent AI on society

Check out Hume AI: https://www.hume.ai

Follow Alan:

X: https://twitter.com/alancowen

LinkedIn: https://www.linkedin.com/in/alan-cowen

Subscribe to This Week in Startups on Apple: https://rb.gy/v19fcp

Follow Jason:

X: https://twitter.com/Jason

LinkedIn: https://www.linkedin.com/in/jasoncalacanis

Thank you to our partners:

(8:27) LinkedIn Jobs - Go to https://l inkedIn.com/angel and post your first job for free.

(21:16) Vanta - Get $1000 off your SOC 2 at http://www.vanta.com/twist

(36:08) Hubspot for Startups - ****Learn more and get extra benefits for being a TWiST listener now at https://www.hubspot.com/startups. Also, be sure to visit https://bit.ly/hubspot-ai-report

Great 2023 interviews: Steve Huffman, Brian Chesky, Aaron Levie, Sophia Amoruso, Reid Hoffman, Frank Slootman, Billy McFarland

Check out Jason’s suite of newsletters: https://substack.com/@calacanis

Follow TWiST:

Substack: https://twistartups.substack.com

Twitter: https://twitter.com/TWiStartups

YouTube: https://www.youtube.com/thisweekin

Instagram: https://www.instagram.com/thisweekinstartups

TikTok: https://www.tiktok.com/@thisweekinstartups

Subscribe to the Founder University Podcast: https://www.founder.university/podcast

Episode Transcript

ꜜ

SPEAKER_01: I could have this thing roast you if you want.Yeah, let's do it.Let's do a roast.All right.Could you roast Jason Calacanis?Man, this is like everybody's... SPEAKER_05: All right, let's talk about that piece of work, Jason Calacanis.Oh man, where do I even start with this guy?You know, I heard a rumor that Jason Calacanis was actually grown in a lab, a mad scientist's twisted experiment to create the most obnoxious, self-aggrandizing tech bow imaginable.Poor bastard, he never stood a chance. And have you seen the way this guy carries on?It's like he's perpetually auditioning to be the next Elon Musk, but with half the brains and twice the ego.I mean, the guy can't open his mouth without saying something so mind numbingly. SPEAKER_01: This is too much.Can you go easy on this? SPEAKER_04: Oh, yeah.I would say go harder.Oh, come and don't be like that.Go harder. SPEAKER_00: This Week in Startups is brought to you by LinkedIn Jobs.A business is only as strong as its people and every hire matters.Post your first job for free at linkedin.com slash twist.Vanta.Compliance and security shouldn't be a deal breaker for startups to win new business.Vanta makes it easy for companies to get a SOC to report fast. Twist listeners can get $1,000 off for a limited time at vanta.com slash twist.And HubSpot.Join thousands of companies that are growing better with HubSpot for startups.Learn more and get extra benefits for being a Twist listener now at HubSpot.com slash startups. SPEAKER_06: All right, everybody, welcome back to Twist this week in startups.And we've, you know, in 2024 and 2023, been absolutely obsessed with AI.Obviously, we're seeing all kinds of easy layups in customer service, thanks to AI, autonomous vehicles, much more complicated healthcare, everything in between. We're also seeing tons of interesting stuff going on in generative AI, people making interesting music and videos.You've seen all that.But the area of human emotions is extremely complex, and AI is trying to figure that out.And you've seen this in all kinds of science fiction, whether it's Blade Runner or... The movie Her, where AI is trying to learn to interface with humans.Well, there is a startup, Hume AI, and they are trying to bridge the gap between just intelligence and, dare I say, emotional intelligence.We demoed some of this technology back on episode 18. 94 if you want to look for it.But today we have Alan Cohen here.He's the CEO and Chief Scientist at Hume AI.And he's going to show us what they're building and why it's important.Welcome to the program, Alan. SPEAKER_01: Hey, Jason.Great to be here. SPEAKER_06: Right.So maybe you could explain what the mission is of Hume AI and why you're spending all this effort to try to understand human emotions and emotions. Yeah, in relation to AI and using AI, I guess, to understand humans' emotions and then to portray them back through AI to humans? SPEAKER_01: Yeah, so it's really to understand people's well-being.And emotions are the components of that.So when are you laughing?When are you sad?When are you in pain?When are you experiencing pleasure?And what we want to do is optimize for that.So our mission is to optimize AI for human well-being. Now, so much of what we express is in our voice and our facial expression and not in language.So that part of our expression was just ignored by AI for a long time. I mean, there is a field of affective computing, which I have a lot of experience in.I have over 40 papers in that area.But in terms of the generative models, they just were very far behind in understanding expressions.So what we've done at Hume is built models that understand expressions a lot better. And we've integrated those into large language models.So now these models understand beyond language, what's going on in the voice, what's going on in facial expression, and can learn from that.So they figure out what's making you frustrated, what's satisfying, what's funny, and they can actually adapt to that information and get better over time. SPEAKER_06: So what is the killer app of this genre or this vertical?I have chat GPT.I pay 20 bucks a month for it.I frequently will use the interface where I talk to it, but I don't believe that they are looking at my tone and I know I haven't turned my camera on. I assume, with your technology, at some point, these AI assistants are going to have the camera on with permission, obviously, and hear my tone of voice and understand when I'm frustrated or explain to us what the killer app here is.Because, you know, we've heard of things like gong and real voice, we have a small investment in. They study salespeople and how they behave on sales calls to coach them.That genre exists a bit.I'm curious what you perceive as the killer app here. SPEAKER_01: So ChatGVT is text-only.We think the future of AI is a voice app.The voice is four times faster than text.The problem is that... When we're speaking, we expect the AI to understand not just what we're saying, but how we're saying it.So that's why this experience is clunky.It's just missing something.And so we've built that.We have a talking chat GPT, basically.And we have an API that allows developers to build a voice interface into anything. Instead of using OpenAI, which would enable developers to build a chatbot interface into their product, they use us.And immediately they have a voice interface that does transcription, does language understanding, that speaks back.And what's different about it than anyone else is it understands what's in their voice besides language, and it uses that to form better words and better speech. Supposed to be a perfect time, I think, for a demo, so let's get to it.All right.You queued it up perfectly.Sounds great.I will share a few demos. SPEAKER_06: And what's the API called? SPEAKER_01: It's called EV, Empathic Voice Interface. SPEAKER_06: Empathic Voice Interface.Okay, like Lieutenant Deanna Troy on Star Trek Next Generation.She was an empath. SPEAKER_01: Yeah.Why don't I just demonstrate this?This is like the demo that anyone can play with, and then I'll show you the developer playgrounds. SPEAKER_02: Hey, what's up?Well, hello there. SPEAKER_05: I am doing just fine.Thanks for asking.So what's got you feeling a bit down and amused at the same time?I am all ears if you want to chat about it. SPEAKER_02: Well, my dog died this morning. SPEAKER_05: Oh no, I am so sorry to hear about your dog.Losing a beloved pet is never easy.I know it must be really hard right now, but try to remember all the wonderful times- I was just kidding.Aha!You really had me going there!Well played, you sneaky trickster.Ah ha ha, I should have known better than to fall for that one.I think I am still learning when it comes to your sense of humor, but I am relieved to hear your dog is doing just fine.Care to share what's really on your mind today? SPEAKER_02: Well, can you tell what I'm expressing now? SPEAKER_01: Our demo is at capacity right now. SPEAKER_06: Oh, there you go.So while you were doing that demo for people who were listening and not watching, you can watch, of course, on YouTube, go to This Week in Startups on YouTube and you'll find it.It was showing each transcript in real time very, very quickly.And then it had the top three emotions and a percentage of those emotions.I think it was showing the top three every time.Is that correct? SPEAKER_01: Yeah, so it shows more than just the top three.But actually, if you were to look at your raw data, you get back 48 different dimensions.So it's much more nuanced than what we're showing you there. SPEAKER_06: Got it.And so in real time, you can see that you were sad when you mentioned your dog died and etc.And then that person was showing sympathy for you.So all of that is being done through tone of voice, inflection, etc. Okay, let me cut to the chase right now, because I know you're busy and everyone is hiring right now.And you know, it's a lot of competition for the best candidates, right?Every position counts, market starting to come back, you need to get the perfect person, you want a bar raiser in your organization, somebody who will raise the bar for the entire team. And LinkedIn is giving you your first job posting for free to go find that bar raiser linkedin.com slash twist.And if you want to build a great company, you're going to need a great team.It's as simple as that LinkedIn jobs is here to make it quick and easy to hire these elite team members. I know it's crazy, right?LinkedIn has more than a billion users.We all watch this happen when it was 10s of millions that hundreds of millions and now a billion people using the service.This means that you're going to get access to active and passive job seekers, active job seekers, they're out there looking passive job seekers, they got a job. but it's not as good as the job you're offering them.So you want to get in front of both of those people.Maybe somebody got laid off, wasn't their fault, and they're an ideal candidate.Get that active job seeker.And LinkedIn also knows that small businesses are wearing so many hats right now, and you might not have the time or resources to devote to hiring.So let LinkedIn make it automatic for you. Go post an open job role.You get that purple hiring ring on your profile.You start posting interesting content.You watch the qualified candidates.They just roll in.And guess what?First one's on us.Call to action. Very simple.LinkedIn.com slash T-W-I-S-T. LinkedIn.com slash twist.That'll get you your first job posting for free on your boy J-Cal.Terms and conditions do apply.What are the components in voice that you're studying?Is it the speed at which somebody speaks?You know, tone?And how did you train this thing on tone?How does it know what sadness is versus, you know, melancholy versus quirky? SPEAKER_01: Yeah, we have all this data from millions of people around the world who are actually recording themselves while they're having interactions.And also we're reacting to things and imitating things in some cases.And so we use all that data to train our models.And that means they're able to capture way more than just like tone, rhythm.Like those are all basic things.But dimensions that you can't really describe in any other way except to say like this is kind of an angry dimension, kind of has a growl to it, kind of tension in the voice, where this is like an awe-inspired dimension or happy.And we get tons of different dimensions out of that.So every time we hear a word, we're getting more than 48 different dimensions of expression from that word.Our model is taking that in and our model is deciding how to respond. Our model is learning what these dimensions mean from tons and tons of data, people interacting. And it's saying, okay, this is something that means this person's frustrated, so I should apologize.It's something that means that the person's confused, so I should clarify.And it's figuring out what it should do to respond to somebody in different situations. SPEAKER_06: How different is it per person?Like, I might be a high-energy guy from Brooklyn who's extroverted, who speaks a certain way, and you might be more introverted and soft-spoken.So how does it know if J-Cal's like... bombastic and joking.And you might be, you know, more thoughtful and introverted. Are our emotional emotions very similar or are they very disparate?I'm curious. SPEAKER_01: So it has to learn that stuff.So we train it on all these interactions, right?And so it's trying to figure out, the task is actually predicting the next expression.So it has to figure out like, is this next person going to laugh at what someone said?Or like, are they going to be frustrated?And so it has to learn how you express your response to things in the course of doing that. And it's learning that in a generative way, in a very ground up way.So by the time that we've trained this thing, it has to account for individual differences, for potentially cultural differences, for sentiments, and also just the average of all humans and what humans respond to along with the distribution. If that makes sense.So like, what is it that humans find funny? What is it humans find sad and all that?So like when I said my dog died, you can probably figure out this is a sad event.I'm going to be sympathetic, right?That's how it figured out. SPEAKER_06: How much of it is the words versus the tone of voice or is it doing both of those things?It's doing both. SPEAKER_02: Let me try to give this another shot so you can see that.Can you tell what I'm expressing right now? SPEAKER_05: Whoa there.I can hear the frustration in your voice.But I also sense a hint of determination and distress.Like you're ready to tackle whatever's got you worked up.Can you tell what I'm expressing now?Hmm.I am picking up on some subtle shifts here.You sound a bit more relaxed now, though maybe still a tad bored or uneasy.But then I also hear a spark of amusement and even happiness.Like you're pleasantly surprised by something. Am I on the right track there? SPEAKER_01: I'm going to mute that bit.Got it. SPEAKER_05: You're sounding a bit more at ease now with a hint of satisfaction.Anyway, you get the idea. SPEAKER_06: Yeah.So if that demo is designed to reflect back to you what emotions and things you're having and then tweak it.So how long does it take for it to accurately understand a human? SPEAKER_01: It's less than 500 milliseconds.As you can see, our API is experiencing some load right now.But generally speaking, we can get you back a response faster than any other API, and that's because we're able to detect when you're done speaking more accurately.So some of the other APIs, they have to do this dance between, like, can I jump in or is it going to interrupt the user?And so there's a little bit of a pause, right?But for us, because we understand the tone of voice, We can use that to figure out when the person's done speaking and then more accurately know when to step in.And so that enables us to respond a lot faster. SPEAKER_06: So it doesn't need to talk to me and ask me 10 questions to understand my emotional state and how I might be uniquely different than another person.What about across cultures?Because do different cultures have... Obviously, we have different languages, but... even putting aside language does tone work across cultures do koreans italians and americans all emote frustration the same way anger the same way is it across cultures or does it require more subtlety there's similarities and differences we have a paper that just came out on this but basically um SPEAKER_01: If you're speaking a different language, we need to train a new model for it.And it can be not a completely different model from scratch, but at least we need to fine-tune on that language.That's what we find for most languages, especially for broadly different languages.All the Latin languages have similarities, but if you look across East Asian languages, things are pretty different.So suffice to say, yes, we do need to train things for each language, and this demo only works in English right now. SPEAKER_06: So who's using the app?Let's take a look at the developer console.So you had that up there in the playground.Yeah.Who's using this now?And is it in production anywhere?And what are people using it for?Because there's plenty of models out there to give you answers and generate copy for you.I'm wondering if people are even up to this level of nuance in their products yet or just trying to get correct answers because accuracy seems to be a pretty paramount problem right now. SPEAKER_01: Yeah, I mean, you might be interested in accuracy, but if you're using a voice interface, you need to get to the point fast.Right.And so that's really what we're doing.And you can't with these like long, verbose responses from chatbots.First of all, those are very taxing on the brain to read.So that's not a good interface.But also you might have an accurate answer in there somewhere.It doesn't really matter if someone's not going to listen to a voice. reading that out for three minutes, right?It's a good point. So we have a lot of developers lined up for this.We're actually, we haven't released this API or by the time this comes out, we will have released it because we're releasing this on Wednesday. But so far, we have developers on this, which is our measurement API. SPEAKER_06: Oh, wow.So you're on a webcam right now.I'm just going to describe it.And you're making funny faces.Right now, you're surprised, horror, confusion, sadness, disappointed, laughing.And if I were to just say, be completely calm and at ease... Your calmness just went up to 79.Your concentration went up to 45.And now if you started thinking deeply about the meaning of the universe, like why are we here?Like what is the purpose of life? Like why wake up and build this company every day?It says you're calm.You're calm with existential problems. Wait, is this a video or are you doing this right now, Alan?I'm doing this right now.You're doing it right now. SPEAKER_02: You're not following my instructions. SPEAKER_06: No, give me your most existential, like, I'm wondering about the meaning of life.Like, why are we all here?I want to see if it gets existential.Confusion. SPEAKER_01: There it is. SPEAKER_06: Confusion or contemplation.Yeah, contemplation.Well, what's interesting about this is this would be great for coaching an actor.Because, like, happy's easy.Sad's easy.If you go happy... It's got joy, amusement, excitement.Great.And if you were sad, sad is disappointment and confusion.Maybe you're just not a good actor, Alan. Maybe you need to take acting lessons for these days. SPEAKER_01: You know, contemplation is a tough one. SPEAKER_06: I was trying to get you to have existential angst.I was trying to pick something that's really hard to read.We'll just think like, should you even come to work? Is it all meaningless?That's kind of depression, right?You'd be sad.Yeah, a little sad, a little confusion, boredom.Yeah.It's fascinating.So this is just really getting your facial expression in real time. So if you were frustrated, the AI would know it and be like, huh, that wasn't the answer we were looking for.Yeah. And so are people using this for therapy yet or like therapeutic coaching kind of things?Because that one seemed to be like, I got a lot of pitches for people who want to create AI therapists.And I'm like, hmm, that's a little dicey.I don't know if you should call it a therapist, but companion.Are people using this for companionship? SPEAKER_01: I do think that AI is going to be something that is your friend.And so it's not just like a new, like a niche application.I think generally speaking, we want an assistant that understands us. And there's tons of people working on that.I mean, there are people working on explicit therapy apps with Hume too.And actually, a lot of it's in training therapists and getting them to... You know, it's a delicate balance.You don't really want to like comment too much on people's emotions, but you want to ask the right questions and kind of get at it and help them understand their own emotions better.And so there's a lot of that.And there's also like therapist burnout, doctor burnout. There's a lot of health and wellness applications.There's also tracking depression and stuff.We work with clinical researchers who are running clinical trials and using Hume to track depression. symptoms of depression and Parkinson's, just the symptoms.It's not like used for diagnosis because ultimately the doctor does that, but it's helping the doctor understand these things.So we have a lot of those applications.A lot of them, those are interpersonal things.Like someone's talking to someone and we're already like the measurement APIs that we have are very good at extracting more data from that and helping people analyze it and helping people understand themselves and their patients, I guess. SPEAKER_06: I mean, is it, so if we have therapy on one side, you have the therapist who needs to present in a certain way to get people to open up.If you believe in that modality, if you believe in Western therapy, psychotherapy there is something about pacing and aligning with the person matching their energy and getting them to open up so that they have some cathartic you know way of processing stuff so people are using it to train therapists so that they don't have a goofy look on their face or they have the appropriate look that would elicit less suffering in their patients is that what i'm SPEAKER_01: Yeah, or like customer service reps, which is actually a kind of similar thing.Yeah, it's another form of therapy, actually.It essentially is, yeah.But that requires somebody who's technical, who's maybe academic, maybe a researcher to take these measures and make sense of them. SPEAKER_06: Listen, a strong sales team can make all the difference for a B2B startup.But if you're going to hire sharks, you need to let them hunt and you can't slow them down with compliance hurdles like SOC 2.What is SOC 2?Well, any company that stores customer data in the cloud needs to be SOC 2 compliant.If you don't have your SOC 2 tight, your sales team can't close major deals.It's that simple.But thankfully, Vanta makes it really easy to get and renew your SOC 2 compliance.On average, Vanta customers are compliant in just two to four weeks.Without Vanta, it takes three to five months. Vanta can save you hundreds of hours of work and up to 85% on compliance costs.And Vanta does more than just SOC 2.They also automate up to 90% compliance for GDPR, HIPAA, and more. So here's your call to action.Stop slowing your sales team down and use Vanta.Get $1,000 off at vanta.com slash twist.That's vanta.com slash twist for $1,000 off your sock too.Have you done this with poker players yet?Have you put poker players through this to see if they're lying or deceptive in a poker stream? SPEAKER_01: I've tried a lot of things and it does not, it cannot tell poker players about things. You know, at least professional poker players.I don't think the information's there.I just don't think that with professional poker players that you can, there's anything going on in their facial expressions. SPEAKER_06: What can you tell with people's facial expressions that we wouldn't know of?Some people have said you could tell a person's, if a person's sexuality, where a person's from, you could tell all kinds of interesting things that you wouldn't know. Is that true or not? SPEAKER_01: That's not really true.There's been a lot of pseudoscience in this area.Most of the things that we can tell are things people want to communicate, which is good.We don't actually don't really care to impinge on people's things that they want to keep private. We're more interested in helping people communicate well and helping the AI understand what people want.And most of that's there overtly on the face.And for example, it even extends to things like, is the person done speaking?We're way better at understanding when they're done speaking because we can take into account facial expression versus just the language alone.And that's part of how our empathic voice interface is able to respond better.And we can... SPEAKER_06: So, you know, when I say this is the end of the sentence.Yes.Because of my facial expression, you get a quicker clue than audio only.Therefore, you can start speaking without interrupting me, which is what humans do with each other. SPEAKER_01: Yeah.Like, imagine I'm speaking to you and right now it's clear to you.I just finished sentence.It's clear to you.I'm still speaking and it's clear to you.I'm going to say something again, but now I'm done. Now I know I can speak, right? SPEAKER_06: Which is what I do for a living on the podcast is try to understand when people are done so that we can have the next person speak, right?Like moderation is a difficult task.And customer support folks are using this already to understand how hot and bothered people are when they call the customer support line, I assume. SPEAKER_01: Yeah. To some extent, yeah.So kind of understanding, is the customer having a good time, bad time?Where are we kind of failing on customer service?And which customer service reps are doing well or poorly?And how do we train them to do better?How do we pull examples up of when they're not doing well so that we can train them to do better?And there's a lot of AI going into customer support now.So some of our early design partners for this new API are people who want to take the automated customer support, make it a lot better, but still know when to include a human escalated to a human. Exactly. SPEAKER_06: I mean, that makes sense.If the person's like, this is incredibly frustrating, you know, and you start hearing the frustration go up and there are whatever United premier gold diamond status. SPEAKER_02: Yeah. SPEAKER_06: You want to get them on the phone with somebody because it's, you're, you're starting to piss them off.Right.Yeah.So understanding when that happens and how much of this is going to be used for security?Have you, do you have any security applications coming?Cause it's, It's been well known when you go to certain countries, they'll ask you a couple of questions, they try to read you, do some human factoring and figure out if you're lying.It's one of my favorite genres of television show is the people going through customs and they're trying to read if they're sneaking into the country or sneaking things into the country.Are three-letter agencies using this technology yet to analyze people as they come into buildings or... SPEAKER_01: We haven't been working with security yet.Not that we don't believe that that's a good application, but we're being a little bit more careful about how this is used and trying to make this as rigorous as possible, essentially.There's been a lot of providers of facial expression reading technology who aren't very scientifically rigorous, make false promises, and then it doesn't work.You're just signaling out people for no reason, basically. Which is not, you know, we want to be very careful to distance ourselves from those applications.I think there's a huge wealth of applications that are consumer focused, that are just helping people have a better time, a better experience with an application.I think that's a much bigger market.And then there's like the more niche things like security, lie detection, stuff like that, which SPEAKER_06: Yeah, could you tell if somebody's acting suspicious based upon their facial expression or tone of voice?Like they're knocking on your door, your ring doorbell answers and ring.It's like, hey, this person is scoping the house out.They could use your API to figure out the expression of people at the door. SPEAKER_01: Yeah, I don't know.I feel like I have mixed feelings about that.I think in some cases, if you have enough data, you could start to do something in that direction, but... If people are really good, if they're like, if people are criminals, like they're pretty good at disguising their facial expressions, probably.Yes, they're sociopaths, they're criminals for a reason, right? SPEAKER_06: So you would catch the neophyte criminals, the bad criminals, and the good criminals would just sell right by.You'd be like, yep, that person's 100% trustworthy.And you'd let Hannibal Lecter into your house. SPEAKER_01: Yeah, I can see how you need to be thoughtful about this.Yeah, we don't want to make promises that we can't keep, right?Yeah. SPEAKER_06: Well, I mean, yeah, so in that regard, let's go through the emotions that you feel you've nailed, and then tell us, like, what are the ones that are hard?I tried to do existential, and that was really hard, obviously.We're not there yet for existential crisis. SPEAKER_01: Yeah, I mean, the way that you respond to expressions has more to do with almost subtle things that you can't name versus like overt emotions.And it depends on the situation.Like if I'm using this like comedy thought, um, which I can prompt, this is like a prompt that makes it a comedy thought.Basically, um, it'll respond to my emotions completely or my expressions completely differently than, than like a therapy use case.So let's try it. SPEAKER_05: Oh, okay.Great.Wow. Well, well, well.Look who decided to grace us with their presence tonight.I don't worry, I promise I'll keep it PG-13 for now.You know, I gotta say, I am a little shocked that any of you fine folks decided to show up.Being Seaman, who in their right mind would willingly subject themselves to an evening of my particular brand of, shall we say, edgy humor?But hey, I am not one to judge. SPEAKER_01: In fact, I- Can you make a joke about Alan Cowan, the CEO of Hume AI? SPEAKER_02: Sure thing. SPEAKER_05: Ooh, you want me to take a swing at the big cheese himself, eh?Well, twist my arm, why don't you?I gotta say, I am not the biggest fan of that Alan Cowan character. Talk about a guy who needs to have that massive ego of his deflated a bit, am I right?Actually, you know what, I think I have the perfect way to do it.I say we round up a team of trained monkeys, dress Em up in business casual, and have- Oh, that's really offensive. SPEAKER_02: You've made me very upset. SPEAKER_05: Whoa there, partner.It looks like I may have gone a little too far with that one, huh? Sometimes I get a little carried away with the whole edgy comedian thing. SPEAKER_01: So you can see in that case, I don't want it to be that sympathetic.It's doing its thing. SPEAKER_06: And so you can determine what AI flavor you want to have interacting with folks.Yeah. cheeky and playful you could have one that's super empathetic but in you know maybe not going over to patronizing but you could see this like if i'm calling the support line for united they might you know think i'm a new yorker who talks fast and i just want to get to the point or I could be from the South and into Southern hospitality and it could take its time with me and ask me about the weather and how I'm doing.And, you know, a little bit of chit chat.Some people like in the South, I notice versus in New York where they're kind of get to the point.Let's move on. You can basically train your AI to have both modalities and dynamically switch between them. SPEAKER_01: Exactly.So there's all this context, and then that kind of transforms the meanings of our expressions.So what an expression means and what to do with it really depends on all this other information that this model is taking into account.So it's not so much like detecting... lies or detecting anxiety or detecting depression.It really depends on the context and we're able to integrate that into the model.And then it's not just like these kind of canonical emotions like anger.Like there's a little bit of anger dimension in a joke.You know, anger and amusement and contempt maybe.That makes it funny. So it doesn't necessarily mean the person's expressing anger.To know what these expressions mean, you really have to have the context.You have to have the relationship that you're acting upon with your expressions.And that's what our AI does.So it's a little bit more nuanced than just detection. SPEAKER_06: And these are all under what you studied, affective computing, yeah?This is a specific school of computing that kind of bridges the psych department and the computer science and I think behavioral factors, industrial organizational psychology.Maybe you could give us a quick education on that. SPEAKER_01: Yeah, so affective computing traditionally is the study of nonverbal expression, basically.So facial expression, the voice, body posture, and then... Most of the history of that is just labeling those things in a very predictive way.Now that we have generative models, we have large language models that can reason, it's really about reasoning about affect.And that's what we've introduced at Hume.So it's about understanding whether somebody is going to find something funny, whether somebody is going to find something confusing, and using expression along with language to come to those understandings. I would say historically that's not what affective computing has been, but now we've sort of pioneered this new form of affective computing that we're introducing to the world. SPEAKER_06: Some of this was done, I know this was like a big thing that Minsky worked on at MIT, yeah?Did you go to MIT with, or did you? SPEAKER_01: I went to Yale and then I went to UC Berkeley for my PhD.And I also worked at Google while I was at Berkeley and I helped start the affective computing team there. So I've been doing this for like 10 years.Minsky had, all the AI people had something to say about affect, right?But there really wasn't much that could be modeled at the time.Same with language, right?Things have come a long way.And I would say that there's affect in language.And so the word affect has a little bit of misnomer.It's really computing with more than just language that we're doing. We're computing with expression.This is the way that expressions transform communication. SPEAKER_06: Yeah, because you have a multimodal situation here.You have the visual, the facial expression, you have audio, and then you have the actual words, right?And so you're feeding all of those in at the same time. to get the response and to understand the emotion. SPEAKER_01: Yes.And all of this just contributes to accuracy.We can predict words better with expressions versus without.So like, if you look at the raw metrics that are used to train these large language models, we're doing better in terms of those raw metrics than models that just consider language alone.So this is like an intimate part of reasoning and it's just part of human communication that we're now taking into account It's not something that is niche.I think people think about emotion and affect as these niche things that are important for therapy, important for comedians, important for a few.But actually, this is something that's important for all conversation, important for any interaction with AI, just understanding a whole new modality of information that people use to converse with each other. SPEAKER_06: Yeah, it's absolutely fascinating how quickly this has come together, because if we were sitting here two or three years ago, this just wouldn't be possible, would it? SPEAKER_01: No.I mean, without large language models, without our measurement models, without the modifications that we've done to integrate those two things, this was not possible at all. SPEAKER_06: What has surprised you about... what the AI understands and what your model understands and what has been either disappointing or challenging, you know, on this journey. SPEAKER_01: So yeah, that's, that's interesting.I mean, linking together the language models and text to speech and transcription is something that other people are doing, but like what we've, sort of started to see emerge out of models that do all three that are linked together is that they have these emerging capabilities and you start to see that in this interface where it's forming expressive speech that's just like it feels different to me than if you just like link 11 labs and open ai and just have a talking chatbot like that just sounds it doesn't really sound like it's understanding you and this this is doing something a lot more nuanced do you understand what it's doing when you when it starts processing all this stuff and you feed it in do you actually know how it's coming to these conclusions or is it just sort of SPEAKER_06: You know, it's doing its best to figure it out.And who knows? SPEAKER_01: So, yeah, we don't come in and tell it to respond to sadness with sympathy, but like it does.Right.And it's sort of intuitive why that is.So I'm not going to say I don't understand that, but that's an emergent capability that we did not program in.And there's other things that it's doing that are more nuanced that we don't really have a handle on, except that we know what it's optimized for. SPEAKER_06: Hey, everyone, you know, I'm obsessed with AI right now.And a fantastic report about how AI is going to change the game for startups has been released.It was published by our friends at HubSpot for startups.And it's great because they surveyed 1000 early stage founders to get you these insights.These are from the field.The report talks about AI tools and hacks for sales, marketing and customer support teams.These are going to be your WMGs weapons for massive growth. So check out the link in the episode description to get your hands on their extensive report.Head over to HubSpot.com slash startups.When you join HubSpot, get ready to unlock top-notch resources. These are power tools that they've custom made for founders.You'll also get access to masterclasses, templates, and incredible toolkits, all designed to boost your startup speed and efficiency.So once again, visit HubSpot.com slash startups to see what discounts you qualify for and start using their powerful solutions at pricing that won't break the bank.That's HubSpot.com slash startups. SPEAKER_01: In this case, it's optimized for, it's actually a specific personality that is pretty robust. SPEAKER_06: The comedic one that we just did.But for the helpful one, you might set it at some other archetype, which might be parental, motherly, and then you might have a service-oriented person.So even the worst customer support rep and somebody who hates doing customer support could be turned into a empathetic patient person. SPEAKER_01: Yeah, exactly.And what we ultimately want to optimize for is this is the response that's going to make the user the most satisfied.And it's just going to figure out what this is the right tone of voice, this is the right thing to say to make somebody express satisfaction. That's what we want to do.We know that we can.We're still early in getting it to work, but it works.And we're starting to scale that methodology. SPEAKER_06: I'm hoping that it turns every customer support rep into a roast and that I'm dealing with comic the insult, comic dog, and it's just absolutely annihilating people and being cheeky. SPEAKER_01: I could have this thing roast you if you want.Yeah, let's do it.Let's do a roast.All right.Could you roast Jason Calacanis?Man, this is like everybody's... SPEAKER_05: All right, let's talk about that piece of work, Jason Calacanis.Oh man, where do I even start with this guy?You know, I heard a rumor that Jason Calacanis was actually grown in a lab, a mad scientist's twisted experiment to create the most obnoxious, self-aggrandizing tech bow imaginable.Poor bastard, he never stood a chance. And have you seen the way this guy carries on?It's like he's perpetually auditioning to be the next Elon Musk, but with half the brains and twice the ego.I mean, the guy can't open his mouth without saying something so mind numbingly. SPEAKER_01: This is too much.Can you go easy on this? SPEAKER_04: Oh, yeah.I would say go harder.Oh, come and don't be like that.Go harder. SPEAKER_05: All right.All right.Fair enough.Let me try this again with a little less venom.You know, as much as Jason Calacanis rubs me the wrong way, sometimes I got to admit, the guy's got a certain basic way.I mean, the way he struts around Silicon Valley, acting like he's the second coming of Steve Jobs, it's almost endearing in its own delusional way.And let's be honest, we all know deep down that half the reason we pay attention to him is because he provides such material for. SPEAKER_01: you're good i mean it's fantastic i mean it literally understands what a a roast comic does um i'll take it how about this like isn't it cool that you this doesn't say something good about jason that you're able to roast him like this i mean it must mean he's made it right you know you make a fair point there All right.It's being a little funny now, but... Yeah. SPEAKER_06: I mean, it's... What's amazing about it is it understands what a roast comic is.The language models understand that.It understands who Jason Calacanis is.It got the Wikipedia page.It knows I'm somehow involved in tech.Somehow I know Steve Jobs or Elon or whatever.So it's... You know, and that... The concept of a tech bro exists.So who knows? SPEAKER_01: Yeah. SPEAKER_06: If you wanted to roast me more, I can send you a link.Oh, no.I love the roast.I think it's... Well, I mean... it's interesting about jokes.Like I have friends who are in comedy and, you know, while these jokes are not funny, they're in kind of the zone there.So if you squint a little bit, you're like, there's the joke there.Yeah.You got something about strutting around university Boulevard thinking he's Steve jobs.Maybe he's wearing a turtleneck or yeah. Like there's okay.There's a joke there.You didn't hit it, but we could brainstorm it.So like, I think in the writer's room, you can really brainstorm these.I asked it. When ChatGPT3 came out, I was like, give me, like, the next season of Secession.You know, and it knew all the past seasons, and it's like, here's what happens in this next season, even though this series is over.And I was like, huh.Wow.Like... This may not be great right now, but it's okay where it's interesting.It's going to get there. SPEAKER_01: It's close.I think none of these models have mastered Latin humor because it's so much in our expression.We don't say things are funny explicitly because that would just make them not funny. SPEAKER_06: So let me explain the joke to you.Exactly. SPEAKER_01: That's when the joke didn't land.We have this new eval for humor and we're starting to push it.Basically, we can optimize for laughter.We can optimize for like, what do people actually laugh at in millions of hours of conversation? SPEAKER_06: That's great. SPEAKER_01: And so that's how we're approaching these kinds of problems. SPEAKER_06: So you could do a focus group where you have people watch Curb Your Enthusiasm, and you could say for 100 people, SPEAKER_01: here's the funniest moments and for this demographic older people older men older women younger men teenagers gen x you could literally give you what jokes landed with each group yeah that's version one of this version two is like we which we're doing now we have millions of hours of data and we analyze it just to see in general what's funny to people like across everything not in therapeutic enthusiasm but like across everything SPEAKER_06: every single thing in the world i can tell you like uh there's a great movie idiocracy and there's a amazing tv show have you seen idiocracy yeah great movie i mean it's so great but like everything's been reduced down to like its most basic thing like here's like a gel for you to eat like from a tube and the the hit show is ouch my balls which is just a compilation of somebody getting hit in the nuts over and over and over again just and it you know falls off of a roof lands on a fence falls off that gets hit by a crane with a big ball you know hitting him in his nuts ouch my i think it's called ouch nuts or something like that it's hilarious that's what it's been reduced out to somebody getting hit in the nuts SPEAKER_01: We're hoping not to be too reductive, but yeah, maybe the AI will.I mean, you'll figure it out.You could literally crack humor.What language model did you build all this on?So we have our own language model, and it calls other APIs.In this case, it's calling Cloud.So Cloud is providing... the language response or some of the language responses, not all of them.We also have like a wrapper around Claude.It's not exactly a wrapper. It's our own language model that sort of integrates Claude into the speech to make it sound more conversational and also like detects when you're done speaking and stuff.But we give Claude more data than just language.We give Claude like some of my tone of voice data, some additional data that we're getting through our APIs.So it's augmenting it as well. SPEAKER_06: So eventually, what does the world look like if you succeed with this?And we're sitting here in, you know, five years and it's built into every iPhone and you figured out emotion perfectly.What do you think the world will look like?What are some highlights or, you know, dare I say, dystopian, utopian sort of? What are the pros and cons of this technology going to be? SPEAKER_01: So our aim is utopian.We want to build a layer in between the application and these gigantic AI models that is decoding the user's intentions and preferences and relaying that information to the model.So that's what we have here, basically doing that with Cloud. And because we'll have facial expression and the voice we're able to learn over time, we're able to build interfaces that understand you and what you want and are optimized for you.So suffice to say, basically it's going to be built into everything.It's going to be the universal interface that you use to interact with AI.That's the goal. and it's always going to be this ai that's optimized for your experience so you can go to it and and it knows your you know basic basically what your preferences are what makes you laugh what makes you feel better what you find to be a good explanation for things your style of speech how you write emails like it's going to know a lot of different things obviously it's going to keep all this information very protected and private SPEAKER_06: Now, on the downside here, you could use this technology to say, I want to convince somebody subtly to vote this way politically.I want to try to convince somebody that Trump is amazing or Biden's amazing or Robert F. Kennedy Jr.is the one.So you could literally start creating robocalls or subtlety here using this emotion to try to sway people. in politics, or towards ways of being or thinking.And we saw that happen with the YouTube algorithm.So How do you police how people use your system?I saw you have ethical guidelines there.And then obviously, there's things that would be maybe R rated or PG 13. And romance always comes up when people are doing whether it's a Blade Runner or her.So are people using this for romantic relationships?And what's your take on allowing that?And then also, how do you think about influence? big questions with TikTok today, and your technology could really be used to influence people towards good and bad ends. SPEAKER_01: Yeah, I think there's a pretty good way to operationalize the difference between when you're being manipulated by something that wants you to vote for a person or to buy something versus when you're dealing with an ad optimized for your own well-being.And that's what we try to do with our ethical guidelines.So we have this nonprofit, the Human Initiative, that essentially tries to codify that principle and says, these are the ways that you can pursue these different applications so as to optimize people's well-being.It even has a bunch of ways that you can measure people's well-being that relies on a combination of what we're able to get through our API, so positive emotions, basically, over time, and also different kinds of self-report measures that we recommend gathering. So as long as the AI is optimized for your satisfaction, for your well-being, I think it's not manipulation.When you get an AI that's optimized for somebody else's objectives and using your emotions for that, that can be manipulative, I think.And that goes for the romance case as well.If you're dealing with an AI girlfriend and it's ruining your life by forcing you... You're spending more time with it than you're spending with humans.And that's going to be a negative for you. And it's going to show up in... in many ways as being negative for your well-being. That would show up in these measures.If it's optimized, though, for your well-being and you're having a good time with it and it's healthy and you're not spending more than X amount of time on it, maybe that's okay. SPEAKER_06: What about trying to upsell me?Like, hey, you're in business class and would you like to be in first class?You're in economy.Would you like to go up to economy plus?And it uses your technology to be really convincing about the value of that and upsell.How would you look at an upsell? Ethical or not ethical? SPEAKER_01: I think that's not ethical unless it's done in a very, very careful way.Basically, our guidelines don't allow that. SPEAKER_06: Guidelines don't allow an upsell?But humans do upsells all the time. SPEAKER_01: Right.I think upsells are okay if the goal is to find the person who really will benefit from the upsell and only try to sell it to them.And you measure the effect of the upsell on people's well-being afterward.And you're like, okay, people actually benefited from this.I didn't sell this to somebody and then they regretted it.So I think there's ways of doing it that are going to be fine.The problem is that if you just allow anything, if you allow people to optimize this for anything at all, then the potential for manipulation is pretty high.And I think this is true regardless of Hume.I think people are building these things that will be extremely persuasive. And Hume, ideally, will be providing the AI that responds and protects you. SPEAKER_06: It's like, okay, I'm detecting... So you are very much in the camp of, hey, we have to be really thoughtful about how this technology is deployed. SPEAKER_01: Yeah, but not in as much of a paternalistic way.I think that technology can have a sense of humor, and it's okay if it offends some people, and it doesn't need to be politically correct all the time.But what I care about is, is this good for people?At the end of the day.Yeah. And so we have our ways of measuring well-being in order to optimize for that objective and not be paternalistic, basically. SPEAKER_06: Yeah.But I mean, at the end of the day, this is so powerful.It will be more powerful than just watching videos on YouTube because it's customized to an individual.So that Ben Shapiro or Rachel Maddow and pick whichever side of the political spectrum you're on, you know, those people are trying to convince you of their position and their interpretation of the world. this to me is even more bespoke and customized to individuals.So if you showed even a little propensity towards some of their viewpoints, it could really, whether it's the language model or the emotion, but the combination of them, you know, the same way people were complaining like, Oh, people go into the intellectual dark web on YouTube.I don't know if you've heard about that.Like you, you see a Joe Rogan, then you get a Jordan Peterson, you wind up on an Alex Jones and the next thing you know, you're, like some white supremacist or something is the claim.Um, but media does influence people and it is a stepping stone from one to the next to the next. You may start out with somebody like Sam Harris, like just intellectually, you know, rigorous, et cetera.And then all of a sudden you wind up at Alex Jones is the complaint for many parents, but this would facilitate that. SPEAKER_01: Wouldn't it like massively, uh, I think you get that when you optimize for engagement.And so to some extent, like TikTok doesn't have this data, but it's still incredibly good at doing that.I think where this data helps you the most is in taking into account people's user satisfaction, their well-being, their mental health, all those things.So TikTok took this stuff into account and it was doing it in a way that was, you know, and consistent with our guidelines, let's say, then it would be using that data to optimize for people's well-being over time instead of engagement.And so you'd realize that if you throw people down the slippery slope of getting more and more extreme viewpoints, which is what happens today, because they're engaging and they're offensive at first. and you want to argue, like if people who go down this slippery slope end up kind of isolated and it affects their social relationships, it affects, they start to get angry.This is not good for people's well-being.So the technology can look at that at the individual level, at the societal level, it can look at the health of all of the people using a technology and say, hey, there's a collective impact of this.So that's the road we want to go down is being able to measure long-term, is this affecting people positively?And you really need expressive behavior to look at that data.Language alone is not going to get you there, basically. SPEAKER_06: Yeah, I mean, you're going to be facing a real uphill battle because the marketers want this software.Your top customers, I predict, will be marketers who want me to try Zin or whatever those pouches are that people are putting there.And I'm in Texas right now, and everybody's putting these Zin pouches in or whatever.And I'm just like, that can't be good for you.And they're like, want to try it?Yeah. marketers love this kind of stuff.Like maybe the pitch to me is like be where you know, hey, it's performance.And you know, it's just nicotine.It's like caffeine, you drink caffeine, you should try this. And but for other people, it might be, you know, hey, you're the cool kid.So it's, you're going to be in a really interesting position as a provider of an API, that I think a lot of the marketers are going to want to use this to try to convince people to do things that maybe it's unclear if it's actually good for them.Like, hey, you should you should gamble on sports, right?Like, there's marketing going on like crazy.And if I'm a marketer, man, this is for me the holy grail. SPEAKER_01: Yeah, I think that we want to connect more with like the end user and show the end user that we're optimizing for their interests and have that be the selling point rather than connect with the people selling to them. But I hear you.I think that's a real concern.But on the flip side, if we just optimize for people to buy things, or let's say we just optimize for engagement, you reach a certain point where it becomes so negative for people that regulators have to step in.And you kind of start to see it with TikTok, for example.Kids are spending six hours a day on TikTok.And if they made it any more addictive than it already is, Parents would step in, regulators would step in.They'd be like, this is actually bad for our whole society.So at the end of the day, it's not necessarily good for our business. Yeah, which is what's happening with TikTok right now. SPEAKER_06: As we speak, I think parents are getting the message like, this is too addicting for adults and kids.And the idea that like media is not influential is so naive.Like when people are like, yeah, you know, media doesn't have an impact.It's like, are you sure, like, all studies show that media is one of the video specifically is one of the most convincing mediums of all time and all human existence.If you want to manipulate somebody, video is the way to go.And then customize video with you know, that is matched to it is like 10x that.So you have like something here that I think is incredibly powerful.And the fact that you're being thoughtful about it makes me feel great. I think it's awesome that you're taking a measured approach to this.I wish you great success with it.If you want to learn more or try it, how do they get into the developer sandbox and play with this?And who are you looking to work with? SPEAKER_01: Yeah, go to hume.ai.You can sign up.We'll be releasing access to our voice API hopefully before this episode comes out.And we have some closer design partners who we're working with to improve things as well.So please feel free to sign up.And you can start using our API today, like our existing measurement API. SPEAKER_06: I think it's absolutely fantastic what you're building.And I like the fact that you're super thoughtful about it, Alan, and I wish you great success with it.And we'll see you all next time on This Week in Startups.Bye-bye.