The VoiceFirst Roundtable - Episode 2
Host: Bradley Metrock (CEO, Score Publishing)
Guest: Dr. Ahmed Bouzid (CEO, Witlingo)
Duration: 30 minutes, 16 seconds
Podcast Links:
[intro music]
The VoiceFirst Roundtable Episode 2
Bradley Metrock: [00:00:06] Hi. And welcome to the second episode of The VoiceFirst Roundtable. My name is Bradley Metrock. I'm CEO of a company called Score Publishing, based in Nashville, Tennessee. Our mission is to help people become better interactive content creators.
Bradley Metrock: [00:00:20] Our guest today is Dr. Ahmed Bouzid, who is the founder and CEO of Witlingo, a software company based in Virginia that builds products and solutions for voice-first devices and platforms such as Alexa, Google Assistant, and Microsoft's Cortana. Prior to Witlingo, Dr. Bouzid was head of Alexa's smart home product at Amazon, as well as V.P. of product and innovation at Angel.com. Dr. Bouzid is also co-founder and director of the Ubiquitous Voice Society, a non-profit organization dedicated to the mission of evangelizing the emerging voice interface as well as author of two books on voice user interface design. And, happy to say, the keynote speaker of the Alexa Conference coming up in January. Dr. Bouzid, say hello.
Dr. Ahmed Bouzid: [00:01:06] Hello! Hello, Bradley. Thank you so much for your invitation. I'm very honored to be with you.
Dr. Ahmed Bouzid: [00:01:11] First of all congratulations on the initiative. I think you're doing a great service as well as you're being smart in terms of being first out there to, I believe, lead in a field that is only going to emerge and hopefully grow in a very, very interesting way as we follow it from the beginning of it. So congratulations.
Bradley Metrock: [00:01:34] I appreciate that. Certainly, thank you for your time and sharing your expertise and your perspectives with us.
Bradley Metrock: [00:01:41] Before we get into things, let me thank our two sponsors. The first one is the Alexa Conference, which is the annual gathering of Alexa developers and enthusiasts. You can learn more, and get registered, at AlexaConference.com.
Bradley Metrock: [00:01:54] The second is Fourthcast. Fourthcast - F O U R T H C A S T - turns your podcast into a custom Alexa skill. Get started at Fourthcast.com.
Bradley Metrock: [00:02:07] So Dr. Bouzid, let's get started, and again thank you for your generosity, sharing your time with us. You're very much appreciated.
Bradley Metrock: [00:02:19] I want to start by simply asking you: what is Witlingo?
Dr. Ahmed Bouzid: [00:02:26] Well, Witlingo it is a, I'll call it a "software as a service" company, and it's important to note that. Meaning our clients, and many, they are enterprises. They come to us, and we take care of launching, from beginning to end, meaning researching, designing, building, hosting, developing, and so forth, an Alexa skill, a Google action, or Cortana skill. Our key focus really is on ensuring that we deploy highly usable, value-delivering skills or actions.
Dr. Ahmed Bouzid: [00:03:08] I think what we are witnessing now is a first generation of these, let's call them voice conversations, or conversation experiences. And like just like we saw with the iPhone and Android, the mobile apps (the first generation), and web sites as well. The first generation of these products was, you know, not highly usable, and was not very clear on the value. And so we saw an opportunity to, from the very get go, since I've been in this field forever and I understood the challenge - the challenge that anyone who wants to get into the field of building and launching these skills. Especially for the enterprise, meaning for companies that have a logo and have a client base and they care about their brand. And they have money to spend, and they buy into and understand that the Alexas and the Google Homes and Cortanas of the world are going to be a channel in their own right just like today. You know the mobile app. I don't think you can you can call the company "kosher," so to speak, if they don't have a mobile app, or if they don't have a Web site, right? I think, in the very near future, I think it's going to be the same, more or less, where the customer is going to expect to be able to interact with the brand by just speaking.
Bradley Metrock: [00:04:29] And you guys are an Amazon preferred developer...?
Dr. Ahmed Bouzid: [00:04:34] Well we are a partner of Amazon, we are a developer, as well as Google as well as for Cortana. We are not positioned to do only Amazon or Google. We understand just like a company is not going to build only an iPhone app, they are going to build an iPhone or Google and a Microsoft app. We believe - at least this is our bet - that companies will want to be on as many platforms as possible.
Bradley Metrock: [00:05:04] Sure, and you guys are vetted for all of those.
Dr. Ahmed Bouzid: [00:05:07] Yes. Yes, we are.
Bradley Metrock: [00:05:08] That's great. And so tell me...so you guys develop skills and voice applications for the enterprise, as you said, and I want to get into one of the topics that you actually provided. What about the individual that wants to develop an Alexa skill or a Google skill? Talk to me about what you think needs to happen with monetization of voice skills in order to progress the industry forward.
Dr. Ahmed Bouzid: [00:05:44] Yes, of course. So I think we saw, at least in the App Store, both the App Store and Google Home, we saw definitely a spike in both numbers of apps, as well as the quality of the apps, when monetization was introduced.
Dr. Ahmed Bouzid: [00:06:03] Now, maybe the listeners are not aware of how much it costs to build a mobile app, but a mobile app is not a cheap thing to do if you're not building something that is a toy. If you're building mobile apps for Taco Bell or mobile app that does something that is not trivial, the average cost to build a mobile app is $270,000. That is the average cost. There are apps that - I have figures in front of me here - but 11 percent of apps are more than a million dollars. Right? 80 percent are more than $500,000. So it's very expensive to build a mobile app of high quality.
Dr. Ahmed Bouzid: [00:06:45] It is very expensive to build a skill that is of high quality. In fact, I would say venture to say that it's much more expensive to build a business-grade skill than it is to build a business-grade mobile app today. If, for no other reason, than the fact that the people have the skill to build these skills - pardon the redundancy there - the talent to build these skills is very scarce, right? People who know how to design for voice are not a commodity. People who know how to build skills are not a commodity. And so, if you want to build a great skill that will have both value, and is highly usable, it is not going to be done by a single developer. It's going to be done to a team. So somebody who knows how to research for voice will value somebody who knows how to go to the field and has has been targeted by a a product marketer who says I would like to build a skill for seniors who are of a certain nation. That person, that field researcher, will need to go out and they will need to collect information from the ground, from real people, and then hand over that information to the product manager, who then will go over the findings and identify the MVP: the minimum viable product. And they will work with the US designer who then will think about the experience, and they will design the experience, and they will test the experience, and they will identify assumptions, and so on and so forth.
Dr. Ahmed Bouzid: [00:08:16] And only after all of that is done, specs are written up by the product manager and UX person - product manager for features, and UX person for exactly how the experience should sound like. I'm talking about what should it say? How should it say it? How does it behave with a first time user? How does it behave with the multiple time user? How does it handle errors? How does it handle errors of a certain type? What happens when when there is a lot of noise? And so on.
Dr. Ahmed Bouzid: [00:08:45] All those are things that need to be considered, before you hand over a development-ready document. Then, and only then, does development begin. What we see today is a coder, by virtue of having access to the Alexa skills kit, or the Google kit, or the Cortana...by virtue of having access, they can then just very quickly zip through everything and spend 10 minutes thinking about hey it'd be cool to do this, this, and that, and of course I can design a flow - you know doesn't take a lot of brain to design a flow - and what we end up with are highly unusable skills that don't deliver value, because they were not researched, and they were not done by a professional. So I think when you monetize something, what you are attracting are business types. People who want to make money on the platform, and those are the people who are going to assemble the team to build something that will make them a million dollars.
Dr. Ahmed Bouzid: [00:09:46] Let's say a skill: I build a skill that a million people will want and each one of them is going to pay 99 cents. Right? I can imagine that happening, and I can imagine somebody pitching it to a VC and saying "I'm going to focus on building these skills." And let's say on educational skills. And moms will buy these skills their skills allow you to memorize a poem, you know, a mom will buy it for their kid for 99 cents, and a million moms will buy it. And that's just one idea, and there are millions of ideas. So you can see that businesses, actually whole businesses, can be built around building skills that people who want to pay something for.
Bradley Metrock: [00:10:28] So you know Amazon well. Why, in your opinion, have they not monetized the Alexa Skills Marketplace yet? In other words, allowing developers to sell their Alexa skill - obviously you can monetize it through in-app purchases - but selling the skill out of the marketplace, why have they not done that yet? They just haven't gotten to it? Or some strategic reason?
Dr. Ahmed Bouzid: [00:10:51] I have no idea. I have no idea. I can only say that what we are seeing is not an anomaly. Saw the same thing with Apple. Saw the same thing with Google. We don't come out of the gates with monetization. You come out of the gates with an STK and just see where is the traction, and all that. But my point simply is that monetization needs to happen if you are going to see the value, if you are going to see the ecosystem thrive.
Dr. Ahmed Bouzid: [00:11:26] And it is an ecosystem. It's not a collection of developers. It's an ecosystem of field researchers and designers and testers and coders and beta managers and marketers and people who know how to, you know, how to iterate, and product managers, and all that.
Dr. Ahmed Bouzid: [00:11:42] It's an ecosystem of people that will be assembled by entrepreneurs who will want to make money, and see a path to making money, because they have done their research, and are able to capitalize, meaning able to raise funds, and so forth. So I don't think it's an anomaly that Amazon and Google and Cortana have not yet come up with the monetization. I think it's going to happen sooner or later.
Dr. Ahmed Bouzid: [00:12:10] But I think whoever comes out with it first, they are going to see the traction in terms of both volume as well as in terms of quality. More importantly, I think the quality of the skills that then by necessity, because somebody is putting a lot of money into building a skill, will think very very hard about making sure that they've crossed their T's and dotted their I's, and make sure that whatever they're building is not something that is junk.
Bradley Metrock: [00:12:38] Sure. Yeah. We need that, the Alexa skills marketplace needs that, and I'm sure the other ones do too. I think the world's ready for that.
Bradley Metrock: [00:12:45] Let me ask you something else that I thought was interesting from what you have put forth: share, with me and the people listening to this podcast, your vision for discovery in a skills marketplace. You know, you wrote in this Google doc that you shared with me that you know you view the app store model as a good MVP, but a "no store" model is the way to go. Explain that.
Dr. Ahmed Bouzid: [00:13:19] Well, again, this is an opinion.
Bradley Metrock: [00:13:24] Sure. It's a very informed opinion.
Dr. Ahmed Bouzid: [00:13:28] I have been wrong many, many times, so this is simply an opinion of mine. But I believe that I think its very natural in the visual ecosystem, where you have something you are looking at, and you have a store you're looking at, and you have logos, and you can read at your leisure, and you can say "I like this app," I don't like...I don't want this, I'm going to try this app. And, if you don't like it, later on you delete it. I think it's a very natural interface to be able to go and discover them through searching and so forth. I think voice is all about being as natural as possible. Meaning being able to ask for things with minimal effort, and without having to remember what you need to say, or what you need to click on. And so that's what makes the UI, the voice UI, both challenging and thrilling. It's challenging because you have a lot of constraints. How are you going to service this skill to people, and how are they going to know what to say, and what not to say, and how are you going to recover without causing a lot of grief. You know we've had those experiences with the IVR over the phone and doesn't understand you and so on. So it's very challenging. It's a very challenging interface, but it's thrilling in that it allows you to speak naturally, and when done well, when the value and the design of the experience align nicely, you can have a wonderful experience that you can't top.
Dr. Ahmed Bouzid: [00:15:07] So I think discovery, as such, needs to happen naturally, meaning I think a default when you engage with an assistant like Alexa or with Google Assistant, or with Cortana, I think the default should be should just ask it. Just say, you know, "Alexa, what's the Facebook stock? How's the Facebook stock doing? And she's going to figure it out, and go and either give you that information out of the box if it has a skill somewhere, it needs to say this particular skill here is doing pretty well. So let me use that skill to deliver the information. As opposed to you need to go and enable the scale and so forth. To the credit of Google, they are trying to do that. They are trying to work towards you just speak and we'll take care of things behind the scenes. There are pluses and minuses on both sides, right? So the plus of the Alexa store is the fact that you can piggyback on something that everybody understands, which is the app. So you go to the store, you look for stuff, and all that, which is why I said it's good for an MVP.
Dr. Ahmed Bouzid: [00:16:24] I think we'll see this as, especially that Alexa has now tens of thousands of skills out there, I believe and I hope, and I think this is where it's going, by necessity that we will go towards a model where you just ask for stuff and then you get something back that's useful, and you, the user, you really don't care where it came from, whether it came from Amazon, or a third party. You don't care, as long as the experience is great, and as long as you get what you what you want. So I think discovery is going to, you're not going to discover anything, really. I think discovery is going to be moot as a concept. I think what will happen is you will speak, and you'll get things done, and then you expect it to behave like that. And just like Steve Jobs said, "it just works."
Bradley Metrock: [00:17:17] Well, yeah, I think everyone hopes that things work out like that. I think the cynic in me would say, "alright, if I'm asking Alexa or Google Assistant, you know I've got a couple hours to kill." I say "Google Assistant, or 'OK Google,' or you know 'Alexa,' I want to play a game for an hour while I'm just sitting here. What should I play?" Now, at that point where I've asked the assistant to give me that sort of information that previously would have been searchable, one of two things is going to happen. Number one is what you suggest, which is it gives me the actual best answer.
Bradley Metrock: [00:18:13] The other thing that could happen is that it ignores entirely whatever the actual best answer is and then gives me some advertised product placement. So if I if really the best game...you know my assistant knows me, it knows my context. You know it knows my searches I've made, it knows I've listened to baseball news before, and it knows I'm a sports guy. And it knows there's this new baseball game for Alexa that's on Amazon's platform that could be a perfect fit, but it doesn't tell me that that's what I should play for the next hour because instead this other company has a much worse, a very mediocre game, that they've paid to have that placement and be advertised instead.
Bradley Metrock: [00:19:05] So you know, I hear your hope for the vision that you've laid out. I just I hope that advertising doesn't get in the way.
Dr. Ahmed Bouzid: [00:19:20] I think Google is very well aware, or was very aware well aware of that from the get go. So their mission in their search engine is to give you the best results by using the algorithms they have. And it is and they are very adamant about ensuring that nobody any way can insinuate or hint at the fact or hint that they are promoting anything that is why they are being given money. So they clearly demarcate between ads and content that is delivered with the best intentions to deliver the best quality. Right? Curated content and so forth. So that ethos, the ethos of delivering quality to customers, I think is expected.
Dr. Ahmed Bouzid: [00:20:13] And as soon as suspicion arises that this assistant here is not getting the best result, is giving you a result that's sponsored, I think there'll be a backlash. I think these companies are mature enough not to go that way, because the user is sophisticated enough to understand the feel, or suspect that there is something that is not completely on the up and up as far as that is concerned. So I, again, this is completely my opinion. I think they will steer away from that. If they are going to do advertising, it is going to be clear advertising, and they will make sure that nobody suspects that they're delivering something not putting the interests of the customer best.
Bradley Metrock: [00:21:02] Yeah, I hope you're right. I certainly hope you're right. Let me shift gears a moment, and ask you another question.
Bradley Metrock: [00:21:11] So you're very involved with spreading the word about voice and voice-first technology and you're the co-founder of the Ubiquitous Voice Society. Let me ask you: if you were speaking to a group of high school seniors, or a group of college undergraduate seniors, or even folks in graduate school or or in some stage of education, and someone asked you "what skills do I need to be cultivating in my own personal skill set?" "What do I need to be learning to participate in this new computing era of voice technology that we're quickly finding ourselves in?"
Dr. Ahmed Bouzid: [00:22:06] Yeah, that's a great question and that's what motivated me to to launch this organization, because I believe that the voice and the conversational interface is going to pull into the fold lots of people who were excluded from, say, the mobile revolution and the social revolution perhaps and so forth. I think by virtue of it being language-centric - it's all about language, it's all about conversation, it's all about being able to craft prompts that are crisp and clean and clear - and where you're able to think about the customer or the user in the context of what is usually a human-to-human conversation. In this case, a human-to-computer conversation, but still, where the computer is acting like a human.
Dr. Ahmed Bouzid: [00:23:01] So people who have been in psychology, for example, or people who are anthropologists, who know how to study people in terms of their environment, because looking at two people talking to each other you can learn a lot about how to build and design a great conversation.
Dr. Ahmed Bouzid: [00:23:18] People who know how to write clearly and cleanly, who are good English majors, people who are in the theater who can or can write - playwrights - who are in, or think about, 'roles.' So you have a role - that's what the assistant is assuming - and that role has a character and has integrity and they say things in a certain way because they have a personality and having those sensitivities and sensibilities allow you to play a part in that collection of people who make a great skill.
Dr. Ahmed Bouzid: [00:24:01] You could be a field researcher, anthropologist, you could be a prompt crafter and UX designer or if you are good at English and you know how to convert thought into language. If you're a playwright you can be part of UX design. So what's exciting, really, is we can pull a lot of people in from the humanities and the social sciences, linguistics for example, that just didn't fit. I mean a linguist doesn't fit very well in, say, a mobile app development environment. Right? Because language is not that central. Somebody who is visual, who thinks about the visual interaction or understands colors and clashes and fitting colors and so forth, would fit there.
Dr. Ahmed Bouzid: [00:24:48] But somebody who is not a visual designer, or as somebody who is a designer of conversation, designer of language, crafter of language didn't have a spot there, and I think that's what makes me very excited. I'm very much a fan of the humanities and social sciences, and so it just makes me thrilled that I'm able to go and evangelize to a whole swath of very smart kids who now hopefully will be able to partake in the new revolution. You know, an immense revolution I think that awaits us.
Bradley Metrock: [00:25:22] That's awesome. So I appreciate you sharing that with us.
Bradley Metrock: [00:25:26] Let me address your other things you've got going on. You've got your second edition of "Don't Make Me Tap" coming out. When is that being released?
Dr. Ahmed Bouzid: [00:25:37] So we're finalizing the manuscript, the second version of it, the second edition of it, sometime end of this month, early next month. So probably sometime in early August, mid-August, you should be seeing it being available to those folks out there who would be interested in reading it.
Bradley Metrock: [00:25:56] And what is "Don't Make Me Tap" about?
Dr. Ahmed Bouzid: [00:25:59] So it's all about providing designers with the basics of how to design a great voice experience - a farfield one, eyes free, hands free - not one that is based on a mobile. So we believe - I believe - that the form factor, the natural form factor for voice, is something that you don't have to touch and look at. Something that you speak with, speak to, and so "Don't Make Me Tap" is a play on "don't make me think" which is a classic in design for the visual. Here, "Don't Make Me Tap" is "don't force me to tap on something, or type on something, or swipe on something." I just want to say something. And have you please respond to me quickly, succinctly, and let me go on with my life.
Bradley Metrock: [00:26:53] And that's available on Amazon, and the second edition will be available on Amazon?
Dr. Ahmed Bouzid: [00:26:57] Yes. First one is available today, and the second one definitely will be available sometime in mid-August.
Bradley Metrock: [00:27:03] Y'all need to go buy that. If you're listening to this, go buy that. Stop what you're doing and go buy that.
Bradley Metrock: [00:27:07] And then the other thing that we have not covered so far is the Witlingo Conference. So tell us what you're planning on doing with this Witlingo Conference.
Dr. Ahmed Bouzid: [00:27:17] Yes, so the conference is slated for early second week of January. It is going to happen in Salt Lake City. It's going to coincide with the Sundance Festival. And so we will be, in essence, as I was seeing the theme really is to pull in people from the social sciences, the humanities, and talk about voice but also talk about topics that are dear to my heart, which is what's happening to our society as a result of automation.
Dr. Ahmed Bouzid: [00:27:50] What has happened to our society as a result of the mobile app being everywhere, and the way it has, I believe, mangled the way we interact with each other. And what are the pluses and minuses of the new interface, called the conversational interface, which I'm hoping is going to give us some of that natural way of interacting with the world back, where you don't have to always be looking at something, and tapping on something, to do something. You're able to read a book, and ask for the time, without having to stop and look at your your smartphone and then find out that you have a text and see a message there, and before you know it, you're one hour into your Facebook and forgot completely about your book.
Dr. Ahmed Bouzid: [00:28:34] So the conference is really all about about voice but also about AI and about the adjacent issues of automation and labor displacement...there's a lot of interesting stuff. The program, we'll be announcing at the end of June, so you'll have the program published on our website and I'll make sure you know it's on social media and I'll let you know as well so you can help spread the word.
Bradley Metrock: [00:28:58] Sure. Yeah. And for folks who, you know, people can see you there, or people can see you keynote the Alexa Conference, either way and have access to your expertise. That's exciting...and you've got a lot going on.
Dr. Ahmed Bouzid: [00:29:10] Thank you sir. You too. Again, I want to congratulate you. This initiative is fantastic. I'm really thrilled and I'm going to be looking forward to listening to future podcasts as you invite smart people who'll be talking about this emerging space.
Bradley Metrock: [00:29:30] We need as many smart people as I can find because I'm just not that smart. It's interesting to see all the different companies coming into the space, and doing different things, and what you're doing with Witlingo is something people should pay attention to.
Bradley Metrock: [00:29:48] Dr. Bouzid, thank you very much for your time and your generosity today. And, until next time.
[exit music]