© 2017-2019 VoiceFirst.FM, a division of Score Publishing

Feedback? Email VoiceFirstFM@gmail.com.

Artificial Intelligence - Episode 3

Host: Bradley Metrock (CEO, Score Publishing)

Guest: Peter Cahill (CEO, Voysis)

Duration: 28 minutes, 9 seconds

Podcast Links:

Apple Podcasts

Google Play Music

SoundCloud

Stitcher Radio

TuneIn

YouTube (+ closed captioning)

In this debut episode, host Bradley Metrock (CEO, Score Publishing, and host of VoiceFirst.FM podcasts This Week In Voice and The VoiceFirst Roundtable) interviews Joshua Montgomery, CEO of Mycroft AI. The fascinating discussion spans the origin of the company, how the company got its name, Montgomery's views on whether AI is an existential threat to humanity, and the need for an ecosystem of smaller companies to check the power of larger ones.

Transcript:

[intro music]​

 

Bradley Metrock: [00:00:13] Hi, and welcome back to Artificial Intelligence, Episode 3, for October 2017. My name is Bradley Metrock - I'm CEO of a company called Score Publishing, based here in Nashville, Tennessee.

 

Bradley Metrock: [00:00:29] Our sponsor for this podcast is VoiceXP, blazing the trail in voice technology. VoiceXP is taking the lead in developing Alexa skills for some of the best brands in the world. With VoiceXP, all you have to do is say it, to revolutionize your marketing strategy. I encourage you to go and check them out at www.VoiceXP.com.

 

Bradley Metrock: [00:00:52] We are very thrilled to have Dr. Peter Cahill with us today from Voysis - Peter, say hello!

 

Peter Cahill: [00:01:00] Hi, Bradley! How are you?

 

Bradley Metrock: [00:01:02] I'm doing fine, Peter. So I got your name right, I assume?

 

Peter Cahill: [00:01:06] Yes. Perfect.

 

Bradley Metrock: [00:01:07] Excellent, excellent. It wouldn't be the first thing that I have messed up, but thank you very, very much for joining us today, and sharing some of your insight and experience and expertise with us.

 

Peter Cahill: [00:01:17] Thanks. Looking forward to it.

 

Bradley Metrock: [00:01:19] Sure. So let's start with you. You've got a rich background in language, in voice, and in speech. Share with me and the audience about your background and your educational background and your professional background and what led you up to starting Voysis.

 

Peter Cahill: [00:01:40] I started working in, more so studying voice technology back in 2002. So 15 years ago I think....so I studied computer science, just finishing up my undergraduate degree. And I think at the time then what attracted me to voice was that it's a hard problem in comparison to a lot of the typical software engineering challenges that you would encounter while in universities. So if you had to use the database for something or build a website, there's very little technology risk there. The engineering challenges with it are much more about architecture and building out something. But at the end of the day you always know that it was possible to do so. Whereas in the case of voice technologies....you could build a very robotic sounding voice or you could build a relatively inaccurate speech recognition system but at that time nobody had built highly accurate systems, not large teams and not small teams. I think that's what attracted me to it most. And so after I finished my undergraduate degree, I ended up going down the academic route and completing a PhD. And then after a fairly short post-doc I became faculty at a university in Ireland and while I was faculty, I and six other academics here in Ireland attracted a fairly significant funding round which was about 17 million euros at the time. That helped me broaden my experience of voice and AI technology considerably and because up until that point I'd mostly worked on text-to-speech....Once that happened and the project we worked on speech recognition....we still did text-to-speak. There was a very large team working on machine translation. There were teams working on many other topics and subtopics of language technologies. And after working on that some years, within academic circles I got elected to chair an academic group called Sensic(?) which is the global speech synthesis academic group. Anyone who works in speech synthesis, academia or industry, would all be members of that group, and that group organizes annual conferences. Around that time, this would have been 2010, maybe 2011, we started to see deep learning in products. So deep learning was pretty popular in research labs at the time and some people had prototypes of deep learning based voice recognition systems. And so we could see from that it was quite clear that it was going to be very transformative for the industry in general. And I think at that time then we also saw the likes of Google, Amazon, Apple, Microsoft, and all of them went on acquisition sprees, and they acquired pretty much any of the companies that existed out there in the space which had a solid team behind them and pretty much all of them got acquired around that time frame. At that time I was looking at the landscape saying that there were no smaller companies to challenge these bigger guys anymore.

 

Peter Cahill: [00:04:31] And at the same time it was quite clear from my connections in all these companies that it was quite clear that they were working on making their own platforms better. While there was this kind of consolidation, nobody was really coming to market making voice technologies available. So a good example is even the Amazon Alexa, where there are skills available today, and you can add your own skills very easily....but you're still kind of limited in terms of what a skill can actually do. And so that motivated me to set up voice as an independent complete voice AI platform which could not just make the technologies available for everyone else and but also make it easy for people to actually use their products.

 

Bradley Metrock: [00:05:09] Interesting. Thank you for all of that insight. Obviously this sector and this nexus of voice technology and artificial intelligence is deep in your blood. Give me the 30, 60 second elevator pitch and you were starting to get into it just there at the end of that. Give me the 30 to 60 second elevator pitch of "I am with company X...", let's say I'm with Delta Airlines and I want to have a voice assistant I come to you I ask you what do you do....and you tell me what?

 

Peter Cahill: [00:05:40] We build out solutions for particular verticals where we see if voice can add a lot of value and the verticals are ready for them. So our primary focus right now is on e-commerce, so not quite travel. But in the case of either what we would do is we make an automated platform available online to eight guys(?). Where any company within that vertical could just push some of their data into our service and behind the scenes then that will trigger all of the AI models required be it voice recognition, language understanding, and dialogue conversational-type technologies. All of them will be either tuned towards the company's data, or trained on the company's data. And so what the company actually gets access to then, are our APIs which are not generic cloud APIs, but instead are APIs that have been trained based on their data, designed for their use case, and so they end up with something that's very straightforward to integrate. But it's also highly accurate.

 

Bradley Metrock: [00:06:35] I'm fascinated by your approach. And let's just roll with the Delta Airlines example for a minute. But let's just assume we're talking about their web portal where you go and buy a ticket for this reason, that reason, or the other when Amazon Alexa is not sufficient, or Google's assistant is not sufficient. We want our own voice assistant and we want voices to help us create that. Give me some insight into the stuff that you're looking for in the data sets of the customer. So what are you looking for? I won't put words in your mouth. You tell me what y'all look for in the data sets as you get started creating a voice assistant or you know an artificial intelligence backed voice technology for a client.

 

Peter Cahill: [00:07:27] Sure. So I think at a very high level we see that once you can define the use case, and generally modern AI technologies can work extremely well if you have the use case and the problem that you're asking them to solve is reasonably well-defined. And in the case of something like travel, regardless of what airline you may interact with, the general use case is the same; that you'll have various tasks the user is going to want to use when they interact with an airline. So the obvious ones are probably to check prices on tickets at certain dates or certain times. The user may then want to follow up about the availability of flights. So, for example, maybe I checked how much is a flight to New York next week. Then maybe I want to follow up with it and say if I was to depart around 1:00 p.m., how much would it be then, or even just to check the availability of the flights. Maybe I want to constrain it to certain airlines. And so there's a whole bunch of different parameters I can provide as a consumer of data technology in terms of what date I want to fly, what airlines I want to fly with,...what class do I want to sit in, even like my seating preference and so that's kind of one category which is kind of close to discovery and search. And at the same time then you also have another side of it which is closer to customer service where maybe I've bought the flight previously and I want to query certain details about that flight. What is the date of my flight? Can I check in online? Are there restrictions on the weight of my luggage? How many bags can I check in? And these types of queries-regardless of the airline in question, it's the same set of queries that are valid for pretty much all of them. And so that's kind of our motivation to focus on a particular vertical because it's possible to define it well, it's possible to build a highly accurate fully automated solution around it. And we can get the information from the airline itself. And that's useful in many cases it's the names of the airports they fly into. You know, traditionally most modern language technologies can struggle when dealing with the names of places and you would have witnessed that in the various iterations of voice space navigation systems and the streets and cities can be quite difficult. And if the airline can provide us with the names of the airports to fly into and that helps us adjust our models in a way as us all the airports they fly into are integrated into the system.

 

Bradley Metrock: [00:09:59] The magic of being able to integrate data into a product like what Voysis offers, it's not so much having a gigantic data set. Everybody's got gigantic data set. It's the magic of knowing what things to exclude.

 

Peter Cahill: [00:10:14] Absolutely. Yeah. So when we build these systems it's not that we just use data off the customer. We have enormous amounts of data ourselves that we combine with the data they provide. And so we don't expect large data sets of the customer and we also never expected to have audio recordings of the data sets either. So they just need to provide the text and be at their airports or give them their products or whatever.

 

Bradley Metrock: [00:10:36] Sure that's great. So Voysis raised eight million dollars earlier this year. Congratulations on that.

 

Peter Cahill: [00:10:45] Thank you.

 

Bradley Metrock: [00:10:46] So what is that allowing you to do, what are Voysis' short term goals toward the end of 2017, heading into 2018?

 

Peter Cahill: [00:10:58] So we used most of our capital to grow the team. So as you can imagine for the fairly broad range of voice and natural language technologies that we've developed and because we tune(?) all the models for our domain it requires us to have all of the technologies in house. We don't depend on some generic black box cloud service that's outside of our contr. Ol. The full stack of technology is something that the company developed. And so naturally with that we need to have a fairly sizable team and in practice I think a constantly growing team to accommodate the requirements we have there. A lot of the 8 million has been used to scale the size of the team and the company, in addition to building out our brand further and engaging more partners at a single point in time. And later this year we're releasing our product called Voysis Commerce, which is a fully automated solution for any e-commerce company that wants to create a voice AI that can talk to their customers about their products, help people find products, order products, etc. And it's a fully automated solution so when anyone signs up and pushes their product out into the service, in a matter of hours, they have this voice AI available that they can integrate into their products.

 

Bradley Metrock: [00:12:14] Wow. Yeah that's cool.

 

Peter Cahill: [00:12:16] We are looking forward to getting that out the door.

 

Bradley Metrock: [00:12:18] I bet you are. Oh that's great. And you said that's coming before the end of the year?

 

Peter Cahill: [00:12:23] Yep, absolutely.

 

Bradley Metrock: [00:12:25] Interesting. Very cool. So I think it's a reality of this sector, this voice technology, artificial intelligence, machine learning universe that you've got your big players as you mentioned earlier your Amazon, your Microsoft, your Apple, your Samsung, all of those who are creating mainstream experiences. But then you've got a lot of other players and I'm sure Voysis fits into this as well that the insights that you provide and the experience that you provide for your clients' customers is one thing but then another part of the value proposition that you bring to the table is enhanced privacy and security. Can you talk to me about that a little bit?

 

Peter Cahill: [00:13:12] Yes sure. So there are a lot of people who do have concerns about how their data is being used, rightfully so I think. And in the case of Voysis, later next year, I think it's in May next year, there are some new regulations in Europe which are very, very strict on what companies can do with data. It's called GDPR, and even when we launch Voysis Commerce this year we're already fully compliant with that and it's a very strict set of regulations about how the customer's data is handled. And so with that, anyone who's using any Voysis-based solution can directly log into Voysis or maybe even through the companies that they are directly speaking to and request for all of their data to be deleted or to view all of the data that a company has. And so the end-user gets full control and full visibility into the data that's from them.

 

Bradley Metrock: [00:14:03] Yeah, that's very appealing. So that's legislation that's on the books now in Europe?

 

Peter Cahill: [00:14:08] It's already fully defined. And Europe allowed some time for companies to become compliant. We're already fully compliant with that and I do think in some cases you know this data can help companies understand their customers better and build better products as a result, but naturally there is also a threshold with it and you want to respect people's privacy too. I think in e-commerce we did encounter a fairly interesting scenario here because one of the features of voice interfaces is that they're unconstrained. If you consider how much effort companies put into analyzing where the users click on a screen and even for example in e-commerce, what products are people looking at, how much do people scroll, or do they click on the screen, these types of things. But it's all very constrained because of the visual interface, whereas in the world of voice and particularly with something like Voysis Commerce where consumers can just directly tell a mobile app what type of products they're looking for. That type of information can actually be genuinely very useful for the retailers, because they can see if someone maybe does a search for a certain product, maybe then they click on a certain product, and then they actually ask it, "Oh I want this but a different color", or "I want a product like this that's a bit bigger." And so I think there's a whole world of possibilities for what analysis of voice interactions may be able to do in the future.

 

Peter Cahill: [00:15:32] We're not doing it initially with Voysis Commerce but I do think in terms of the value that these types of interface provide helping companies build better products and deliver a better experience is probably in everyone's interest. But the need for privacy I think is critical. It'a just finding the right balance.

 

Bradley Metrock: [00:15:50] Sure. I absolutely agree. And for the voice sector, really the charge has been led by Amazon in many ways here in the U.S. but it's on the rise everywhere and as you know and I know it's here to stay, it's not some sort of fad or whatever. This is a permanent shift that we're experiencing. And it's all happening so fast that we've really opened a Pandora's Box of ethical concerns around privacy that I don't think a lot of people are thinking about. So then it becomes paramount just for a company like yours to advocate for the end-user, and think about stuff that they're not thinking about themselves.

 

Peter Cahill: [00:16:38] Yeah, I think so. And in the case of what we're doing, if we integrate into a mobile app or on a Web site, we're not always listening which is something that has bigger privacy concerns, so if you take something like an Amazon Echo or a Google Home and then those devices are always listening to people in their home.

 

Bradley Metrock: [00:17:00] Sure.

 

Peter Cahill: [00:17:00] And I think the privacy concerns are much greater there. And I did hear at an academic conference recently that one of the big platform companies had done a study, because these devices are always listening and they're not just listening to voice, they listen to everything because it's just a microphone that records audio and it's quite straightforward to identify what TV channel somebody is watching. And because the device can actually hear the audio from the TV. Also it was possible to identify who is in any room at any point in time because the microphone could hear the footsteps and the timing of them and people apparently have quite distinct footsteps. And so I think there's a whole range of privacy around those types of devices and that isn't really being talked about that much right now.

 

Bradley Metrock: [00:17:48] Well yeah and you're touching on something that came up on an earlier episode, I think on the VoiceFirst Roundtable, one of our VoiceFirst FM shows, the idea actually came up on the Voice of Health Care as well. This idea was about smart speakers. And this just sort of is a great example of the duality of this technology.

 

Bradley Metrock: [00:18:14] Smart speakers can serve as fantastic security devices for the home because you give them a week or some amount of days and they're going to determine the average volume in the house, they're going to ascertain that every day from about 6:42 p.m. to 7:15 there's a TV that's on that's 50 feet away, a mailman shows up at 1:00 in the afternoon. They don't just listen, but the capability to document is also there. And so that's incredibly powerful. Like in the case of a senior citizen or somebody who you know has trouble looking after themselves if the Echo device, just to use that as an example, if the Echo device in that senior citizen's house does not hear the bathroom being used 2.3 times a day and it goes for three hours without hearing any activity in the hall. Well, there may be a problem. That person may be collapsed on the floor or something. So if you take that in all of its potential to serve the human race and then you look at it from the other direction of all the potential to betray the human race then it becomes a fascinating discussion.

 

Peter Cahill: [00:19:46] And I think it ultimately boils down to, can you trust the providers of those devices? It's inevitable that someday in the future there will be more transparent devices and right now all of the devices are very, very closed platforms, even if you may be able to add in some basic commands on any of them but other than that they're very, very closed platforms. There's no transparency to see. And if....by accident people don't really have any visibility or any idea of how frequently does that occur....and what exactly is being tracked in the audio recordings.

 

Bradley Metrock: [00:20:22] Let me ask you two more things then we'll wrap up, both of which are related to news stories that we reported on This Week in Voice.

 

Bradley Metrock: [00:20:29] The first one is that smart speaker purchases are up 300 percent in 2017. I think the number is 24 and a half million that's projected at this point for smart speakers sold across the world.

 

Bradley Metrock: [00:20:44] And I want to ask you, I saw somebody, I couldn't remember who the other day, and I still can't remember it now, but some analysts were discussing the fact that he thought smart speakers were going to increase exponentially in terms of sales over the next two years. And then they're almost going to vanish just as quickly because they're going to get subsumed into other existing technology like appliances or other existing hardware in the home or the office. Do you agree with that and what do you think about the rise of smart speakers?

 

Peter Cahill: [00:21:16] I could imagine that smart speakers will become less intrusive. Many of the devices to date have been fairly big and bulky but I think it's hard to see them being integrated that much into external appliances because you're still going to need that microphone somewhere. And if it's in a small non-intrusive device ideally somewhere where you don't even have to locate it, in a visible place, I could see it existing on its own. I think to date, from the original Amazon Echo only came out just over two years ago and already the microphone technology being used in these devices has changed fairly dramatically from that to now and even in the past year the mix of technology has changed quite dramatically and I would say particularly, if the microphone technology keeps improving, then it's going to keep justifying itself for people to have these devices. And although maybe ones more like the Amazon Echo Dot and the Google Home Mini, may be the more popular devices because they're just smaller and less intrusive.

 

Peter Cahill: [00:22:12] And I don't see an advantage in adding that into the next appliance if it makes that appliance less upgrade-able in the future because you can buy these microphone devices for $40 at the moment.

 

Bradley Metrock: [00:22:24] The other thing I wanted to ask you about, there is an article from earlier from last week that Brett Leary did with your colleague Eric Bisceglia, am I pronouncing that right? And this is a great article. We're going to link to it in the show notes. One of the things that this article contemplates is, what is the time frame that a retailer has to embrace voice technology before they sort of really start to shoot themselves in the foot? What is your perspective on that?

 

Peter Cahill: [00:23:03] I think in the world of e-commerce, a lot of the retailers do seem quite progressive and they do want to move forward. But voice is different from probably any of the technologies that are used today. It takes a bit of trial and error to build a good voice interface. So even just from the user experience point of view alone, where the best place is to place a microphone icon, how do you teach a customer, what are the capabilities of this microphone icon? How do you tell the customer vendors about extra features or some functionality that may help them? For retailers, there's a lot of unknowns still. And now is the time to start figuring it out because things are moving very, very fast and probably the best example of that is two years ago, when nobody knew that they needed a smart speaker. Now look at how many people have them and they're not only increasing, they're increasing at an incredible speed. You know a customer would accept an unpolished experience now and the voice interface will still save them 10x the amount of time. It does provide enough value but you really will see it at certain retailers who will start polishing their voice experiences pretty soon and then the expectations will rise of it.

 

Bradley Metrock: [00:24:19] So your message to retailers is they need to go ahead and get on this now, so they can, if not for any other reason, start learning the ropes.

 

Peter Cahill: [00:24:28] Oh, absolutely, because I think in another 12 months it will be too late to start thinking about it.

 

Bradley Metrock: [00:24:34] Yeah, I agree with that. I think 2018 is going to be a transformative year and you guys at Voysis are pretty well-positioned to take advantage of it. Peter, thank you very, very much for setting your time aside to talk with me and share your wealth of experience with the audience.

 

Peter Cahill: [00:24:50] Thank you. Enjoyed it.

 

Bradley Metrock: [00:24:51] For Artificial Intelligence, thank you for listening. Until next time. 

[exit music]