top of page

The VoiceFirst Roundtable - Episode 3


Host: Bradley Metrock (CEO, Score Publishing)

Guests: Mark Webster, CEO; and Scott Werner, CTO, Sayspring

Duration: 30 minutes, 12 seconds

Podcast Links:

Apple Podcasts

Google Play Music



Stitcher Radio

YouTube (+ closed captioning)



Bradley Metrock: [00:00:00] In the third episode of The VoiceFirst Roundtable, I interview Mark Webster and Scott Werner of Sayspring. We talk about the company's origins, about voice design, about the Echo Show, and at the end of the episode we even talk basketball, as Mark Webster spent time working for the NBA earlier in his career.


Bradley Metrock: [00:00:24] Hi, and welcome to the third episode of The VoiceFirst Roundtable. The purpose of this show is to examine all aspects of voice technology and voice-first technology as it starts to get off the ground. Our sponsors are Fourthcast: Fourthcast turns your podcast into an Alexa skill. Get started today at


Bradley Metrock: [00:00:52] Our other sponsor is The Alexa Conference, the annual gathering of Alexa developers and enthusiasts. Learn more and get registered at


Bradley Metrock: [00:01:03] We're very fortunate today to have as our guests Mark Webster and Scott Werner of Sayspring. Guys, thank you very much for joining us. Sayspring is a very interesting young company that solves a specific problem in this growing voice-first industry / sector. And we'll get into that in just a minute. But first, Mark and Scott, let's take a moment to have you explain your background a bit - where you came from, and what led you to start Sayspring.


Mark Webster: [00:01:48] Sure. My background is in product design, which I've been doing for close to 20 years. I've worked in large and small companies. Started my career with the National Basketball Association, then worked at a few different startups. In 2011 I launched my own startup called SideTour, which was a marketplace for activities very similar to what Airbnb is doing with their experiences business. Then I ran the accelerator company TechStar here in New York, and raised money from some great investment firms. In 2013 we ended up being acquired by Groupon, where I was then director of products for our division for two years - then decided to leave to explore what was going to be the next thing. I actually spent a lot of time looking at senior care, which was a pretty exciting space, and through that landed on voice sets and interface and where voice was going. And so, got working on Alexa projects.


Mark Webster: [00:02:58] In the world of web and mobile, there's a design process and a product process that I'm used to; we have toolsets for how we design, how we prototype, how we wireframe. And it became pretty apparent when I started working with voice sets as a medium that teams were going to need those same tools in the world of voice design. So that ultimately led to building up Sayspring. It sort of feels like the business found me while I was looking for a different business.


Bradley Metrock: [00:03:32] Funny how that works.


Mark Webster: [00:03:32] It is! I think a lot of second-time entrepreneurs have the experience of leaving to work on the next big thing and it ends up being pretty unexpected.


Bradley Metrock: [00:03:47] That is exactly how I would describe VoiceFirst.FM - we're very involved in publishing, and sort of on the vanguard of that with interactive books and interactive content, and this all this voice stuff somehow found us. So you're preaching to the choir.


Mark Webster: [00:04:08] Yeah! It's funny how ideas find you! So we got working on what became Sayspring, and Scott - who will talk a bit about his background in a moment - was our second engineer at that last startup, SideTour, and we had the opportunity to work together there and also at Groupon. We have a very similar belief in the process that leads to great products. So I was pretty excited when there was an opportunity for him to come on as CTO to help drive this business. I'll let Scott talk a bit more about himself.


Scott Werner: [00:04:43] Thanks, Mark. I started my career at a design agency, and pretty much since then have been working with Mark, as he mentioned, first at SideTour and then at Groupon. I've had a lot of experience seeing the different ways designers work, and seeing the whole product process from ideation all the way to getting it into the engineer's hands and the products getting built. The agency does a lot of the prototyping, the click-throughs, those kinds of things that you present to clients. And I think back a lot to that when we talk through the product meetings we’re working on here since it's a big part of the agency work. But then also at SideTour and Groupon, working with Mark and seeing this process - it was a no-brainer when we started talking about the product. It's the part where you figure out what you're building BEFORE you have the engineering capability, because the last thing you want is to spend a whole bunch of time in engineering only to find out that the thing isn't actually what customers want, or that customers can't even figure out how to use it. So when Mike brought this idea to me I was sold pretty much immediately.


Bradley Metrock: [00:06:13] Nice. So let's talk about Sayspring and take the vantage point of someone who has never developed a piece of software at all - like me. I'm technologically savvy and the day is going to come when we develop our own Alexa skill or voice application; we've got some things that we want to do. But from the standpoint of somebody who has never developed a piece of software at all, who's vaguely familiar with the process that one might go through, familiar with the growing ecosystem of voice hardware - walk us through the problem that Sayspring solves within the larger context of what developers have to do to bring a voice application to life.


Mark Webster: [00:07:08] Sure. I think a great way to conceptualize what's involved in product design and development is to just imagine a house: imagine that I have a family and I want to build a brand-new house. My requirements are a certain amount of bedrooms for the amount of people that are in my family. So I sit down with an architect and I tell the requirements: we need a five-bedroom house and we want to live in this general area. The architect starts asking questions about what I want out of the house and what our preferences are. Then the architect will start to sketch out what that house looks like, and that design process would end with fully fleshed-out blueprints of what we're going to build. And only once you have those blueprints do you actually start construction of a house. You would never say "I want to build a five-bedroom house" and start pouring a foundation. So the idea is that every design process should walk through defining what the requirements are, what you're trying to accomplish, answering detailed questions. What finishes are going to be on this house? What appliances are going to be put in there? You go through that process, and then when all of the big questions have been answered, you start construction. Software works exactly the same way. If I'm building a mobile app I should go through wireframing it, deciding what the main features are going to be, then put together what would be called high-fidelity design pops where I can actually see what each screen of that mobile application will look like. I would then put it into prototyping software, like Vision or Flinto, where it simulates the interactivity that you'd see in a mobile application. In the world of mobile there's a lot of user experience convention that we're all used to: you pull down to refresh, you swipe to move ahead. So you want the design process experience to be as close to what the final product will be before you code anything. And only once you've put all those prototypes in front of users do you actually begin development.


Mark Webster: [00:09:31] Voice is the same exact thing. We need to design what that user experience is going to be - what you're trying to accomplish with your voice application or your Alexa skill. What are all the commands that a user could say to it? What are all the different variants of the ways that they could ask the same thing? And then what's all the speech that's going to be part of that response back to the user, that is on-brand and on-message for my company or brand, that properly prompts the user to know what to do next? We need to go through that process before we start coding anything. And where a lot of voice creation is happening now is that it's basically starting with development, and that's extremely expensive. It's extremely slow. You end up iterating in the actual development and creation of it, which is NOT the place to do it; it should be done where it's cheap and easy in the design process. And so, Sayspring is all about empowering designers ... empowering product people ... empowering non-technical users to figure out what is this voice application going to be and do, and what does it feel like, before you invest the resources to actually build it.


Bradley Metrock: [00:10:47] That's a great description. So … you guys just a month or two ago closed a funding round. It appears the market likes what you're doing. You've got a portfolio of companies that are working with you, so it looks as if industry likes what you're doing. Share with us a bit of the feedback you've received on Sayspring thus far, positive and negative, as well any challenges you've encountered.


Mark Webster: [00:11:20] Sure. I actually started working on what became Sayspring last February, so it's been a year-and-a-half-long journey for me so far. Going through building a startup the second time around, I really wanted to make sure that what we were creating was helpful for users and the industry before it was something that investors would get excited about. We spent about a year building the first version and we launched in December, and immediately saw traction and saw companies getting excited about what we were building. We used that to ultimately fundraise on top of it, and we just closed our financing about four weeks ago. So now we're in the midst of building our team.


Mark Webster: [00:12:11] It's been great to see the reception that our tool has gotten. It really has brought designers and User Experience (UX) professionals into the conversation, which they hadn't really been part of prior to that. If you're not using Sayspring you tend to do voice design in Google Docs and with flowcharts, and that tends not to be very interactive. So bringing the medium of voice to the design community is something that designers are excited about. It's been exciting to see who's using our product. There are people we wouldn't necessarily have guessed. Financial services is becoming a surprisingly big category for us, both mutual funds in the retirement space as well as retail banking. A lot of financial services companies are looking at how to extend their relationship with the customer into people's homes.


Mark Webster: [00:13:15] Some of the challenges we've come up against are really about getting people to understand what voice design is, and bringing the best practices of product process into voice. Helping people figure out that you should treat the conversations that happened with your application almost separate from the technical construction has been the place that there's a lot of ground for us to help break here. We've gone from a world where there are maybe a thousand or fewer voice designers buried in companies like Nuance who know how to do this really well -- to a world where there will be tens of thousands and hundreds of thousands of people creating voice applications. And we're basically helping bring them into that space. It's challenging but super exciting. We're on pretty fast-moving ground here almost every day. There are changes from Amazon or Google about what technology is out there, what devices are out there, so we constantly have to catch up.


Bradley Metrock: [00:14:28] You got that right! And I'll transition into what I wanted to ask you about as well: the Echo Show. I got my Echo Show in the mail yesterday and spent some time playing with it. And I don't mean this negatively: it feels like we're back to ground zero again because it's such a different device. It almost feels like alien technology. It feels like this is an untapped blue ocean where you can only begin to imagine what people will create for this form factor of voice assistant + touch screen + camera. To your point, the ground is shifting every day. I had to sit there a moment and just marvel at what Amazon's managed to do. And it makes sense to me that they're not out there advertising this thing on TV even though they have advertised the heck out of the Echo itself. Evidently there there was a decision made to let this thing land and generate some feedback before they make that push. Have you had a chance to play with the Echo Show? Share your thoughts on that, and how you want to integrate Sayspring into a world where voice all of a sudden means you have a screen.


Mark Webster: [00:16:18] Yeah! The Echo Show is an incredible device. Amazon is both an incredible and a terrifying company in terms of how fast they're moving and how fast they're changing everything. I see two big shifts happening. One is this idea of a voice-only device. The introduction of Echo and Dot was a pretty transformative move, and what's crazy is to see how fast it's been adopted. Sure, they're advertising it now, and had it in a SuperBowl commercial - but when it first came out, it was basically under the radar, only available for Prime customers. It only became widely available July two years ago - and it's already in something like 10-12% of U.S. households. It took the PC about 10 years to get to that level of penetration. To have a device adopted so quickly by a mass audience is incredible. And then to introduce the Show - bringing screen to it as well as the idea of multi-modality - is exciting. And I think it paints an interesting vision of the future and brings up a lot of questions, especially around Echo users and Echo Show users: does the screen become a requirement for certain experiences? Or does it become complementary? When we design voice experiences, especially on the product side, it's very common for us to look at what you do on a mobile phone and then think about how to do it on an Amazon Echo instead of thinking about how much we get done with just conversation with other people. For example, purchasing concert tickets: before we all did that on a computer, we'd call a ticket broker, go down to a video store with a Ticketmaster outlet to buy those tickets. And that was a voice-driven experience. Then when we got to buy concert tickets by computer, we got to look at seat maps, which have gotten better and better, and now you can see a view of the venue. So ticket purchasing is something that doesn't REQUIRE a visual element for the transaction to happen - but it makes it much better.


Mark Webster: [00:18:49] It's not hard to see how any ticket company could create an Amazon Echo skill that lets you purchase a ticket. But Show users also get the benefit of seeing a seat map and seeing different views and deciding how that that complements the experience. It'll be interesting to see how product designers determine whether a screen is a requirement or a complement for some of these different experiences.


Mark Webster: [00:19:17] Vision of the future: our kids are not going to grow up with TV remotes. Our kids are going to grow up asking to change the channel. So I think the Show is painting this vision of seeing a screen you talk to that to does the thing you want it to and is still hands-free. And I think that's going to drive a lot of different kinds of experiences across entertainment, like ordering food at McDonald's or going to an ATM machine. I think that sort of multi-modality and how they jump back and forth is super interesting.


Bradley Metrock: [00:19:57] Not only do I not think kids will grow up with remote - I don't think they're going to grow up with anywhere between 50 to 100% of the devices that we've got right this moment. The controversial one is the smartphone. I've got a theory that one of Amazon's dream objectives is to take a big eraser and erase iPhone dominance by using Alexa in some device that doesn't necessarily exist now. And it dates back to what an absolute disaster the Fire Phone was. I don't think Amazon has given up on the idea of destroying Apple's main source of revenue ... but that's probably another subject for another time. Going back a step further, let me ask either one of you: what is the definition of voice design, and what is it trying to accomplish?


Mark Webster: [00:21:07] I could say that almost every organization in the world does voice design currently; they just call it "training." You have retail salespeople interact with customers; you have inside sales people who pick up the phone and call customers; you have lots of different people across lots of different companies who interact with customers. So there's training about how to be helpful, how to deliver an experience that the customer will be happy with, and that represents your company's values and brand appropriately. The difference is we're just training other human beings how to do that. In a world of voice assistance and voice interfaces, all those same needs will now be transitioned to the voice assistance and the software and these platforms. So the idea of voice design - for instance, how do you greet somebody when they walk into a store? what do you say when they call your 800 number? how does an airline manage a customer calling with an issue? - all the different pieces of how you interact with people through an audio experience is already a big part of lots of people's business. As we move to do the design for these platforms, we need a set of tools to help people interact with those platforms because that's ultimately where your customer will interact. So the idea behind Sayspring is: how do we remove technical limitations and technical barriers to working with voice as a medium to empower companies, to empower designers, to build out experiences that will make their customers happy the same way they train all their frontline customer support and salespeople to make customers happy now.


Bradley Metrock: [00:23:05] Interesting. I wanted to ask that because on this show we attempt to deal with a lot of the bigger issues ... and just simply asking "what is voice design?" gets an interesting perspective that laymen just have no understanding for. But but they soon will.


Mark Webster: [00:23:26] Let me just jump in here. I think intuitively we all know what good voice design is, because conversation and voice are the foundation of how we all communicate with one another, right? And I think there's a reason our having a certain expectation. Let's say you sit down at Applebee's - your server greets you in a certain way, and you expect them to follow up at certain points throughout the meal - a lot of that interaction is actually a function of voice design. And that's why those experiences are relatively similar across all those places you visit. When you walk into a retail store and you need help - I think there's a certain amount of training expectation that consumers have of what will be there. And deep down as people, we have expectations about how we're understood, how the words I choose and the tone of my voice will indicate whether I'm happy or unhappy. There are expectations that we have of other people how we will be understood. And I think that's why frustration levels with voice interfaces can be so high when you're not understood, because deep down we expect to be understood when we talk to another human being. When you think about voice design, think about all the interactions that you have with people, all the expectations you have of them about how they will deal with you, all the expectations they have of how you will help communicate what it is that they can be helpful with - and then just think about say putting that into a voice interface.


Mark Webster: [00:25:05] And I think that's going to completely change how we interact with brands. Calling an airline can be a terrible experience when you go through a phone tree ... but if you had the personal phone number for somebody at the airline who knows everything about you, who remembers every conversation they had with you, and you didn't have to explain anything or provide your PIN or card number and all that ... it would be an amazing experience! And now voice assistance is going to let us all have experiences like that every time we call a company.


Bradley Metrock: [00:25:40] Well, that just means it's a much quicker conversation for them to tell me things I don't want to hear. But your point is well taken. Having grown up in the South, it's fascinating to think about voice design. Any Southerner can tell you: we can use the same words and mean two totally different things, depending on intonation, on context, or on facial expression or body expression. Southerners like to be polite in speech, but that doesn't mean we agree. And this is just one example in about a million of how psychologists and this emerging field of conversational design will shape everybody's world from here on out. It's fascinating to think about, and something I personally find very interesting.


Mark Webster: [00:27:03] That's a great point. Or even how geography and region and culture and age can all influence how people communicate. Human communication is probably one of the most complicated things to deal with when it comes to building an interface. Our belief at Sayspring is that the adoption of voice interfaces will quickly become a design problem more than a technology problem. The technology for voice has become excellent in the last two years and now our view is that it's really more of a design challenge. So that's why we're building the tools to bring more people to the table to help solve these problems.


Bradley Metrock: [00:27:47] Well, the work you're doing is fantastic. Before we go, since I'm a big basketball fan, let me ask Mark: you said that you started your career with the NBA?


Mark Webster: [00:28:00] I did, at the National Basketball Association.


Bradley Metrock: [00:28:02] Then can you please pass along a message to those folks? I'm tired of the Golden State Warriors winning, and I'm ready for LeBron James to continue his unadulterated run of championships - not just finals appearances, but championships.


Mark Webster: [00:28:18] I worked at the NBA during the Lakers dynasty which came right after the Bulls dynasty. Everything gets broken up eventually.


Bradley Metrock: [00:28:29] This is true. And - talk about voice being conversation-designed! - Steph Curry is saying all the right stuff but it doesn't look like he's entirely thrilled, because he's never going to win an MVP again on that roster, nor are any of those players. It'll be somebody on a team that doesn't have as much talent.


Mark Webster: [00:28:58] Sure. Nothing lasts forever, and the tides always change in the NBA.


Bradley Metrock: [00:29:04] We can only hope that it doesn't involve LeBron joining the Warriors. Anyway - Mark and Scott, we very much appreciate your spending time with us today. For folks who want to learn more about Sayspring, of course they can go to - but what's the best way to contact either one of the two of you?


Scott Werner: [00:29:26] Our e-mails are on the site: ... We also have a widget on the site to send us questions in realtime.


Bradley Metrock: [00:29:40] Excellent. Again, guys, we greatly appreciate you. Thank you very much. And for the third episode of The VoiceFirst Roundtable - thank you for listening. And until next time.

bottom of page