top of page

The Voice of Healthcare - Episode 2

Co-hosts: Dr. Matt Cybulsky (Principal, Ionia) and Bradley Metrock (CEO, Score Publishing)

Guest: Dr. Joe Marks, Executive Director, Center for Machine Learning and Health, Carnegie Mellon University

Duration: 30 minutes, 32 seconds

Podcast Links

Apple Podcasts

Google Play Music



Stitcher Radio


YouTube (+ closed captioning)


Bradley Metrock: [00:00:00] In the second episode of The Voice of Healthcare, we interviewed Dr. Joe Marks, the Executive Director for the Center for Machine Learning and Health at Carnegie Mellon about voice technology, the Echo Look's applications in healthcare, and even his favorite U2 track he reveals he knows Bono back from grade school days. Enjoy.


Bradley Metrock: [00:00:29] Hi and welcome to the second episode of The Voice of Healthcare. My name is Bradley Metrock, I'm CEO of Score Publishing, a company based in Nashville, Tennessee. My co-host is Dr. Matt Cybulsky, principal of Ionia. And tonight, we're fortunate - very fortunate - to have with us Dr. Joe Marks, Executive Director for the Center for Machine Learning and Health at Carnegie Mellon University. Joe say hello.


Joe Marks: [00:00:53] Hi everybody.


Bradley Metrock: [00:00:55] Thank you very much for joining us. We're very grateful for you taking the time out of your schedule to share with us some of your experience and your perspective on voice technology and health care. Joe I want to start by asking you...just share with us a little bit about your background, and you're very prestigious, you've done a whole lot of stuff, share with us, and the audience, a little bit of what you've done.


Joe Marks: [00:01:19] So this health care is actually the sixth different industry that I've done technology R&D in, and many of the other industries have also involved speech technology. So I set up and ran Disney research, the technology arm of Disney Imagineering. Before that, I ran a research lab for Mitsubishi Electric, a Japanese company, and speech technologies was a part of all of that. So I've seen speech and voice and general technologies around that many different applications. It's exciting now to be working on it in the healthcare arena.


Bradley Metrock: [00:01:55] Cool. So share with us a little bit how Amazon just came to Carnegie Mellon and there were several presentations to the Alexa Fund. Share with us a little bit about how that came to be.


Joe Marks: [00:02:12] Well Carnegie Mellon - maybe not a lot of your listeners know about it because it's it's a university in Pittsburgh. If you're not in computer science or drama - the two areas that it's really strong in - you might not know about it, but it is ranked number one for computer science and for information and technology management in U.S. News and World Report, and those of us in the field consider it right up there with MIT, Stanford, University California Berkeley. They tend to have a bigger presence, but Carnegie Mellon, again, for the insiders it's a really top notch place. So you get companies like Amazon. Often it's the first place they'll stop. And in particular, for language technologies and human computer interaction the real strengths of the department at CMU are robotics, human computer interaction, speech, machine learning. And given where health care is going, those are technologies that people see a lot of potential for application. So it's not surprising that Amazon would show up there and indeed every week or two there's another big company that comes along. So it's an unusual place maybe to think of for high tech but it is actually one of the very very best.


Bradley Metrock: [00:03:29] So with voice technology and health care - I'm not in the industry, I just sort of observe it. I have received health care from time to time. What...what's the low hanging fruit for voice technology and health care? What are the things that the layman, like me, can expect to see voice technology...what problems can we expect to see voice technology address first, in your estimation?


Joe Marks: [00:03:56] That's a really great question. I'd say, up until now, if you asked that question of a number of people in the field, they would go and say that voice and speech technology would primarily be applied to the editing and management of the electronic health record. So pretty much now every practice has an electronic health record system, and getting the data into that and also getting the data out of it is a major burden on the medical staff. It's basically a lot of typing that needs to be done and there's been a lot of work done on trying to use speech recognition to automatically transcribe what the doctor says and put it into the electronic health record. So that's the thing that people most often think about currently. The thing that...and I'm excited about some of the work in that space, and I do think there's a real opportunity there...but the thing that I'm really getting excited about now is having voice systems in the home, because you'll see that home health care is going to become a much bigger thing in the coming years because of the Medicaid managed MLTSS program (manage long-term services and support) for the disabled and elderly. This is going to basically move the locus of health care for a significant chunk of the population from the doctor's office to their home. And these are people who will be at a stage of life or have conditions that require frequent interactions.


Joe Marks: [00:05:30] So there I think there's a huge opportunity to have a way to connect with them and for them to connect with their caregivers in a natural and easy way from the home and voice could be the preferred modality for that for a variety of reasons.


Joe Marks: [00:05:47] And so that's the thing that's gotten me excited because it's a new opportunity. States are signing up for it. I think I read somewhere by 2014, they had maybe 26 states. They continue to have states enrolling in this, but it's a relatively new thing. So that's what's getting me excited.


Matt Cybulsky: [00:06:05] Joe, I could not agree with you more about the home health application of voice-first technology, especially with tools like Amazon's Alexa. You know, we for quite a while, tried to use some voice recognition software for physicians to speed their work - Dragon I think has been the prevailing one, if I'm not mistaken.


Joe Marks: [00:06:31] Well, I think there's actually a subtle distinction here between voice and speech technology. So speech recognition is about understanding the spoken word breaking it into phonemes and then then assembling it into sentences and interpreting them. And that certainly has a role in home health care as well, so that people could request a visit, or order prescription, or simply alert people that they're not feeling well. But voice processing is looking at just the voice signal alone, without really thinking about speech. And there's some exciting technologies in interpreting voice. And, from that, discerning whether people have taken their medications or have had some kind of medical episode or just feeling stressed. And I think that's the kind of thing that, particularly for disabled and elderly population, all they have to do is interact naturally. Say what they want to say. The system won't...we'll be paying some attention to their words, but we'll also be paying attention to their voice, and and can can infer a lot of the things that medical staff would want to know about them. So I just think it's a great fit for the disabled and elderly. And that's on the input side - listening to their voices. I think, then, a whole other side to this is communicating by voice to them, where and also be close to your heart - behavioral nudging, for everything from getting them to do exercises to taking medication - so that the two-way communication of voice is just a very very rich channel and particularly well-suited, I believe, to the disabled and elderly in this home health care situation.


Matt Cybulsky: [00:08:16] Yeah I've actually heard it described as kind of discovering the deep sea when it comes to connecting speech recognition to things like even neural networks and what we can get from people interacting iteratively through speech and what we can learn from the data sets that develop from that. Now when we look inside of those things, we don't necessarily have a program that sits back and says I can see how this neural network came together and made this connection. Is there any thoughts you have on machine learning in these neural networks that could be developed through something like a home health device, like Alexa?


Joe Marks: [00:08:57] Yes one of the questions I we have some excellent researchers, as I mentioned, at CMU in the space, and one of the early questions I had for them was the the stuff that they have shown in terms of interpreting voice and speech. Would their techniques work over a potentially difficult connection or audio environment? And it turns out that many of the things that they want to do don't have, don't encounter difficulties when the audio is not top notch. And that's very encouraging, so that they can apply the techniques that they have used. I think you were referring there to using some of the machine learning techniques that can do these days some amazing things but it's often a black box technology. You don't quite know how it's doing it. And there's a lot of resistance in health care to use machine learning, in that black box fashion, when it comes to diagnosis. Medical professionals are very uncomfortable with just passing on a diagnosis from a machine without any rationale behind it or without them knowing why the machine made that diagnosis even when it has a very strong track record. But I think the capabilities that we're talking about here - they're not diagnostic. They're about health care delivery, and that's where the economic opportunity at the moment really lies, when you look at and compare the U.S. system to other systems.


Joe Marks: [00:10:20] It's the cost of our health care delivery that's really become a problem for us, and home health care is addressing that in one way and this could be a key enabling technology. But I think people would feel a lot more comfortable using some of the machine learning technologies in what is a non-diagnostic setting. But it is all about healthcare delivery.


Matt Cybulsky: [00:10:41] Yeah, I mean, I couldn't agree more when it comes to the delivery aspect of this. Something that I've come across regularly, which I know is echoed in some of the subtext of what you just said, is the absolute shortage of practitioners upcoming, both nurses and physicians and ancillary staff, in order to take care of all the growing aged population, as well as growing amounts of chronic conditions. As a comparison, COPD, which is something that I've been working China, they have estimated 200 million Chinese are diagnosable COPD. In the United States, it's not that much - clearly, it's more like 24 million. But in both sets, in both populations, there is certainly a lacking amount of providers that can show people how to take care of themselves, maybe alter their health or be, as you mentioned, a behavioral nudge to modify behaviors to get people to stop smoking or to be more compliant with medication. That's where I really see this voice technology, from a delivery perspective, changing the game completely. In fact, it might even be something more of the norm, especially with things coming out like Echo Show. I suppose, I think it's called Echo Show, right, Bradley? It's the one with the monitor on it.


Bradley Metrock: [00:11:59] Yes.


Matt Cybulsky: [00:11:59] The idea that you could connect to a practitioner, or even better, someone else that has the same disease as you do, inside of some sort of social connectivity, to talk about how to take care of yourself in an optimal way.


Joe Marks: [00:12:13] I think that the presence of a visual monitor - that certainly helps for the social connection as well in making the systems more friendly to use. There's also something, boy we learned back when I was at Mitsubishi Electric, back then, that a screen, paired with a voice interface, is often a very good combination because the screen can give feedback and correct the voice. It can, for example, tell what voice utterances are valid right now, in the current state. What are we expecting you to say? And maybe give a list of things. And the usability of voice interfaces went up a lot, when there was this side channel - a visual side channel - to go with it. So it's sort of a subtle point you won't really appreciate until you actually build these systems, but the pairing of a visual display with voice often makes the interface a whole lot more effective.


Joe Marks: [00:13:12] And then, you know, just reiterating what you said about the social aspects - being able to put a face on the display, that's very real as well. Maybe even being able to put data on there. But one of the exciting things to me about the technologies that are pairing the display with the voice is that it actually makes the interface a lot easier to design. And it's the subtleties like that, that can make voice interaction usable or not usable.


Bradley Metrock: [00:13:41] Well let me tack on to that, and get your thoughts on adding a camera to it. You know Amazon just announced the Echo Look, as well, which is marketed by Amazon as a fashion device that goes in the bedroom in the home, and ostensibly helps people have a better fashion sense by using a camera and studying your body type and helping make recommendations. But as was noted back in one of our earlier VoiceFirst.FM podcasts, that Echo Look has a lot of health care applications too, potentially. And I just thought I'd get your thoughts on how you think that might fit in as well.


Joe Marks: [00:14:35] So I think the...certainly, it's additional information. Facial expressions can be an important indicator and combined with voice for assessing mental state - whether people are depressed or not. And so I think that additional signal certainly doesn't hurt at all. I think that's one obvious need. For other places, where, you know, I might hold up my injured arm, or something like that, and get a diagnosis that way. I'm not sure about that. If you've got something that's sort of visually that bad, maybe we should just get a caregiver out take a look at you, if you're part of a very vulnerable population.


Matt Cybulsky: [00:15:27] Well you know, Joe, there's actually some famous...when I was working at a medical school, there are some famous internists who claim that they could watch people's necks, and based on what they saw with veins and arteries pumping in the neck, and comparing that to pulses in the feet, diagnose various heart conditions. So maybe it's not too inconceivable to think that, with thermal imaging and with the micro-views of the skin, and what's happening with respiration rate, that these kinds of diagnoses might not be too far off, as far as being able to put a camera on a patient, let's say, in a bed, or even when they're speaking to the camera at home, to the practitioner, and they're being scanned all the while for the world.


Joe Marks: [00:16:15] That's actually a great point. There's some work the last couple of years by my former Mitsubishi Electric colleague, now professor at MIT, Bill Freeman, that showed that the tiny change in color due to the blood being pumped could be seen in the skin and in the face in cameras, and be exaggerated so that you could visually see it. So we can't see it when we're looking at somebody - I can't see your heart beating due to the changing color on your skin - but the computer can actually see that. And he did some work showing how you could measure heart rate. And I've seen other work where you can measure respiration as well. So being able to get some vital signs out of the camera is not crazy at all - it's doable. There's some great video out there, if you Google for Bill Freeman / MIT, you can see that. So I think that's also a possibility as well. Although, you know, we should be careful about not using the wrong modality. I think while you can do things with vision, and with voice, that are very powerful, there's also things you can do with wearables and other sensors, and maybe the whole package is really what will drive this forward for the homebound population, so that it's voice, it's vision, it's a wearable sensor for blood pressure, maybe for...maybe it's an implanted sensor for glucose levels, or something like that. But you could imagine instrumenting the population. You can also imagine instrumenting the environment. One of my favorite projects that I watched for many years was the "aware home" at Georgia Tech - a project where they instrumented the house to see what they could learn about inhabitants and their behaviors.


Matt Cybulsky: [00:18:07] Oh wow.


Joe Marks: [00:18:08] And just a very simple, again, an audio application - they listened to the...they put a microphone on the main sewer drain of the house. So all it was listening to was the water gurgling out of the house. But from that, you could tell, like, who was...somebody's taking a shower, somebody was flushing the toilet, somebody was cooking, and there's a rhythm to that. And when people...something changed in the house, or something changed in behavior, that was noticeable in just listening to their gurgles of the sewer. And, you know, somebody stops washing, or somebody stops cooking, or somebody stops going to the bathroom - that could be very indicative of something seriously wrong, particularly for disabled and elderly. And the fact that you could pick that up from just that simple signal, suggests also that there's other uses for, you know, if you've got a microphone in there listening for a voice, it can listen for a lot of other things as well.


Joe Marks: [00:19:02] So the potential for, once you get that beach head in the home, of something that's microphone, camera, and display, who knows what the potential is for providing services to people who need them.


Bradley Metrock: [00:19:17] It's funny that you would reference microphones in the house. I was just speaking to my brother, who's a software developer, about the idea of having a home security Alexa skill, but I don't think that...or a voice skill - it could be on any platform. But I don't, I don't think it's technologically feasible, but basically the gist would be you would tell your Amazon Echo (just to use that as an example) that "hey, I'm going out of town, so Echo please begin listening to the house." And at that point, you know, it's already...if it listens to your kitchen for a day or two - possibly less is needed - it's going to establish a baseline level of volume. It's going to know, after a day or two, all the different..certainly a week...scientists can figure out what the exact amount of time needs to be, but some amount of time is going to go by, and within a 95 percent confidence interval, they're going to have recorded all the different sounds that your house makes.


Bradley Metrock: [00:20:28] And so, from that point forward, if your house makes a different sound, guess what? You've got somebody in your house. Or you've got some other sort of problem.


Bradley Metrock: [00:20:38] It's just funny to hear you talk about it from a health context, which makes perfect sense. The first thing I thought of when I saw the Echo Look, and seeing the camera, is thinking about you know people think of a camera they think of the pictures it's taking right there in that instantaneous moment in time. But for me, as a melanoma survivor, how perfect it would be to have a camera that can capture your body every day over months and years. And to have it be able to say to somebody: hey you, that mole on your leg isn't red anymore, it's kind of black. You know what I mean? And you may want to go see the doctor. You want me to call the doctor? Your doctor? You want me to call it set up an appointment? Or if a call is not even necessary, you would just want me to set an appointment? Something like that obviously would save lives. So, you know, it' is an exciting new world. So I just want to throw that in there. Matt - Matt, it's all you.


Matt Cybulsky: [00:21:45] That's a great point, Bradley, based on what you just shared, and what Joe was sharing earlier about these data points in the home, including sounds and the gurgling from the the drainpipe in the house. What comes to mind here is we're going to be constantly deriving a relationship between, instead of millions and millions of data points, billions and billions of data points that we get through speech, that we get through sounds in the home and the body and visual from the body. Joe, given the massive increase in data points that will now be able to connect with these kinds of technologies, what does this mean about how we generate guesses about the world we live in, including health care?


Joe Marks: [00:22:29] Well, more data is always better, at least in good hands. The point you made there, about the skin lesion detection - there was a paper in Nature in the last few months that showed a machine learning system that outperformed, I think, 21 dermatologists in diagnosing melanoma. And this was a very detailed study where they had actually done biopsies of all of the lesions that they had imaged, so it was ground truth. And so, that future is here, and voice can be used for the same thing.


Joe Marks: [00:23:03] One of my colleagues at CMU is working on being able to detect, for example, early signs of Parkinson's from the voice signal. So, whether it's visual, or whether it's audio-based, there is the ability to do some diagnosis. Again, I want to be careful about that. I think everybody in the medical profession feels that these should be used as indicators, but final decision should be made by medical staff. But, in the context that we're talking about - the disabled and elderly, particularly on Medicaid - things can fall through the cracks. Having these automatic checks, I think, are quite important. But, yeah, you could imagine - and for good reasons - that if you're at home and have, you know, the full package in there that your life is being monitored and processed at a level that really hasn't happened before in human history. Basically all of this data being gathered about you, it's all for a good reason: to provide you quality health care in your home. But it is, for some of us, it will take some getting used to the idea that, you know, every little flush of the toilet to utterance that I say is maybe being recorded. And they're all with a good reason, but still a little scary.


Matt Cybulsky: [00:24:25] Yeah it can be, which brings me to another question for you, which is: I don't think anyone listening, or even participating in the conversation we're having today, would disagree that these days we are hopelessly hooked on our technology. Our phones are the primary example. But in a short period of time, these kinds of voice technologies in the home will be a secondary and otherwise primary example of this disconnection anxiety that you can feel when you're not utilizing one of these tools. Given the concerns of health care through HIPAA and other privacy concerns, what comes to mind for you, Joe, when you think about benefiting and optimizing this technology, as well as protecting our privacy and making people feel comfortable in a world where so much of your behavior can be predicted in a way in which you can't hide anything?


Bradley Metrock: [00:25:24] Yes. I think, well, there's a number of things in here. I think the principle that data can only be used for the purpose - the stated purpose for which it was gathered - that's something that people really need to adhere to. And it can go beyond HIPAA in many ways for that, but I think that principle is sort of like the golden rule of this data gathering. And I think that's a key issue.


Joe Marks: [00:25:49] I think the other thing is a level of security about biometric data that I'm concerned about. If your financial data gets stolen, there's remedies to fix that. The credit card company will send you a new credit card. But if data about your body - your medical state - gets stolen, you can't change that. You can't grow a new finger if somebody steals your fingerprint. You can't grow a new kidney, if one of your kidneys was removed for cancer, or something like that. So once that data gets out, it's sort of permanent, at least for your lifetime. And that scares me a little bit that we're not really up to protecting that data, because the norms have been set for financial data, which while bad, is fixable. So I think having the right principle in mind about how the data gets used, being more careful about security protocols - this is the place where the rubber really meets the road on security, is in health care biometric data. I think as long as people derive real benefit from it - I think that's the place where most people seem to be willing to give up some privacy and are willing to give up data - is do they get real benefit from it? And if they do, they're willing to do it. If you're giving up the data, and you're not getting anything for...and it's being used for something else, that really rubs people the wrong way, as it should. But I think the scenario that we're talking about here, is we're talking about undeniably strong benefits in providing better health care at home, allowing people to live in their home longer, providing them better health care at a more reasonable price. These are real, real benefits. So I think that, in general, people will be willing to give up their data, because they'll get that benefit, but they'll want to know that the use of that data has been informed about that. And they should expect that it's being protected, and secured, at the highest level that technology will allow.


Matt Cybulsky: [00:28:01] Yeah. And, you know, Joe and Bradley, there's been technologies throughout the millennia that people have said "this is it, this is going to ruin us." And we seem to get through it all. Microbiology, nuclear energy, cloning, even with technology is now CRSPR and other genetic modification tools - something that humanity has been really good at doing is setting up systems around that to keep us from having a total meltdown. And we've got a pretty good job of it, in my opinion. So maybe the future isn't so scary with this. Maybe humans can be more reliable on using this the right way.


Bradley Metrock: [00:28:41] Joe, I'm going to wrap it up. We greatly appreciate you. Greatly appreciate you setting the time aside. Before we let you go, I want to ask you: we were talking about music, before the podcast began, and how you recently saw U2, as I just did it at Bonnaroo. I want to ask you, for the record: what is your favorite U2 song?


Joe Marks: [00:29:02] Oh I like them. I like them all.


Bradley Metrock: [00:29:05] That's not an acceptable answer.


Joe Marks: [00:29:10] Well, if you got to pick one, "With or Without You," I like, but there you go. I played...Bono and I used to play on the neighborhood chess team in Dublin, Ireland, which is how I know them.


Bradley Metrock: [00:29:21] Are you serious?


Joe Marks: [00:29:22] Absolutely. And...for folks on the call, I sent around earlier a photograph of me backstage with them at their conference here, at their concert here, in Pittsburgh.


Joe Marks: [00:29:33] And we were chess-playing buddies. He has a strong interest, both he and The Edge, in investing in health care. And so, you know, they're are also technologists, and very interested in this.


Bradley Metrock: [00:29:49] Awesome. Thank you for sharing that with us. Thank you for sharing, just like I said, some of your time, and some of your perspective. Brilliant guy. We greatly appreciate it. And for The Voice of Healthcare, thank you for listening. And until next time.

bottom of page