How Outlook Mobile Mastered Its Release Process

Kristel Kruustük , Kristel is Testlio's co-founder.

May 5th, 2017

Testlio co-founder and CEO Kristel Kruustük recently interviewed Microsoft Partner Director of Engineering Kevin Henrikson in our webinar, How Microsoft Masters Its QA. The following is a transcript of the interview, followed by audience Q&A. Enjoy!

INTERVIEW

Kristel: Hey everyone, we’re still waiting for a couple of more minutes for everyone to join, but we will start in a couple of minutes, so very excited about that. We have Kevin already on as well and so we’ll talk to you soon.

It’s on, hello everyone!

So we are ready to start with our webinar, we have quite a lot of activities today, so thank you so much everyone for signing up and thank you Kevin from Microsoft for taking the time to speak with us and share your knowledge about how testing is being done at Microsoft.

To start off, I’d like to introduce myself. I’m Kristel, Testlio’s co-founder and CEO. Testlio is connecting enterprises with a global community of software testers. So we provide really amazing customer experiences. Today we’re really here to talk about engineering and quality with our industry veteran Kevin Henrickson, and Kevin is an awesome guy. He has been founding companies, he has been leading multiple software development teams in very, very big companies, and recently he founded a company called Acompli, that was acquired by Microsoft two and a half years ago, and now he’s a partner director of engineering at Microsoft. So Kevin, thank you so much for coming today, and we’re super happy to have you here. You can tell everyone hi, as well.

Kevin: You bet, hi thanks for inviting me. This is going to be great, I’m very excited. Looking forward to all these challenging questions, to see if you can stump me. So I’m ready to go.

Kristel: Oh yes, I’m ready to go as well. I just draw like myself for the board as well, so I have someone watching my back as well.

Kevin: Nice.

Kristel: But yeah, just to, maybe I can give a quick agenda on today as well. We have a thirty-five minutes fireside discussion with Kevin, but I am really encouraging everyone to ask questions during our conversation so we’re actually streaming all of the questions as well and if there are some interesting ones, coming up then we will ask them immediately. So don’t hold yourself back, we’re going to try answering as many questions as possible.

And then the second thing is we might have a couple of polls coming up on your screen a couple of times during the conversations, so feel free to answer that as well so we can get some more points to talk about. Let’s start. So Kevin, can you talk a little bit about yourself, and what you’ve done in the past, and yeah, let’s start from there.

Kevin: Sure, so I don’t know how far you want me to go back but I grew up on a pig farm in Central California, so I had lots of motivation to go to college and learn something else so I didn’t end up being a pig farmer forever. I totally loved growing up on a farm but it was great to go to school and learn a little bit about … I actually was a mechanical engineering major so coming out of school and the internet was kind of blowing up made it more sense to do software. So I’ve been working at various software companies you talked about for a while, mostly focused on the email space, and so emailing and calendaring type of applications for various types of uses, some for larger carriers and now more recently for enterprises and consumers and then with a company, really focused on the mobile aspects of that.

And so today I manage the engineering teams at Microsoft, very focused on Outlook for the mobile client and also the Mac client. So iOS, Android and Mac, which is great. So we’ve been able to make a lot of changes and really try to bring Outlook to a bunch of new devices where Microsoft traditionally didn’t have a lot of strength. But over the last couple of years has really grown significantly.

Kristel: And so how did you end up in this position today at Outlook team? As I mentioned before you were forming a company called Acompli, so maybe you can talk a little bit about that story and company.

Kevin: Yeah about four, four and a half years ago now we started a company called Acompli with the idea that we felt mobile email and mobile productivity was broken. And so we picked email as kind of the base case and ended up extending into calendar and contacts as a mobile expression of how you get things done. Right? You know more and more today people are with their phone all day long and so how can you use the contacts and the phone to help you not send longer emails and that but to send better emails. Right? So how can you be more productive on your phone?

And then Microsoft showed an interest in the company about a year and a half after we started it, ended up acquiring the company and as part of that acquisition we renamed Acompli to Outlook and then launched it about six or eight weeks after the acquisition and today’s been Outlook. So today we have tens of millions of users using Outlook on their phones and that kind of grew into what I have today and as part of that, so we as big companies do, picked up the leadership of the Mac team and so it’s been great to kind of take some of the things that we’ve learned building Acompli and share that with the Mac team. But a team that’s been around a long time at Microsoft as part of Office and Outlook is kind of the core to Office, is a big part of how Microsoft sees the future in terms of the users spend an enormous amount of their day in Outlook, on varieties of different platforms, whether it’s on the web, or Outlook on their Window’s machine, or Outlook on their Mac, or on their phone. And so being part of that is pretty cool and in the last couple of years as you said that I’ve been at Microsoft.

Kristel: That’s super exciting and I just wanted to know to everyone, send a note to everyone or say to everyone as well that we even have signs up in our office at Testlio like what will Kevin do? Because we follow Kevin’s journey for such a long time and the way he approaches things and how fast he is able to release stuff and in high quality, has been so impressive. And so we have huge posters up of Kevin. So we’re super excited to have you and it’s really, really good. But now, talking about let’s dig a bit more deeper into Outlook’s mobile developments. How is your team structured? What are the roles and how big is your team?

Kevin: So the total team’s around 100 today and that’s growing. When we were Acompli we acquired around 19 people so we had basically one or two people working on each of the mobile apps for iOS and Android, a couple people working on the service component that runs in the Cloud, a couple of people in ops, a couple of people doing marketing and sales. But today that team’s grown significantly. Today we have more than 10 people working on each of the clients. On iOS and Android, there’s more than 20 people working on the Mac client and then again 5 to 6 people on the operations side, and then around 10 or so people in the server side, and each of those teams are a little bigger than that but roughly adds up to around 100.

So the team’s grown significantly I think since we were acquired but the core of the way we do things I think is still the same structurally. We had to scale some of the processes and change some of the approaches. Obviously you can’t have a stand-up with 100 people in there and everybody takes 30 seconds to say what they’re doing. So you move to more of a scrum of scrum type model where teams meet each in their own groups and then we kind of roll that up to a larger kind of team meeting.

Kristel: Very cool. So you have the bigger team now, what are your teams biggest priorities and where does testing fit in?

Kevin: Yeah so I mean the high level you say you want to build an app that users love and so if you look at the traditional way that IT deployed software, Outlook was really built for the IT manager, the IT admin if you think about it. There was a way for them to control and provide security around the email traffic or the communication of their users. And currently today, people don’t think of email as something that IT enforces. They think of it as a way that they communicate and get work done.

And so our view at Acompli and as we translated that into Outlook was that you have the first and foremost, our job was to build something that users love and to build a culture of a team that people like working on the team. And so a lot of the processes that we have around the [Joli 00:11:09] and around the way we test and release software is really to create a great environment to the engineer. Because we feel like if we have an incredible team with a good culture, they will, in turn, build a great product, which will then allow the users to love it and then be able to have a virtuous cycle.

And then that growth ends up being something that Microsoft monetizes, we as a team can monetize our own Office because people love the software. And so this notion of kind of inverting the model from building something that IT asked you to build or a company comes to you and says here’s all the features I need, which is kind of a traditional way of enterprise software being built and moving into a model and saying how do we be incredibly in tune with what our users want and understanding and listening to them.

But at the scale we are at today that becomes very hard. So that is part of the reason the team’s grown, is that with tens of millions of users, that’s a lot of people to listen to and the opportunity to really impact them is great but it also comes with incredible responsibility to make sure that you deliver high quality because one little bug and you’re saying that impacts half a percent or one percent of users, well one percent of tens of millions is millions of users.

So today that’s probably one of the biggest things we focus on, it’s how do we really maintain quality and be able to continue to have an agility inside of the engineering process that we can release quickly and so that we can get features out and the amount of kind of waiting coder code that has been written but not released is very small. Because that’s something that from a developer productivity or developer happiness … developers are not happy writing code that just sits on their laptop or sits in some code depot. They really want to get that out to users and the quicker we can do that, the better the engineering teams feel. But as part of that obviously there is incredible responsibility to make sure we do that with high quality, which is very challenging.

Kristel: Yeah I really love that and I totally believe in your vision as well and just think that something has changed over time, now that you’re being in Outlook and before being at Acompli. Has there anything changed or do you see a cultural change as well between these two companies?

Kevin: Yeah I mean for sure. I mean Microsoft has an enormous amount of resources to go and get things done. So it’s very easy to go and get certain things done at Microsoft. There’s a caddy to run a big marketing campaign or I want to run … I need a large investment. We started a team in China, for example, we already had a team in India and New York and San Francisco and Mountain View and Seattle. So we have a relatively distributed team, which is cool and we, again, it’s part of the culture that we can work where we want to live and really hire the best people from anywhere in the world. And so Microsoft helps enable that to happen as being able to give us the ability to say hey we want to set up a team at this location, we can do that.

But I think similarly, some of the things that traditionally happen if you think back a decade ago, Microsoft was releasing software every three years. So they would work for three years and release a new Office version and then again go dark for essentially three years and work on the next one. And Windows to some extent had some of those changes and now with Windows 10 we’re releasing every week with patch Tuesday but then there’s also kind of monthly releases in the apps. We’re moving to a much faster area and Windows is shipping twice a year. And it’s kind of this continual upgrade. So as the service around the core platform is changed, we’ve even pushed that to another limit inside of the Mac and mobile teeth rule releasing every week.

So being able to move that quickly is something that was initially a challenge inside of a company that was not used to shipping that quickly. And so a lot of processes that they thought were fast, they’re like hey this used to take up a month in the three-year build cycle and they’re like well great we’ve got it down to a week so we can ship every month. And we’re like no, no we want to ship every week. So you need to do it in like an hour or a day. And in some cases seconds. So a lot of things we’ve done as we’ve come to Microsoft is figure out how we can automate and remove a lot of these flexibilities.

Like a lot of the testing that we do at Acompli is having the very rapid scale release process so we work Monday through Friday writing code but then every Friday we ship basically to QA or to get the VICE certification over the weekend. And so even on the weekend we still are kind of working in a sense that we’re validating and checking off that build. So come Monday morning we can address any critical issues and then start the release process and then start this rig again.

So that kind of seven-day cycle and basically being able to work every day has been one of the keys to our success at Acompli. Being able to be incredibly agile and decide quickly about what we wanted to build and how we wanted to add features. But now inside of Microsoft that’s also a similar strength of the team. Even as we scale we’ve been able to maintain that incredibly rapid spread cycle. And it’s kind of like the training’s always run on time and that’s important to us, it’s important to the team. But it also … at some level, that seems stressful you’re moving so quickly and never taking a break. But it’s also incredibly relieving in the sense that if something’s not done and something’s not perfect on Friday there’s no pressure to make this ship train because if you’re like hey it’s only going to miss by a day that’s fine. Just take your time and do it right so then we can ship in the following week and so that you’re only going to have to wait four or five days and the next train we’ll be going.

So that’s something we’ve been kind of blessed with to keep that model and you change the traditional model from saying hey yeah if we make a mistake we need to roll back. Like we just roll forward, for us it’s the ability to fix and correct something. It’s much easier to just make the next build better than it is to try and find a way to go completely have to roll back or undo something you did.

Kristel: So we’re talking a lot about scale and speed and trying to release everything very fast and in high quality. But always something suffers. What are, in your opinion, some of the key challenges of doing QA in such an environment?

Kevin: So QA’s interesting right, just before we were acquired Microsoft actually removed QA as a discipline, so Microsoft moved to a model where they call it combined engineering. Where there’s a single engineering team that includes development and quality but there’s no, the role of tester just doesn’t exist. We don’t have a testing role and so the idea of that was similar to DevOps in the sense that we no longer have a separate ops team. The ops team is part of the engineering team and they’re responsible for writing code to automate their work.

The quality is sort of the same way. We build an incredible amount of automation around the low-level functionalities of the Cloud but for some of the device testing the diversity of devices and the diversity of the content in the email and calendaring app it’s nearly impossible to build all of those matrices in the amount of different possibilities. And so early on we’ve been working with Testlio to help us manage that. In two ways.

So one, we don’t get to pick our users. People can come to the App Store and anybody from anywhere in the world, like we’ve been worldwide since the day we released, I think in 63 languages we have the app translated now can download that app and try it and use it. And so we don’t get to choose that user, we don’t get a chance to meet them in Best Buy when they bought their computer like in a traditional way. Or help them install Office and set it up. It’s all on their phone and they get to go for it and use it.

And we have a sheer number of seconds to impress them and show them that they know how to use the app. It has to be intuitive, they have to open it because, one press and they slide it up and you’ve deleted the app and it’s gone. And so your chance of losing or basically failing on the promises that you made is incredibly short.

And so the benefit having … you know there’s no way arching to home devices, there’s now way we could pick every use case and so the way, you know the way we work with Testlio is that every week on Friday we release that build. And then Testlio sort of rotates through a variety of testers to give us both diversity in the users. You know different languages, different locations, different types of cellular, corporate, wifi. Different devices. And so the general kind of test game is we don’t tell Testlio hey please give us this exact test matrix, we essentially let it be random in what comes in.

And certain days we’ll say we need more focus on this particular Android release or we need more focus on this particular device. Or hey let’s do more tablet testing this week because we have a bunch of changes in tablet for the larger size. But the general kind of cadence of the way we run, we purposely don’t want to know who’s going to test because it’s just like our users. When we push the button to release on the App Store or the Play Store, we no longer get a choice in terms of what devices they have. If it’s compatible it they have the right OS version, it’s going to go and they’re going to use it. And so we need to have a kind of device validation or testing kind of framework that matches with that.

And so for us, Microsoft is a company went to a combined engineering phase, there’s still an importance in this notion of compatibility testing and device testing. Because we don’t have, for example, we do have five engineering offices that are pretty well distributed but there’s obviously of the 63 languages we support we probably only have eight or ten native language speakers on our team. And so we still need the other 50 covered and to be able to work through that.

And again we probably have 20 some devices across the people because generally, the team has newer devices and more updated devices and again we want to be able to test them a wider selection of devices that’s around in the world. So it’s been a good partnership.

Kristel: Awesome. So we actually, I wanted to mention that we’re putting up a portal on this as well because we’re really trying to understand what are the key challenges for companies doing QA. But we also have a question from the audience. Actually quite many but I’m going to start with the first one. So yeah you mentioned … You talked a lot about the two years ago Microsoft introduced the combined engineering into Microsoft and so now the question from the audience comes, how does Microsoft value manual testing? Because I think a lot of people felt that manual QA is being replaced by engineers only.

Kevin: Yeah so I think that’s a great question. So I think there’s still some value in manually validating things that just are hard to automate. I mean if you look back 10 to 15 years maybe even a little less than that, 6 to 8 years, we’re building mostly SAS apps and traditionally Web apps. There wasn’t as big large of a focus on mobile.

And some of the testing frameworks were relatively primitive in that sense and that for a very simple application it was relatively straightforward to walk through and log in and do something on a web browser and automate it. But for the incredibly complex app or something like an email application where there’s a lot of diversity of data and as well as the UI and something that’s querying that quickly where the changes are it’s very hard for the automation to keep up.

And so the company before Acompli was a company called Zimbra and we had about 40,000 test cases that we would run and it would take about three or four days to run those across a huge cluster of VMs. And the team was probably about 35 people in QA that worked in that automation. I always kind of today look back and say that was incredibly negative ROI think. Because most of the time there was the team was just fixing the test and rebuilding them. Because it was just so fragile from a UI and US point of view.

And so I think for a lot of these cases especially things that are interfacing, we much move to like … the team dog foods and kind of manual tests by using the application all the time. Like every check in from our builds goes to all of the DEV team and so in a sense everybody in the DEV team, everybody in the foc team, the entire company every week gets a role out of our build. So once a week we update the entire company’s app but every check in we update the DEV team. So the 40, 50 folks on the DEV team they’re literally manually testing every build. And so if it’s something like flag or forward or something that when you rotate the device, those issues they’re caught very quickly because people are using the app.

I think got manual testing but we don’t think of it as and independent disciple where there’s just one person that just sits there and runs the buttons and puts the test. Like we think of that as a distributed job on a team. And so that gives us coverage on the primary devices and the kind of the more popular newer devices. But again we leverage Testlio to be able to give us coverage on some of the older devices but also coverage on language and things we could have missed during that week.

And then in terms of roll out we use flighted roll outs. And so we roll out things gradually. In the App Store you have to, the iOS App Store you have to basically push everything at once. And so the flights are features inside of the app. But in the Play Store for Google it’s great, we can turn on one percent, two percent and kind of roll it out in a very controlled fashion and look for crashes and watch telemetries.

So there’s still a place for manual testing. We’ve kind of integrated it into the development process. Like I said the device piece, Testlio partners and helps us with, but the role of it is still there. I’ve seen some new start ups that are incredibly interesting that are using either machine learning or computer vision. To try to actually generate automatic testing without having to write code and I think some of the advancements that are going to come, just like today very complex web applications can be automated in testing, based on user data.

So I think some of the more interesting stuff is how do you capture the use case of how users use your data in the wild through telemetry and app recording and then play that back into automated testing. Rather than the kind of traditional way of doing automated testing where you have like an excel spreadsheet with a list of tasks and a list of things. I think that model is going to decrease over time. I think that more and more teams are going to move to putting automation in places where there’s good ROI where you can write tests and the APIs and the functionality you’re testing is very stable. So unit tests and integration tests. But to get comprehensive UI look and polish and feel those kind of things will end up being manual at some level. And really use telemetry to kind of listen to that.

And so we have a couple things in the app like if our app ever hangs for more than 15 seconds on the UI pin we actually crash the app. And so then A, it’s like hardened cert that forces that bug to go get fixed. But it also enables us to kill our app before the OS does because if your app hangs for that long Apple will just quit the app and say hey something’s wrong. And then you don’t get that feedback.

So by forcefully quitting that app as an example we collect it. We also set in telemetry alerts much earlier than that. So three to five seconds if we see a UI hang. So those kind of things give us a lot more leverage for the manual testing. Because a lot of times if you’re a user and you’re manually testing something, you’re like oh that hung for three seconds. You may not report that as a bug because you can’t really reproduce it but you may see it and it’s annoying.

So what we do today is we use data to collect that and then we can machine learn and run regressions off that and say hey for every time somebody opened a message, the average time was 300 milliseconds but in a certain case if the message is over this size hey it was popping and it’s four seconds. Oh there’s something wrong with the HTML renderer and we can dig in. But we have the data to go dig into that so the manual testing, or the manual use I guess I say becomes kind of our test data.

And so a lot of the quality work while non-dedicated in the traditional manual testing engineers, it’s the quality work’s done as a team again in this combined engineering fashion with as much automation and telemetry as we can to kind of report back on what we learn.

Kristel: I think my next question would kind of come from the same thing. So you’re mentioning that you gather a lot of data about the users and so, but will data help you make the decision as well? Like how you prioritize your research and how do you decide on which issues fix on each release? Because your release schedule is still very fast right? Then how do you make that decision on prioritizing your issues and yeah?

Kevin: That’s great. So I think all software has bugs and I think that’s just the nature and it’s great job security for both of us, I think at some level. But it’s an important piece of the puzzle. I think the way that we use the data is two factor. As you know from Testlio we only allow them to give us three bugs. So we say what are … This app is in production it has millions of users, we know it’s working but we also know there’s issues and things we could improve. And so to focus that attention on what are the top three issues, we only basically allow what are the three best, most impactful issues and all the issues are recorded. But in our report each week we only get highlighted the top three to go look at.

Similarly, we use lock key app, which is another acquisition that Microsoft made and super popular data tool for crash analytics. And they give us incredible amount of data not only on crash frequency and histograms and when did this first release and which build, but also the number of impacted users. Because traditionally you would just look at the crash count and say oh this is the top crash but in many cases that top crash is a small number of devices that are crashing in continuous loop or you know in the Android’s case there are people that modify it or have done something to their phone that’s triggering this crash.

You may see one or two users that have gotten to this bad state, so that looks like the most important issue or crash to go fix yet maybe number seven on the list is more important because it’s impacting thousands of users verses but it’s only crashing maybe once or twice whereas something that is impacting maybe one or two users but crashing tens of thousands times a day because somebody’s kind of done something with their device is kind of a less impactful thing.

And so we use data to both decide the priority and really try to mimic and focus on the most important things because we want to get it done in a week and improve the product and keep moving forward. And then also what’s the impact to the user base. And so really having deep telemetry in terms of like is that something that our users do commonly.

So there may be some low-level feature in settings … Maybe if you do some crazy combination of things the app will do a bad thing or it will crash. But that’s just not a frequent use case. Whereas something that happens in like the message list or in reply or in calendar those are incredibly heavily used services. And so even a small imperfection or a small thing is disastrous to our users and the intent.

And so if we look for any kind of slow scrolling speed, if you don’t have that video game type 50, 60 frames per second scrolling. That really, really sharp looking feel that’s how we report telemetry really accurately on what we see the frame rate drop. We know something’s wrong. We’re not using enough cells or having a memory blow or there’s some issue. We take that data and go back and say that is a hard problem we need to go solve potentially because it impacts so many users and it’s such a critical kind of home screen or front screen to the folks that are using our app every day or email.

Our heaviest users are using the app upwards of 100 times a day. Launching it, going in there, average session length is like 23, 24 seconds and so if you think about somebody coming in and basically being inside your app for 23 to 24 seconds, our job is to make sure that they can get the most done inside of that small window. And so our job is how many tabs, how many things can they get done?

I’m kind of this crazy inbox zero person so I’m always swiping and going in there, triage as quick as I can. Find the two important ones, we have this thing called Focus Inbox, which allows us to kind of push down messages that are more for mail lists or senders you haven’t seen. And really prioritize based on people you send to frequently. So if you send me a mail and we have a lot of communication your mail’s going to bubble up versus somebody I may be a cold into or somebody that I haven’t emailed in a long time. They’re going to have a lower weight.

And so all of those kind of features and functionality are built in the app and are really designed to be, how efficient can you be when you’re in the application. Because today’s mobile apps the session length is incredibly small. And so we want people to have a great experience for the few seconds that they’re going to be in our app before they toggle off to whatever they’re going to go do next.

Kristel: Yeah. I agree. So we actually have the results from our first poll. The questions was, “What is the biggest pain point in testing?” And first was, with 30%, was testing speed and reporting. So I think we can elaborate more on how do you really get your cycles speed enough so you can satisfy your customers in considering that Outlook mobile is really releasing on weekly basis. How do you continue with that pace and because there’s so many different …

Kevin: So data’s great right? I think if you look at this data today, like you say okay we’re testing on reporting speed. Well how quickly can you get that feedback again by reproducing common bugs or sort of tide. So being able to collect that data and have that data available. And so having good telemetry and kind of base line telemetry and then correlating that with your crashes.

So when we see, like in the crashes case we have hot key and we’re able to tie that back and say for these people crashing, here’s that stack trace, here’s the bread crumbs of the type of things they were doing. Oh they were on compose, or they had just archived a message, or they had just tried to multi select three messages and move them. Whatever the case maybe you can go in and identify that and find those issues.

The speed thing’s actually an interesting one. Because if you think about it hey how do we design in a week and you know we tried different things. As a small start up when we had two or three people we were kind of releasing every day because the impact was low. And if we made a mistake it was just a small number of users that were impacted and they were mostly our friends because it was early in testing.

But as we started to get a little more rigorous, we said hey two weeks … Is kind of when everybody … hey most agile companies have two weeks and that’s great and so we tried that for a while. And what we found was most of the work would get done in the Thursday before the second Friday. So basically the day before the end of the sprint you’d see this crazy spike in check ins and everybody piled in their work to finish this sprint.

And we said well that’s weird because then you get all this landing and integration issues and you have a lot more crashes and you know try to shake out a bunch of fresh code and try to end this sprint in 24 hours. And so what if we try a week. And sure enough the same thing happened. So we try a week and same thing, Thursday you get a good spike but it’s great because that spike is half the size because the amount of work we’re setting up is half of what we were doing. And so now we see a little bit of check ins earlier in the week of stuff that was in code review or kind of stuff, those kind of trickle in.

And then you see kind of a ramp and spike in Wednesday, Thursday. And then Friday it dies off because people are like hey I don’t want to press it. I don’t want to be that engineer that stock on Friday checking something in and I broke the build or put in something that we never revert for the next week. And so you sort of see that similar spike on Thursday, Friday morning depending on which times that you’re checking in at.

And that kind of felt like a good model. And then we handed off, like I said at Friday eventing time we cut a staging build, and we hand that to Testlio and our internal teams to all try and basically use for the weekend and then on Monday we’re able to take that feedback and use it. And so two levels. One is that we are able to look at the data over a two or three day period because from Friday to Monday morning, and hey was there any top crashers and that’s a separate build that we have hot key and telemetry on that we can make sure that the numbers look good.

We have your kind of Testlio speed mac around specific use cases that were either new or broke or were bugs that we had fixed or things that we had thought. So we write very detailed release notes of what changed. We have a script today that just generates that from our source control system. So people check in, you must check in with a bug number and what you did so that makes it pretty easy to print out a list of new things that changed, things that we …. bug fixes and developer notes, hey we updated this as best we can, we changed this OS and so it kind of gives us a nice little chunk of … at least for us, ourselves to know what changed but also to share with partners like yourself, so they know where to look and say oh hope they did a bunch of work and changed the network set, let’s go and do network set testing and put a bunch of randomization in the network set to see what happens.

So I think on some level, our team gets to focus on adding telemetry and making sure that as we see problems we are able to debug them quickly. So we built a tenant’s tooling around telemetry and diagnosing issues and being able to use in our support, we can, when reporting issue. In the app, that generates the long process telemetry needed we get all that captured so we can analyze that and look for that problem. That not only helps as a validation in depth stages, but it is immensely popular and useful in the production environment because we have this huge scale, we can’t talk to all users, we have been at support, we talked to nine to ten thousand people a day for our support tickets and we talked to them in few minutes.

Engineering can’t look at 9,000 tickets and so actually process those through our support team. They’re very efficient but they’re able to bubble up the one to few percent heyers so really nearly one out, think you’ve seen all the heyers, five or six people that are having this specific problem, maybe there’s hundreds of people. Engineering can go back and grab a snatch on all of those, they already have its telemetry and data and that again let’s us focus our resources on the most important piece which is diagnosing those problems and solving quickly, because we sent the work upfront to engineer a ray, kind of debugger or pipeline for diagnosing issues.

Kristel: I have a mixed question on … We have a couple of questions from the audience as well but I’m going dip, big dip, into these a bit later on but my next question was actually when we were talking about speed and like hearing your customers’ feedback, then how do you handle big feature releases at Microsoft? For example, you introduced focusing box or do you engage with a data usergroup where you gather that data or do you do meta-releases? What’s your take on that?

Kevin: All of that. So we have multiple rings. We have the dev ring, which is like people on the development team, which is 40, 50 people, putting product people and kind of like just people that are very close to development. Like if you know the name of 80% of the engineers then you’re kind of good.

Like A because you’re ready to give them feedback and that’s great and then if you move out to what we call the dogfood or Microsoft ring that is basically everybody at Microsoft, which is obviously 110,000 full-time people and other 120,000 contractors, so a quarter of a million people that have the ability to use one of our mobile apps as you would imagine goes through a phone and so they do. We’re able to get that testing. So that’s our weekly testing and then running that ring we’re able to get internal feedback, this is safer in the sense that they work for our company. The next ring is kind of in parallel with the dogfood is external beta. So for Android we’re able to have that being public so its public can go and get our beta builds and test them weekly with us.

If you’re on iOS it’s a little different because of the way test like works so we have to opt-in with that, so there is a way to cluster that for folks that want to test on that and be a kind of early validation.

Then we roll to production. In production, in a case of a large feature like focused inbox, it’s rolled out percentage at a time so we’re able to increment it all the way up to a subset of the users, step by step, hour by hour, day by day, or any other rate that we want to up hit. Then from that again watching key telemetry like dude, is the user session time changing the way we expect it? Do all the number of sections, the number of composed emails going down to inboxes, key KPIs, to understand if those changes are in a good or a bad way, to decide whether if we go to the next phase of the roll out.

Now, we’ve started to do some of the work that Microsoft has is experimentation pretty much. We’re able to almost automate the roll out. So what we do is basically we setup an experiment and say, we’re going to make a change, we’re going to move this spot and change the color of something, and that’s an experiment.

We mark that as an experiment and say, hey here is the case looks like A, B, C, three different options and it’s randomizes that across a small set of user base. Then each of those, it’s got a scoreboard tied to it. So we have a list of expectations. We expect these five KPIs to not change or not degrade. We expect these two to get better or not be harmed and then this one, we expect to go down, as an example. Then as that experiment automatically rolls out, it looks at those KPIs and in each step of the way it would say 1%, 2%, 6%, 12% and it rolls out and the incremental increasingly phases. As long as those KPIs stay within range of what we expect, it’ll roll out to 100% production automatically based on the experiment.

If the experiment fails, we’re like number C is bad, they go basically stop the roll out of C and then continue with A and B and then eventually it will get up to one answer, which is the one that solves the kind of the goal of the scorecard. So setting up these goals and scorecards, we have a pre-event system inside Microsoft but there is lots of tools out there that help do experiments and help you test things like that, and kind of launch things in a phased way. But the important piece is not just testing for some high level metric but actually understanding what are your key metrics, what is the retention, what other things that are really the most important to your app. If your app is selling something, is it impacting users buying things in a good or a bad way and if so, you know how to make that decision to stop the roll out or pause it.

Kristel: We now have another question for you actually, yeah let’s say we’ve now released a feature to production and that you really need to monitor user feedback, the question comes from App Store review, so how Microsoft handles App Store review from the end users? How do you handle that?

Kevin: Yes, great question. So we’ve just changed that process recently as probably the question person knows, but we can now as a developer respond to that rating, which was always possible on the Android Play Store but not possible on iOS until just recently the latest [inaudible 00:42:20] iOS have to unable that.

So in the process we would grab the data and test it, it’ll actually analyze that and give us report every month. We did one time reports to look at a snatch of our support data, a snatch of our app ratings, and a snatch of our user base feedback for feature value. We found in support, we would be getting lots of questions for features and what we decided to say hey, we want you to voice that and people just vote on those features, it gives them the satisfaction knowing that their vote was heard but also they can expose and see the top 10 list and sort of to say oh, I’m asking for something very rare like printing in this certain layout but then I look and say, oh like tasks, or fingerprints before was like number one for a long time so we added that in iOS 9. We can help focus with development, add some new features on the face that people love the most.

Also, specifically the App Store feedback, we would read them, a tester would basically analyze those for us and give us some raising categories where it fell into, but more recently we have a small team inside our support team that actually looks at those and responds to them. So we’re literally manually responding to them, so we have channel setup in Microsoft teams, we go on and we talk and say hey, look at the support thing, what we’re thinking, let’s talk to them, we reach out to folks and we ask them to [inaudible 00:43:41]. We’ve had since some early success, right where we’ve contacted them, via the App Store reviews, kind of developer response process, we said hey, sorry you’re having this problem, we’re about to go dig into it because as everybody knows, App Store reviews are you’re going to die by them, you want a great rating, but without being able to talk to the user you can’t help in many cases right.

So now we’ve got that ability so we basically report to them and say hey, we’d love for you to come contact us and our support. We give them a key or something so that we can find quickly, kind of prioritize their issue, they go in, they would reproduce the problem or contact us, and we’ve had early [inaudible 00:44:20] where people have comeback and we diagnosed the problem and we figured out what the problem was and fixed it, or in a particular case, worked around it or showed them like oh, that is possible, it’s just the [ewire 00:44:29] wasn’t ideal and you couldn’t find it, we’ll take that and fix it, but here is how you do it, and then they went back and actually changed their reviews. We’ve seen several reviews change form a one star review to a five star review. It’s usually that boundary, right? Like the ones who are the worst are the most critical, when you help them they’re also the most and biggest standers. So we’ve turned a lot of people in that.

It’s just a recent thing we’ve started less than a month ago and so far, we’re seeing a lot of success around that, so we’re going to continue to work on that. Again, at the scale that we’re at, it’s a lot of work to go through all those because we do get a lot of review across the apps.

Kristel: I think that’s the future as well, like your users should try a lot of the features and issues that you’re fixing as well, right? That really helps you prioritize. So we actually had a poll as well in the middle of our conversation Kevin, and I’m just seeing the results that … The question was what drive test strategy and planning? And first was 37% was “Use cases”, and second with 34% was “Product updates”, and third was 25% “User feedback”. Interesting.

Kevin: Interesting. So that’s a different thing. The product updates and new features makes sense. I think when I hear use case I think of Microsoft spreadsheet, listing of all the use case and to me that’s a little an outdated way, but then again the questions are questions, so that depends on how people think about it in answering but I would detail that, I would double click onto the use case and say are those use cases based on telemetry from the data knowing this is exactly what users, and depending what telemetry system, you should be able to see like what are the five most common click streams or tasks streams in and out, and that should be the use case that you’re testing not like, you’re not your user in most cases and the product manager or the engineering team they list up what they think the common use cases are but as we found out over and over again, we’re wrong.

So being able to use telemetry to drive what we would call the production use cases or the common use cases form production data, that to us has been the most insightful. I think similar on product updates like we all want to build the new great feature in app, make a change to the product, but we need to know for sure the needed data to prove that we’re right.

So it’s one thing to test and validate that but it’s also important to have telemetry that as you roll out this new product update, wether it’s a version to version change or you’re rolling it out in a more phased model like we do, if you have telemetry to compare say did this, made the product better, like the beauty of moving quickly in every week releasing something, you need to have systems in place to understand that you’re making the product better, like what are the KPIs that define better, you know between the [inaudible 00:47:22] you need to understand what better is, and say we’re trying to improve retention, we’re trying to improve number of actions taken, number of things bought in the store, number of people that watch videos depending on what the goal of the app is, like what are those key metrics, and is each product update or each change making it better. I think people would be surprise, like we were when we first started tracking this, of how many times we were wrong, like what we assumed would be the change was not actually the change.

So it was super insightful. Having telemetry around that and I think user feedback, it’s just you have, it’s the way of doing business right. If you don’t have strong way to get direct user feedback from your users, you’re just not going to have that signal. As you scale, you know where to process that. So I’m going to be using a set of tools but then also testing of this kind of more objective or subjective review across those data to give us a monthly report that helps us pick up some of the bigger trends and make sure we’re focused on the right things.

Kristel: Yeah, I think that’s a great answer and now I’m actually considering the poll results and focusing on use cases, product updates and user feedback, you talked a lot about your team and the culture as well like how do you really make your team care about your users as well. The question is regarding QA training, so with them ops, the lines between development and tester have become very blurry, so how as a QA manager should I now structure and train my team?

Kevin: Yeah it’s good I mean I always like to think about it and it’s like when we hire support we almost don’t train them. It sounds crazy like we really want … I don’t get the chance to train my users like I don’t get to go to my users on the App Store and say, hey all the people into [inaudible 00:49:13] and then I train them how to use email and how to use my app, and so a part of the training is really untraining them more than you want to train people to do work, like the training to do it a certain way. You really want to turn the model, to be incredibly customer focused and this notion of having user feedback come directly into your in way. I would make them go and work to support … I mean the way I would train my QA team if I had a dedicated QA team, I would cancel their QA title and put them all as [minor 00:49:38] support and say go talk to users, and very quickly they would know the pain of the users because they’re sitting at technical support class for lessons.

Again, if you have a QA already you probably have a support org or a team, so go have them sit and do [paraformatting 00:49:54], switch chairs with the support team for a range and have the support team come test the app and have the QA team go do support because we’d merge all that into one function so that culture is inverted. But if I was running a team where there was solid traditional structure, I would just invert it because there is no better way to lean how to test the app or to find gaps in your testing than to sit there and take the all the support requested for what’s wrong.

Like I said we don’t have a QA or engineering, it’s all really merged into one like our engineers sit in the support queue and they take it as they’re working on, they’re have logins to the support system to take tickets and look at that data and look at telemetry. It is just a cross system we’ve built and there is nothing more gratifying when you see somebody say wow, he wants this feature, I love it, but equally like they’re making a feedback around the heel, that’s this feature and I hate it, or I don’t understand it, or why did you break what I was doing, like users thought I changed it.

Providing that direct user feedback, that’s the game we’re in. Nobody wants to build things that users hate. We all want to build things that are great. So providing that direct can-do-it, so that they can go back to doing the kind of traditional enterprise software that’s talked about like the ride along. The engineers would go with the sales person to clients and company and sit there and listen to their business person, ask questions, or beat them up on how this is what I need, what don’t you have this feature and this is why did this, it’s so important to my business. While in engineering, you can sit, develop and separate yourself from that.

So I think in today’s world we’re shipping weekly, the velocity is so fast. We don’t have time to train people. I look at some of the teams, inside Microsoft has had a traditional model like we used these big kickoffs, you would do, and again I heard this, and some teams still operate in small offices, it’s a little tougher to change.

But if you have a clear development cycle, you would get there in [inaudible 00:51:46], on year two we start spending a year training the support staff and the sale staff about the new product as in how it works, we release every week, our changes every week, you have to be super comfortable with change and be able to read release notes, assume them, understand what’s going on, how the users are going to use this, again using data to drive that rather than thinking of like … I mean the time spent to build this training model, it’d already be updated before you finish it. You took weeks, you’d have been two weeks behind. You took a month or two months to build the training model, or run people through training, or you can have traditional support set, have like a six week training, after six weeks they’re already six weeks releases behind like they’ve lost.

So that agility of moving to every week also helps correct or right size the training modules and things like well, you can do whatever you want with training as long as it’s done in an hour on Monday, that’s it. Because if it’s more than a few hours, you’re burning out the week and then you’re already on the next version so you need to pick up changes, be able to read the release notes.

But then that becomes one of the requirements for that because our support agents need that data, so then when engineering make changes in features, that’s why we have what changed, what bugs do we fix, what features do we add, developer notes, every week, that is religion, that’s part of the training. You know in the training, that’s your ticket to get on the train. The train is running every week, you got to have that data because all the other people that are consuming the app, whether it’s testing it or doing their testing, we don’t have adventure testers there and various places of the world just like our teal is distributed, so we even have that currency to transfer that data back to your own team and similar like we would give it to our support team or our marketing team wants to go and blog post. It’s like great, whatever you can do, you’ve got a week to bust it up. That’s the model we’re running now.

With bigger features we roll them out smaller but we don’t try to special case them. Like we said hey, it’s all of it and it’s in there, it’s whether in the flight or out the flight, but once you’re doing the flight you’ve got the data and you need to go with it. So it’s a great question.

Kristel: So you mentioned that part of the QA team and I think team in general, like we’re talking about designer, engineers, product managers, like everyone has to breathe the vision for the product then they really need to understand how to use it. Then you mentioned that nobody wants to release stuff that people don’t like but sometimes there are cases in a lot of big companies that push big, huge feature or release with a new design and everybody hates it, like what do … Sometimes people don’t know what they want. How do you make it happen?

Kevin: That’s why we have multiple rings. That’s why we try not to do big huge things. You can do the big huge things as long as it’s done in that week and part of that process.

We make mistakes, we’re human, we’re not perfect, we released builds or made changes where hundreds of thousands of people were complaining or something is wrong. Two things we optimize on. One is fix it quickly, be able to roll forward and make it better very quickly. So if you make a mistake on Monday you have to fix on Tuesday or sometimes on Monday. You have to have a process in the build where you can make a change, ship it through, you’re comfortable with the data and the testing and it goes right through it. We’ve sent test your build when we change something on Tuesday and say hey, tonight test this thing, because Wednesday we’re going to release it. We’re releasing it [inaudible 00:55:15] we just want to know if there is something we can tweak or change, then we do break something and make a stupid mistake then we’re going to get fired, right? We don’t want to make … I don’t want to ship an app come close to [inaudible 00:55:27] work, like the fundamental promise of what the app is, you need to have a way to ensure that.

Have testing rooms, have rings at least to get early feedback. No surprises, right? You never want to surprise your boss, you never want to surprise your team, you don’t want to surprise your users like hey, things are changing this why we changed it, here is the testing, we had a data group, get in it. Customers that are the most vocal, we bring them closer to us, without questionable way.

If they’re customers like, I can’t believe you did this or did that, oh sorry, come closer, like you can be at the earlier rain like come be on data rain. Give us more feedback, we want that feedback because the most vocal users, the most vocal customers, are also the best partners in what you’re doing. I always think of them as the coach when playing sports like once the coach starting to stop yelling at you, he gave up. They’re like oh they’re not … football practice and kind of lagging behind, they’re not yelling, they’re like oh yeah, he’s not yelling at me because I’m not playing on Friday. Very simple. They’re like oh you’re going to be slow and slack, no problem.

So I think in that same way when your customers stop complaining, they’re not using your app, and they stop giving you … The reason they’re giving you feedback because they care. They really, really want the app to work. They’re not giving you feedback because they hate you. They may see bad what you did, but you have to revert that and say they’re giving me feed … Those are yes, like that feedback or those bad support role clusters, those are all gifts, they’re giving you their time and they’re giving you that data that you can go use to improve. So if you have to take that feedback and receive it as a gift, and thank them for it, not be like oh, dude another [inaudible 00:57:04] their app have slown, doesn’t have the speech. That’s the stupidest speech ever. But you’re not true user. So thank them for that. How do you figure out okay, will this work for you? Can you comment on our beta? We’d love that feedback, thank you so much for being so critical of us, please continue that.

As you build that, you build fans. You’ve got people that are invested in you being successful, being better, improving.

Kristel: I really like the one that you said feedback is gift and I totally agree with that, it’s crucial for building products, and even when we came out with Testlio we really needed that feedback from our potential customers who would be using the product as well, right? So it really is a gift.

As we’re getting closer to 11 a.m. here at pacific time, I would like to ask my last question and we might have a poll as well coming up.

The question is how do you measure quality? I think there are companies out there that think about quality in terms of how many issues do we have in the backlog, what’s our App Store rating, there are multiple things, how do you do that?

Kevin: Yes, so we’re on multiple things right? We have an NPS score, so kind of the traditional way like 0 to 11, 1 to 5, and other scale that we’re using but recently looked that for like hey, hi, up here, people with the app from day zero to day seven. We obviously look at App Store ratings and that’s probably the one that I take, there’s no one to sleep over because with the App Store scale, with the diversity of the feature, with diversity of the content, with the diversity of the user base, like it’s incredibly hard. But the goal, like we always talked about this, five star app released every seven days on both platforms simultaneously, that was like kind of our motor at Acompli. We know what the goal is, the no stars, five stars, and we got there a few times with the company but the shoes that we’re filling as a cool kid startup in Silicon Valley was very different from making yourself out looking Mohegan in the shoes of giants. That’s three decades of no application with expectations that are well regarded, really.

There is a massive number of features with a massive number of expectations. You remind yourself, you should be perfect, you should never release a bug and you should find everything. But we’re human, right? So we try to make that culture as inclusive and as open and all our diversities to extend upon that.

So we look at App Store ratings, locate the really hard core numbers around quality, around crash-free sessions like how often as you know, how many nines you’re going for. You talk about three or five nines, and like whatever your numbers in the demos roll, you similarly want that in crash-free sessions, right? How can you get three or five nines in crash-free sessions for your users and then what other cases, what other devices have you drew down and learned from that.

We’re looking telemetry, we’re looking retention. Retention is probably … If you had to pick one thing specifically in mobile apps, there is nothing more important to focus on than retention. Retention will make everything else like great retention makes everything else easy. Like if you’re acquiring users through organic or you buying them or whatever, if you have great retention, like they’re staying with you, which means they like it, and then as you get more users up to the very efficient tunnel, if your retention is terrible and you’re losing 90% of your users or 80% of your users, you’re like well, every time I’m going to buy 10 users I’m only really getting two, so now I’m my costs are higher, whether it’s organic, buy or in the sense you’re doing brand marketing or it’s just organic because they’re coming in, if you’re losing them you’re going to have hard time growing if you’re not able to retain your users.

So I think if I had to pick one thing it’d be retention so we look at that closely and the other one is the App Store rates because that’s kind of your scoreboard. In the old days of enterprise software, there is no product scoreboard, people were like oh, this sucks, this is great or no. There’s no data but in the app world, the scoreboard is very public and there is nothing worse than waking up and seeing your app ratings are down, and not knowing how to fix this and get on it. So we look at that often and it keeps us employed but it’s also one of the things you work on really, really hard just trying to figure out how to constantly improve, and close that gap, and build processes to make it better so that we prevent mistakes, or prevent regressions and things that we don’t want to impact our apps.

Kristel: Yeah, thanks. The last question. We have a couple of minutes left. How do you see the future of testing? What role manual testers will play in the future? What do you think? Because I think a lot of testers … There was even a question coming from the audience that some people might be concerned about their role as a manual tester like what do you think? What’s the future of testing?

Kevin: I think that as I said a little bit earlier, the manual tester of the past, of the one where they get a list of excel, you know test cases, or they’re in some test case software checking off different apps [inaudible 01:02:18], that tester is not going to last very long. I just don’t think there is a lot of ROI and value in it, it’s not efficient because it [inaudible 01:02:26]. So I think the kind of manual tester that’s going to be rewarded or seeing kind of future is … A manual tester that’s not just a tester. They go in and look at the App Store reviews like oh, I’m testing this app, where are the app reviews? What are people complaining about? We love it when folks in the team go and look at one star review and say yeah, I found like five apps one star reviews in this particular area and I found a test that will produce an [inaudible 01:02:52] like that feature is missing or that thing doesn’t work on the case, saying that’s correct, let’s go fix it.

I think the manual tester who’s being able to make, I always think of it as you know, Facebook has this model like the engineering rewards are really tied to developer efficiency. So I talked to my team and said if you can find a way to reduce the time to build something in half, like the [inaudible 01:03:11] build time is five minutes and you can reduce it to two minutes, you literally can take the whole [inaudible 01:03:15] because by just setting the team, like the number of developers on our team, the number of builds that we do every day, if you can reduce that by half you’ve made everybody more efficient, your job is done. So making the engineers more efficient and making the product team more efficient like there is no bigger reward from an individual case.

So if you think of a manual tester, if you can go and distill data from a very wild source like the app reviews or support tickets, that’s what I said, you want to sit in support, go sit in the support queue for a while and handle support tickets for a couple of days I can guarantee you’ll come up with better bugs than running that test that you’ve been running for years, right? That will give great insight for data and you can comeback to the engineering team and say look, here is what I found. Like it models a QA person in how to the QA role, that’s just your job, I would go and work all the extra hours, sitting in support, looking at app reviews and coming to work and I’ll do my checklist if that was my boss’s job but I would really detail it, they want to say oh, here is this great issue, set down this thing that was missing, or I found this common bug, and this is going to impact our star ratings because as you know, 10% of our one star reviews are this, or 12% of our tickets are around this issue, if we just make this one change to [inaudible 01:04:19] and the way we log in, the way we handle errors, that would be just more efficient.

It’s very hard for machines to do that, to understand from a specific app words that the [inaudible 01:04:31] oh, maybe it’s unclear, go do a search on whatever is your app and get a bunch of feedback where great people are complaining or call it our Facebook group or if you have forums, like whatever is this. Because people will find a way to complain into this world or give you feedback to get yes. So we’ll collect those yes, organize them, comeback and say hey, there is three things that are super impactful to improve the product like that’s manual work but the leverage that you have to take that kind of data and distill down to something that’s super actual. If I was managing a team of new managers, [inaudible 01:05:01] coming back and saying damn, I’m busting up these bugs that are not bugs, they’re like this is down to my office data to prove that like 10% of our users are seeing this like that’s way more important than like oh, line 72 on coding table B failed the test for the seventh time and I told you guys go finish this.

Like that’s a value [inaudible 01:05:25] but probably not. People are using that feature that much or go find the things that they care about, go dig into those [inaudible 01:05:32] become a data geek right?

I think there is clearly room for smart people that require time to go do that, the question is how do you direct that manual entity if you want to call it that into the most value that you can drive for the team or the project.

Kristel: Yeah seriously I couldn’t agree with you more, and I think the more time goes by there’s going to be more blurred lines between different roles in the companies but by the end of the day quality is team’s work and I think it’s, for everyone, really important to understand how your users think and what makes them happy and what doesn’t make them happy and really have the data to prove it as well like what target, how are we going to prioritize the changes in the app as well.

So, we are now out of time unfortunately. I would like to chat with you forever Kevin. Thank you so much for taking the time again today.

I also wanted to mention that we got a lot of questions and we will definitely follow up with these if we didn’t get to your question. So we’ll probably write, send an email afterwards, after that we’ve been-

Kevin: Yes, email. Reach me out or something and I’ll get back to you guys and we can sort it out.

Kristel: Thanks so much.

Kevin: Thanks.

Kristel: We’ll make a follow up as well.

So thank you everyone for coming and thanks Kevin a lot for this awesome chat, and yeah.

Kevin: No problem, glad I could help. Take care. Bye.

Kristel: Bye.

AUDIENCE Q&A

Questions

#1: As a QA Manager of a product that is an internal service or an API, how do you allocate your team’s resources, given the fact that you have a limited amount of user feedback and these users are other product teams that use your service in their implementations?

KH> The way we do it in our team is use passive signals to track usage and telemetry of the API. Performance, error counts, latency, call frequency, etc. This data will help inform you what those teams are using and your goal should be to improve these metrics with each release ie making the API/service faster more stable and better each time.

#2: Where does defect prevention fit into your process?

KH> We have some automated unit tests and integration tests as the first line of defense. The primary way we prevent defects is dogfooding our app. Each commit is pushed within a few minutes to a live dev system used by the entire team. Once a week the entire company gets updated with the latest code. We then rollout in production in rings so small percentage ramp. At each step we watch telemetry and use in-app support for feedback or to capture issues. The mode of roll forward vs roll back. Always be increasing quality and if you find an issue have the processes and ability to fix it fast.

#3: How does Microsoft value manual testing?

KH> We do very little pure manual testing in my team. Testlio provides testing for coverage of various devices and user configuration. We do heavily dogfood our app so get lots of feedback as being an email app everyone on the team is a user.

#4: How Microsoft handle AppStore reviews from the end users?

KH> We have a small number folks on the team who reply to them. In general though lots of the engineering and product team review the reviews frequently as we deeply value user feedback. At the scale of many apps in Microsoft we can’t reply to them all but do use tools to better categorize feedback and report on trends.

#5: Why is random tester demographics beneficial vs.targeted? Assuming that you know 1) your existing user’s location/language and 2) your expansion goals

KH> You may know your goals and you can obviously lean towards your user base, but in today’s mobile world you don’t always get to pick your users. They pick you and the app store(s) allow users to get the app even if you try to geographically limit it. So a bit of randomness in your tester selection for device/validation is a good thing I feel.

#6: With relation to user feedback, CS, how do you support helpdesk multilingually?

KH> We have native language speakers for the top 15 languages or so. We use bing translation for the less than 10% of tickets outside those top 15 languages. As we see languages grow in popularity with our apps we add additional native languages. We use Helpshift for in-app support and their tools allow us to cal Bing translate on the fly both for incoming text as well as outgoing responses when needed.

#7: Why do end-users have to play the role of testers at all? Where are those times when software development process looked like: development – testing – release and not like: development – release – testing by users – bug fixing? When the release cycle wasn’t constant bug fixing. Using the approach when a/b testing and testing by dev team using some primitive use cases are the main sources of QA – is this actually possible to create a well-qualified product that millions of people love (not just use)?

KH> Production users should never be thought of as QA. Internal dogfood users yes and even beta users to some extent but only in the cases that engineering and automation testing missing something. The goal is to find issues quickly and as close to engineering as possible.

#8: How do you manage Google play and Apple app store delays in release process?

KH> We just include this in our process. Since we release weekly we don’t worry about it much. In general Google is one day and Apple a few days but we just keep the releasing each week. Which means every few weeks we end up skipping the app store for iOS as the next weekly release is already ready.

#9: Kevin mentioned pushing builds to the DEV team for manual testing. This works for a publically used app – mail, calendar, etc. How would you do this with an institutional application that is much less common?

KH> Yes Outlook is easier as everyone can be a user. In the past when I ran more line of business apps or apps the team wasn’t the user we would find ways to encourage use. Pick up simulated scenarios weekly to have the entire team participate in. Act like your users. Also in these case a strong beta community is a great way to keep feedback coming in on early releases.

#10: Question for Kevin regarding roles of the Testing team. With DevOps, the lines between dev and Tester have become very blurry. So how as a QA manager should I now structure and train my team?

KH> For us we keep the team as one so don’t face that issue today. If I ran a traditional QA team today I’m pushing them to get closer and closer to engineering and the users. In a sense become the identity of a developer when you test and a user when you think about how customers use the product. If it’s a mobile app I’d assign QA members to review and spend time in the app store reading reviews and working to reproduce those issues.

#11: I have another question for Kevin: How do you keep your automation bed up to date if the main functional processes keep changing? Is there any guideline like no more than 30% of key functional processes should change so that I can go back to my business team and explain to them how their constant changes is impacting negatively on automation and CICD

KH> Today we have very little UI/UX automation as that would expire and be out-of-date too quickly. Rather than try to slow down the business you want to try to adapt to change and help the business speed up. This is key as apps and new companies more faster and faster. The goal is to move faster not look for ways to slow down.