Name: Ashish Jha
Location: New Delhi, India
What I Do: Conversational Design Engineer
What's your background, how did you get started in voice and what are you currently working on?
I started in voice tech back in late 2017. I believe the first time I heard about Alexa was in an episode of Mr. Robot where the character talked to a device and it lit up and spoke back to her. That was quite mesmerizing for me. Later in the same year, the Alexa team had their first-ever workshop in India and that happened to be in my city. I attended that session and at the end of the day, I had this gut feeling that this is a field I have a good chance of succeeding in.
I'm currently a final year engineering student here in India and I'm majoring in IT, maths, and management. As a student, I tried a lot of different paths like mobile development, web development, and all the common ones which we all go through. But in most of them, I had this feeling like I am trying to put in hard work and at the end of the day, I may excel in it. But when I started building for Alexa, from the very initial point I was getting the feeling that I have a real good chance of excelling and that's the difference which tempted me to go forward in this field.
Currently, I'm working as an independent contractor to build Alexa skills and Google actions. I usually work with some voice agencies located in the US and UK. Apart from that, I build skills of my own for hackathons and also work on freelancing platforms like Fiverr and Upwork.
Can you tell us more about your experience with voice app hackathons and what your thought process is?
I have won first place in three hackathons and been in the top three in around six - all of them were conducted in India. I’ve only participated in a few international hackathons, but I haven't been able to win majorly in any of them yet - hopefully soon. But in India, I have won a prize in every Alexa hackathon that has been conducted.
One thing which works for me in a hackathon is that they have preset themes. So my ideas are bound by those themes and all my focus is aligned in that direction. When I'm building a skill which is not for a hackathon, then it's in an open world environment and I have a lot of things to think about, different directions I can go in. But say that there is a hackathon and the theme is productivity, then I only have to look into that direction and think of ideas of how Alexa devices can make the user's life more productive.
Once I have the theme then I start bringing in ideas from my daily life. For example, I recently built a skill called Learn Lingo for an Alexa hackathon where the theme was productivity. Learn Lingo teaches users new languages like Spanish, German, and Italian through gamified methods. Since I mostly stay at home these days, I picked up learning languages through Duolingo, and some of the ideas from that aligned in the same direction, so I thought why not?
The one thing I like most about these language learning mobile apps is that they speak back to you and you hear the pronunciation rather than just reading the word. Alexa seemed like a perfectly suited medium for that because it can speak back in a foreign accent and you can pick up a new language much easier that way.
One observation I’ve made is that in most of these hackathons, any element of gamification actually increases your skill experience and your chances of winning. So rather than just having a productivity skill, try to add some gamification elements like leaderboards, quizzes at optimal times, badges, or streaks. All those elements and some good quality audio massively contribute to how your skill stands out from competitors.
“One observation I’ve made is that in most of these hackathons, any element of gamification actually increases your skill experience and your chances of winning.”
Finally, once you have built these features, a good demo is all it takes. In offline hackathons, the demo is the most important thing. Even if you have a half-baked product, if you can demo it working perfectly then the game is yours because you’re not expected to have a fully-fledged product in 24 hours. You just need an MVP and it should showcase everything that you have. In online hackathons, it's the reverse. You have to have a fully-fledged product with a good demo and everything. That's my thought process behind every hackathon.
Do you believe tactics like gamification which help you in a hackathon translate over to the success of a real-world voice app?
Yes, because the idea behind every voice app is getting the user to come back to it. If you can retain the user, that’s what’s most important in terms of building a voice app. And retention involves not giving the user the same boring and monotone experience every time they come back to your app. You have to add elements like session attributes so that you can make the greeting different each time instead of just saying “welcome back.” You could also say something like “good morning” by taking the time of day into account. It’s the little things that really enhance the user experience. If you are building an educational skill or anything which involves learning - adding quizzes or giving the user some badges for their streak enhances the experience. I’ve found that these methods help a lot in making sure that the users return to my apps.
When developing voice apps, what's the biggest challenge that you've had to overcome?
My biggest challenge while developing voice apps is working under people who don't understand how a voice app works. For example, if I work with technical managers who are coming from a mobile or web app background, the hardest thing is to convince them that what works there doesn't work here. I usually have to go through how the design differs, how to rethink all the interactions from a voice-first perspective, and how Alexa can not capture the entire user input and maintain context or states by itself. All of these technical nitty-gritties are the biggest challenges right now. When I first started, the biggest challenge for me was definitely getting the resources to learn. I started almost two and a half years ago, and at that time it was a very niche community so I only had the docs to refer to.
In 2020 what would you say is your most useful resource?
I mostly refer to the Alexa Devs Twitter and Twitch channel, and also to some podcasts (links at the bottom). I’ve been going back to concepts like design guidelines and monetization strategies which is something I need to work on because of the skill limitations here in India. In-skill purchases don't work here and we don't get developer rewards for skills built outside of India. So, monetization is not something which people here focus on yet. Instead, they just build skills and let it go out in the wild.
Initially, I started with the dabble lab tutorials and, to be honest, I feel that's the one-stop-shop for every voice dev right now because they cover everything. The way Steve explains concepts is superb. You can easily understand him and whatever he teaches and I definitely have a lot of respect for all the work Steve has done.
Apart from that - meetup events. If you are an experienced voice dev, it's really fun to just engage with people from all sorts of domains because in my meetups that I conduct or used to conduct before the pandemic in Delhi, almost 80% of the crowd which showed up were beginners who had no previous experience with voice. So hearing their thought process behind every design jam that we had was quite useful. It was like an idea bucket for me as well because these people from all sorts of domains turn up and they bring different elements to the table. So yeah, as an experienced developer, these events can be a huge idea market for you and if you are just learning you get to meet a lot of people who already have their foot in this domain. It's a win-win for everyone.
For someone who is just getting started, what's the number one piece of advice that you would give to them?
I would say don't be shy or hold back in getting started in Alexa development. Most of the students I meet, the first question they ask me before even getting to know what Alexa or Google assistant is, is what is the scope of this thing? Can I get a job doing it? Can I make a career out of it? These are questions they start asking me straight away before sitting through the session or webinar or whatever I'm involved in.
The industry is still very new and even the most experienced person in this industry will have at maximum maybe five years experience. Compared to mobile or web where you have people with experience that is more than my age, it's pretty good to be in this field where it’s still pretty new. We are still in the days of early adoption and if you devote time to it and are always open to learning, you will definitely achieve a lot here. Since it's a new industry, all you have to do is work hard and have faith in your dreams and goals and get involved with the community. It's a very small community right now and everyone is so welcoming here.
There are lots of opportunities to learn and getting on any of these platforms is super easy compared to trying to learn to be a full-stack engineer, so the barrier to entry is also pretty low. If you can't code, there are tools like Voiceflow which can help you. So that's my only advice to voice devs, don't hold back because of concerns like “is there a career opportunity here?” Look at it as a hobby initially and if you do enjoy building for voice, you can always continue with it.
What would you say is the most important part of building a voice app to get right?
I think in the complete process what makes the most sense is how we handle out of context queries. In my experience building voice apps, most of the time the user can say anything and they will say anything inside the app at any point in time. They will say something like a random “c'mon” and if your skill breaks down at that time, that's the worst experience you can give which can definitely lead to one-star reviews. I’ve looked inside some of my skills and users are trying to say random words like “banana”, “chicken”, and anything you can think of.
So, we all talk about designing a good experience or building a complicated skill with multiple features but what ties it together and makes it complete is how we handle any of those out of context questions and also how we maintain the states. For example, if the user says help in one part of the skill compared to another part, they should get a different response. Managing states is something that I believe is the most important in any skill for every interaction. For every response that Alexa gives you should try to have a specific fallback message, a specific reprompt message, a specific help message or something like that. You have to keep the conversation open at all times and control the flow and use states to the maximum.
What do you spend the majority of your time on when building a voice app?
The most time I think is spent on defining the architecture because that's the phase where you also partly do the scripting side, the dialogue side, and also check for potential issues. Like this might not work, this doesn't work on this platform because of whatever reason. In building any voice app, I usually work with some voice designers and they initially give me basic requirements and an architecture and then I have to figure out what doesn't work on Alexa.
Once we have that architecture finalized, then we just need to write the scripts and build it in code and building it on the code side is definitely easier because I have approved that architecture and I have built some skills like that so I know that is achievable. The only important aspect is finalizing the architecture where we have to work with the voice designers and make sure we are all on the same page before getting started on the build side. It really comes back to bite you if you design a V1 and then directly code it all out before getting the architectured reviewed. Then you send it to the user for feedback and if the user is like, no, no, there are too many questions then you have to redo it and it’s back to square one. So, it's better to get all the expectations right with the client, designer, developers, all of them need to be on the same page from day one.
What are your goals for 2020?
My goals will be definitely to win an international hackathon. That will be my number one goal. I hope the Alexa team conducts one soon and I build a super cool skill for that. Apart from that, I would like to build more personal Alexa skills, mostly games, and I would also like to devote more time to building for Bixby because currently I spend almost 90% of my time building for Alexa and the rest of it building for Google assistant. I did start learning Bixby Capsule Development but I haven’t quite caught up with it. Learning it will be another of my goals this year. And finally, I would like to make a skill that earns me some developer rewards because I have tried and achieved a lot of things here but developer rewards is something that has always eluded me.
You can try my latest Alexa Game here - https://www.amazon.com/dp/B089GVJQYS
List of resources mentioned:
Connect with Ashish:
Voice Apps Mentioned:
Amazon Alexa Twitch Channel: https://www.twitch.tv/amazonalexa?linkId=90519427
Dabble Lab YouTube videos: https://www.youtube.com/channel/UCfY-LopSxGekh9LruXLjffg
VUX World Podcast: https://vux.world/podcast/
The Artificial Podcast: https://www.stitcher.com/podcast/the-artificial-podcast-2