We have an active research program on virtual assistant technology. The following outlines the research problems that we are actively pursuing. Published work can be found under Publications.
Accuracy. Our goal is to create a dialogue tool set that can automatically bootstrap a semantic parser for a given domain at an accuracy of about 80%. After that, we expect that the domain expert will use our development tool to refine the parser. We are pursuing three directions: (a) generating domain annotations automatically using pretrained networks, (b) better named entity recognition, (c) a more comprehensive library of generic templates.
Extending our conversational model. We have created and are actively refining the conversational model for queries and transactions based on structured data, such as booking a restaurant or playing a song. We plan to extend this to handle other use cases, such as answering FAQs and on-topic discussions.
Scalability. We have created conversational agents for 6 of the Schema.org domains: restaurants, hotels, people, music, books, and movies. How do we scale to all of them? And how do we handle Wikidata in a scalable manner? We are exploring how to teach neural network the concept of type systems.
Error recovery. Today's assistants tend to just answer "I don't know how" whenever it fails to handle a command. The assistant should be able to use its conversational capability to resolve mistakes. We are exploring how to use confidence scores on different components of the sentence parsed to seed a conversation to resolve the error. Any resolved error will provide an important training sample. Additionally, the system must also learn from user corrections.
Localization. People want to talk to their assistants in their native tongues. How do we leverage machine translation to avoid redoing all the work done on English? Promising results were obtained by training neural networks with translated versions of synthesized data substituted with entities in the target language. We are initiating an open-world collaboration effort to develop and make available this technology in all languages. Currently, we are working with teams in Inria, France, and Yonsei University, Korea.
Development environment. Just like there are over 20M web developers, there will be over 20M voice web developers. We are investing in development tools to facilitate the process, from classifying errors to making changes to domain annotations, generic sentence templates, and conversational models.
Assistants will be able to hold conversations across many topics in the future. This research builds upon the Chirpy Cardinal project, which won the 2nd prize of the 2020 Amazon Alexa Socialbot Competition.
Longtail Skill Creation. Virtual assistants today can only handle the head, and not the long-tail, commands. Our goal is to let users automate anything they do on the web, We are creating a multi-modal system that lets users create skills by demonstration on their browswer, using voice together with their keyboard and mouse.
Customer-support agents, by leveraging forms on the web. Customer-support calls are in a sense sophisticated transactions. An important subset of these calls can be modeled as filling a form on the website. The Genie toolkit currently translates API signatures into a dialogue agent. We plan to generalize API signatures with web-forms and automatically generate a multimodal agent that facilitate such customer-support calls.