To understand diverse natural language commands, virtual assistants today are trained with numerous labor-intensive, manually annotated sentences. Companies like Amazon and Google use proprietary technology to train their linguistic interface with skills and sample data provided by the developers. Such interfaces are unavailable outside the platforms, and reproducing them requires prohibitively high investment, which is not affordable for all but the largest companies. In the meantime, the existing proprietary linguistic interfaces can only accurately understand simple, popular commands.
We envision a future where cost-effective assistant technology is open and freely available so every company can create and own their voice interface. Towards this goal, we have developed Genie, a new methodology and a toolkit, called Genie, which makes it easy and cheap to create conversational natural language interfaces.
Instead of relying on manually annotated training data, which is not only expensive to acquire, but also often bad annotated due to human errors, Genie takes a different path. It automatically synthesizes a large high quality synthetic dataset based on the skill manifest.
This is made possible with our executable representation language ThingTalk, and a massive set of domain-independent templates that captures the grammar of natural language. Then Genie trains a contextual semantic parsing with the synthesized dataset and the resulting model can be deployed with a Genie server to provide a voice or web user interface for the skill.
Genie generates high-quality synthetic data with a large, comprehensive set of templates, that captures the grammar of natural language.
For example, the templates make use of Genie Annotations for each parameter and construct utterances according to its Part-of-Speech (POS): cuisine
parameter in restaurant domain can have annotation in active verb POS "serves <value> food", and Genie will synthesize utterances such as "Search for restaurants that serve Chinese food" and "What restaurant serves Italian food?"; cuisine
can also have annotation in adjective POS, with which Genie can synthesize "Show me Chinese restaurants". A detailed reference to Genie templates is coming soon.
Genie also provides an auto annotator to generate Genie annotations automatically by extracting how commands can be described from large pretrained language models. To further improve the variety and naturalness of the training examples, Genie uses a neural paraphraser to automatically generate paraphrases of the synthetic sentences.
All synthesized examples by Genie are automatically annotated with ThingTalk, a precise, extensible, executable representation for dialogues. It includes multiple turns of user input, results from the user request, and the agent's response. We show that it is precise enough to capture 98% of the turns in MultiWOZ 3.0.
You can find a more detailed introduction of ThingTalk here.
Genie trains a contextual semantic parsing (CSP) model with the data it generates. The neural model is fine-tuned from the pretrained BART model. The model encodes a concatenation of the formal dialogue and the user utterance, and generates the user state as its output.
Once the model is trained, it can be deployed with a Genie server which provides the backend runtime for the skills and a voice/web interface for the users. Check the deployment step of our step-by-step Genie guide for more details.
Genie assistant is an open-source, privacy-preserving virtual assistant built with Genie toolkit.
Genie is available at https://genie.stanford.edu. Check Try Genie for more details. Alternatively, you can set up Genie locally on your own device. Check this installation guide for more details.
Genie assistant supports some of the most popular commands for virtual assistants, such as weather, Spotify, Yelp, and IoTs. Check Genie cheatsheet for more commands that are supported by Genie.