This documentation is outdated, update coming soon.
Almond runs command in ThingTalk, but it speaks English. In the following, we will introduce how to get the best natural language support from us so that Almond understands users' commands related to your device and presents the results to the users properly.
To understand natural language, Almond uses Genie, which is a tool that is able to generate many hundreds of thousands of possible commands based on a general grammar of English and a few annotations and domain-specific templates for each Thingpedia device.
Each developer provides the domain-specific information for their device as canonical forms associated with each API element in Thingpedia, as well as primitive templates, which annotate entire snippets of ThingTalk with a short phrase. The natural language understanding ability of Almond heavily relies on the quality and quantity of the canonical forms and primitive templates.
Canonical forms are short phrases defining how to compose commands that use a specific API element in Thingpedia. Canonical forms are attached to a class, function or parameter using the #_[canonical]
natural language annotation. For example, here is an example of the the natural language to talk about songs in Spotify:
class @com.spotify
#_[canonical="spotify"]
{
list query song(out id: Entity(com.spotify:song)
#_[canonical={
default="base",
base=["name", "title", "track name"],
passive_verb=["named #", "called #", "titled #"],
property=["name #", "title #"]
}],
out artists: Array(Entity(com.spotify:artist))
#_[canonical={
default="preposition",
base=["artist", "author", "song writer", "band", "artist name",
"singer", "composer"],
property=["artist #", "song writer #"],
preposition=["by #", "from #"],
adjective=["#"],
passive_verb=["written by #", "released by #", "produced by #",
"composed by #", "recorded by #"],
verb = ["# wrote", "# released", "# produced", "# composed",
"# did", "# recorded", "# sang", "# made"]
}],
out release_date : Date
#_[canonical={
default="passive_verb",
base=["date", "release date", "release year", "release day"],
property=["release date #"],
adjective=["#"],
preposition=["from #", "in #"],
passive_verb=["released in #", "released #", "published in #"],
adjective_argmin=["least recent", "oldest", "first"],
adjective_argmax=["most recent", "newest", "latest"],
passive_verb_argmax=["released most recently"],
}],
out popularity: Number
#_[canonical={
default="base",
base=["popularity", "average popularity"],
passive_verb=["rated # popularity"],
property=["# popularity"],
adjective_argmin=["least popular", "most niche"],
adjective_argmax=["most popular", "most well-known", "best", "greatest", "top"],
}])
#_[canonical=["song", "music", "track"]];
action play_song(in req song: Entity(com.spotify:song)
#_[canonical={
default="base",
base=["name"],
preposition=["named #", "called #", "titled #"],
property=["name #", "title #"]
}])
#_[canonical="play a song"];
}
From this example, Genie is able to generate commands of the form:
The canonical form of each function defines how that function is referred to in natural language. For a query, it defines what the query returns, as a noun phrase. For an action, it defines what the action does, as a verb phrase. The canonical form of a function is a string, or an array of strings.
The canonical form of each parameter defines how the parameter is specified to the action, or how it can be used in natural language to filter the result of the query. The canonical form of a parameter is an object, whose keys are the following:
base
: the base form the of the parameter, used in explicit filters such as "with author equal to Taylor Swift" and explicit projection such as "what is the artist of Blank Space?"#
character replaces the value_argmin
or _argmax
, which is used to form a ranking question_enum
, which is used to provide domain-specific forms for enumerated values (example)default
: the grammatically preferred form to use, which is then given slight priorityAll canonical forms should be lowercase. They should use no punctuation. They cannot use any placeholder, other than the
#
placeholder for a filter canonical form.
On occasion, especially with actions, the general grammar templates are not sufficient to capture the variety of real-world commands. In that case, one can specify example commands in the form of primitive templates. Primitive templates are composable snippets of natural language and ThingTalk code that allow Genie to construct complex commands (with multiple joins and filters).
Primitive templates are specified in a dataset
file. Here is subset of the dataset
used by Spotify:
dataset @com.spotify language "en" {
action (p_song :Entity(com.spotify:song)) := @com.spotify.play_song(song=p_song)
#_[utterances=["play ${p_song}",
"i would like to hear ${p_song}",
"i would like to listen to ${p_song}"]]
#[id=27924741]
#[name=""];
program (p_artist :Entity(com.spotify:artist)) :=
now => @com.spotify.song(), contains(artists, p_artist) => @com.spotify.play_song(song=id)
#_[utterances=["play ${p_artist:no-undefined}",
"play some ${p_artist:no-undefined}",
"i would like to listen to ${p_artist:no-undefined}"]]
#[id=27924875]
#[name=""];
}
With these primitive templates, and combined with the previous class definitions, Genie is now able to generate commands of the form:
In particular, primitive templates for actions allow to compose an action with a complex query ("play" + "the most popular song released this month"). This composition is based on the common type Entity(com.spotify:song)
. Primitive templates also allow to express common constructs that refer to queries implicitly, such as "Play Coldplay" to mean "Play songs by Coldplay".
When creating a new device, a dataset
file containing the example commands is required.
These example commands provide the training data for your device.
There are four types of primitive templates, denoted with the keywords action
, query
, stream
or program
. A program
primitive template contains a stream, a query (optional), and an action. It cannot be composed further. Other primitive templates contain only one of those elements, and are composed with other program parts to form the full ThingTalk command.
Primitive templates can also have parameters, which will be replaced with concrete values in the generated program. For example, given the following templates:
action (p_song : Entity(com.spotify:song)) := @com.spotify.play_song(song=p_song);
we automatically generate the programs:
now => @com.spotify.play_song(song="..."^^com.spotify:song("Titanium"));
now => @com.spotify.play_song(song="..."^^com.spotify:song("She Wolf"));
now => @com.spotify.play_song(song="..."^^com.spotify:song("Chandelier"));
...
By convention, parameters in primitive templates begin with p_
, to distinguish them from function parameters.
The parameters of a primitive template can be used anywhere in the body, not just as input parameters. For example, they can be used as filters:
query (p_artist : Entity(com.spotify:artist)) => @com.spotify.song(), contains(artists, p_artist);
A primitive templates cannot be executed by Almond right away, but it can be composed with other primitive templates and builtin functions (e.g., now
, notify
, timer
) to form a full program. For example, if we have the following two primitive templates:
query := @com.thecatapi.get();
action := @com.slack.send();
We can generate a list of full programs including:
now => @com.thecatapi.get() => notify;
now => @com.slack.send();
now => @com.thecatapi.get() => @com.slack.send();
attimer(time=...) => @com.thecatapi.get() => notify;
attimer(time=...) => @com.slack.send();
attimer(time=...) => @com.thecatapi.get() => @com.slack.send();
Thus, given the composable primitive templates, Almond will be able to generate a large number of full programs for training and thus get a better accuracy.
An utterances
annotation is used to provide different ways of expressing a command. It takes a list of strings. In each utterance, concrete values for parameters are replaced by placeholders, which can be expressed by $param
or ${param}
, where param is the name of a declared parameter of the primitive template. The braces are needed if the parameter is immediately followed by a letter, a number, or an underscore.
You also need the braces if you want to pass an option. Available options are const
(with the syntax ${param:const}
) and no-undefined
. const
means that the placeholder must not be a parameter passed from a previous function; no-undefined
means that a placeholder cannot be replaced with a generic word such as “something” or “a certain value”. The syntax is as follows:
query (p_count :Number) := @com.thecatapi.get(count=p_count)
#_[utterances=["${p_count:const} cat pictures", "$p_count cats"]];
The utterances will be used to generate the synthetic sentences for the full programs composed by the primitive templates. For example, if we have the following two primitive templates with the corresponding utterances:
query := @com.thecatapi.get() #_[utterances=["a cat picture"];
action := @com.slack.send() #_[utterances=["send a message to Slack"];
Then when we compose the full program now => @com.thecatapi.get() => @com.slack.send();
, we will generate the synthetic sentences such as: “get/search/show me/find a cat picture, then send a message to slack”.
Placeholders in the primitive templates will be replaced with:
For example, given the following primitive templates:
query := @com.thecatapi.get() #_[utterances=["a cat picture"];
action (p_picture_url : Entity(tt:picture)) := @com.slack.send_picture(picture_url=p_picture_url) #_[utterances=["send ${p_picture_url} to Slack"];
The following sentences are generated:
Hint: To decide whether or not to use const
, try replacing with the placeholder with “the” followed by a noun, for example “the message” or “the picture”. If the sentence does not flow grammatically, or it sounds awkward, then it is appropriate to use const
. If you do so, you should also think of a different utterance that allows such parameter passing. For example, this is the correct way to annotate a primitive template that selects the channel for Slack:
action (p_channel : Entity(tt:hashtag)) := @com.slack.send(channel=p_channel)
#_[utterances=["send a message to channel ${p_channel:const}", "send a message to ${p_channel}"]
Hint: To decide whether or not to use no-undefined
, replace the placeholder with the word “something” (or “someone”, “somewhere”, etc.), and compare it against “anything” or “a certain thing”. If the use of “something” is likely to mean “anything”, then you should no-undefined
. For example, this is how you annotate filters for Slack:
stream (p_sender :Entity(tt:username)) := monitor ((@com.slack.channel_history()), sender == p_sender)
#_[utterances=["when ${p_sender:no-undefined} messages me on slack"]]
The motivation is that sentences of the form “when someone messages me on Slack” should be interpreted the same as “when anyone messages me on Slack”, and not “when a certain person messages me on Slack”.
Hint: no-undefined
does not make sense for required parameters, because the user always
specify a specific value before the program is executed.
The rules through which Almond generates full programs from primitives template are defined in Genie.
By default, the utterances for a query should be noun phrases. When we compose the sentence, we will add generic verbs before the noun phrase such as get
, show
, search
. As in our example, we have the utterance “a cat picture” instead of “get a cat picture”.
Using noun phrases is necessary for parameter passing. Going back to the previous example of cats and Slack:
query := @com.thecatapi.get() #_[utterances=["a cat picture"];
action (p_picture_url : Entity(tt:picture)) := @com.slack.send_picture(picture_url=p_picture_url) #_[utterances=["send ${p_picture_url} to Slack"];
in addition to the sentence “get a cat picture, then send the picture to Slack”, which is long and verbose, we can also generate ”send a cat picture to Slack”. In this case, the placeholder is replaced with the entire noun phrase of the function that generates the result to send.
For streams, write the utterances as when phrases, such as “when it's raining”, “when I receive an email”. For actions, write the utterances as verb phrases in the imperative form, such as “send a message”.
Additionally, if you want to use a non-generic verb for your query, put a comma ,
before your utterance. The comma is a marker for a verb-phrase, and is automatically removed when generating sentences. For example, the following are the primitive templates for translation:
query (p_text :String) := @com.yandex.translate.translate(text=p_text)
#_[utterances=["the translation of $p_text", ", translate $p_text"]]
These templates (combined a Slack template) result in the following sentences:
Note: the first utterance of each distinct example will be presented in Thingpedia Cheatsheet. So put the most common and natural utterance first in the list.
Note: internally, the examples are not stored as .tt
files, so any comment or formatting in the dataset file will be lost, and multiple examples with the same code will be collapsed.
Note: the dataset you provide as part of your device submission must use American English. Translations can be provided separately, and the language
field of the dataset file is ignored.
To help Almond do a better job on handling commands with parameters, you must specify the example values for each of your parameter (of type String or Entity) when declaring the function. The syntax is #[string_values=<dataset-name>]
.
For example, the song
function for Spotify can be declared as follows:
action play_song(in req song : Entity(com.spotify:song)
#[string_values="tt:song_names"])
In this case, we tell the system to use the values in the dataset tt:song_names
as example values for parameter song
. Then Almond will replace message
with the values in tt:song_names
randomly when generating the synthetic sentences.
You can choose any of the existing datasets from the Available String Datasets page, or submit your own, taylored to your device. If your device must understand values from an open ontology (that is, you expect the user to try out values not seen at training time), it is recommended to include at least 10,000 to 100,000 training examples.
Note that example values are necessary for both input and output parameters since an output parameter can also be used in the command as a filter.
Unlike most of the other programming languages, the choice of parameter names is important in ThingTalk: it affects the performance of the natural language translation, because parameters of the same name share knowledge across devices in the neural network.
To achieve the best accuracy, we suggest to use the same parameter names as other similar devices. Here are some naming conventions to follow:
picture_url
picture_url
url
; if it accepts any URL of videos, name it video_url
title
, the blurb or description description
, the URL link
, the update date updated
and the author name author
query
caption
and the picture picture_url
status
title
and body
set_power
, name the argument power
and make it of type Enum(on,off)
low
and the upper bound high
count
parameter of type Number
In addition to understanding the user, we would also like Almond to reply back to the user in natural language when the user issue a command. The replies from Almond are controlled by the canonical forms, using similar templates as those used to generate commands for training. This is often sufficient, but sometimes the replies from Almond are clunky. In that case, you can provide additional annotations to control how Almond replies.
If a command misses a required input parameter, Almond will ask the user for the value, by asking a slot filling question. Similarly, if a query returns too many results and it's pointless to list them all, Almond will ask the user if they want to refine by adding a new filter. By default, the question looks like something “What <parameter-name> are you interested in?” (e.g. "What artist are you interested in?"). Users might not understand such a question, especially when the parameter name is not informative enough.
To give users a better experience, you can provide a customized slot filling question with a prompt
annotation for each parameter in your function. For example, in Spotify we declare play_song
as:
action play_song(in req song: Entity(com.spotify:song)
#_[prompt=["what song do you want to play"]]
...);
By default, the slot filling question will be “What song are you interested in?”. Now with the prompt provided, Almond will ask “What song do you want to play?” instead.
prompt
annotations should be lowercase and should not include a question mark. The question mark will be included automatically (this allows Almond to ask two questions at once).
You must provide a
prompt
annotation on any required input parameter, or the device will not be valid. This restriction might be lifted in the future.
In addition to the agent asking questions during the dialogue, the user might also have a question about the current result. For example, when searching songs, the agent might be recommending some tracks to play, and the user might want to know what genre they are in, or how popular they are.
By default, the following commands are used to train follow up questions: "what is the $param of the $query" (e.g. "what is the genre of the song", "what is the popularity of the song") and "is that a $filter $query" (e.g. "is that a pop song", "is that a song with popularity greater than 70").
To understand more follow-up questions, you can add the #_[questions]
annotation. For example, for Spotify we might have:
query song(...,
popularity : Number,
#_[questions=["how popular is that song", "is that a popular song"]]
...);
When the agent replies to a question from the user, it needs to form a coherent sentence that describes the answer (a database row). Similarly, when the agent executes an action, it needs to describe to the user what just happened, so the user has confidence that the agent executed the right action.
For questions, the default way to do so is to form a phrase that describe each found item using the canonical forms. For example, when searching for songs, the agent might reply with "I have found Delicate. It is a song by Taylor Swift released in 2017." or "I have found Shake It Off and Bad Blood. Both are songs by Taylor Swift.". For actions, the default is to use a verb phrase for the action and convert it to past tense; for example "I played Welcome To New York for you." These kinds of descriptions are often appropriate for queries and actions that operate over named entities (like songs, movies, restaurants, hotels, etc.) but do not cover all possible skills. For example, it would be very weird for a weather forecast skill to reply with "I have found Today's Weather Forecast. It is a cloudy forecast with a temperature of 90F".
To obviate the limitations of generic reply templates, developers can provide customized result phrases using the #_[result]
annotation. This annotation is used to describe a single result from a query (the top result, if the query returns multiple results), or to describe the succesful execution of an action. For example, the weather device is declared as:
class @org.thingpedia.weather
monitorable query current(in opt location: Location,
out temperature: Measure(C),
out wind_speed: Measure(mps),
out humidity: Number,
out cloudiness: Number,
out fog: Number,
out status: Enum(raining,cloudy,sunny,snowy,sleety,drizzling,windy),
out icon: Entity(tt:picture))
#_[result=["the current weather in ${location} is ${status} . the temperature is ${temperature} and the humidity is ${humidity} % .",
"the current weather in ${location} is ${status}",
"the weather in ${location} is ${status}",
"it is ${status} today in ${location} and the temperature is ${temperature}"]]
#[minimal_projection=["status"]];
Result phrases can use placeholders to refer to input or output parameters. The syntax is $name
or ${name}
(similar to primitive templates). Unlike primitive templates, no options are valid for placeholders. The agent uses the phrase with most parameters that have a valid value (not null
or undefined
). Which parameters have a value depends on the projection applied to the command. For example, if the user asks for the weather temperature explicitly, only the "temperature" field will be projected and the agent will choose a phrase that only uses the "temperature". Input parameters are always available in a result phrase. You can ensure that certain output parameters are part of the result phrase regardless of projection with a #[minimal_projection]
annotation. In the example, the minimal_projection
is set to "status", to the agent can always talk about the weather status. If minimal_projection
is unspecified, it defaults to "id" if an "id" parameter is present, and empty otherwise.
If an error occurs while calling a query or action API (in the form of a JavaScript exception), the agent will display an error to the user. By default, the raw exception message will be displayed. Often, the exception message will be cryptic and unsuitable for displaying directly to the user. It will also not be translated.
Instead, "normal" errors that are to be expected in the course of using the device should be specified using an #_[on_error]
annotation on the query or action, describing both the error codes and associated message. For example, in the Twitter skill:
action post(in req status : String)
#_[on_error={
too_long="your tweet exceed 240 characters",
duplicate="you already tweeted this"
}];
Given that annotation, in case an overlong tweet, Genie would generate replies of the form:
The format of the annotation should be an object whose keys are error codes, and whose values are phrases describing the error condition. You can use placeholders to refer to input parameters (but not output parameters, because no output was generated). You should not include a description of the action that was attempted, or an invitation to try different inputs, because both of those will be added automatically.
At runtime, your JS code should catch any low-level API error, and then throw an exception having code
property equal to one of the declared error codes. For example:
async do_post() {
try {
await callTwitterPostAPI();
} catch(e) {
if (isTweetTooLongError(e)) {
const newError = new Error("Tweet too long"); // included in debugging logs
newError.code = 'too_long';
throw newError;
}
// rethrow unexpected errors here
throw e;
}
}
Error codes starting with
E
,E_
andERR_
are reserved for predefined nodejs errors and internal errors.
You should not catch network connectivity or authentication errors, unless you are able to recover them. Instead, you should propagate the low-level error as-is and the agent will handle it appropriately.
In addition to a textual reply, you can specify that your agent should show a graphical or interactive element when it answers. You do so with the formatted
annotation. The annotation takes a list of messages, using object syntax. For each result from your query that the agent presents to the user, all messages specified in the annotation will be instantiated. Each property of a message is a string with placeholders, that are replaced based on the results of the query. Five types of messages are supported: rdl
, sound
, picture
, audio
, video
.
Not all platforms support all types of non-textual output. You should design your skill so that the textual reply is sufficient for the user, or account for all supported platforms.
An RDL message is a clickable link with a title, an optional description, and an optional picture. It is suitable for website links and news articles. It has the following properties:
webCallback
(the link, required)displayTitle
(the title, required)displayText
(the description, optional)pictureUrl
(the picture, optional)See Tutorial 2 for an example of this format type.
A sound
message plays a predefined sound effect, specified with the name
property. At the moment, the available sound effects include those in the Freedesktop sound theme specification.
A picture message shows a picture to the user. It has only one property: url
. E.g., { type="picture", url="${picture_url}" }
(in Tutorial 3).
An audio message plays an audio file to the user, specified by the url
property, which should point to a publicly accessible URL for the audio stream. To maximize compatibility, it is recommended to use a patent-free format such as Ogg/Vorbis.
On voice platforms, the audio is played in the background after the agent is done speaking. If multiple audio files are played for the same agent reply, they are played consecutively. The user can say stop
to stop playing audio. Audio messages can also be interleaved with sound effect messages, and will be played sequentially.
On supported web-based and graphical platforms, the message will appear as an interactive audio player.
A video message plays a video to the user, specified by the url
property, which should point to a publicly accessible URL for the video stream. To maximize compatibility, it is recommended to use a patent-free format such as WebM.
On supported web-based and graphical platforms, the message will appear as an interactive video player. This message is not supported on voice-based platforms, and has no effect.