[Previous: Write a Genie Skill] [Next: Implement a Genie Skill Backend]
The natural language understanding ability of Genie heavily relies on the quality and quantity of the canonical forms and primitive templates. (Primitive templates are explained in the Primitive Templates Section in Natural Language Support For Thingpedia Devices.)
An utterance-ThingTalk program pair sampler tool is available to help you evaluate your canonical forms. This tool is capable of generating samples of single-turn commands and ThingTalk programs with projection and filter properties (definitions of database operations can be found here). Single-turn commands are the natural language commands that Genie will attempt to generate based on the canonical forms you defined in the manifest file. ThingTalk programs can help give you a sense of what query operations will be executed relative to the generated natural language command.
The resulting code pairs are saved to a tsv file where each row contains the id
, command utterance
, and ThingTalk program
for each example.
id
: An unique identifier of each generated code pair samplecommand utterance
: The natural language command generated by GenieThingTalk program
: The corresponding ThingTalk program of the natural language command utteranceAn example of the resulting file can be found here.
Let's use Yelp and Spotify examples we wrote in the Natural Language Support For Thingpedia Devices guide to run a check on the canonical form as an illustration.
First, we only take a snippet from Yelp's manifest and just focus on the name
of the restaurant. (You can temporarily change and reduce Yelp's manifest.)
class @com.yelp
#_[canonical="yelp"]
{
entity restaurant #_[description="Restaurants on Yelp"];
list query restaurant(out id: Entity(com.yelp:restaurant)
#_[canonical={
default="base",
base=["name"]
}])
#_[canonical=["restaurant"]];
}
In this class declaration:
list
keyword)Entity
with the same name as the queryid
parameter uses name
as its base canonical and default grammar formsNow, run the following command:
genie sample-synthetic-data --output $OUTPUT_FILE_PATH --thingpedia $PATH_TO_SKILL_MANEFIST --constants $CONSTANTS_FILE --device $SKILL_NAME --function $FUNCTION_NAME
Parameters:
--output Output file path (in tsv file)
--thingpedia Path to the skill manifest.tt file
--constants Path to the skill constants (sample values) TSV file
--device Skill name
--function A specific function name in your skill device
Example:
manifest.tt
file is located at /home/user/genie-sdk/thingpedia-common-devices/main/com.yelp/manifest.tt
const.tsv
under the ~/
directory for now (we will explain the purpose of this file in detail later in the More on the canonical form tests Section)com.yelp
restaurant
~/output.tsv
The command to run is:
genie sample-synthetic-data --output ~/output.tsv --thingpedia /home/user/genie-sdk/thingpedia-common-devices/main/com.yelp/manifest.tt --constants ~/const.tsv --device com.yelp --function restaurant
After the execution, you should see some utterance-ThingTalk program pairs with id
, utterance command
, and ThingTalk program
in each row like these in the ~/output.tsv
.
com.yelp-000 what is the name of the restaurant ? [ id ] of @com.yelp . restaurant ( ) ;
com.yelp-001 what is the restaurant 's name ? [ id ] of @com.yelp . restaurant ( ) ;
com.yelp-002 what name does the restaurant have ? [ id ] of @com.yelp . restaurant ( ) ;
In accordance with the class declaration, the word restaurant
invokes the restaurant query and name
is used to get restaurant's id
(or the name
).
Now, to see the effects of an ill-defined canonical form, let's make some changes to the class definition:
9
from base=["name"]
to base=["restaurant's id"]
11
from #_[canonical=["restaurant"]]
to #_[canonical=["eatery"]]
class @com.yelp
#_[canonical="yelp"]
{
entity restaurant #_[description="Restaurants on Yelp"];
list query restaurant(out id: Entity(com.yelp:restaurant)
#_[canonical={
default="base",
base=["restaurant's id"]
}])
#_[canonical=["eatery"]];
}
And, run the same command again.
genie sample-synthetic-data --output ~/output.tsv --thingpedia /home/user/genie-sdk/thingpedia-common-devices/main/com.yelp/manifest.tt --constants ~/const.tsv --device com.yelp --function restaurant
Now, we should see something like this in the ~/output.tsv
, where the utterances were constructed less meaningfully and sounded awkward:
com.yelp-000 what is the restaurant 's id of the eatery ? [ id ] of @com.yelp . restaurant ( ) ;
com.yelp-001 what is the eatery 's restaurant 's id ? [ id ] of @com.yelp . restaurant ( ) ;
com.yelp-002 what restaurant 's id does the eatery have ? [ id ] of @com.yelp . restaurant ( ) ;
Here, we use our previously defined Spotify manifest in this example.
class @com.spotify
#_[canonical="Spotify"]
{
action create_playlist(in req name : String
#[string_values="com.spotify:playlist"]
#[raw_mode=true]
#_[prompt=["what do you want to name your playlist"]]
#_[canonical={
default="base",
base=["name"],
preposition=["named #", "called #", "titled #"],
property=["name #", "title #"]
}])
#_[canonical=["create a new playlist", "create playlist"]]
;
}
In this class declaration:
name
whose type is a string.name
parameterFirst, let's create a new constants file ~/const.tsv
and add a sample value to it. In each row, we specify a parameter type
and its value
, delimited by a tab. The naming convention for the parameter type is param:$DEVICE_NAME.$FUNCTION_NAME:$PARAMETER:$TYPE
.
param:@com.spotify.create_playlist:name:String radiohead
Then, we run this command to generate the samples, assuming the manifest is at /home/user/genie-sdk/thingpedia-common-devices/main/com.spotify/manifest.tt
and the constants file is at ~/const.tsv
:
genie sample-synthetic-data --output ~/output.tsv --thingpedia /home/user/genie-sdk/thingpedia-common-devices/main/com.spotify/manifest.tt --constants ~/const.tsv --device com.spotify --function create_playlist
In the ~/output.tsv
, we should see something like this:
com.spotify-000 create a new playlist named radiohead . create_playlist ( name = " radiohead " ) ;
com.spotify-001 create a new playlist called radiohead . create_playlist ( name = " radiohead " ) ;
com.spotify-002 create a new playlist titled radiohead . create_playlist ( name = " radiohead " ) ;
com.spotify-003 create a new playlist with name radiohead . create_playlist ( name = " radiohead " ) ;
com.spotify-004 create a new playlist with title radiohead . create_playlist ( name = " radiohead " ) ;
In this part, we explain further on how to use the testing tool to help you check the canonical form declarations.
The testing tool requires three prerequisites:
skill manifest
: The manifest.tt
file for your skill deviceparameter datasets
: A collection of real-world realistic valuesskill constants
: A subset of values sampled from your skill's parameter datasetsParameter datasets provide real-world realistic values for Genie to synthesize natural sentences. If your skill defines new entities, or you have parameters of string type with domain-specific values, you will need the parameter datasets. Details of how to prepare the datasets can be found here under the Preparing parameter datasets Section.
If you have prepared some parameter datasets, this constants file generally obtains a subset of values from your skill-specific parameter datasets. The values will be in this format:
For Entity
type
param:$DEVICE_NAME.$FUNCTION_NAME:$PARAMETER:$TYPE $VALUE $DISPLAY_VALUE
e.g.
param:@com.yelp.restaurant:id:Entity(com.yelp:restaurant) null Bolero Meatballs
For String
type
param:$DEVICE_NAME.$FUNCTION_NAME:$PARAMETER:$TYPE $VALUE
e.g.
param:@com.yelp.restaurant:id:String Bolero Meatballs
If your skill has no newly-defined entities and/or domain-specific values, you can just leave this file as an empty file.
If you are unsure, you can still run the below command, and an empty file will be created at the specified output location if no new entity or domain-specific parameter value is found for your skill.
Now, run the following command to obtain the constants. The sample size is determined by the --sample-size
parameter:
genie sample-constants --output $OUTPUT_FILE_PATH --thingpedia $PATH_TO_SKILL_MANEFIST --parameter-datasets $PATH_TO_PARAMETER_DATASETS_FILE --sample-size $SAMPLE_SIZE --devices $SKILL_NAME --locale $LOCALE --random-seed $RANDOM__SEED
Example:
genie sample-constants --output ~/const.tsv --thingpedia /home/user/genie-sdk/workdir/thingpedia-common-devices/main/com.yelp/manifest.tt --parameter-datasets /home/user/genie-sdk/workdir/parameter-datasets.tsv --sample-size 1 --devices com.yelp --locale "en-US" --random-seed 777
Parameters:
Required:
--output TSV file output path
--thingpedia Path to the skill manifest.tt file
--parameter-datasets Path to the skill parameter dataset tsv file
Optional:
--sample-size Number of samples per entity (default to 10)
--devices Skill name
--locale Locale tag of the natural language (default to en-US)
--random-seed Value to initialize the random number generator
In the ~/const.tsv
, we may see something like this:
param:@com.yelp.restaurant:id:Entity(com.yelp:restaurant) null McDonald
Below is the full command that supports you to generate the utterance-Thingtalk program pairs based on your skill class declaration in Thingpedia.
genie sample-synthetic-data --output $OUTPUT_FILE_PATH --thingpedia $PATH_TO_SKILL_MANEFIST --constants $CONSTANT_FILE --locale $LOCALE --device $SKILL_NAME --function $FUNCTION_NAME
Usage:
Required parameters:
--thingpedia Path to the skill manifest.tt file
--constants Path to the skill constants (sample values) TSV file
--device Thingpedia skill name
Optional parameters:
--output Output file path (default to a test folder under your skill directory with the name samples.tsv)
--locale Locale tag of the natural language (default to en-US)
--function A specific skill function name to be generated (default to all functions in your skill)
Now, we continue working on the previous Yelp skill from Example 1 and expanding it by adding more parameters to its restaurant
function.
class @com.yelp
#_[canonical="yelp"]
{
entity restaurant #_[description="Restaurants on Yelp"];
entity restaurant_cuisine #_[description="Cuisines in Yelp"];
list query restaurant(out id: Entity(com.yelp:restaurant)
#_[canonical={
default="base",
base=["name"]
}],
out link: Entity(tt:url)
#_[canonical="link"],
out cuisines: Array(Entity(com.yelp:restaurant_cuisine))
#_[canonical={
default="adjective",
base=["cuisines", "category"],
property=["# food"],
adjective=["#"],
verb=["serves # cuisine", "serves # food"],
preposition=["in the # category"],
base_projection=["food", "cuisine"],
verb_projection=["serve", "offer", "have"],
}],
out price : Enum(cheap, moderate, expensive, luxury)
#_[canonical={
default="adjective",
base=["price range", "price"],
property=["# price"],
preposition=["in the # price range"],
adjective=["#"]
}],
out rating: Number
#[min_number=1]
#[max_number=5]
#_[canonical={
default="passive_verb",
base=["rating"],
passive_verb=["rated ${value} ${value:plural: one{star} other{stars}}"],
adjective=["${value} ${value:plural: one{star} other{stars}}"],
property=["rating", "${value} star rating"],
adjective_argmax=["best", "top rated"],
adjective_argmin=["worst"],
passive_verb_projection=["rated"]
}]
#_[counted_object="stars"],
out geo: Location
#_[canonical={
default="preposition",
base=["address", "location"],
preposition=["at #", "near #", "in #", "around #"]
}],
out phone: Entity(tt:phone_number)
#[filterable=false]
#_[canonical={
default="property",
base=["phone number", "telephone"],
property=["phone number #"],
verb=["can be reached at #"]
}],
out opening_hours: RecurrentTimeSpecification
#_[canonical={
default="property",
base=["opening hours", "business hours", "hours"],
verb=["opens #"]
}]
)
#_[canonical=["restaurant"]];
}
After regenerating the skill constants and re-executing the test tool, the samples may look something like this:
com.yelp-000 what is the name of the restaurant ? [ id ] of @com.yelp . restaurant ( ) ;
...
com.yelp-003 what is the link of the restaurant ? [ link ] of @com.yelp . restaurant ( ) ;
...
com.yelp-006 what is the cuisines of the restaurant ? [ cuisines ] of @com.yelp . restaurant ( ) ;
...
com.yelp-010 what is the restaurant 's category ? [ cuisines ] of @com.yelp . restaurant ( ) ;
...
com.yelp-013 what is the restaurant 's price range ? [ price ] of @com.yelp . restaurant ( ) ;
...
com.yelp-018 show me a restaurant with cheap price . @com.yelp . restaurant ( ) filter price == enum cheap ;
...
com.yelp-020 show me a restaurant in the cheap price range . @com.yelp . restaurant ( ) filter price == enum cheap ;
...
com.yelp-022 what is the rating of the restaurant ? [ rating ] of @com.yelp . restaurant ( ) ;
...
com.yelp-025 what is the address of the restaurant ? [ geo ] of @com.yelp . restaurant ( ) ;
com.yelp-026 what is the restaurant 's address ? [ geo ] of @com.yelp . restaurant ( ) ;
...
com.yelp-029 what is the restaurant 's location ? [ geo ] of @com.yelp . restaurant ( ) ;
...
com.yelp-031 show me a restaurant at madison . @com.yelp . restaurant ( ) filter geo == new Location ( " madison " ) ;
...
com.yelp-033 show me a restaurant near madison . @com.yelp . restaurant ( ) filter geo == new Location ( " madison " ) ;
com.yelp-034 which restaurant is near madison ? @com.yelp . restaurant ( ) filter geo == new Location ( " madison " ) ;
...
com.yelp-037 show me a restaurant around madison . @com.yelp . restaurant ( ) filter geo == new Location ( " madison " ) ;
...
com.yelp-044 what is the restaurant 's phone number ? [ phone ] of @com.yelp . restaurant ( ) ;
...
com.yelp-050 what is the restaurant 's opening hours ? [ opening_hours ] of @com.yelp . restaurant ( ) ;
...
com.yelp-053 what is the restaurant 's business hours ? [ opening_hours ] of @com.yelp . restaurant ( ) ;
...
Once the natural language support is functionally working, it's time to wire up the backend.
[Previous: Write a Genie Skill] [Next: Implement a Genie Skill Backend]