Suppose you are looking for the best restaurant in an area, and you have reviews available from two different sources. Rather than displaying some restaurants twice, you need a way to merge duplicates. Bixby handles this by setting up equivalence definitions for each concept.
Because real-world inputs can be messy, a simple comparison is not enough to decide whether or not two inputs are equivalent. For example, you might want to treat two locations as the same as long as they are close together. Or, you might want to accept business names that contain minor typos or variations. You might also have complex structures where equivalence depends on a subset of the structure's properties, such as the name and the author.
To handle this, use an equivalence
definition, which specifies how the system should compare two instances of the same concept. If the function returns true
for two concept instances, the system will merge and present them as a single instance. If the function returns false
, they are not the same instance. The function can also return uncertain
when information is missing or when fuzzy matching returns a value below the confidence threshold. At the top-most level, only values that are considered true
matches will be merged. You can modify this behavior when comparing structures.
The equivalence functions discussed below are only used for merging results. They are not used by the Natural Language understanding system and have nothing to do with user input.
By default, two primitive values will match with true
when identical and false
otherwise. The equivalence function fuzzy-string-equality
relaxes this threshold for strings. Here's an example of this:
name (BusinessName) {
description (The name of a business.)
equivalence: fuzzy-string-equality {
true-tolerance (0.9)
uncertain-tolerance (0.7)
similarity-measure (Edit)
}
}
You can also set tolerances for float values (primitive type decimal
), using fuzzy-numeric-equality
. The syntax is the same as fuzzy-string-equality
.
You cannot use non-numeric concepts with fuzzy-numeric-equality
.
You can learn more about primitive equivalence in reference documentation.
Comparing two structures is more complicated. By default, the system walks through all the properties and compares each, descending into sub-properties as needed. Each comparison uses any available equivalence definitions for the properties. Comparison of structures with any missing properties will always return uncertain
. Otherwise, comparison returns true
if and only if all property comparisons return true
.
We can modify this behavior by defining equivalence as part of the concept structure. To do this, there are two primitive constraints and three conjunctions that join them together.
Here are the primitive constraints:
convertible-concepts
: This returns true
if two concept instances can be converted to each other: both instances have the same concept type (for example, they are both Business
concepts), or if one is a sub-type of the other. For example, if Restaurant
extends Business
, then a Business
and a Restaurant
are convertible types. A Restaurant
and a MovieTheater
that both extend Business
are not convertible types: they have no inheritance relationship between one other.
equivalent-values
: This returns true
if two concept instances have the same value for the specified property. For instance, equivalent-values (name)
will return true
if the two structures being compared both have a name
property with the same value in each concept.
We use joins to aggregate the results of other constraints:
join
: This acts like a min
function across the nested constraints. If any nested constraint returns false
, that is the result. Otherwise, if any nested constrain returns uncertain
, that is the result. The result is true
if and only if all the nested constraints return true
.
optimistic-join
: This modifies the behavior of a join
by treating uncertain
as true
. It returns true
if all the nested constraints return true
or uncertain
, and false
otherwise. This conjunction never returns uncertain
.
pessimistic-join
: This modifies the behavior of a join
by treating uncertain
as false
. It returns true
if all the nested constraints return true
, and false
otherwise. This conjunction never returns uncertain
.
Here are some examples of equivalence definitions:
structure (Business) {
property (address) {
type (viv.geo.Address)
}
// ... more properties ...
// Businesses get merged if their name and addresses match in a fuzzy
// way with an "uncertain" tolerance:
equivalence: optimistic-join {
convertible-concepts
equivalent-values (name)
equivalent-values (address)
}
}
Concepts that extend Business
, for instance a Restaurant
concept, can be compared to a Business
and return true
because of the convertible-concepts
constraint. Because of the equivalent-values
constraints, only the name
and address
properties will be compared to determine whether the structures are equivalent. Finally, the join is optimistic, so the result will be true
as long as the name and address comparisons return either true
or uncertain
.
This illustrates the utility of returning uncertain
. It might not seem very useful when comparing two instances directly, but it can bubble up to any parent concept comparison. For example, a name
comparison might return uncertain
, while the address
comparison returns true
. The Business
concept specifies an optimistic join across these two properties, so the result would be true
.
structure (GeoPoint) {
property (latitude) {
type (geo.Latitude)
min (Required)
}
property (longitude) {
type (geo.Longitude)
min (Required)
}
// The confidence for a point will be true, false or uncertain
// depending on the specified location tolerances.
equivalence: join {
fuzzy-numeric-equality (latitude) {
true-tolerance (0.00005)
uncertain-tolerance (0.005)
}
fuzzy-numeric-equality (longitude) {
true-tolerance (0.00005)
uncertain-tolerance (0.005)
}
}
}
GeoPoint
structures are compared using the latitude
and longitude
properties. Two points are equivalent if and only if both properties are within the specified tolerances. If either property comparison returns false
, the points are not equivalent. Otherwise, the result is uncertain
.
As a special case, you can define equivalence rules for GeoPoint
properties using the distance-equality
constraint. This returns true if two points are within a specified geographic distance of each other. In this example, the property centroid
is a GeoPoint, and comparison returns true if two centroids are separated by 0.2 miles or less.
equivalence: join {
distance-equality (centroid) {
unit (Miles)
magnitude (0.2)
}
}
You must use viv.core.BaseGeoPoint
concepts with distance-equality
.
You can learn more about structure equivalence in the reference documentation.
By default, Bixby will ensure nodes with max (Many)
cardinality have only unique elements by merging duplicate values. For example, imagine an Item
structure concept:
structure (Item) {
property (name) {
type (Name)
min (Required) max (One)
}
property (departments) {
type (Department)
min (Required) max (Many)
}
}
The departments
property can contain multiple Department
values, but those values cannot be duplicates of one another. If departments
had the values ["Hardware", "Toys", "Home Goods"]
, you could add the value "Kitchen"
to it, but if you added the value "Toys"
, it would automatically be merged with the existing value "Toys"
and the values would remain unique.
This behavior can be overridden with the no-auto-property-value-merging
flag. When this override is set, Bixby will allow multi-value nodes to contain the same value more than once. You can use the Expression Language function dedupe
to merge equivalent elements of a specified node. The following action, for example, takes a node with multiple strings and outputs a new node that removes any duplicates.
action (ReduceStrings) {
type (Constructor)
collect {
input (strings) {
type (String)
min (Optional)
max (Many)
}
}
output (String) {
evaluate {
$expr(dedupe(strings))
}
}
}