TextSimilarity
Text similarity metrics for comparing documents and strings
vec1 := [1.0, 2.0, 3.0];
vec2 := [2.0, 4.0, 6.0];
similarity := System.NLP.TextSimilarity->CosineSimilarity(vec1, vec2);
tokens1 := ["the", "cat", "sat"];
tokens2 := ["the", "dog", "sat"];
jaccard := System.NLP.TextSimilarity->JaccardSimilarity(tokens1, tokens2);
distance := System.NLP.TextSimilarity->LevenshteinDistance("kitten", "sitting");Operations
CosineSimilarity
Calculates cosine similarity between two vectors
function : CosineSimilarity(vec1:Float[], vec2:Float[]) ~ FloatParameters
| Name | Type | Description |
|---|---|---|
| vec1 | Float | first vector |
| vec2 | Float | second vector |
Return
| Type | Description |
|---|---|
| Float | cosine similarity (0.0 to 1.0) |
JaccardSimilarity
Calculates Jaccard similarity between two sets of tokens
function : JaccardSimilarity(tokens1:String[], tokens2:String[]) ~ FloatParameters
| Name | Type | Description |
|---|---|---|
| tokens1 | String | first token array |
| tokens2 | String | second token array |
Return
| Type | Description |
|---|---|
| Float | Jaccard similarity (0.0 to 1.0) |
LevenshteinDistance
Calculates Levenshtein distance (edit distance) between two strings
function : LevenshteinDistance(str1:String, str2:String) ~ IntParameters
| Name | Type | Description |
|---|---|---|
| str1 | String | first string |
| str2 | String | second string |
Return
| Type | Description |
|---|---|
| Int | edit distance (number of edits needed) |
NormalizedEditDistance
Calculates normalized edit distance (0.0 = identical, 1.0 = completely different)
function : NormalizedEditDistance(str1:String, str2:String) ~ FloatParameters
| Name | Type | Description |
|---|---|---|
| str1 | String | first string |
| str2 | String | second string |
Return
| Type | Description |
|---|---|
| Float | normalized distance (0.0 to 1.0) |