Bundle Natural language processing toolkit. Provides tokenization, text preprocessing, TF-IDF vectorization, cosine similarity, and sentiment analysis. Works with plain strings — no external model required. Compile with -lib nlp.
TextSimilarity
Text similarity metrics for comparing documents and strings
vec1 := [1.0, 2.0, 3.0];
vec2 := [2.0, 4.0, 6.0];
similarity := System.NLP.TextSimilarity->CosineSimilarity(vec1, vec2);
tokens1 := ["the", "cat", "sat"];
tokens2 := ["the", "dog", "sat"];
jaccard := System.NLP.TextSimilarity->JaccardSimilarity(tokens1, tokens2);
distance := System.NLP.TextSimilarity->LevenshteinDistance("kitten", "sitting");Operations
CosineSimilarity # function
Calculates cosine similarity between two vectors
function : CosineSimilarity(vec1:Float[], vec2:Float[]) ~ FloatParameters
| Name | Type | Description |
|---|---|---|
| vec1 | Float | first vector |
| vec2 | Float | second vector |
Return
| Type | Description |
|---|---|
| Float | cosine similarity (0.0 to 1.0) |
Example
v1 := [1.0, 0.0, 1.0];
v2 := [1.0, 1.0, 0.0];
sim := TextSimilarity->CosineSimilarity(v1, v2);
"Similarity: {$sim}"->PrintLine();JaccardSimilarity # function
Calculates Jaccard similarity between two sets of tokens
function : JaccardSimilarity(tokens1:String[], tokens2:String[]) ~ FloatParameters
| Name | Type | Description |
|---|---|---|
| tokens1 | String | first token array |
| tokens2 | String | second token array |
Return
| Type | Description |
|---|---|
| Float | Jaccard similarity (0.0 to 1.0) |
Example
t1 := ["cat", "sat", "mat"];
t2 := ["cat", "sat", "log"];
jaccard := TextSimilarity->JaccardSimilarity(t1, t2);
"Jaccard: {$jaccard}"->PrintLine();LevenshteinDistance # function
Calculates Levenshtein distance (edit distance) between two strings
function : LevenshteinDistance(str1:String, str2:String) ~ IntParameters
| Name | Type | Description |
|---|---|---|
| str1 | String | first string |
| str2 | String | second string |
Return
| Type | Description |
|---|---|
| Int | edit distance (number of edits needed) |
Example
dist := TextSimilarity->LevenshteinDistance("kitten", "sitting");
"Edit distance: {$dist}"->PrintLine();NormalizedEditDistance # function
Calculates normalized edit distance (0.0 = identical, 1.0 = completely different)
function : NormalizedEditDistance(str1:String, str2:String) ~ FloatParameters
| Name | Type | Description |
|---|---|---|
| str1 | String | first string |
| str2 | String | second string |
Return
| Type | Description |
|---|---|
| Float | normalized distance (0.0 to 1.0) |
Example
nd := TextSimilarity->NormalizedEditDistance("kitten", "sitting");
"Normalized distance: {$nd}"->PrintLine();