v2026.5.3
All Bundles
Bundle Natural language processing toolkit. Provides tokenization, text preprocessing, TF-IDF vectorization, cosine similarity, and sentiment analysis. Works with plain strings — no external model required. Compile with -lib nlp.

TextSimilarity

Text similarity metrics for comparing documents and strings

vec1 := [1.0, 2.0, 3.0];
vec2 := [2.0, 4.0, 6.0];
similarity := System.NLP.TextSimilarity->CosineSimilarity(vec1, vec2);

tokens1 := ["the", "cat", "sat"];
tokens2 := ["the", "dog", "sat"];
jaccard := System.NLP.TextSimilarity->JaccardSimilarity(tokens1, tokens2);

distance := System.NLP.TextSimilarity->LevenshteinDistance("kitten", "sitting");

Operations

CosineSimilarity # function

Calculates cosine similarity between two vectors

function : CosineSimilarity(vec1:Float[], vec2:Float[]) ~ Float

Parameters

NameTypeDescription
vec1Floatfirst vector
vec2Floatsecond vector

Return

TypeDescription
Floatcosine similarity (0.0 to 1.0)

Example

v1 := [1.0, 0.0, 1.0];
v2 := [1.0, 1.0, 0.0];
sim := TextSimilarity->CosineSimilarity(v1, v2);
"Similarity: {$sim}"->PrintLine();

JaccardSimilarity # function

Calculates Jaccard similarity between two sets of tokens

function : JaccardSimilarity(tokens1:String[], tokens2:String[]) ~ Float

Parameters

NameTypeDescription
tokens1Stringfirst token array
tokens2Stringsecond token array

Return

TypeDescription
FloatJaccard similarity (0.0 to 1.0)

Example

t1 := ["cat", "sat", "mat"];
t2 := ["cat", "sat", "log"];
jaccard := TextSimilarity->JaccardSimilarity(t1, t2);
"Jaccard: {$jaccard}"->PrintLine();

LevenshteinDistance # function

Calculates Levenshtein distance (edit distance) between two strings

function : LevenshteinDistance(str1:String, str2:String) ~ Int

Parameters

NameTypeDescription
str1Stringfirst string
str2Stringsecond string

Return

TypeDescription
Intedit distance (number of edits needed)

Example

dist := TextSimilarity->LevenshteinDistance("kitten", "sitting");
"Edit distance: {$dist}"->PrintLine();

NormalizedEditDistance # function

Calculates normalized edit distance (0.0 = identical, 1.0 = completely different)

function : NormalizedEditDistance(str1:String, str2:String) ~ Float

Parameters

NameTypeDescription
str1Stringfirst string
str2Stringsecond string

Return

TypeDescription
Floatnormalized distance (0.0 to 1.0)

Example

nd := TextSimilarity->NormalizedEditDistance("kitten", "sitting");
"Normalized distance: {$nd}"->PrintLine();