v2026.2.1
All Bundles

TF_IDF

Term Frequency-Inverse Document Frequency (TF-IDF) vectorizer for document analysis

documents := ["The cat sat on the mat", "The dog sat on the log", "Cats and dogs are pets"];
tfidf := System.NLP.TF_IDF->New();
tfidf->Fit(documents);

test_doc := "The cat and the dog";
vector := tfidf->Transform(test_doc);
vocab := tfidf->GetVocabulary();

Operations

Fit

Fits the TF-IDF model on a corpus of documents

method : public : Fit(documents:String[]) ~ Nil

Parameters

NameTypeDescription
documentsStringarray of document strings

GetVocabulary

Gets the vocabulary (word to index mapping)

method : public : GetVocabulary() ~ Hash<String,IntRef>

Return

TypeDescription
Hash<String,IntRef>vocabulary hash map

GetVocabularySize

Gets the size of the vocabulary

method : public : GetVocabularySize() ~ Int

Return

TypeDescription
Intvocabulary size

New

Constructor

New()

Transform

Transforms a document into a TF-IDF vector

method : public : Transform(document:String) ~ Float[]

Parameters

NameTypeDescription
documentStringinput document string

Return

TypeDescription
FloatTF-IDF vector as float array