TF_IDF
Term Frequency-Inverse Document Frequency (TF-IDF) vectorizer for document analysis
documents := ["The cat sat on the mat", "The dog sat on the log", "Cats and dogs are pets"];
tfidf := System.NLP.TF_IDF->New();
tfidf->Fit(documents);
test_doc := "The cat and the dog";
vector := tfidf->Transform(test_doc);
vocab := tfidf->GetVocabulary();Operations
Fit
Fits the TF-IDF model on a corpus of documents
method : public : Fit(documents:String[]) ~ NilParameters
| Name | Type | Description |
|---|---|---|
| documents | String | array of document strings |
GetVocabulary
Gets the vocabulary (word to index mapping)
method : public : GetVocabulary() ~ Hash<String,IntRef>Return
| Type | Description |
|---|---|
| Hash<String,IntRef> | vocabulary hash map |
GetVocabularySize
Gets the size of the vocabulary
method : public : GetVocabularySize() ~ IntReturn
| Type | Description |
|---|---|
| Int | vocabulary size |