Bundle Natural language processing toolkit. Provides tokenization, text preprocessing, TF-IDF vectorization, cosine similarity, and sentiment analysis. Works with plain strings — no external model required. Compile with -lib nlp.
TextPreprocessor
Text preprocessing utilities for cleaning and normalizing text
text := "The Quick Brown FOX!";
lower := System.NLP.TextPreprocessor->Lowercase(text);
clean := System.NLP.TextPreprocessor->RemovePunctuation(text);
tokens := System.NLP.Tokenizer->WordTokenize(lower);
filtered := System.NLP.TextPreprocessor->RemoveStopwords(tokens);
stemmed := System.NLP.TextPreprocessor->StemAll(filtered);Operations
Lowercase # function
Converts text to lowercase
function : Lowercase(text:String) ~ StringParameters
| Name | Type | Description |
|---|---|---|
| text | String | input text |
Return
| Type | Description |
|---|---|
| String | lowercase text |
Example
lower := TextPreprocessor->Lowercase("Hello WORLD");
lower->PrintLine();RemovePunctuation # function
Removes punctuation from text
function : RemovePunctuation(text:String) ~ StringParameters
| Name | Type | Description |
|---|---|---|
| text | String | input text |
Return
| Type | Description |
|---|---|
| String | text with punctuation removed |
Example
clean := TextPreprocessor->RemovePunctuation("Hello, world! How's it going?");
clean->PrintLine();RemoveStopwords # function
Removes common English stopwords from token array
function : RemoveStopwords(tokens:String[]) ~ String[]Parameters
| Name | Type | Description |
|---|---|---|
| tokens | String | array of word tokens |
Return
| Type | Description |
|---|---|
| String | filtered token array without stopwords |
Example
words := Tokenizer->WordTokenize("The cat sat on the mat");
filtered := TextPreprocessor->RemoveStopwords(words);
each(w in filtered) {
w->PrintLine();
};Stem # function
Basic Porter stemmer - reduces words to their root form
function : Stem(word:String) ~ StringParameters
| Name | Type | Description |
|---|---|---|
| word | String | input word |
Return
| Type | Description |
|---|---|
| String | stemmed word |
Example
TextPreprocessor->Stem("running")->PrintLine();
TextPreprocessor->Stem("happily")->PrintLine();StemAll # function
Applies stemming to all tokens in array
function : StemAll(tokens:String[]) ~ String[]Parameters
| Name | Type | Description |
|---|---|---|
| tokens | String | array of word tokens |
Return
| Type | Description |
|---|---|
| String | array of stemmed tokens |
Example
words := Tokenizer->WordTokenize("running quickly and happily");
stemmed := TextPreprocessor->StemAll(words);
each(s in stemmed) {
s->PrintLine();
};