v2026.5.3
All Bundles
Bundle Natural language processing toolkit. Provides tokenization, text preprocessing, TF-IDF vectorization, cosine similarity, and sentiment analysis. Works with plain strings — no external model required. Compile with -lib nlp.

TextPreprocessor

Text preprocessing utilities for cleaning and normalizing text

text := "The Quick Brown FOX!";
lower := System.NLP.TextPreprocessor->Lowercase(text);
clean := System.NLP.TextPreprocessor->RemovePunctuation(text);
tokens := System.NLP.Tokenizer->WordTokenize(lower);
filtered := System.NLP.TextPreprocessor->RemoveStopwords(tokens);
stemmed := System.NLP.TextPreprocessor->StemAll(filtered);

Operations

Init # function

Initializes common English stopwords

function : Init() ~ Nil

Lowercase # function

Converts text to lowercase

function : Lowercase(text:String) ~ String

Parameters

NameTypeDescription
textStringinput text

Return

TypeDescription
Stringlowercase text

Example

lower := TextPreprocessor->Lowercase("Hello WORLD");
lower->PrintLine();

RemovePunctuation # function

Removes punctuation from text

function : RemovePunctuation(text:String) ~ String

Parameters

NameTypeDescription
textStringinput text

Return

TypeDescription
Stringtext with punctuation removed

Example

clean := TextPreprocessor->RemovePunctuation("Hello, world! How's it going?");
clean->PrintLine();

RemoveStopwords # function

Removes common English stopwords from token array

function : RemoveStopwords(tokens:String[]) ~ String[]

Parameters

NameTypeDescription
tokensStringarray of word tokens

Return

TypeDescription
Stringfiltered token array without stopwords

Example

words := Tokenizer->WordTokenize("The cat sat on the mat");
filtered := TextPreprocessor->RemoveStopwords(words);
each(w in filtered) {
  w->PrintLine();
};

Stem # function

Basic Porter stemmer - reduces words to their root form

function : Stem(word:String) ~ String

Parameters

NameTypeDescription
wordStringinput word

Return

TypeDescription
Stringstemmed word

Example

TextPreprocessor->Stem("running")->PrintLine();
TextPreprocessor->Stem("happily")->PrintLine();

StemAll # function

Applies stemming to all tokens in array

function : StemAll(tokens:String[]) ~ String[]

Parameters

NameTypeDescription
tokensStringarray of word tokens

Return

TypeDescription
Stringarray of stemmed tokens

Example

words := Tokenizer->WordTokenize("running quickly and happily");
stemmed := TextPreprocessor->StemAll(words);
each(s in stemmed) {
  s->PrintLine();
};