v2026.2.1
All Bundles

TextPreprocessor

Text preprocessing utilities for cleaning and normalizing text

text := "The Quick Brown FOX!";
lower := System.NLP.TextPreprocessor->Lowercase(text);
clean := System.NLP.TextPreprocessor->RemovePunctuation(text);
tokens := System.NLP.Tokenizer->WordTokenize(lower);
filtered := System.NLP.TextPreprocessor->RemoveStopwords(tokens);
stemmed := System.NLP.TextPreprocessor->StemAll(filtered);

Operations

Init

Initializes common English stopwords

function : Init() ~ Nil

Lowercase

Converts text to lowercase

function : Lowercase(text:String) ~ String

Parameters

NameTypeDescription
textStringinput text

Return

TypeDescription
Stringlowercase text

RemovePunctuation

Removes punctuation from text

function : RemovePunctuation(text:String) ~ String

Parameters

NameTypeDescription
textStringinput text

Return

TypeDescription
Stringtext with punctuation removed

RemoveStopwords

Removes common English stopwords from token array

function : RemoveStopwords(tokens:String[]) ~ String[]

Parameters

NameTypeDescription
tokensStringarray of word tokens

Return

TypeDescription
Stringfiltered token array without stopwords

Stem

Basic Porter stemmer - reduces words to their root form

function : Stem(word:String) ~ String

Parameters

NameTypeDescription
wordStringinput word

Return

TypeDescription
Stringstemmed word

StemAll

Applies stemming to all tokens in array

function : StemAll(tokens:String[]) ~ String[]

Parameters

NameTypeDescription
tokensStringarray of word tokens

Return

TypeDescription
Stringarray of stemmed tokens