TextPreprocessor
Text preprocessing utilities for cleaning and normalizing text
text := "The Quick Brown FOX!";
lower := System.NLP.TextPreprocessor->Lowercase(text);
clean := System.NLP.TextPreprocessor->RemovePunctuation(text);
tokens := System.NLP.Tokenizer->WordTokenize(lower);
filtered := System.NLP.TextPreprocessor->RemoveStopwords(tokens);
stemmed := System.NLP.TextPreprocessor->StemAll(filtered);Operations
Lowercase
Converts text to lowercase
function : Lowercase(text:String) ~ StringParameters
| Name | Type | Description |
|---|---|---|
| text | String | input text |
Return
| Type | Description |
|---|---|
| String | lowercase text |
RemovePunctuation
Removes punctuation from text
function : RemovePunctuation(text:String) ~ StringParameters
| Name | Type | Description |
|---|---|---|
| text | String | input text |
Return
| Type | Description |
|---|---|
| String | text with punctuation removed |
RemoveStopwords
Removes common English stopwords from token array
function : RemoveStopwords(tokens:String[]) ~ String[]Parameters
| Name | Type | Description |
|---|---|---|
| tokens | String | array of word tokens |
Return
| Type | Description |
|---|---|
| String | filtered token array without stopwords |