v2026.2.1
All Bundles

Tokenizer

Text tokenization utilities for breaking text into words, sentences, and n-grams

text := "Hello, world! This is a test.";
words := System.NLP.Tokenizer->WordTokenize(text);
sentences := System.NLP.Tokenizer->SentenceTokenize(text);
bigrams := System.NLP.Tokenizer->WordNGrams(words, 2);

Operations

CharNGrams

Generates character-level n-grams from text

function : CharNGrams(text:String, n:Int) ~ String[]

Parameters

NameTypeDescription
textStringinput text
nIntsize of n-grams

Return

TypeDescription
Stringarray of n-gram strings

SentenceTokenize

Tokenizes text into sentences using common sentence delimiters

function : SentenceTokenize(text:String) ~ String[]

Parameters

NameTypeDescription
textStringinput text to tokenize

Return

TypeDescription
Stringarray of sentence tokens

WordNGrams

Generates word-level n-grams from tokenized text

function : WordNGrams(tokens:String[], n:Int) ~ String[]

Parameters

NameTypeDescription
tokensStringarray of word tokens
nIntsize of n-grams

Return

TypeDescription
Stringarray of n-gram strings (words joined by spaces)

WordTokenize

Tokenizes text into words using whitespace and punctuation as delimiters

function : WordTokenize(text:String) ~ String[]

Parameters

NameTypeDescription
textStringinput text to tokenize

Return

TypeDescription
Stringarray of word tokens