Phi3Tokenizer
Phi-3 tokenizer for encoding text to token IDs and decoding token IDs back to text. Loads a HuggingFace tokenizer.json file and supports BPE encoding/decoding. Example tokenizer := Phi3Tokenizer->New("phi3-mini/tokenizer.json"); ids := tokenizer->Encode("What is 2+2?"); text := tokenizer->Decode(ids); text->PrintLine();
Operations
Decode
Decode token IDs to human-readable text
method : public : Decode(token_ids:Int[]) ~ StringParameters
| Name | Type | Description |
|---|---|---|
| token_ids | Int | array of token IDs |
Return
| Type | Description |
|---|---|
| String | decoded text string |
Encode
Encode text to token IDs using BPE with special token handling. Added/special tokens are matched as whole strings before BPE is applied.
method : public : Encode(text:String) ~ Int[]Parameters
| Name | Type | Description |
|---|---|---|
| text | String | input text to tokenize |
Return
| Type | Description |
|---|---|
| Int | array of token IDs |
EncodeBPE
Encode text to token IDs using BPE (no special token handling)
method : private : EncodeBPE(text:String) ~ Int[]Parameters
| Name | Type | Description |
|---|---|---|
| text | String | input text to tokenize |
Return
| Type | Description |
|---|---|
| Int | array of token IDs |
GetVocabSize
Gets the vocabulary size
method : public : GetVocabSize() ~ IntReturn
| Type | Description |
|---|---|
| Int | vocabulary size |
IsLoaded
Checks if the tokenizer was loaded successfully
method : public : IsLoaded() ~ BoolReturn
| Type | Description |
|---|---|
| Bool | true if loaded |
New
Constructor.
New(tokenizer_path:String)Parameters
| Name | Type | Description |
|---|---|---|
| tokenizer_path | String | path to tokenizer.json file |