v2026.4.1
All Bundles

Phi3VisionSession

Phi-3 Vision multimodal session for image understanding and generation. Uses three separate ONNX models: vision encoder, text embedding, and text decoder. Accepts raw image bytes (JPEG/PNG) and pre-tokenized prompt tokens split into prefix (before image placeholder) and suffix (after image placeholder) segments. Example model_dir := "phi3v-directml/"; session := Phi3VisionSession->New( model_dir + "phi-3-v-128k-instruct-vision.onnx", model_dir + "phi-3-v-128k-instruct-text-embedding.onnx", model_dir + "model.onnx"); image_bytes := System.IO.File.FileReader->ReadBinaryFile("photo.jpg"); prefix := [32010, 29871]; suffix := [32007, 32001]; eos := [32000, 32007]; result := session->Generate(image_bytes, prefix, suffix, 256, 0.0, eos); each(token in result->GetTokens()) { token->PrintLine(); }; session->Close();

Operations

Close

Closes all three sessions (vision, embedding, decoder)

method : public : Close() ~ Nil

Generate

Generate text tokens from an image and prompt tokens. The prompt is split into prefix tokens (before the image) and suffix tokens (after the image).

method : public : Generate(image_bytes:Byte[], prefix_tokens:Int[], suffix_tokens:Int[], max_tokens:Int, temperature:Float, eos_tokens:Int[]) ~ API.Onnx.Phi3Result

Parameters

NameTypeDescription
image_bytesByteraw image file bytes (JPEG/PNG)
prefix_tokensInttoken IDs before the image placeholder
suffix_tokensInttoken IDs after the image placeholder
max_tokensIntmaximum number of tokens to generate
temperatureFloatsampling temperature (0.0 for greedy)
eos_tokensIntarray of end-of-sequence token IDs

Return

TypeDescription
Phi3Resultgeneration result with output token IDs

New

Constructor.

New(vision_model:String, embed_model:String, decoder_model:String)

Parameters

NameTypeDescription
vision_modelStringpath to vision encoder ONNX model
embed_modelStringpath to text embedding ONNX model
decoder_modelStringpath to text decoder ONNX model

New

Constructor with configuration.

New(vision_model:String, embed_model:String, decoder_model:String, config:Map<String,String>)

Parameters

NameTypeDescription
vision_modelStringpath to vision encoder ONNX model
embed_modelStringpath to text embedding ONNX model
decoder_modelStringpath to text decoder ONNX model
configMap<String,String>session configuration parameters