Tokenizer

Trait Tokenizer 

Source
pub trait Tokenizer: Send + Sync {
    // Required methods
    fn count(&self, text: &str) -> i32;
    fn model_family(&self) -> &str;
    fn encode(&self, text: &str) -> Vec<u32>;
    fn decode(&self, tokens: &[u32]) -> String;
}
Expand description

Trait for counting tokens in text.

Used for token budget management in context assembly. Implementations can provide exact counts (using actual tokenizer) or heuristic estimates based on character ratios.

Required Methods§

Source

fn count(&self, text: &str) -> i32

Count tokens in the given text.

Source

fn model_family(&self) -> &str

Get the model family this tokenizer is for (e.g., “gpt-4”, “claude”).

Source

fn encode(&self, text: &str) -> Vec<u32>

Encode text to token IDs (for advanced use cases). Returns empty vec if not supported.

Source

fn decode(&self, tokens: &[u32]) -> String

Decode token IDs back to text. Returns empty string if not supported.

Implementors§