The maximum size of the vocabulary.
A mapping between words (tokens) and their corresponding frequencies.
A mapping between words (tokens) and their corresponding index.
Fits the tokenizer to a given set of strings. The frequency that tokens appear in these model strings will be used when generating sequences for unknown strings.
The strings to fit the tokenizer with.
Loads a tokenizer from a given object.
The object to loads the tokenizer from.
Tokenizes a given string.
The string to tokenize.
The tokenized string.
Converts the tokenizer to an object that can be JSON encoded.
The JSON encodable tokenizer object.
Converts a string into an array of tokens.
The following rules are followed:
The raw string.
An array of tokens.
Generated using TypeDoc
Implementation of a tokenizer derived from here.
Used to convert a string into a sequence of numbers, where smaller numbers indicate more frequently occurring tokens.