A token is the fundamental unit of text that a large language model (LLM) reads, processes, and generates — roughly equivalent to a word, a word fragment, or a punctuation character, depending on how the model has been trained to segment language.
When you send a message to an AI system such as GPT-4 or Claude, the text is not processed word by word or character by character. Instead, it is first broken down into tokens by a component called a tokenizer. Common words like "the" or "is" typically map to a single token, while longer or less frequent words are often split into multiple tokens. The word "tokenization," for example, might be represented as two or three tokens. Numbers, punctuation marks, and whitespace are also counted as tokens, which means the total token count of a given text is rarely identical to its word count.
Understanding tokens matters in practice because LLMs operate within a context window — a hard limit on the total number of tokens the model can consider at one time. This window encompasses both the input you provide (the prompt) and the output the model generates (the completion). A model with a 128,000-token context window can process roughly 90,000 to 100,000 words in a single interaction, though exact figures vary by language and content type. Once the context window is full, the model cannot reference earlier parts of the conversation, which can affect coherence in long exchanges.
Token limits are also directly tied to cost. Most AI APIs, including those offered by OpenAI and Anthropic, charge per token consumed — counting both input tokens and output tokens separately. This makes token awareness an important consideration in prompt engineering, where practitioners aim to craft instructions that are precise and concise without sacrificing the information the model needs to perform well.
The tokenization process itself is not universal. Different models use different tokenization schemes. GPT models, for instance, use a method called Byte Pair Encoding (BPE), which builds a vocabulary of common character sequences from training data. This means the same sentence can produce a different token count depending on which model processes it. Tools such as OpenAI's Tokenizer playground allow developers to inspect exactly how a given piece of text is split before it reaches the model.
For developers building applications on top of LLMs, and for marketers or content teams using AI writing tools, token limits shape what is possible in a single request. Summarizing a long document, maintaining a multi-turn conversation, or injecting large amounts of reference material into a prompt all require careful management of token budgets. As models evolve, context windows have grown substantially, but the token remains the core unit through which all language model interactions are measured and priced.