LLM Core Concepts
Introduction
This page explains the core concepts to help you use Large Language Models (LLMs) more effectively.
Overview:
Evolution of LLMs
Core terms and concepts
1. Evolution of LLMs: a High-level History
LLMs existed before 2022 in research and developer APIs, but they became widely accessible to the public with the release of ChatGPT.
2022: Just LLMs
LLMs are made available to the wider public (e.g., ChatGPT)
Models are trained on large datasets and then frozen at a knowledge cut-off date.
Example:
Model released: March 2026
Knowledge cut-off: November 2025
The model does not know events after that date unless given external data.
LLMs are inherently non-deterministic:
The same prompt can produce different outputs
Outputs are probabilistic predictions of the next token.
Examples:
It cannot make accurate, reliable calculations like a calculator
It cannot count 'r' in "strawberry" reliably - it makes a probabilistic guess
Typical flow:
2023: LLMs + Tools
LLMs begin integrating with deterministic external tools:
Calculators
APIs
Databases
Search
The LLM decides whether a prompt requires:
reasoning only, or
an external tool call.
LLM uses (or should use) tools when any of the following are required:
Exactness (math, dates, counts)
Freshness (current events)
Execution (code, simulations)
Grounding (retrieval, verification)
Tools return precise outputs, which the LLM then incorporates into the response.
Strictly speaking, it is the orchestrator of LLMs and tools that manages the flow.
New flow:
2024: LLMs + Tools in a loop
Instead of one tool call, the model can iteratively reason and act.
The system alternates between:
reasoning
tool invocation
observing results
This pattern is often called ReAct (Reason + Act).
New alternative flow where needed:
2025 onwards: AI Agents with skills
Systems evolve into AI agents capable of handling multi-step tasks.
Agents can execute workflows such as research, coding, data analysis, and task automation.
Typical capabilities:
Planning: break a task into steps.
Skill selection: choose the right tool or capability.
Execution: run one or more specialized tools or sub-agents.
Memory: store and retrieve intermediate knowledge.
2. Core Terminology and Concepts
Prompt Engineering
A way to write and structure prompts to achieve higher quality output. In other words: "ask better questions, get better answers".
Most structures revolve around RTCF: Role - Context - Task - Format
Role
You are a professional tester, skilled in Black-Box techniques, Test Heuristics, etc.
Context
Any relevant input - documents, remote data, plain text prompt, etc.
Task
Ask questions, find gaps and contradictions in the requirements, produce test cases, file bug reports, etc. May specify what tools and methods can be used to achieve the task.
Format
Produce test cases in the following table format... Write a bug report following this template...
Alternative conceptually similar formats or mnemonics exist, such as:
AIM: Actor - Input - Mission
MAAP: Memory - Assets - Actions - Prompt
System Prompt
Special instructions that set the model’s role, style, or rules for consistent output across chats/sessions.
Prompt techniques
Zero-shot
Just the task, no examples
Simple or common tasks
Fast, cheap, minimal prompt. But lower reliability.
Few-shot
Task + a few examples
Patterned or format-sensitive tasks
Better accuracy, consistency. But uses more context.
Chain-of-thought (CoT)
Ask for step-by-step reasoning
Multi-step reasoning, logic, math
Improved reasoning. May get overly verbose, uses more tokens.
RAG: Retrieval-Augmented Generation
Basically, providing extra data for the LLM to consider before answering.
Retrieved documents are inserted into the prompt context.
The model itself is not retrained; the retrieved data is simply added to the prompt.
"Manual" RAG example: attach documents to the chat that the LLM must take into account.
Enterprise RAG: Give LLMs access to internal Wikis, APIs, and Databases to query (retrieve) for better, context-aware answers.
Context Window
The maximum amount of text (tokens) an LLM can consider at once. The context window typically contains:
Current and recent messages (prompts)
Retrieved docs if any (RAG)
Tool outputs
When the limit is exceeded:
Older tokens are removed from the conversation context.
The LLM gradually loses earlier information.
This can lead to:
contradictions
forgotten constraints
degraded output quality
RAG consumes part of the context window, so large retrieved documents can reduce the amount of conversation history the model can remember.
Therefore, to maintain accuracy:
Restate important constraints + key requirements succinctly in bullet points
Summarize long discussions (or make LLM do it for you) and start a new chat
Last updated