LLM Core Concepts

Introduction

This page explains the core concepts to help you use Large Language Models (LLMs) more effectively.

Overview:

Evolution of LLMs
Core terms and concepts

1. Evolution of LLMs: a High-level History

LLMs existed before 2022 in research and developer APIs, but they became widely accessible to the public with the release of ChatGPT.

2022: Just LLMs

LLMs are made available to the wider public (e.g., ChatGPT)
Models are trained on large datasets and then frozen at a knowledge cut-off date.
Example:
- Model released: March 2026
- Knowledge cut-off: November 2025
- The model does not know events after that date unless given external data.
LLMs are inherently non-deterministic:
- The same prompt can produce different outputs
- Outputs are probabilistic predictions of the next token.
- Examples:
  - It cannot make accurate, reliable calculations like a calculator
  - It cannot count 'r' in "strawberry" reliably - it makes a probabilistic guess

Typical flow:

2023: LLMs + Tools

LLMs begin integrating with deterministic external tools:
- Calculators
- APIs
- Databases
- Search
The LLM decides whether a prompt requires:
- reasoning only, or
- an external tool call.
LLM uses (or should use) tools when any of the following are required:
- Exactness (math, dates, counts)
- Freshness (current events)
- Execution (code, simulations)
- Grounding (retrieval, verification)
Tools return precise outputs, which the LLM then incorporates into the response.
Strictly speaking, it is the orchestrator of LLMs and tools that manages the flow.

New flow:

2024: LLMs + Tools in a loop

Instead of one tool call, the model can iteratively reason and act.
The system alternates between:
- reasoning
- tool invocation
- observing results
This pattern is often called ReAct (Reason + Act).

New alternative flow where needed:

2025 onwards: AI Agents with skills

Systems evolve into AI agents capable of handling multi-step tasks.
Agents can execute workflows such as research, coding, data analysis, and task automation.
Typical capabilities:
- Planning: break a task into steps.
- Skill selection: choose the right tool or capability.
- Execution: run one or more specialized tools or sub-agents.
- Memory: store and retrieve intermediate knowledge.

2. Core Terminology and Concepts

Prompt Engineering

A way to write and structure prompts to achieve higher quality output. In other words: "ask better questions, get better answers".

Most structures revolve around RTCF: Role - Context - Task - Format

Template Part

Example

Role

You are a professional tester, skilled in Black-Box techniques, Test Heuristics, etc.

Context

Any relevant input - documents, remote data, plain text prompt, etc.

Task

Ask questions, find gaps and contradictions in the requirements, produce test cases, file bug reports, etc. May specify what tools and methods can be used to achieve the task.

Format

Produce test cases in the following table format... Write a bug report following this template...

Alternative conceptually similar formats or mnemonics exist, such as:

AIM: Actor - Input - Mission
MAAP: Memory - Assets - Actions - Prompt

System Prompt

Special instructions that set the model’s role, style, or rules for consistent output across chats/sessions.

Prompt techniques

Technique

What it is

When to use

Pros / Cons

Zero-shot

Just the task, no examples

Simple or common tasks

Fast, cheap, minimal prompt. But lower reliability.

Few-shot

Task + a few examples

Patterned or format-sensitive tasks

Better accuracy, consistency. But uses more context.

Chain-of-thought (CoT)

Ask for step-by-step reasoning

Multi-step reasoning, logic, math

Improved reasoning. May get overly verbose, uses more tokens.

RAG: Retrieval-Augmented Generation

Basically, providing extra data for the LLM to consider before answering.

Retrieved documents are inserted into the prompt context.
The model itself is not retrained; the retrieved data is simply added to the prompt.
- "Manual" RAG example: attach documents to the chat that the LLM must take into account.
- Enterprise RAG: Give LLMs access to internal Wikis, APIs, and Databases to query (retrieve) for better, context-aware answers.

Context Window

The maximum amount of text (tokens) an LLM can consider at once. The context window typically contains:

Current and recent messages (prompts)
Retrieved docs if any (RAG)
Tool outputs

When the limit is exceeded:

Older tokens are removed from the conversation context.
The LLM gradually loses earlier information.

This can lead to:

contradictions
forgotten constraints
degraded output quality

RAG consumes part of the context window, so large retrieved documents can reduce the amount of conversation history the model can remember.

Therefore, to maintain accuracy:

Restate important constraints + key requirements succinctly in bullet points
Summarize long discussions (or make LLM do it for you) and start a new chat

PreviousState of AI in Testing in 2026 NextPrompt Test Personas and Templates

Last updated 22 days ago

hashtagIntroduction

hashtag1. Evolution of LLMs: a High-level History

hashtag2022: Just LLMs

hashtag2023: LLMs + Tools

hashtag2024: LLMs + Tools in a loop

hashtag2025 onwards: AI Agents with skills

hashtag2. Core Terminology and Concepts

hashtagPrompt Engineering

hashtagSystem Prompt

hashtagPrompt techniques

hashtagRAG: Retrieval-Augmented Generation

hashtagContext Window

Introduction

1. Evolution of LLMs: a High-level History

2022: Just LLMs

2023: LLMs + Tools

2024: LLMs + Tools in a loop

2025 onwards: AI Agents with skills

2. Core Terminology and Concepts

Prompt Engineering

System Prompt

Prompt techniques

RAG: Retrieval-Augmented Generation

Context Window