lightbulb-exclamationLLM Core Concepts

Introduction

This page explains the core concepts to help you use Large Language Models (LLMs) more effectively.

Overview:

  1. Evolution of LLMs

  2. Core terms and concepts

1. Evolution of LLMs: a High-level History

circle-info

LLMs existed before 2022 in research and developer APIs, but they became widely accessible to the public with the release of ChatGPT.

2022: Just LLMs

  • LLMs are made available to the wider public (e.g., ChatGPT)

  • Models are trained on large datasets and then frozen at a knowledge cut-off date.

  • Example:

    • Model released: March 2026

    • Knowledge cut-off: November 2025

    • The model does not know events after that date unless given external data.

  • LLMs are inherently non-deterministic:

    • The same prompt can produce different outputs

    • Outputs are probabilistic predictions of the next token.

    • Examples:

      • It cannot make accurate, reliable calculations like a calculator

      • It cannot count 'r' in "strawberry" reliably - it makes a probabilistic guess

Typical flow:


2023: LLMs + Tools

  • LLMs begin integrating with deterministic external tools:

    • Calculators

    • APIs

    • Databases

    • Search

  • The LLM decides whether a prompt requires:

    • reasoning only, or

    • an external tool call.

  • LLM uses (or should use) tools when any of the following are required:

    • Exactness (math, dates, counts)

    • Freshness (current events)

    • Execution (code, simulations)

    • Grounding (retrieval, verification)

  • Tools return precise outputs, which the LLM then incorporates into the response.

  • Strictly speaking, it is the orchestrator of LLMs and tools that manages the flow.

New flow:


2024: LLMs + Tools in a loop

  • Instead of one tool call, the model can iteratively reason and act.

  • The system alternates between:

    • reasoning

    • tool invocation

    • observing results

  • This pattern is often called ReAct (Reason + Act).

New alternative flow where needed:


2025 onwards: AI Agents with skills

  • Systems evolve into AI agents capable of handling multi-step tasks.

  • Agents can execute workflows such as research, coding, data analysis, and task automation.

  • Typical capabilities:

    • Planning: break a task into steps.

    • Skill selection: choose the right tool or capability.

    • Execution: run one or more specialized tools or sub-agents.

    • Memory: store and retrieve intermediate knowledge.


2. Core Terminology and Concepts

Prompt Engineering

A way to write and structure prompts to achieve higher quality output. In other words: "ask better questions, get better answers".

Most structures revolve around RTCF: Role - Context - Task - Format

Template Part
Example

Role

You are a professional tester, skilled in Black-Box techniques, Test Heuristics, etc.

Context

Any relevant input - documents, remote data, plain text prompt, etc.

Task

Ask questions, find gaps and contradictions in the requirements, produce test cases, file bug reports, etc. May specify what tools and methods can be used to achieve the task.

Format

Produce test cases in the following table format... Write a bug report following this template...

Alternative conceptually similar formats or mnemonics exist, such as:

  • AIM: Actor - Input - Mission

  • MAAP: Memory - Assets - Actions - Prompt


System Prompt

Special instructions that set the model’s role, style, or rules for consistent output across chats/sessions.


Prompt techniques

Technique
What it is
When to use
Pros / Cons

Zero-shot

Just the task, no examples

Simple or common tasks

Fast, cheap, minimal prompt. But lower reliability.

Few-shot

Task + a few examples

Patterned or format-sensitive tasks

Better accuracy, consistency. But uses more context.

Chain-of-thought (CoT)

Ask for step-by-step reasoning

Multi-step reasoning, logic, math

Improved reasoning. May get overly verbose, uses more tokens.


RAG: Retrieval-Augmented Generation

Basically, providing extra data for the LLM to consider before answering.

  • Retrieved documents are inserted into the prompt context.

  • The model itself is not retrained; the retrieved data is simply added to the prompt.

    • "Manual" RAG example: attach documents to the chat that the LLM must take into account.

    • Enterprise RAG: Give LLMs access to internal Wikis, APIs, and Databases to query (retrieve) for better, context-aware answers.


Context Window

The maximum amount of text (tokens) an LLM can consider at once. The context window typically contains:

  • Current and recent messages (prompts)

  • Retrieved docs if any (RAG)

  • Tool outputs

When the limit is exceeded:

  • Older tokens are removed from the conversation context.

  • The LLM gradually loses earlier information.

This can lead to:

  • contradictions

  • forgotten constraints

  • degraded output quality

circle-exclamation

Therefore, to maintain accuracy:

  • Restate important constraints + key requirements succinctly in bullet points

  • Summarize long discussions (or make LLM do it for you) and start a new chat

Last updated