# LLM Core Concepts

### Introduction

This page explains the core concepts to help you use Large Language Models (LLMs) more effectively.

Overview:

1. Evolution of LLMs
2. Core terms and concepts

### 1. Evolution of LLMs: a High-level History

{% hint style="info" %}
LLMs existed before 2022 in research and developer APIs, but they became widely accessible to the public with the release of ChatGPT.
{% endhint %}

#### **2022: Just LLMs**

* LLMs are made available to the wider public (e.g., ChatGPT)
* Models are trained on large datasets and then frozen at a **knowledge cut-off date**.
* Example:
  * Model released: March 2026
  * Knowledge cut-off: November 2025
  * The model does not know events after that date unless given external data.
* LLMs are inherently **non-deterministic:**
  * The same prompt can produce different outputs
  * Outputs are probabilistic predictions of the next token.
  * Examples:
    * It cannot make accurate, reliable calculations like a calculator
    * It cannot count 'r' in "strawberry" reliably - it makes a probabilistic guess

Typical flow:

```mermaid
flowchart TD
    A[User Prompt] --> B[LLM]
    B --> C[Probabilistic Answer]
```

***

#### **2023: LLMs + Tools**

* LLMs begin integrating with **deterministic external tools**:
  * Calculators
  * APIs
  * Databases
  * Search
* The LLM decides whether a prompt requires:
  * reasoning only, or
  * an external tool call.
* LLM uses (or should use) tools when any of the following are required:
  * Exactness (math, dates, counts)
  * Freshness (current events)
  * Execution (code, simulations)
  * Grounding (retrieval, verification)
* Tools return **precise outputs**, which the LLM then incorporates into the response.
* Strictly speaking, it is the orchestrator of LLMs and tools that manages the flow.

New flow:

```mermaid
flowchart TD
    A[User Prompt] --> B[LLM decides if tool needed]
    B -->|No| C[Direct Response]
    B -->|Yes| D[Invoke Tool]
    D --> E[Tool Result]
    E --> F[LLM formats answer]
    F --> G[Response]
```

***

#### **2024: LLMs + Tools in a loop**

* Instead of one tool call, the model can **iteratively reason and act**.
* The system alternates between:
  * reasoning
  * tool invocation
  * observing results
* This pattern is often called **ReAct (Reason + Act)**.

New alternative flow where needed:

```mermaid
flowchart TD
    A[User Goal] --> B[LLM Reasoning]
    B --> C[Choose Action]
    C -->|Call Tool| D[Tool Execution]
    D --> E[Tool Result]
    E --> B
    C -->|Final Answer| F[Return Answer]
```

***

#### **2025 onwards: AI Agents with skills**

* Systems evolve into **AI agents** capable of handling multi-step tasks.
* Agents can execute workflows such as research, coding, data analysis, and task automation.
* Typical capabilities:
  * **Planning**: break a task into steps.
  * **Skill selection**: choose the right tool or capability.
  * **Execution**: run one or more specialized tools or sub-agents.
  * **Memory**: store and retrieve intermediate knowledge.

```mermaid
flowchart TD
    A[User Goal] --> B[Agent Planning]
    B --> C[Task Decomposition]
    C --> D[Select Skill or Tool]
    D --> E[Execute Step]
    E --> F[Store Result in Memory]
    F --> G{More Steps Needed?}
    G -->|Yes| C
    G -->|No| H[Final Result]
```

***

### 2. Core Terminology and Concepts

#### Prompt Engineering

A way to write and structure prompts to achieve higher quality output. In other words: *"ask better questions, get better answers"*.<br>

Most structures revolve around RTCF: Role - Context - Task - Format

<table><thead><tr><th width="177.6363525390625">Template Part</th><th>Example</th></tr></thead><tbody><tr><td><strong>Role</strong></td><td>You are a professional tester, skilled in Black-Box techniques, Test Heuristics, etc.</td></tr><tr><td><strong>Context</strong></td><td>Any relevant input - documents, remote data, plain text prompt, etc.</td></tr><tr><td><strong>Task</strong></td><td>Ask questions, find gaps and contradictions in the requirements, produce test cases, file bug reports, etc.<br>May specify what tools and methods can be used to achieve the task.</td></tr><tr><td><strong>Format</strong></td><td>Produce test cases in the following table format...<br>Write a bug report following this template...</td></tr></tbody></table>

Alternative conceptually similar formats or mnemonics exist, such as:

* AIM: Actor - Input - Mission
* MAAP: Memory - Assets - Actions - Prompt

***

#### System Prompt

Special instructions that set the model’s role, style, or rules for consistent output **across chats/sessions.**&#x20;

***

#### Prompt techniques

<table><thead><tr><th width="130.09088134765625">Technique</th><th width="204.63641357421875">What it is</th><th width="182.45458984375">When to use</th><th width="219.908935546875">Pros / Cons</th></tr></thead><tbody><tr><td>Zero-shot</td><td>Just the task, no examples</td><td>Simple or common tasks</td><td>Fast, cheap, minimal prompt. But  lower reliability.</td></tr><tr><td>Few-shot</td><td>Task + a few examples</td><td>Patterned or format-sensitive tasks</td><td>Better accuracy, consistency. But uses more context.</td></tr><tr><td>Chain-of-thought <br>(CoT)</td><td>Ask for step-by-step reasoning</td><td>Multi-step reasoning, logic, math</td><td>Improved reasoning. May get overly verbose, uses more tokens.</td></tr></tbody></table>

***

#### RAG: Retrieval-Augmented Generation

Basically, providing extra data for the LLM to consider **before** answering.

* Retrieved documents are **inserted into the prompt context.**
* The model itself is **not retrained**; the retrieved data is simply added to the prompt.
  * "Manual" RAG example: attach documents to the chat that the LLM must take into account.
  * Enterprise RAG: Give LLMs access to internal Wikis, APIs, and Databases to query (retrieve) for better, context-aware answers.

```mermaid
flowchart TD
    A[User Question] --> B[Context]

    C[Wiki] --> B
    D[Database] --> B
    E[Documents] --> B
    F[APIs] --> B

    B --> G[LLM]
    G --> H[Answer]

```

***

#### Context Window

The **maximum amount of text (tokens)** an LLM can consider at once. The context window typically contains:

* Current and recent messages (prompts)
* Retrieved docs if any (RAG)
* Tool outputs

When the limit is exceeded:

* Older tokens are **removed from the conversation context**.
* The LLM gradually **loses earlier information**.

This can lead to:

* contradictions
* forgotten constraints
* degraded output quality

{% hint style="warning" %}
RAG consumes part of the **context window**, so large retrieved documents can reduce the amount of conversation history the model can remember.
{% endhint %}

```mermaid
flowchart TB
    subgraph Context_Window["Context Window"]
        direction TB
        A[System Instructions]
        B[Conversation History]
        C[RAG / Retrieved Docs]
        D[Tool Outputs]
        E[Current Prompt]
    end
    Context_Window --> F[LLM Generates Response]
```

Therefore, to maintain accuracy:

* Restate important constraints + key requirements succinctly in bullet points
* Summarize long discussions (or make LLM do it for you) and start a new chat
Template Part	Example
Role	You are a professional tester, skilled in Black-Box techniques, Test Heuristics, etc.
Context	Any relevant input - documents, remote data, plain text prompt, etc.
Task	Ask questions, find gaps and contradictions in the requirements, produce test cases, file bug reports, etc. May specify what tools and methods can be used to achieve the task.
Format	Produce test cases in the following table format... Write a bug report following this template...
Technique	What it is	When to use	Pros / Cons
Zero-shot	Just the task, no examples	Simple or common tasks	Fast, cheap, minimal prompt. But lower reliability.
Few-shot	Task + a few examples	Patterned or format-sensitive tasks	Better accuracy, consistency. But uses more context.
Chain-of-thought (CoT)	Ask for step-by-step reasoning	Multi-step reasoning, logic, math	Improved reasoning. May get overly verbose, uses more tokens.