# LLM Core Concepts

### Introduction

This page explains the core concepts to help you use Large Language Models (LLMs) more effectively.

Overview:

1. Evolution of LLMs
2. Core terms and concepts

### 1. Evolution of LLMs: a High-level History

{% hint style="info" %}
LLMs existed before 2022 in research and developer APIs, but they became widely accessible to the public with the release of ChatGPT.
{% endhint %}

#### **2022: Just LLMs**

* LLMs are made available to the wider public (e.g., ChatGPT)
* Models are trained on large datasets and then frozen at a **knowledge cut-off date**.
* Example:
  * Model released: March 2026
  * Knowledge cut-off: November 2025
  * The model does not know events after that date unless given external data.
* LLMs are inherently **non-deterministic:**
  * The same prompt can produce different outputs
  * Outputs are probabilistic predictions of the next token.
  * Examples:
    * It cannot make accurate, reliable calculations like a calculator
    * It cannot count 'r' in "strawberry" reliably - it makes a probabilistic guess

Typical flow:

```mermaid
flowchart TD
    A[User Prompt] --> B[LLM]
    B --> C[Probabilistic Answer]
```

***

#### **2023: LLMs + Tools**

* LLMs begin integrating with **deterministic external tools**:
  * Calculators
  * APIs
  * Databases
  * Search
* The LLM decides whether a prompt requires:
  * reasoning only, or
  * an external tool call.
* LLM uses (or should use) tools when any of the following are required:
  * Exactness (math, dates, counts)
  * Freshness (current events)
  * Execution (code, simulations)
  * Grounding (retrieval, verification)
* Tools return **precise outputs**, which the LLM then incorporates into the response.
* Strictly speaking, it is the orchestrator of LLMs and tools that manages the flow.

New flow:

```mermaid
flowchart TD
    A[User Prompt] --> B[LLM decides if tool needed]
    B -->|No| C[Direct Response]
    B -->|Yes| D[Invoke Tool]
    D --> E[Tool Result]
    E --> F[LLM formats answer]
    F --> G[Response]
```

***

#### **2024: LLMs + Tools in a loop**

* Instead of one tool call, the model can **iteratively reason and act**.
* The system alternates between:
  * reasoning
  * tool invocation
  * observing results
* This pattern is often called **ReAct (Reason + Act)**.

New alternative flow where needed:

```mermaid
flowchart TD
    A[User Goal] --> B[LLM Reasoning]
    B --> C[Choose Action]
    C -->|Call Tool| D[Tool Execution]
    D --> E[Tool Result]
    E --> B
    C -->|Final Answer| F[Return Answer]
```

***

#### **2025 onwards: AI Agents with skills**

* Systems evolve into **AI agents** capable of handling multi-step tasks.
* Agents can execute workflows such as research, coding, data analysis, and task automation.
* Typical capabilities:
  * **Planning**: break a task into steps.
  * **Skill selection**: choose the right tool or capability.
  * **Execution**: run one or more specialized tools or sub-agents.
  * **Memory**: store and retrieve intermediate knowledge.

```mermaid
flowchart TD
    A[User Goal] --> B[Agent Planning]
    B --> C[Task Decomposition]
    C --> D[Select Skill or Tool]
    D --> E[Execute Step]
    E --> F[Store Result in Memory]
    F --> G{More Steps Needed?}
    G -->|Yes| C
    G -->|No| H[Final Result]
```

***

### 2. Core Terminology and Concepts

#### Prompt Engineering

A way to write and structure prompts to achieve higher quality output. In other words: *"ask better questions, get better answers"*.<br>

Most structures revolve around RTCF: Role - Context - Task - Format

<table><thead><tr><th width="177.6363525390625">Template Part</th><th>Example</th></tr></thead><tbody><tr><td><strong>Role</strong></td><td>You are a professional tester, skilled in Black-Box techniques, Test Heuristics, etc.</td></tr><tr><td><strong>Context</strong></td><td>Any relevant input - documents, remote data, plain text prompt, etc.</td></tr><tr><td><strong>Task</strong></td><td>Ask questions, find gaps and contradictions in the requirements, produce test cases, file bug reports, etc.<br>May specify what tools and methods can be used to achieve the task.</td></tr><tr><td><strong>Format</strong></td><td>Produce test cases in the following table format...<br>Write a bug report following this template...</td></tr></tbody></table>

Alternative conceptually similar formats or mnemonics exist, such as:

* AIM: Actor - Input - Mission
* MAAP: Memory - Assets - Actions - Prompt

***

#### System Prompt

Special instructions that set the model’s role, style, or rules for consistent output **across chats/sessions.**&#x20;

***

#### Prompt techniques

<table><thead><tr><th width="130.09088134765625">Technique</th><th width="204.63641357421875">What it is</th><th width="182.45458984375">When to use</th><th width="219.908935546875">Pros / Cons</th></tr></thead><tbody><tr><td>Zero-shot</td><td>Just the task, no examples</td><td>Simple or common tasks</td><td>Fast, cheap, minimal prompt. But  lower reliability.</td></tr><tr><td>Few-shot</td><td>Task + a few examples</td><td>Patterned or format-sensitive tasks</td><td>Better accuracy, consistency. But uses more context.</td></tr><tr><td>Chain-of-thought <br>(CoT)</td><td>Ask for step-by-step reasoning</td><td>Multi-step reasoning, logic, math</td><td>Improved reasoning. May get overly verbose, uses more tokens.</td></tr></tbody></table>

***

#### RAG: Retrieval-Augmented Generation

Basically, providing extra data for the LLM to consider **before** answering.

* Retrieved documents are **inserted into the prompt context.**
* The model itself is **not retrained**; the retrieved data is simply added to the prompt.
  * "Manual" RAG example: attach documents to the chat that the LLM must take into account.
  * Enterprise RAG: Give LLMs access to internal Wikis, APIs, and Databases to query (retrieve) for better, context-aware answers.

```mermaid
flowchart TD
    A[User Question] --> B[Context]

    C[Wiki] --> B
    D[Database] --> B
    E[Documents] --> B
    F[APIs] --> B

    B --> G[LLM]
    G --> H[Answer]

```

***

#### Context Window

The **maximum amount of text (tokens)** an LLM can consider at once. The context window typically contains:

* Current and recent messages (prompts)
* Retrieved docs if any (RAG)
* Tool outputs

When the limit is exceeded:

* Older tokens are **removed from the conversation context**.
* The LLM gradually **loses earlier information**.

This can lead to:

* contradictions
* forgotten constraints
* degraded output quality

{% hint style="warning" %}
RAG consumes part of the **context window**, so large retrieved documents can reduce the amount of conversation history the model can remember.
{% endhint %}

```mermaid
flowchart TB
    subgraph Context_Window["Context Window"]
        direction TB
        A[System Instructions]
        B[Conversation History]
        C[RAG / Retrieved Docs]
        D[Tool Outputs]
        E[Current Prompt]
    end
    Context_Window --> F[LLM Generates Response]
```

Therefore, to maintain accuracy:

* Restate important constraints + key requirements succinctly in bullet points
* Summarize long discussions (or make LLM do it for you) and start a new chat
