ERLM

ERLM (Episodic Retrieval Language Model)

ERLM is a hybrid language model architecture that combines a global probabilistic base model (n-gram) with an advanced episodic memory system via Contextual RAG (Retrieval-Augmented Generation).

Its theoretical goal is to blend excellent local generalization with absolutely deterministic recall of extended contexts, while counteracting stylistic drift within sub-conversations using Sticky Gating.

Model Architecture

The system follows a robust sequential structure, modeled in the following steps:

  1. Adaptive Tokenization (BPE): Extraction of a condensed vocabulary using the Byte Pair Encoding algorithm from the delimited corpus, enabling semantic encoding without an oversized dictionary.
  2. Backoff Sparsity (N-gram): Training of an order-4 Sparse N-gram model with a backoff algorithm. This pillar secures syntax and manages unseen scenarios.
  3. Episodic Indexing (Retrieval Memory): The training algorithm segments the global corpus into “episodes”. Each possesses its own probabilistic mapping that records histories of contextual states down to a profound anchor depth (RETR_MAX_CTX = 64 tokens).

Core Algorithms and Equations

1. Episodic Selection & “Sticky Gating”

To prevent context confusion (sudden topic switching), ERLM dynamically identifies the statistically most relevant episode based on the recently generated text. Each episode is evaluated according to the longest exact match ($L_{match}$) present in its index.

Sticky Gating is introduced to stabilize the current episode against competing episodes:

2. Distributional Mixology

When facing an arbitrary state, token expectations arise from the linear combination of the dense abstract global model and the targeted dense spectrum generated by the reigning episode’s Retrieval. The interpolation weight $\alpha$ represents our fidelity to the episode, capped at an empirical maximum: \(\alpha = \alpha_{max} \times \min\left(1.0,\; \frac{L_{match}}{\text{RETR\_MAX\_CTX}}\right)\)

The unified probability mass function for each candidate $x$ is formally written as: \(P_{final}(x) = (1 - \alpha) \cdot P_{global}(x) + \alpha \cdot P_{episode}(x)\)

3. Sampling Modulators

Fluent generation finally transitions through the terminal sampler equipped with strict penalties guaranteeing output quality:

Installation & Usage

This engine correctly diverges several distinct styles or semantic “episodes” that no longer contaminate each other, preserving the specific tone and style for extended periods.

1. Prerequisites (Rust)

Windows:

winget install --id Rustlang.Rustup -e
rustup default stable

Linux / macOS:

curl https://sh.rustup.rs -sSf | sh
source "$HOME/.cargo/env"

2. Execution

From the project root, you can compile and run the engine like so:

Development Mode:

cargo run --bin erlm

Optimized Build (Release):

cargo build --release --bin erlm
# The executable will be available in target/release/

Hyperparameters (src/main.rs)

Retrieval Mechanics:

Sticky Gating Inertia:

Sampling Heuristics: