ERLM

ERLM (Episodic Retrieval Language Model)

ERLM is a hybrid language model architecture that combines a global probabilistic base model (n-gram) with an advanced episodic memory system via Contextual RAG (Retrieval-Augmented Generation).

Its theoretical goal is to blend excellent local generalization with absolutely deterministic recall of extended contexts, while counteracting stylistic drift within sub-conversations using Sticky Gating.

Model Architecture

The system follows a robust sequential structure, modeled in the following steps:

Adaptive Tokenization (BPE): Extraction of a condensed vocabulary using the Byte Pair Encoding algorithm from the delimited corpus, enabling semantic encoding without an oversized dictionary.
Backoff Sparsity (N-gram): Training of an order-4 Sparse N-gram model with a backoff algorithm. This pillar secures syntax and manages unseen scenarios.
Episodic Indexing (Retrieval Memory): The training algorithm segments the global corpus into “episodes”. Each possesses its own probabilistic mapping that records histories of contextual states down to a profound anchor depth (RETR_MAX_CTX = 64 tokens).

Core Algorithms and Equations

1. Episodic Selection & “Sticky Gating”

To prevent context confusion (sudden topic switching), ERLM dynamically identifies the statistically most relevant episode based on the recently generated text. Each episode is evaluated according to the longest exact match ($L_{match}$) present in its index.

Sticky Gating is introduced to stabilize the current episode against competing episodes:

If the active episode provides an anchor $L_{current} \ge \text{MIN_MATCH_TO_STICK}$, the state’s persistence grants it an artificial score bonus: $Score_{current} = L_{current} + \text{STICKY\_BONUS}$
To usurp the position of the dominant episode, a candidate must beat the incumbent by a decisive margin: $Score_{new} > Score_{current} + \text{SWITCH\_MARGIN}$

2. Distributional Mixology

When facing an arbitrary state, token expectations arise from the linear combination of the dense abstract global model and the targeted dense spectrum generated by the reigning episode’s Retrieval. The interpolation weight $\alpha$ represents our fidelity to the episode, capped at an empirical maximum: $\alpha = \alpha_{max} \times \min\left(1.0,\; \frac{L_{match}}{\text{RETR\_MAX\_CTX}}\right)$

The unified probability mass function for each candidate $x$ is formally written as: $P_{final}(x) = (1 - \alpha) \cdot P_{global}(x) + \alpha \cdot P_{episode}(x)$

3. Sampling Modulators

Fluent generation finally transitions through the terminal sampler equipped with strict penalties guaranteeing output quality:

Discrete Temperature: Amplification via $P_{T}(x) = P_{final}(x)^{1/T}$
Repetition Inhibition: Abrupt application of $\text{REPEAT_PENALTY}$ on the immediately preceding token.
Capture Inhibition (Recent Penalty): Exponentially smoothed detector operating over a closed lateral window (RECENT_WINDOW) to locally soften loops. For a token appearing virtually $k$ times: $P_{penalized}(x) = P_{T}(x) \times (\text{RECENT\_PENALTY})^{k}$
Top-p Nucleus Truncation: Deterministic rejection of quantiles forming the probabilistic asymptote ($p=0.90$).

Installation & Usage

This engine correctly diverges several distinct styles or semantic “episodes” that no longer contaminate each other, preserving the specific tone and style for extended periods.

1. Prerequisites (Rust)

Windows:

winget install --id Rustlang.Rustup -e
rustup default stable

Linux / macOS:

curl https://sh.rustup.rs -sSf | sh
source "$HOME/.cargo/env"

2. Execution

From the project root, you can compile and run the engine like so:

Development Mode:

cargo run --bin erlm

Optimized Build (Release):

cargo build --release --bin erlm
# The executable will be available in target/release/

Hyperparameters (`src/main.rs`)

Retrieval Mechanics:

RETR_MAX_CTX (64): System capacity to search for an exact match sequence.
RETR_ALPHA_MAX (0.92): Maximum authority (92%) the local episode can impose over the global grammar.

Sticky Gating Inertia:

STICKY_BONUS (3.0): Raw advantage given to maintaining the current episode or section.
SWITCH_MARGIN (2.5): The required delta for an episode to seize priority from the current one.
MIN_MATCH_TO_STICK (6): The lower limit below which the current episode abandons its right to protection.

Sampling Heuristics:

TEMPERATURE (0.90), TOP_P (0.90).
REPEAT_PENALTY (0.25) & RECENT_PENALTY (0.55): Adjustability of algorithmic freedom (0.25 means a 75% reduced chance).