ERLM is a hybrid language model architecture that combines a global probabilistic base model (n-gram) with an advanced episodic memory system via Contextual RAG (Retrieval-Augmented Generation).
Its theoretical goal is to blend excellent local generalization with absolutely deterministic recall of extended contexts, while counteracting stylistic drift within sub-conversations using Sticky Gating.
The system follows a robust sequential structure, modeled in the following steps:
RETR_MAX_CTX = 64 tokens).To prevent context confusion (sudden topic switching), ERLM dynamically identifies the statistically most relevant episode based on the recently generated text. Each episode is evaluated according to the longest exact match ($L_{match}$) present in its index.
Sticky Gating is introduced to stabilize the current episode against competing episodes:
When facing an arbitrary state, token expectations arise from the linear combination of the dense abstract global model and the targeted dense spectrum generated by the reigning episode’s Retrieval. The interpolation weight $\alpha$ represents our fidelity to the episode, capped at an empirical maximum: \(\alpha = \alpha_{max} \times \min\left(1.0,\; \frac{L_{match}}{\text{RETR\_MAX\_CTX}}\right)\)
The unified probability mass function for each candidate $x$ is formally written as: \(P_{final}(x) = (1 - \alpha) \cdot P_{global}(x) + \alpha \cdot P_{episode}(x)\)
Fluent generation finally transitions through the terminal sampler equipped with strict penalties guaranteeing output quality:
RECENT_WINDOW) to locally soften loops. For a token appearing virtually $k$ times:
\(P_{penalized}(x) = P_{T}(x) \times (\text{RECENT\_PENALTY})^{k}\)This engine correctly diverges several distinct styles or semantic “episodes” that no longer contaminate each other, preserving the specific tone and style for extended periods.
Windows:
winget install --id Rustlang.Rustup -e
rustup default stable
Linux / macOS:
curl https://sh.rustup.rs -sSf | sh
source "$HOME/.cargo/env"
From the project root, you can compile and run the engine like so:
Development Mode:
cargo run --bin erlm
Optimized Build (Release):
cargo build --release --bin erlm
# The executable will be available in target/release/
src/main.rs)Retrieval Mechanics:
RETR_MAX_CTX (64): System capacity to search for an exact match sequence.RETR_ALPHA_MAX (0.92): Maximum authority (92%) the local episode can impose over the global grammar.Sticky Gating Inertia:
STICKY_BONUS (3.0): Raw advantage given to maintaining the current episode or section.SWITCH_MARGIN (2.5): The required delta for an episode to seize priority from the current one.MIN_MATCH_TO_STICK (6): The lower limit below which the current episode abandons its right to protection.Sampling Heuristics:
TEMPERATURE (0.90), TOP_P (0.90).REPEAT_PENALTY (0.25) & RECENT_PENALTY (0.55): Adjustability of algorithmic freedom (0.25 means a 75% reduced chance).