Skip to main content

What is MoM Model Family?

The MoM (Mixture of Models) Model Family is a curated collection of specialized, lightweight models designed for intelligent routing, content safety, and semantic understanding. These models power the core capabilities of Semantic Router, enabling fast, accurate, and privacy-preserving AI operations.

Overview​

The MoM family consists of purpose-built models that handle specific tasks in the routing pipeline:

  • Classification Models: Domain detection, PII identification, jailbreak detection
  • Embedding Models: Semantic similarity, caching, retrieval
  • Safety Models: Hallucination detection, content moderation
  • Feedback Models: User intent understanding, conversation analysis

All MoM models are:

  • Lightweight: 33M-600M parameters for fast inference
  • Specialized: Fine-tuned for specific routing tasks
  • Efficient: Many use LoRA adapters for minimal memory footprint
  • Open Source: Available on HuggingFace for transparency and customization

Model Categories​

1. Classification Models​

Domain/Intent Classifier​

  • Model ID: models/mom-domain-classifier
  • HuggingFace: LLM-Semantic-Router/lora_intent_classifier_bert-base-uncased_model
  • Purpose: Classify user queries into 14 MMLU categories (math, science, history, etc.)
  • Architecture: BERT-base (110M) + LoRA adapters
  • Use Case: Route queries to domain-specific models or experts

PII Detector​

  • Model ID: models/mom-pii-classifier
  • HuggingFace: LLM-Semantic-Router/lora_pii_detector_bert-base-uncased_model
  • Purpose: Detect 35 types of personally identifiable information
  • Architecture: BERT-base (110M) + LoRA adapters
  • Use Case: Privacy protection, compliance, data masking

Jailbreak Detector​

  • Model ID: models/mom-jailbreak-classifier
  • HuggingFace: LLM-Semantic-Router/lora_jailbreak_classifier_bert-base-uncased_model
  • Purpose: Detect prompt injection and jailbreak attempts
  • Architecture: BERT-base (110M) + LoRA adapters
  • Use Case: Content safety, prompt security

Feedback Detector​

  • Model ID: models/mom-feedback-detector
  • HuggingFace: llm-semantic-router/feedback-detector
  • Purpose: Classify user feedback into 4 types (satisfied, need clarification, wrong answer, want different)
  • Architecture: ModernBERT-base (149M)
  • Use Case: Adaptive routing, conversation improvement

2. Embedding Models​

Embedding Pro (High Quality)​

  • Model ID: models/mom-embedding-pro
  • HuggingFace: Qwen/Qwen3-Embedding-0.6B
  • Purpose: High-quality embeddings with 32K context support
  • Architecture: Qwen3 (600M parameters)
  • Embedding Dimension: 1024
  • Use Case: Long-context semantic search, high-accuracy caching

Embedding Flash (Balanced)​

  • Model ID: models/mom-embedding-flash
  • HuggingFace: google/embeddinggemma-300m
  • Purpose: Fast embeddings with Matryoshka support
  • Architecture: Gemma (300M parameters)
  • Embedding Dimension: 768 (supports 512/256/128 via Matryoshka)
  • Use Case: Balanced speed/quality, multilingual support

Embedding Light (Fast)​

  • Model ID: models/mom-embedding-light
  • HuggingFace: sentence-transformers/all-MiniLM-L12-v2
  • Purpose: Lightweight semantic similarity
  • Architecture: MiniLM (33M parameters)
  • Embedding Dimension: 384
  • Use Case: Fast semantic caching, low-latency retrieval

3. Hallucination Detection Models​

Halugate Sentinel​

  • Model ID: models/mom-halugate-sentinel
  • HuggingFace: LLM-Semantic-Router/halugate-sentinel
  • Purpose: First-stage hallucination screening
  • Architecture: BERT-base (110M)
  • Use Case: Fast hallucination detection, pre-filtering

Halugate Detector​

  • Model ID: models/mom-halugate-detector
  • HuggingFace: KRLabsOrg/lettucedect-base-modernbert-en-v1
  • Purpose: Accurate hallucination verification
  • Architecture: ModernBERT-base (149M)
  • Context Length: 8192 tokens
  • Use Case: Factual accuracy verification, grounding check

Halugate Explainer​

  • Model ID: models/mom-halugate-explainer
  • HuggingFace: tasksource/ModernBERT-base-nli
  • Purpose: Explain hallucination reasoning via NLI
  • Architecture: ModernBERT-base (149M)
  • Classes: 3 (entailment/neutral/contradiction)
  • Use Case: Explainable AI, hallucination analysis

Model Selection Guide​

By Use Case​

Use CaseRecommended ModelWhy
Domain routingmom-domain-classifier14 MMLU categories, LoRA efficient
Privacy protectionmom-pii-classifier35 PII types, token-level detection
Content safetymom-jailbreak-classifierPrompt injection detection
Semantic cachingmom-embedding-lightFast, 384-dim, low latency
Long-context searchmom-embedding-pro32K context, 1024-dim
Hallucination checkmom-halugate-detectorModernBERT, 8K context
User feedbackmom-feedback-detector4 feedback types, ModernBERT

By Performance Requirements​

RequirementModel TierExamples
Ultra-fast (<10ms)Lightmom-embedding-light, mom-jailbreak-classifier
Balanced (10-50ms)Flashmom-embedding-flash, mom-domain-classifier
High-quality (50-200ms)Promom-embedding-pro, mom-halugate-detector

Configuration​

Using MoM Models in Router​

MoM models are pre-configured in router-defaults.yaml:

# Domain classification
classifier:
category_model:
model_id: "models/mom-domain-classifier"
threshold: 0.6
use_cpu: true

# PII detection
classifier:
pii_model:
model_id: "models/mom-pii-classifier"
threshold: 0.9
use_cpu: true

# Jailbreak protection
prompt_guard:
model_id: "models/mom-jailbreak-classifier"
threshold: 0.7
use_cpu: true

Custom Model Registry​

Override the default registry in your config.yaml:

mom_registry:
"models/mom-domain-classifier": "your-org/custom-domain-classifier"
"models/mom-pii-classifier": "your-org/custom-pii-detector"
"models/mom-embedding-pro": "your-org/custom-embeddings"

Model Architecture​

LoRA-Based Models​

Many MoM models use LoRA (Low-Rank Adaptation) for efficiency:

  • Base Model: BERT-base-uncased (110M parameters)
  • LoRA Adapters: <1M parameters per task
  • Memory Footprint: ~440MB base + ~4MB per adapter
  • Inference Speed: Same as base model (~10-20ms on CPU)

ModernBERT Models​

Newer models use ModernBERT for better performance:

  • Architecture: ModernBERT-base (149M parameters)
  • Context Length: 8192 tokens (vs 512 for BERT)
  • Performance: Better accuracy on long-context tasks
  • Use Cases: Hallucination detection, feedback classification

Next Steps​