Skip to main content
Version: 🚧 Next

Advanced Tool Filtering for Tool Selection

Issue: #1002


Current​

Currently tool selection only uses embedding similarity, similarity threshold, and top-k. When embeddings are similar but intent is inconsistent, tools from incorrect domains may be selected.

#1002 proposes the need to introduce advanced tool filtering capabilities to reduce these misselections through sophisticated relevance filtering, while maintaining default behavior unchanged.

Solution​

After embedding candidate set retrieval, add an optional advanced filtering stage. This stage applies deterministic filtering (allow/block lists, optional category gating, lexical overlap thresholds) and a combined score reranker that fuses embedding similarity with lexical, tag, name, and category signals. If advanced_filtering.enabled=false, existing behavior remains unchanged.

Solution advantages: maintains controllable latency, introduces no new model dependencies, and is fully explainable through configuration.

Comparative Test Results​

Test configuration:

  • Query set: 20 queries (17 positive examples, 3 negative examples), covering weather, email, search, calculation, calendar, and other scenarios
  • Tool library: 5 tools (get_weather, search_web, calculate, send_email, create_calendar_event)
  • Iterations: 10
  • Advanced filtering configuration: min_lexical_overlap=1, min_combined_score=0.35, weights={embed:0.7, lexical:0.2, tag:0.05, name:0.05}

Evaluation results:

Advanced Tool Filtering Evaluation Results

MetricBaselineAdvancedDelta
Accuracy55.00%90.00%+35.00%
Precision78.57%94.12%+15.55%
Recall64.71%94.12%+29.41%
False Positive Rate100.00%33.33%-66.67%
Avg Latency0.0162 ms0.0197 ms+0.0036 ms
P95 Latency0.0256 ms0.0288 ms+0.0032 ms

Data Flow​

Configuration Changes​

Advanced filtering is disabled by default. When enabled, the following fields take effect.

FieldTypeDefaultRange / Description
enabledboolfalseEnable advanced filtering.
candidate_pool_sizeintmax(top_k*5, 20)If set and >0, use directly.
min_lexical_overlapint0Minimum unique token overlap between query and tool vocabulary.
min_combined_scorefloat0.0Combined score threshold, range [0.0, 1.0].
weights.embedfloat1.0If no weights are set, embed defaults to 1.0.
weights.lexicalfloat0.0Optional weight, range [0.0, 1.0].
weights.tagfloat0.0Optional weight, range [0.0, 1.0].
weights.namefloat0.0Optional weight, range [0.0, 1.0].
weights.categoryfloat0.0Optional weight, range [0.0, 1.0].
use_category_filterboolfalseIf true, filter by category when confidence is sufficient.
category_confidence_thresholdfloatnilIf set, category filtering only applies when decision confidence â‰Ĩ threshold.
allow_tools[]string[]Tool name whitelist; when non-empty, only these tools are retained.
block_tools[]string[]Tool name blacklist.

Scoring and Filtering Implementation​

Tokenization​

Tokenization rules: convert to lowercase and split at non-alphanumeric characters. Only Unicode letters and numbers are counted as tokens. Implementation: src/semantic-router/pkg/tools/relevance.go#L240.

Lexical Overlap​

Lexical overlap counts the intersection of the following unique tokens:

  • Tool name
  • Tool description
  • Tool category

Tags are not included. Tags are a separate signal.

Combined Score Formula​

For each candidate tool:

combined = (w_embed * embed + w_lexical * lexical + w_tag * tag + w_name * name + w_category * category) / (w_embed + w_lexical + w_tag + w_name + w_category)
  • embed is the similarity score, clamped to [0,1].
  • lexical and tag are overlap scores, normalized by query token count / tag token count.
  • name and category are binary scores (0 or 1).
  • If no weights are set, embed defaults to 1.0.
  • If all weights are explicitly set to 0, the combined score is 0; when min_combined_score > 0, all candidates will be filtered.

Category Confidence Gating​

Category filtering only takes effect when all of the following conditions are met:

  • use_category_filter is true,
  • A category exists, and
  • Decision confidence â‰Ĩ category_confidence_threshold (if set).

Error Handling and Fallback​

  • When tool selection fails and tools.fallback_to_empty=true: the request continues with no tools and a warning is logged.
  • If fallback_to_empty=false: the request returns a classification error.
  • Invalid advanced configuration values are rejected during configuration loading (range validation in validator.go).

API Changes​

New or modified APIs and signatures:

// src/semantic-router/pkg/tools/tools.go
func (db *ToolsDatabase) FindSimilarToolsWithScores(query string, topK int) ([]ToolSimilarity, error)

// src/semantic-router/pkg/tools/relevance.go
func FilterAndRankTools(query string, candidates []ToolSimilarity, topK int, advanced *config.AdvancedToolFilteringConfig, selectedCategory string) []openai.ChatCompletionToolParam