Dynamic LLM Router - Storytell.ai

The Best Answer, Every Time

What if you could have confidence that the best Large Language Model would answer your question? Now you can, via our sophisticated LLM Router that will analyze each question you ask and select the best LLM to answer it.

Why an LLM Router?

Since Storytell is built to be enterprise-grade, our Dynamic LLM router allows for enterprise customers to “Bring your own LLM” to add to our LLM farm, and then set custom rules that allow for the following types of scenarios:

Restrict sensitive queries from being answered by foundational models: Storytell is built to ensure that using AI inside the enterprise is safe and secure. Storytell is built with a robust multi-tenant structure, end-to-end encryption, and no LLMs are trained on your data — even for free users. However, some enterprises want to go even further, ensuring that the most sensitive queries, which might contain non-public financial, customer, roadmap or other data, are answered by bespoke fine-tuned open-source LLMs specific to that enterprise. Our LLM router enables exactly this type of control out of the box, with the ability to create custom rule sets like: “Ensure any queries by the finance team on company data are routed to in-house LLMs.”
Prioritize for accuracy, speed and cost with granular controls: Our Dynamic LLM router can prioritize, on a per-query basis, choosing the best LLM based on highest accuracy, fastest response speed, lowest cost, or a dynamic mixture of all three. Enterprise customers can optimize across these vectors based on the needs of each user, team or department.

Experiencing the Dynamic LLM Router:

Our Dynamic LLM Router is automatic — there’s nothing you need to do. Just SmartChat™ with your content and we will select the best LLM to answer your query, every time.

The Dynamic LLM router works by evaluating each prompt you put into Storytell. You don’t have to take any action as a user. The LLM chosen to answer your question will be displayed below the answer. You can hover over the LLM name to see why Storytell chose it.

You’re still in control: Override the LLM router’s selection

If you’d still like to have a specific LLM answer your query, you can override the Dynamic LLM router’s selection by selecting the LLM you want to have answer your query from the dropdown menu. Storytell LLM router override

Enterprise Customers: Tune our Dynamic LLM Router to your needs

Our enterprise customers can fine-tune our Dynamic LLM router to prioritize how the router works across any of these vectors — even down to a query type or department, team, or even individual user level. Enterprises can also bring private LLMs to our LLM farm to add them into the router mix. Contact us to learn more.

Behind the scenes: How Storytell’s Dynamic LLM router works

Here’s a video with Alex, Storytell’s lead engineer on the Dynamic LLM router, showing DROdio, our CEO, how the router works:

By default, Storytell’s Dynamic LLM router works with the following Large Language Models:

OpenAI

GPT 4.1
GPT 4.1 Mini
GPT 4.1 Nano
O3
04 Mini

Anthropic

Claude 4 Sonnet
Claude 4 Sonnet Thinking
Claude 3.7 Sonnet
Calude 3.7 Sonnet Thinking
Claude 3.5 Sonnet
Opus (available to enterprise customers)

Google

Gemini 2.5 Pro
Gemini 2.0 Flash
Gemini 2.5 Flash
Gemini 2.5 Flash Thinking

Choosing the best LLM to answer your query

Our Dynamic LLM router evaluates your query to determine what category it falls into. Available categories include:

Reasoning & Knowledge: General queries that require the LLM to access company or world’s knowledge and arrive at an answer
Scientific Reasoning & Knowledge: Specific scientific queries that require the LLM to access company or world’s knowledge and arrive at an answer‍
Quantitative Reasoning: Queries that require the LLM to do math and computations‍
Coding: Queries that require the LLM to write computer code‍
Communication: Queries that require the LLM to respond in ways that communicate concepts effectively to a human, (like writing an effective email to your boss)

The Dynamic LLM router will select the LLM with the highest benchmark score for the selected category while also considering costs and response time tradeoffs based on configurable preferences. One of Storytell’s product principles is speed — the highest quality answer isn’t helpful if it takes a long time for you to receive it. If there is an LLM that scores nearly as well as the highest quality scoring LLM, but is substantially faster, we will automatically prioritize the faster LLM to respond. Speed Analysis

Cost optimization is another key factor in our LLM selection process, which enterprises can configure based on their needs. Building on our quality and speed analysis, Storytell’s router will identify if there are more cost-effective options among the high-performing LLMs. When an alternative model delivers comparable quality and speed at a significantly lower cost, our system will select that option, ensuring you get optimal value without compromising on performance. Optimization

Seeing the router in action: Reporting and audit logs Storytell provides robust enterprise reporting and audit logs showing the router in action. Here are some screenshots from an enterprise reporting dashboard: Router Requests

What Each LLM Does Best (And When to Use It)

When working with Storytell, choosing the right model for your task can significantly improve results. Below is a complete guide to each of our supported models, including what they are, what they’re best at, and real-world prompt examples.

OpenAI

GPT 4.1: OpenAI’s flagship model, known for state-of-the-art reasoning, creativity, and conversational performance.
- Best for:
  - Deep analysis
  - Complex research and reasoning
  - High-quality, long-form content generation
  - Coding assistance
- Prompt Example:
  - You’re a senior data analyst. Interpret the following user engagement metrics across three cohorts, identify which cohort is underperforming, and provide three hypotheses as to why. Also recommend a next step for validation. [Insert metrics table]
GPT 4.1 Mini: A smaller, faster variant of GPT-4.1 optimized for cost and performance tradeoffs.
- Best for:
  - Fast prototyping
  - Medium-complexity writing or summarization
  - Reasonably complex Q&A
- Prompt Example:
  - Summarize this research paper into three key insights, written for a general audience without technical jargon. Highlight what makes the findings novel. [Insert research paper]
GPT 4.1 Nano: Lightweight variant of GPT-4.1, ideal for speed and affordability.
- Best for:
  - Basic summarization
  - Quick idea generation
  - Short-form content (e.g., tweets, headlines)
- Prompt Example:
  - Give me five tweet-length summaries of the core idea behind the book *Atomic Habits*. Keep each tweet under 280 characters.
O3: A next-generation OpenAI model known for strong logical reasoning and factual accuracy.
- Best for:
  - Advanced analytical tasks
  - Critical thinking workflows
  - Strategy recommendations
- Prompt Example:
  - Analyze three competitive positioning statements from different companies in the health tech space. Identify the implicit strengths and weaknesses in each, and recommend how a new entrant should differentiate. [Insert statements]
04 Mini: A smaller sibling to O3, optimized for balance between reasoning and speed.
- Best for:
  - Business writing
  - Brainstorming ideas
  - Strategy drafts
- Prompt Example:
  - You’re a brand strategist. Generate five positioning statements for a new eco-friendly detergent line aimed at Gen Z. Each should reflect values around sustainability, affordability, and innovation.

Anthropic

Claude 4 Sonnet: Excellent at nuanced language, long-context understanding, and safe, reliable generation.
- Best for:
  - Long-form generative writing
  - Document summarization
  - Fiction and storytelling
- Prompt Example:
  - Write a 1,000-word short story set in a near-future society where memories can be traded. Focus on character development, emotional stakes, and world-building.
Claude 4 Sonnet Thinking: Emphasizes deliberate chain-of-thought reasoning.
- Best for:
  - Multi-step reasoning
  - Decision trees
  - Analytical thinking with less hallucination
- Prompt Example:
  - You’re a policy advisor. Analyze three possible outcomes of implementing universal basic income in a mid-sized urban economy. For each, list the economic, political, and social implications in bullet point format.
Claude 3.7 Sonnet: Slightly faster than Claude 4 with less contextual reasoning ability.
- Best for:
  - Structured writing
  - Concise content creation
  - Friendly tone copywriting
- Prompt Example:
  - Write a 500-word blog post explaining how parents can use technology to support their kids' education. Keep the tone conversational and practical.
Claude 3.7 Sonnet Thinking: Structured, logic-oriented variant of Claude 3.7.
- Best for:
  - Logic-heavy workflows
  - Case study reviews
  - Decision support tools
- Prompt Example:
  - Compare three different approaches to employee performance evaluation: peer reviews, manager assessments, and self-evaluations. Provide pros/cons and recommend the best fit for a 50-person startup.
Claude 3.5 Sonnet: Clean and safe Claude model with strong summarization abilities.
- Best for:
  - User-facing copy
  - FAQs
  - Summarization
- Prompt Example:
  - Rewrite this 800-word article for a 6th grade reading level while keeping the key facts and tone intact.
Opus (available to enterprise customers): Anthropic’s most powerful model for high-stakes reasoning and multi-document synthesis.
- Best for:
  - Multi-document synthesis
  - Sensitive topic exploration
  - Enterprise-level Q&A
- Prompt Example:
  - You’re an enterprise AI assistant. A user has uploaded three legal contracts, each with different clauses for IP ownership. Summarize the key differences and flag any risks where clauses conflict.

Google

Gemini 2.5 Pro: Google’s most capable model for technical content and research.
- Best for:
  - Cross-source research
  - Technical writing
  - Multimodal input tasks
- Prompt Example:
  - Given this academic abstract, find three real-world applications of its findings in public health or urban planning. Also list two counterpoints or critiques of the methods used. [Insert abstract]
Gemini 2.0 Flash: Lightweight Gemini model for fast execution.
- Best for:
  - Real-time queries
  - Live feedback
  - Chatbot UX
- Prompt Example:
  - Suggest five SEO-optimized blog titles for a piece on the benefits of cold showers, targeting health-conscious millennials.
Gemini 2.5 Flash: Stronger, faster Flash model with better performance.
- Best for:
  - On-the-fly suggestions
  - Lightweight summarization
  - Structured data interpretation
- Prompt Example:
  - Summarize this Slack conversation thread and extract three follow-up actions with responsible owners. [Insert conversation]
Gemini 2.5 Flash Thinking: Balanced variant for speed and thoughtfulness.
- Best for:
  - Quick but structured ideation
  - Reflective generation
  - Brainstorming outlines
- Prompt Example:
  - Brainstorm five new podcast episode ideas around the theme of “Digital Privacy in Daily Life.” For each, write a title, guest suggestion, and two discussion questions.

Magic Under the Hood

​The Best Answer, Every Time

​Why an LLM Router?

​Experiencing the Dynamic LLM Router:

​You’re still in control: Override the LLM router’s selection

​Enterprise Customers: Tune our Dynamic LLM Router to your needs

​Behind the scenes: How Storytell’s Dynamic LLM router works

​By default, Storytell’s Dynamic LLM router works with the following Large Language Models:

​Choosing the best LLM to answer your query

​What Each LLM Does Best (And When to Use It)

The Best Answer, Every Time

Why an LLM Router?

Experiencing the Dynamic LLM Router:

You’re still in control: Override the LLM router’s selection

Enterprise Customers: Tune our Dynamic LLM Router to your needs

Behind the scenes: How Storytell’s Dynamic LLM router works

By default, Storytell’s Dynamic LLM router works with the following Large Language Models:

Choosing the best LLM to answer your query

What Each LLM Does Best (And When to Use It)