Dynamic LLM Router
Get the best answer, every time
The Best Answer, Every Time
What if you could have confidence that the best Large Language Model would answer your question? Now you can, via our sophisticated LLM Router that will analyze each question you ask and select the best LLM to answer it.
Why an LLM Router?
Since Storytell is built to be enterprise-grade, our Dynamic LLM router allows for enterprise customers to “Bring your own LLM” to add to our LLM farm, and then set custom rules that allow for the following types of scenarios:
-
Restrict sensitive queries from being answered by foundational models: Storytell is built to ensure that using AI inside the enterprise is safe and secure. Storytell is built with a robust multi-tenant structure, end-to-end encryption, and no LLMs are trained on your data — even for free users. However, some enterprises want to go even further, ensuring that the most sensitive queries, which might contain non-public financial, customer, roadmap or other data, are answered by bespoke fine-tuned open-source LLMs specific to that enterprise. Our LLM router enables exactly this type of control out of the box, with the ability to create custom rule sets like: “Ensure any queries by the finance team on company data are routed to in-house LLMs.”
-
Prioritize for accuracy, speed and cost with granular controls: Our Dynamic LLM router can prioritize, on a per-query basis, choosing the best LLM based on highest accuracy, fastest response speed, lowest cost, or a dynamic mixture of all three. Enterprise customers can optimize across these vectors based on the needs of each user, team or department.
Experiencing the Dynamic LLM Router:
The Dynamic LLM router works by evaluating each prompt you put into Storytell. You don’t have to take any action as a user. The LLM chosen to answer your question will be displayed below the answer. You can hover over the LLM name to see why Storytell chose it.
You’re still in control: Override the LLM router’s selection
If you’d still like to have a specific LLM answer your query, you can override the Dynamic LLM router’s selection by selecting the LLM you want to have answer your query from the dropdown menu.
Enterprise Customers: Tune our Dynamic LLM Router to your needs
Our enterprise customers can fine-tune our Dynamic LLM router to prioritize how the router works across any of these vectors — even down to a query type or department, team, or even individual user level.
Enterprises can also bring private LLMs to our LLM farm to add them into the router mix. Contact us to learn more.
Behind the scenes: How Storytell’s Dynamic LLM router works
Here’s a video with Alex, Storytell’s lead engineer on the Dynamic LLM router, showing DROdio, our CEO, how the router works:
By default, Storytell’s Dynamic LLM router works with the following Large Language Models:
OpenAI
OpenAI
-
GPT 4.1
-
GPT 4.1 Mini
-
GPT 4.1 Nano
-
O3
-
04 Mini
Anthropic
Anthropic
-
Claude 4 Sonnet
-
Claude 4 Sonnet Thinking
-
Claude 3.7 Sonnet
-
Calude 3.7 Sonnet Thinking
-
Claude 3.5 Sonnet
-
Opus (available to enterprise customers)
Google
-
Gemini 2.5 Pro
-
Gemini 2.0 Flash
-
Gemini 2.5 Flash
-
Gemini 2.5 Flash Thinking
Meta
Meta
-
Llama 3.3 70B
-
LLama 4 Scout
-
LLama 4 Maverick
DeepSeek
DeepSeek
-
DeepSeek R1 Distill Llama 70B
-
DeepSeek R1 Distill Qwen 7B
Open Source
Open Source
We run OSS models on Groq for optimal inference and security.
-
Llama 3.1 (available to enterprise customers)
-
DeepSeek R1 (available to enterprise customers)
Enterprise LLMs
Enterprise LLMs
- Storytell allows enterprises to “Bring your own LLM” to our platform for us to route employee queries to.
Enterprise AI Agents
Enterprise AI Agents
- Storytell allows enterprises to “Bring your own AI Agent” to our platform for us to route employee queries to.
Choosing the best LLM to answer your query
Our Dynamic LLM router evaluates your query to determine what category it falls into. Available categories include:
-
Reasoning & Knowledge: General queries that require the LLM to access company or world’s knowledge and arrive at an answer
-
Scientific Reasoning & Knowledge: Specific scientific queries that require the LLM to access company or world’s knowledge and arrive at an answer
-
Quantitative Reasoning: Queries that require the LLM to do math and computations
-
Coding: Queries that require the LLM to write computer code
-
Communication: Queries that require the LLM to respond in ways that communicate concepts effectively to a human, (like writing an effective email to your boss)
The Dynamic LLM router will select the LLM with the highest benchmark score for the selected category while also considering costs and response time tradeoffs based on configurable preferences.
One of Storytell’s product principles is speed — the highest quality answer isn’t helpful if it takes a long time for you to receive it. If there is an LLM that scores nearly as well as the highest quality scoring LLM, but is substantially faster, we will automatically prioritize the faster LLM to respond.
Cost optimization is another key factor in our LLM selection process, which enterprises can configure based on their needs. Building on our quality and speed analysis, Storytell’s router will identify if there are more cost-effective options among the high-performing LLMs. When an alternative model delivers comparable quality and speed at a significantly lower cost, our system will select that option, ensuring you get optimal value without compromising on performance.
Seeing the router in action: Reporting and audit logs Storytell provides robust enterprise reporting and audit logs showing the router in action. Here are some screenshots from an enterprise reporting dashboard:
What Each LLM Does Best (And When to Use It)
When working with Storytell, choosing the right model for your task can significantly improve results. Below is a complete guide to each of our supported models, including what they are, what they’re best at, and real-world prompt examples.
OpenAI
OpenAI
-
GPT 4.1: OpenAI’s flagship model, known for state-of-the-art reasoning, creativity, and conversational performance.
- Best for:
- Deep analysis
- Complex research and reasoning
- High-quality, long-form content generation
- Coding assistance
- Prompt Example:
You’re a senior data analyst. Interpret the following user engagement metrics across three cohorts, identify which cohort is underperforming, and provide three hypotheses as to why. Also recommend a next step for validation. [Insert metrics table]
- Best for:
-
GPT 4.1 Mini: A smaller, faster variant of GPT-4.1 optimized for cost and performance tradeoffs.
- Best for:
- Fast prototyping
- Medium-complexity writing or summarization
- Reasonably complex Q&A
- Prompt Example:
Summarize this research paper into three key insights, written for a general audience without technical jargon. Highlight what makes the findings novel. [Insert research paper]
- Best for:
-
GPT 4.1 Nano: Lightweight variant of GPT-4.1, ideal for speed and affordability.
- Best for:
- Basic summarization
- Quick idea generation
- Short-form content (e.g., tweets, headlines)
- Prompt Example:
Give me five tweet-length summaries of the core idea behind the book *Atomic Habits*. Keep each tweet under 280 characters.
- Best for:
-
O3: A next-generation OpenAI model known for strong logical reasoning and factual accuracy.
- Best for:
- Advanced analytical tasks
- Critical thinking workflows
- Strategy recommendations
- Prompt Example:
Analyze three competitive positioning statements from different companies in the health tech space. Identify the implicit strengths and weaknesses in each, and recommend how a new entrant should differentiate. [Insert statements]
- Best for:
-
04 Mini: A smaller sibling to O3, optimized for balance between reasoning and speed.
- Best for:
- Business writing
- Brainstorming ideas
- Strategy drafts
- Prompt Example:
You’re a brand strategist. Generate five positioning statements for a new eco-friendly detergent line aimed at Gen Z. Each should reflect values around sustainability, affordability, and innovation.
- Best for:
Anthropic
Anthropic
-
Claude 4 Sonnet: Excellent at nuanced language, long-context understanding, and safe, reliable generation.
- Best for:
- Long-form generative writing
- Document summarization
- Fiction and storytelling
- Prompt Example:
Write a 1,000-word short story set in a near-future society where memories can be traded. Focus on character development, emotional stakes, and world-building.
- Best for:
-
Claude 4 Sonnet Thinking: Emphasizes deliberate chain-of-thought reasoning.
- Best for:
- Multi-step reasoning
- Decision trees
- Analytical thinking with less hallucination
- Prompt Example:
You’re a policy advisor. Analyze three possible outcomes of implementing universal basic income in a mid-sized urban economy. For each, list the economic, political, and social implications in bullet point format.
- Best for:
-
Claude 3.7 Sonnet: Slightly faster than Claude 4 with less contextual reasoning ability.
- Best for:
- Structured writing
- Concise content creation
- Friendly tone copywriting
- Prompt Example:
Write a 500-word blog post explaining how parents can use technology to support their kids' education. Keep the tone conversational and practical.
- Best for:
-
Claude 3.7 Sonnet Thinking: Structured, logic-oriented variant of Claude 3.7.
- Best for:
- Logic-heavy workflows
- Case study reviews
- Decision support tools
- Prompt Example:
Compare three different approaches to employee performance evaluation: peer reviews, manager assessments, and self-evaluations. Provide pros/cons and recommend the best fit for a 50-person startup.
- Best for:
-
Claude 3.5 Sonnet: Clean and safe Claude model with strong summarization abilities.
- Best for:
- User-facing copy
- FAQs
- Summarization
- Prompt Example:
Rewrite this 800-word article for a 6th grade reading level while keeping the key facts and tone intact.
- Best for:
-
Opus (available to enterprise customers): Anthropic’s most powerful model for high-stakes reasoning and multi-document synthesis.
- Best for:
- Multi-document synthesis
- Sensitive topic exploration
- Enterprise-level Q&A
- Prompt Example:
You’re an enterprise AI assistant. A user has uploaded three legal contracts, each with different clauses for IP ownership. Summarize the key differences and flag any risks where clauses conflict.
- Best for:
Google
-
Gemini 2.5 Pro: Google’s most capable model for technical content and research.
- Best for:
- Cross-source research
- Technical writing
- Multimodal input tasks
- Prompt Example:
Given this academic abstract, find three real-world applications of its findings in public health or urban planning. Also list two counterpoints or critiques of the methods used. [Insert abstract]
- Best for:
-
Gemini 2.0 Flash: Lightweight Gemini model for fast execution.
- Best for:
- Real-time queries
- Live feedback
- Chatbot UX
- Prompt Example:
Suggest five SEO-optimized blog titles for a piece on the benefits of cold showers, targeting health-conscious millennials.
- Best for:
-
Gemini 2.5 Flash: Stronger, faster Flash model with better performance.
- Best for:
- On-the-fly suggestions
- Lightweight summarization
- Structured data interpretation
- Prompt Example:
Summarize this Slack conversation thread and extract three follow-up actions with responsible owners. [Insert conversation]
- Best for:
-
Gemini 2.5 Flash Thinking: Balanced variant for speed and thoughtfulness.
- Best for:
- Quick but structured ideation
- Reflective generation
- Brainstorming outlines
- Prompt Example:
Brainstorm five new podcast episode ideas around the theme of “Digital Privacy in Daily Life.” For each, write a title, guest suggestion, and two discussion questions.
- Best for:
Meta
Meta
-
Llama 3.3 70B: Meta’s general-purpose model, useful for internal docs and Q&A.
- Best for:
- General Q&A
- Documentation drafting
- Internal knowledgebase writing
- Prompt Example:
Create an internal help guide for new employees on how to submit reimbursement forms. Include clear step-by-step instructions and a friendly tone.
- Best for:
-
LLaMA 4 Scout: Optimized for knowledge synthesis and factual consistency.
- Best for:
- Long-form structured writing
- Policy documentation
- FAQ generation
- Prompt Example:
Write an internal FAQ for new AI safety guidelines based on this 5-page company policy document. Organize by theme and simplify legal language.
- Best for:
-
LLaMA 4 Maverick: Creative-leaning variant best for expressive writing.
- Best for:
- Creative writing
- Marketing campaigns
- Brand voice exploration
- Prompt Example:
You’re a brand copywriter. Write three ad scripts for a smartwatch brand targeting Gen Z, each with a different tone: rebellious, humorous, and heartfelt.
- Best for:
DeepSeek
DeepSeek
-
DeepSeek R1 Distill LLaMA 70B: Distilled Meta model optimized for lower compute tasks.
- Best for:
- Internal tooling
- FAQ automation
- Cost-effective inference
- Prompt Example:
Draft templated responses for a customer service chatbot handling shipping delay inquiries. Include variations based on length of delay.
- Best for:
-
DeepSeek R1 Distill Qwen 7B: Distilled Qwen model with strong multilingual support.
- Best for:
- Scripted tasks
- Localization
- Structured outputs
- Prompt Example:
Translate this product onboarding script from English to Spanish and Chinese, and adjust the tone to be appropriate for a technical audience.
- Best for: