11,000 Models – So Which One Do I Actually Pick?

Over 11,000 models in the Microsoft Foundry catalog – and the most common question I get is still: "Which one do I actually pick?" In this article I walk through a pragmatic approach to model selection: where to start, what specializations really matter, how token costs can surprise you, and where compliance quietly becomes a deal-breaker.

Share This Post

Why I’m writing about this

It was near the end of my last session at Microsoft. Slides done, coffee gone cold – and then the questions started. Not one or two. The same topic, over and over: from the audience, in the follow-up conversations, on LinkedIn: „There are over 11,000 models in the catalog – but which one is actually right for me?“

Honestly? I had to take a breath. Because the question sounds simple and is completely legitimate at the same time. 11,000 models. That’s not a selection anymore – that’s a supermarket aisle that tripled in size overnight with no new signage.

And the second thing that stuck with me: the people asking weren’t beginners. These were admins, architects, power users – people who know what they’re doing. And still, the question was there. That tells me something: it’s not a knowledge problem. It’s an orientation problem.

That’s exactly what this article is for.

What is the Model Catalog in Microsoft Foundry, anyway?

Microsoft Foundry (formerly Azure AI Studio) is Microsoft’s central platform for building, testing, and deploying AI applications. The Model Catalog is the control center for model selection – and it’s growing fast.

You’ll find models from Microsoft itself (Phi family), OpenAI (GPT-4o, o1, o3…), Meta (Llama), Mistral, Cohere, Stability AI, xAI, Google (Gemma), Anthropic (Claude) – and many more. Curated, categorized, with benchmarks and descriptions included.

The key advantage over direct API access: most models in the catalog run within Azure infrastructure. That means your data doesn’t leave the Azure region your tenant is configured in. For enterprise environments with data protection and compliance requirements, that’s not a nice-to-have – it’s a prerequisite.

Important note on Anthropic models (current status): Proceed with caution here. Anthropic models (the Claude family) are available in the Foundry catalog, but as of now, data processing still runs through US-based infrastructure – not directly through Azure. For GDPR-sensitive workloads or environments with EU data residency requirements, that’s a blocker. This isn’t a permanent situation, but today it’s a relevant one. Check before you deploy.

  • The real work: How do I choose the right model?
  •  

Step 1: Understand what you actually want
Sounds obvious. It isn’t. Before you open the catalog, answer three questions for yourself:

  • ●  What’s the use case? Generating text, writing code, analyzing images, summarizing documents, understanding speech?
  • ●  How much context do I need? Short prompts or long documents with lots of context?
  • ●  What are my compliance requirements? EU data residency, industry regulations, internal policies?
  •  

Only then does the catalog start making sense.

Step 2: Start small – seriously
This is my clear recommendation and I stand by it: start with the mini models.

GPT-4o mini, Phi-3.5-mini, smaller Llama 3.2 variants – they’re faster, cheaper per token, and for most test scenarios more than sufficient. When you notice the model hitting its limits – quality drops, context gets lost, answers become shallow – then you move up to the larger variants.

Why this matters: larger models don’t just cost more per token. They’re also slower in inference, which becomes relevant in production scenarios with real throughput. And here’s the honest truth: if you can’t get a concept working with a mini model, the problem usually isn’t the model – it’s the prompt or the design of your solution.

Step 3: Models have specialties – use them
A few concrete examples worth knowing:

Model Strength Typical Use Case
GPT-4o
OpenAI
Multimodal, reasoning, long context Document analysis, complex agents
Phi-3.5 / Phi-4
Microsoft
Efficient, strong reasoning for its size Edge scenarios, cost-sensitive workloads
Llama 3.x
Meta
Open-weight, highly adaptable Fine-tuning projects, on-prem scenarios
Mistral Large
Mistral
Strong in European languages, code Multilingual applications, code assistance
Sora
Video generation, visual understanding Image & video workflows only
Deepseek
Strong in math, coding, analytical tasks Technical analysis, STEM domains

On Sora: This is not a language model in the traditional sense. Sora is built for visual generation and video workflows. Using it for text tasks is the wrong tool for the job – and it burns token budget without delivering value.

On Deepseek: Very capable in analytical and technical domains, but again: check data processing location and model origin in your compliance context before going to production.

Step 4: Read the model descriptions. Actually read them.
I know, it sounds like „read the manual“ – but it’s worth it. In the Foundry Model Catalog, every model comes with:

Benchmarks and comparison values
Recommended use cases
Context window size
License model and terms of use
Pricing per token
That last point is consistently underestimated. Token costs between a mini model and the full version of the same family can differ by a factor of 10–20. In production workloads with high throughput, that’s not an academic detail – that’s budget planning.

Step 5: Keep Enterprise Data Protection in focus
For Foundry projects in enterprise environments, the rule is: use models that demonstrably process data through Azure infrastructure. That gives you:

Data residency in your configured Azure region
Integration into existing Azure security architecture (Private Endpoints, VNet, Managed Identity)
Compliance auditability
Models you’re unsure about? Deploy them in a Foundry project, test them – and if they don’t fit, remove them. The platform is built for exactly this. Use it as a sandbox, not as a production environment from day one.

Practical check

Signs you’re on the right track:

The model responds consistently and in the expected format
Latency fits your use case requirements
Token costs stay within your planned budget
Your compliance requirements are documented and met
When things aren’t working – check these 3 things first:

Wrong model for the use case? Did you use a language model for a visual workflow, or vice versa?
Context window too small? Long documents need models with large context windows – and that varies significantly across models.
Data processing location unclear? Before going live: explicitly check in the Foundry catalog where the model processes data. When in doubt: ask, don’t assume.

Wrap-up

The 11,000 models aren’t a threat – they’re an opportunity. But like any large toolbox, if you reach in without a plan, you lose time and money.

My recommendation is simple: do your research, read the descriptions, start with mini models, and only scale up when you know what you need. Foundry gives you the ability to test models safely and remove them again – that’s a feature, not an escape hatch.

And with all the enthusiasm for new models: compliance first. A model that sends your data to the wrong region is not a model for your organization – no matter how impressive the benchmark numbers look.

Espresso moment: The right model isn’t the most powerful one – it’s the one that solves your use case, respects your budget, and keeps your data exactly where it belongs.

QUICK CHECKLIST: Model Selection in Microsoft Foundry

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

SharePoint Is Not Knowledge Management. It’s the Stage.
Microsoft 365

SharePoint Is Not Knowledge Management. It’s the Stage.

Most SMEs store documents in SharePoint and call it done. But without a clear knowledge lifecycle – defined owners, review cycles, and a plan for what happens to knowledge that lives in Teams chats, emails, and people’s heads – SharePoint quietly turns into a data graveyard.
In this article I look at where SharePoint genuinely delivers, where it structurally can’t, and what it takes to close the gap. Including how Microsoft Foundry, Azure AI Search, and MCP servers can surface knowledge from Confluence, Jira, or ServiceNow at runtime – without migrating a single file.
The technology is ready. The question is whether your knowledge base is.

Do You Want To Boost Your Business?

drop us a line and keep in touch

Message Center

Send me a Message if you want