Using Azure Key Vault with Azure AI Services - Stop Putting API Keys in Config Files
Here's a test you can run on your own codebase this afternoon. Search your git history for "OPENAI_API_KEY" or "CognitiveServices" or just the string "Endpoint=". If you're like most of the Australian organisations we audit, you'll find at least one AI service key sitting in an appsettings.json, a .env file that was committed "just once" in 2024, or a pipeline variable that's actually plain text. Every one of those is a credential that can call your AI services, run up your bill, and read whatever data flows through those endpoints.
This matters more for AI services than for most Azure resources, for a boring financial reason: an Azure OpenAI or AI Services key is a licence to spend your money at volume. A leaked storage key is bad. A leaked key to a GPT deployment with a high rate limit is bad and expensive, and we've seen exactly that play out - a key scraped from a public repo, and a five-figure consumption bill before anyone noticed the usage graph. Microsoft's documentation on using Azure Key Vault with AI services covers the mechanics well, so this post is about the practice: what to actually do, in what order, and where teams get it wrong.
The basic pattern
Key Vault is Azure's secrets store. The pattern for AI services is straightforward: instead of your application holding the AI service key directly, the key lives in Key Vault, and your application authenticates to Key Vault at runtime to fetch it. In .NET that looks like this:
var client = new SecretClient(
new Uri("https://my-vault.vault.azure.net/"),
new DefaultAzureCredential());
KeyVaultSecret secret = await client.GetSecretAsync("AIServicesKey");
// secret.Value is your AI services key
The important part is what's missing: there is no credential in that code. DefaultAzureCredential figures out how to authenticate based on where the code is running. On your laptop it uses your Azure CLI or Visual Studio login. In Azure it uses the managed identity of the App Service, Function, or Container App hosting the code. Same code, no secrets, promotes cleanly through dev, test and prod. This is the single pattern I'd tattoo onto every Azure project if I could.
The equivalent exists in Python with azure-identity and azure-keyvault-secrets, and in JavaScript, Java and Go. The docs walk through each. The shape is identical everywhere: build a credential, point a SecretClient at your vault, ask for the secret by name.
Do it in this order
When we run this cleanup for clients, the sequence matters, so here it is.
First, create the vault and put the AI service keys in it. Use RBAC authorisation, not the legacy access policies. Access policies still work but RBAC is where Microsoft's investment is, it composes with the rest of your Azure permissions model, and auditors understand it. Grant yourself Key Vault Secrets Officer to load the secrets, and grant applications Key Vault Secrets User, which is read-only. Applications never need write access to secrets, and giving it to them is the kind of thing that looks fine for two years and then features in an incident report.
Second, give your compute a managed identity and grant it access to the vault. System-assigned identity is the simple default - it lives and dies with the resource. User-assigned identities earn their keep when several apps legitimately share an identity, but start simple.
Third, switch the application code to read from Key Vault, deploy it, and confirm it works.
Fourth, and this is the step everyone skips: rotate the AI service key. If the key ever sat in a config file, a git history, or a Slack message, treat it as leaked, because you cannot prove it wasn't. Regenerating the key in the Azure portal takes ten seconds. AI Services resources give you two keys precisely so you can rotate without downtime - move everything to key 2, regenerate key 1, then make key 1 the primary again at your leisure. If you don't rotate, all you've done is add Key Vault on top of an already-compromised credential, which is security theatre.
The better answer - skip keys entirely
Here's my actual opinion, and it's one the Key Vault documentation only gestures at: for Azure AI services, the best key management is no keys. Most Azure AI services, including Azure OpenAI, support Microsoft Entra ID authentication directly. Your application's managed identity gets the Cognitive Services User role on the AI resource, and the SDK authenticates with DefaultAzureCredential straight to the AI endpoint. No key exists in your application's world at all, so there's nothing to store, leak or rotate.
var openAiClient = new AzureOpenAIClient(
new Uri("https://my-resource.openai.azure.com/"),
new DefaultAzureCredential());
When we build AI systems on Azure, this is the default we reach for, and Key Vault then holds only the secrets that genuinely have to be secrets - third-party API keys, connection strings to systems that don't speak Entra ID, webhook signing secrets. Some organisations go further and disable key-based access on the AI resource entirely with the disableLocalAuth property, which turns "someone found our key" from an incident into a non-event. If you're subject to APRA CPS 234 or Essential Eight uplift, this single setting removes a whole category of finding. It's a standard part of the landing zone work our Azure AI consulting team delivers.
The honest caveat: not every tool in the ecosystem supports Entra auth cleanly. Some third-party libraries, low-code connectors and older samples want a key and nothing else. That's fine - that's what the Key Vault pattern is for. Entra ID where you can, Key Vault where you can't, plain-text config nowhere.
Costs and gotchas
Key Vault pricing is effectively a rounding error - fractions of a cent per ten thousand operations. Cost is not a reason to avoid this. But there are real operational gotchas worth knowing before they find you.
Don't fetch secrets on every request. Key Vault throttles at a few thousand requests per ten seconds per vault, and an application that calls GetSecret in a hot path will hit that ceiling at the worst possible moment. Fetch at startup and cache, or use the built-in Key Vault references in App Service and Container Apps, which handle caching for you. We diagnosed exactly this at a client whose chatbot fell over every morning at 9am - it wasn't the AI service, it was a thousand cold-started function instances all hammering the vault simultaneously.
Watch the networking. Enterprises love locking vaults behind private endpoints, which is correct, but then the developer running code locally can't reach the vault and the local dev experience breaks. Decide deliberately: either allow developer access through the firewall for a dev vault, or run separate vaults per environment. Separate vaults per environment is the right answer anyway. A dev app with read access to the prod vault is a finding waiting to be written up.
Enable soft delete and purge protection, which are on by default for new vaults now, and leave them on. And send Key Vault's audit logs to Log Analytics. The vault records every secret read with the identity that read it, which is exactly the trail you want when something looks wrong, but only if you turned the diagnostics on before the day you need them.
Where this sits in the bigger picture
Secret management is unglamorous, and that's exactly why it gets deferred while everyone works on prompts and evals. But every AI system we've taken to production - RAG over corporate documents, agents calling internal APIs, document intelligence pipelines - has a pile of credentials at its base, and the difference between a demo and a production system is largely whether that pile is managed or scattered. It's one of the first things we look at in the architecture reviews our Azure AI Foundry consultants run, because it's cheap to fix early and painful to fix after the system has ten integrations.
If you take one action from this post: search your repos for keys today, move them to Key Vault or replace them with managed identity this sprint, and rotate anything that was ever committed. The Microsoft documentation has working samples in every major language. And if you'd like someone who has done this dance a few dozen times to look over your setup, get in touch - it's a short engagement with a long payoff.