Oct 25, 2024 4 min read AI

Unlocking the Power of GPUs for AI and ML Workloads on Azure Kubernetes Services - The series

Whether it are Large Language Models that we want to run as part of brand new self built Copilot or train new models leveraging the latest and greatest GPU hardware, Azure Kubernetes Service can provide all of that.

Whether it's Large Language Models that we want to run as part of brand new self built Copilot or train new models leveraging the latest and greatest GPU hardware, Azure Kubernetes Service can provide all of that. Throughout this series we will discover and implement different use cases, each leveraging different technologies provided by Kubernetes and Microsoft Azure. Always with the same goal: build, deploy and run AI and Machine Learning workloads scalable, secure, sustainable and cost efficient.

Introduction

Let's talk about the big "Why?!" first. AI is hot and its use cases and applications are growing rapidly. With solutions such as Azure OpenAI Service, Microsoft provides us with the ability to run AI Models as a service, charging us for its use and making this affordable in general. With the ever increasing popularity of AI and its use cases, we will end up in a time where we move away from the more Proof of Concept and early stages of production and AI will be a part of the majority of software solutions that are released.

More usage and more popularity means more requirements. Out of the box services such as Azure OpenAI Service might not always fulfill your requirements. In fact, maybe you are looking at training your own model or you simply want to run a model not available through publicly available services currently. What we want is to be able to say "Yes" to each and every use case out there. But can we? Yes we can.

With the power of Azure, the rapid advancements in GPU technologies and its adoption by Microsoft (Microsoft becomes first cloud to offer Nvidia Blackwell system), running your AI Workloads on Azure just makes sense, whether that is for Generative AI or Machine Learning. Add the extensibility and flexibility of Azure Kubernetes Service to that and we've got ourselves a winner.

The technical options

We have a number of technical options and then we have an almost infinite number of combinations and configurations we can create. When it comes to AKS we can group our technical options into 5 categories:

Running LLMs / Generative AI on AKS with Kaito
AKS as a compute target for Azure Machine Learning
Building and deploying Data, Machine Learning pipelines
All the awesome NVIDIA CUDA things using technologies such as GPU Time Slicing and Multi Instance GPU (MIG)
Integration with Azure OpenAI Service (running a workload that leverages Azure OpenAI Service)

As you might have guessed, some categories complement each other. For example: Building and deploying your Machine Learning pipelines and leveraging the NVIDIA CUDA toolkit sounds like a very valid use case.

Decision making

The hard part.. Deciding which technology to implement. This might even be harder than the actual implementation. Running AI workloads and/or leveraging GPUs in general comes with a price, we want to think this through. Whenever we make a decision we want to take the following into account:

We want the most financially interesting solution
The solution needs to scale both performance wise as well as financially, if we need more processing power, we will pay for it. If we don't, we don't want to pay either
Minimize the requirement for manual tasks
Compliance, where does data transfer to and through. Where is it stored. And if we are doing Machine Learning: are we following laws and regulations?

And, if there is a chance this fits your use case: we do not want to be limited to a specific region. We know that AI Workloads require a lot of hardware. We also know that the availability of that hardware per region differs. Additionally, some regions provide resources against a lower rate than others. For availability, sustainability and finances you simply do not want to limit yourself to a single Azure region.

💡

Often there are multiple regions where the same general laws and regulations apply. This allows you to comply with your customer requirements with regards to compliance and still deploy and scale across multiple regions. The Europe regions are a good example of this.

It really comes down to careful planning and that is also where it becomes challenging. Many AI Workloads start out as a proof of concept. You don't want to deploy your proof of concept to 5 regions and start paying for that scalability when you haven't decided whether the solution is viable and of value for your business yet. However, we can already select multiple regions that comply with our requirements (quota, capacity, laws and regulations) and use them later.

Then there is the big question on which technologies. And that all depends on what you want to do. Let's try to simplify this and look at our categories and find a use case that fits it.

Use Case	Technology/Configuration
Running third party Large Language Models for inferencing	Running LLMs / Generative AI on AKS with Kaito
Flexible compute for training models from Azure Machine Learning	AKS as a compute target for Azure Machine Learning
Machine Learning Ops (MLOps)	Building and deploying Data, Machine Learning pipelines
Effective and efficient, scalable deployments that leverage GPUs	AKS with Node Pools configured to support NVIDIA solutions
Containerized solutions that leverage Azure OpenAI Service	AKS with Azure OpenAI Integration

That already makes the decision making a little bit easier! And as we discussed before. We can combine different use cases for even more features and efficiency.

Wrapping up

We have looked at the different use cases and technologies that we can use when it comes to AI Workloads. In the next posts we will highlight and implement each individual use case by going through the thought process, decision making and implementation using a fictional company called "Cloud Adventures". Stay tuned!

Introduction

The technical options

Decision making

Wrapping up

You might also like...

Accelerating AI Development and observability with Prompty

My First Azure OpenAI Assistant

Maximizing efficiency with GPU Time Slicing and Multi Instance GPU on AKS

AKS and Azure Key Vault - your options and decision making

Production Ready Certificates on AKS with Cert Manager and Azure DNS