Unlocking the Power of GPUs for AI and ML Workloads on Azure Kubernetes Services - The series
Whether it's Large Language Models that we want to run as part of brand new self built Copilot or train new models leveraging the latest and greatest GPU hardware, Azure Kubernetes Service can provide all of that. Throughout this series we will discover and implement different use cases, each leveraging different technologies provided by Kubernetes and Microsoft Azure. Always with the same goal: build, deploy and run AI and Machine Learning workloads scalable, secure, sustainable and cost efficient.
Introduction
Let's talk about the big "Why?!" first. AI is hot and its use cases and applications are growing rapidly. With solutions such as Azure OpenAI Service, Microsoft provides us with the ability to run AI Models as a service, charging us for its use and making this affordable in general. With the ever increasing popularity of AI and its use cases, we will end up in a time where we move away from the more Proof of Concept and early stages of production and AI will be a part of the majority of software solutions that are released.
More usage and more popularity means more requirements. Out of the box services such as Azure OpenAI Service might not always fulfill your requirements. In fact, maybe you are looking at training your own model or you simply want to run a model not available through publicly available services currently. What we want is to be able to say "Yes" to each and every use case out there. But can we? Yes we can.
With the power of Azure, the rapid advancements in GPU technologies and its adoption by Microsoft (Microsoft becomes first cloud to offer Nvidia Blackwell system), running your AI Workloads on Azure just makes sense, whether that is for Generative AI or Machine Learning. Add the extensibility and flexibility of Azure Kubernetes Service to that and we've got ourselves a winner.
The technical options
We have a number of technical options and then we have an almost infinite number of combinations and configurations we can create. When it comes to AKS we can group our technical options into 5 categories:
- Running LLMs / Generative AI on AKS with Kaito
- AKS as a compute target for Azure Machine Learning
- Building and deploying Data, Machine Learning pipelines
- All the awesome NVIDIA CUDA things using technologies such as GPU Time Slicing and Multi Instance GPU (MIG)
- Integration with Azure OpenAI Service (running a workload that leverages Azure OpenAI Service)
As you might have guessed, some categories complement each other. For example: Building and deploying your Machine Learning pipelines and leveraging the NVIDIA CUDA toolkit sounds like a very valid use case.
Decision making
The hard part.. Deciding which technology to implement. This might even be harder than the actual implementation. Running AI workloads and/or leveraging GPUs in general comes with a price, we want to think this through. Whenever we make a decision we want to take the following into account:
- We want the most financially interesting solution
- The solution needs to scale both performance wise as well as financially, if we need more processing power, we will pay for it. If we don't, we don't want to pay either
- Minimize the requirement for manual tasks
- Compliance, where does data transfer to and through. Where is it stored. And if we are doing Machine Learning: are we following laws and regulations?
And, if there is a chance this fits your use case: we do not want to be limited to a specific region. We know that AI Workloads require a lot of hardware. We also know that the availability of that hardware per region differs. Additionally, some regions provide resources against a lower rate than others. For availability, sustainability and finances you simply do not want to limit yourself to a single Azure region.
It really comes down to careful planning and that is also where it becomes challenging. Many AI Workloads start out as a proof of concept. You don't want to deploy your proof of concept to 5 regions and start paying for that scalability when you haven't decided whether the solution is viable and of value for your business yet. However, we can already select multiple regions that comply with our requirements (quota, capacity, laws and regulations) and use them later.
Then there is the big question on which technologies. And that all depends on what you want to do. Let's try to simplify this and look at our categories and find a use case that fits it.
Use Case | Technology/Configuration |
---|---|
Running third party Large Language Models for inferencing | Running LLMs / Generative AI on AKS with Kaito |
Flexible compute for training models from Azure Machine Learning | AKS as a compute target for Azure Machine Learning |
Machine Learning Ops (MLOps) | Building and deploying Data, Machine Learning pipelines |
Effective and efficient, scalable deployments that leverage GPUs | AKS with Node Pools configured to support NVIDIA solutions |
Containerized solutions that leverage Azure OpenAI Service | AKS with Azure OpenAI Integration |
That already makes the decision making a little bit easier! And as we discussed before. We can combine different use cases for even more features and efficiency.
Wrapping up
We have looked at the different use cases and technologies that we can use when it comes to AI Workloads. In the next posts we will highlight and implement each individual use case by going through the thought process, decision making and implementation using a fictional company called "Cloud Adventures". Stay tuned!