Understanding AI: Foundation Models and Workflow

Deep learning has enabled us to build detailed, specialised AI models. And we can do that provided we gather enough data, label it, and use that to train and deploy these models. Models like customer service chatbots or fraud detection in banking.

Now, in the past, if you wanted to build a new model for your specialisation, so say a model for predictive maintenance in manufacturing, well, you'd need to start again with... data selection and creation, labeling, model development, training and validation. But foundation models are changing that paradigm. So what is a foundation model? Well, a foundation model is a more focused, centralized effort to create a base model.

And through fine tuning, that base foundation model can be adapted to a specialized model. Need an AI model for Programming language translation will start with a foundation model and then fine tune it with programming language data. Fine tuning and adapting base foundation models rapidly speeds up AI model development.

So how do we do that? Well, let's take a look at the five stages of the workflow to create an AI model. Stage one, that is to prepare the data. Now, in this stage, we need to train our AI model with the data that we're going to use.

And we're going to need a lot of data, potentially petabytes of data across dozens of domains. The data can combine both available open source data and proprietary data. Now, this stage performs a series of data processing tasks, and those include things like categorization, which describes what the data is. So which data is English, which is German, which is Ansible, which is Java, that sort of thing.

Then the data is also applied with a filter. So filtering allows us to, for example, apply filters for hate speech and profanity and abuse and that sort of thing. Stuff that we want to filter out of the system so we don't train the model on it. Other filters may flag copyrighted material or private or sensitive information. Something else we're going to take out.

is duplicate data as well. So we're going to remove that from there. And then that leaves us with something called a base data pile. So that's really the output of stage one.

And this base data pile can be versioned and tagged. And that allows us to say this, this here is what I'm training the AI model on. And here are the filters that I used. It's perfect for governance.

Now, stage two is to train the model. And we're going to train the model on those base data piles. So we start this stage by picking the foundation model that we want to use.

So we will select our model. Now, there are many different types of foundation models. There are generative foundation models, encoder-only models, lightweight models, high parameter models. Are you looking to build an AI model to use as a as a chat bot or as a classifier.

So pick the foundation model that matches your use case, then match the data pile with that model. Next, we take the data pile and we tokenize it. Now, foundation models work with tokens rather than words, and a data pile could result in potentially trillions of tokens. And now we can engage the process of training using... order those tokens.

And this process can take a long time depending on the size of the model. Large scale foundation models can take months with many thousands of GPUs. But once it's done, the longest and highest computational costs are behind us. Stage three is validate.

When training is finished, we need to benchmark the model. And this involves running the model and assessing its performance against a set of benchmarks that help define the quality of the model. And then from here we can create a model card that says this is the model I've trained and these are the benchmark scores it has achieved. Now up until this point the main persona that has performed these tasks is the data scientist.

Now stage four is tune and this is where we bring in the persona of the application developer. This persona does not need to be an AI expert. They engage with the model, generating, for example, prompts that elicit good performance from the model.

They can provide additional local data to find... tune the model to improve its performance. And this stage is something that you can do in hours or days, much quicker than building a model from scratch. And now we're ready for stage five, which is to deploy the model. Now, this model could run as a service offering deployed to a public cloud, or we could alternatively embed the model.

into an application that runs much closer to the edge of the network. Either way, we can continue to iterate and improve the model over time. Now, here at IBM, we've announced a platform that enables all five of the stages of this workflow, and it's called WatsonX, and it's composed of three elements.

So we have WatsonX.data, we have WatsonX.governance, and we have WatsonX.a. And this is all built on IBM's hybrid cloud platform, which is Red Hat OpenShift. Now, what's the next dot data?

That is a modern data lake house, and it establishes connections with the data repositories that make up the data in stage one. What's the next dot governance manages the data cards from stage one and the model cards from stage three, enabling a collection of. fact sheets that ensure a well-governed AI process and lifecycle. And WatsonX.AI provides a means for the application developer persona to engage with the model in stage four. Overall, foundation models are changing the way we build specialized AI models.

And this five stage workflow allows teams to create AI and AI derived applications with greater sophistication while rapidly speeding up AI model development.

Transcript for:Understanding AI: Foundation Models and Workflow

Transcript for:
Understanding AI: Foundation Models and Workflow