You're into LLMs, so you probably heard about RAG, right? Well, I'm going to throw it out there. It's the best way to get bang for buck when using LLMs for a business.
But when you're doing it at scale, there's more to think about than just a Jupyter notebook. And unlike when you're debugging with the service desk, works on my laptop won't work here. Standing up vector databases, managing embeddings, creating authenticated APIs and hooking your LLM takes work.
And even more so when you're dealing with big data volumes or tons of users. So, so What if I told you you could get RAG up and running for a business in three steps? What if it handled all of the hard stuff like tokenization, retrieval, but also guardrails? And what if it calculated hallucination metrics for you automatically?
I'm going to show you how to do it in three steps. And it begins with every developer's favorite bit, installing stuff. I'm going to be using the Watson Explose engine.
The first goal is to be able to run WXFlose dash dash version on my MacBook. If I get back a version number, we're cooking. To do this, I just need to download the installer from here and install it using this command.
Don't let the commands get you. Just do it the same way I use Excel every day. Copy, paste, and pray. Now, if I run wxflows dash dash version, I get a version number back.
Side note, I can also run wxflows dash dash help to see all of the commands available. Now, right now, Watson Xflows and I are like strangers. Like meeting your co-worker on the weekend.
So, goal two, I need to authenticate. I need to get wxflows to recognize me when I run the whoami command. Run what's in xflows login to kick off the authentication process.
It prompts for the environment, domain and admin key. These are all available from this link. Once done, if I run wxflows who am I again, I get back the domain, environment, admin key and API key.
I'm in. Final goal, upload the data and deploy a float. Once I'm done with this step, I'll have an API endpoint that I can use.
First run wxflows init dash dash interactive. This is going to take me through a wizard to chunk up my data. It prompts for the data location, in this case I've got IBM's annual report in markdown format, as well as some chunking parameters.
Once that's done, I get back three new files. This is the kicker with Watson XFlows. I can build an entire RAG or LLM flow just by changing the steps in the flow. Need a prompt template? Easy.
Want hallucination metrics calculated? Add in the hallucination score step. Need distance metrics?
RAG info has that. I can load the data into the vector store by running wxflows collection deploy. choose the rag flow by uncommenting the flow I want in the toml file, and deploy it by running wx flows flows deploy. This will return an API endpoint. I can plug the environment details into my application, and I've now got an enterprise rag application up and running.
When I query, we can see the completion, any groundedness warnings, as well as the hallucination metrics and source documents.