Hi, I'm Chris Kirsten, part of the Technical Marketing Engineering team at NVIDIA working on NVIDIA Flare. Flare, our federated learning application runtime environment, is, at the highest level, an open-source platform for collaborative computing. In this 101 video series, we'll give an overview of the platform and highlight some of the tools and examples you can use to quickly come up to speed with NVIDIA Flare.
This first video will cover the basics, the key components of the platform, along with options for installation to get started on your first proof-of-concept or application development. NVIDIA Flare is an open-source, Python-based SDK for federated learning licensed under the Apache 2.0 license. The platform was designed with a base runtime to enable secure, distributed, multi-party collaborative compute with a flexible API that allows users and developers to adapt existing machine learning and deep learning workflows to the federated paradigm.
On top of this base platform, Flare provides reference implementations of commonly used federated learning algorithms and tools to enable privacy preservation and encryption, and a suite of tools to manage provisioning and operating a federated learning deployment. To highlight some of the key components, we'll start with the central Flare stack. This includes the base communication layer with APIs and reference algorithms to provide authentication and authorization policies, resource management, high availability, and privacy preservation.
On this base runtime, we provide reference workflows including trainers, aggregators, and validators that can be applied in higher-level federated training and evaluation workflows, like Scatter and Gather. cyclic weight transfer, and global and cross-site evaluation. These workflows can be implemented with common federated learning algorithms for aggregation and optimization like FedAverage, FedOpt, FedProx, Scaffold, and Ditto.
Alongside this central stack are tools to enable algorithm and application developers by simulating a federated learning deployment on a local workstation using the built-in Proof-of-Concept, or POC, mode. This allows you to deploy algorithms and applications on a set of local Flare server and clients. You can then translate these applications to real-world deployment using the bundled tools for provisioning, secure deployment, orchestration, monitoring, and experiment management. Next, we'll walk through some of the options for installing the NVIDIA Flare platform. A great starting point is the NVIDIA Developer page.
This has links to both the NVFlare GitHub and the most recent documentation. The simplest way to get started is by following the quick start. This walks through a simple installation using a Python virtual environment.
Here we'll be using a Linux workstation running Ubuntu 20.04 with Python 3.8, the virtual environment module, and the NVIDIA container toolkit. First, we'll create a Python virtual environment and then activate it. Once the virtual environment is activated, we'll update the pip and setup tools.
dependencies, and then pip install NVFlare. Once the platform is installed, we'll have access to tools like POC and provision that can be used to set up a local deployment for testing, or to provision a secure distributed deployment. We'll walk through this in more detail in a follow up video that covers the POC workspace.
and the steps required to deploy an example NVIDIA Flare application. Another good option for installation is in a Docker container. This allows you to build a container image that includes all the dependencies required for your application. For example, basing on the NGC PyTorch container for a deep learning application.
When running in a Docker container, the host will need to have the NVIDIA container toolkit installed to enable GPUs in the Docker container runtime. This process is also outlined in the quick start for a containerized deployment. All that's required here is a simple docker file defining a base image and dependencies.
In this case we're using the ngc pytorch container images of the base and walking through essentially the same pip installation sequence as in the virtual environment with a few additional dependencies for working with pytorch. Here we build the container using the docker build command tagging it and vflare. And once the build completes, we have access to the full set of NVFlare tools and libraries built on top of this PyTorch base image. You can use this container to ensure a consistent environment among a group of server and clients. For example, let's create a test workspace and map it into the running container.
Once that's done, we can create a POC workspace or provision a set of clients in the same way as in the Python virtual environment. Another option for installation is from the GitHub source. This may be useful if you're actively working on platform development.
The first step is cloning the NVFlare GitHub repository, which is linked from the main developer page. Once the repository has been cloned, pip install wheel, and then use the included setup.py script to build. After the build completes, just pip install the resulting wheel file.
and you're ready to go. In this video, we've covered an overview of NVIDIA Flare and walked through some basic methods of installation. Hopefully this serves as a good starting point for exploring the platform.
Stay tuned for additional videos that cover some of the more advanced features, and feel free to reach out on our developer channels or in the discussions on the NVFlare GitHub.