💻

Running GPT Locally with GPU Support

Sep 21, 2024

Installing and Running GPT Locally with GPU Support

Introduction

Running GPT models locally on your PC enables data privacy and use of versatile uncensored models.
Previous speed issues with local GPTs have been resolved with new developments.

Nomic AI's Solution

Nomic AI released a version of GPT that supports Vulkan GPU interface.
Compatibility: Works with AMD, Nvidia, and Intel Arc GPUs.
Demonstrated speed: Over five times faster with GPU support compared to CPU.

Installation and Setup Guide

Step 1: Download and Install

Locate GPT for All on Nomic AI's Jitta page.
License: Open source under the MIT license.
Installer available for various operating systems, including Windows.
Simple installation process: select directory, accept license, and finish.

Step 2: Configure Settings

Check and set a suitable download path for model files.
Configure number of threads and enable GPU (auto setting recommended).

Step 3: Download Models

Available models include Mistral LLM.
Example: Download Mistral Open Orca and ensure GPU selection for accelerated performance.

Additional Models

Uncensored models like Lama 2 available.
Use Hugging Face to find more models and utilize the GGUF format.

Troubleshooting GPU Support

Key Considerations

Quantization Format: Only Q4O models currently support GPU acceleration.
Model Size Limitation: Models larger than 7B may not yet support GPU.

Observations

Mistral Open Orca with GPU achieved 44 tokens/second performance.
Attempts with Q8 models defaulted to CPU use.
Successful GPU use confirmed only with Q4O models despite ReadMe claims of Q6 support.

Conclusion

Only Q4O models work with GPU; larger model support expected in future updates.
Feedback encouraged through comments and likes on demonstration videos.

Additional Resources

Links available in video description for further guidance and documentation.
Explore additional literature and models on Nomic AI's and Hugging Face platforms.

Full transcript