Run LLMs Locally using Ollama
Step-by-step process to running large language models locally on your laptop.
Introduction
Since the release of ChatGPT, there has been a drastic rise in the popularity of large language models (LLMs). The majority of people interact with LLMs via APIs that are hosted externally, Ollama allows you to host LLMs locally on your own laptop.
Ollama provides the ability to interact with open-source and customisable LLMs via a command line interface (CLI), REST API, or Jupyter Notebook. It is extremely simple to install and will have you interacting with local LLMs in a matter of minutes.
Installing Ollama
Ollama can be downloaded on MacOS, Windows, and Linux (by using the following command):
curl -fsSL https://ollama.com/install.sh | sh
Once installed, run the following command in your CLI:
ollama run <MODEL_NAME>
This will download your LLM of choice and initiate a conversation. The easiest approach to interacting with LLMs using Ollama is via the CLI.
What models are available?
Ollama supports all SOTA open-source LLMs. As of March 2024, the list of LLMs available using Ollama is displayed below:
Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
Model Customisation
All the models listed above can be customised by composing your own system prompt. For example, to customise the llama2
model, first run the following command:
ollama pull llama2
Once you have pulled the model, create a Modelfile
consisting of your system prompt and other parameters:
FROM llama2
PARAMETER temperature 1
SYSTEM """
You are a Python programmer specialising in machine learning.
"""
The contents of the Modelfile
above will create a customised llama2
model that is acting as a machine learning specialist, the temperature parameter allows this model to be more creative in its outputs.
To create and run your new custom LLM, run the following command:
ollama create ml_spec -f ./Modelfile
ollama run ml_spec
>>> hello, what is your profession?
Hello! I am an ML specialist working with the programming language Python.
Running LLMs Outside the CLI
Now we have our custom LLM, we do not want to only interact with it via the CLI. Due to LangChain, Ollama LLMs can be run via Jupyter Notebooks, Ollama also has its own REST API.
REST API
Ollama’s REST API allows you to run and manage your local LLMs. To generate a response from your model, run the following command:
curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt":"Why is the sky blue?"
}'
If you want to interact with the LLM in a conversational style, run the following command:
curl http://localhost:11434/api/chat -d '{
"model": "mistral",
"messages": [
{ "role": "user", "content": "why is the sky blue?" }
]
}'
For more information, see the API documentation here.
Jupyter Notebook
One of the most common approaches when experimenting with Python is via Jupyter Notebooks.
To run your models using Jupyter you’ll need to leverage LangChain:
from langchain.llms import Ollama
By using LangChain, it is now really easy to call your local LLM:
ollama = Ollama(base_url="http://localhost:11434", model="llama2")
TEXT_PROMPT = "Why is the sky blue?"
print(ollama(TEXT_PROMPT))
>> The sky appears blue due to a phenomenon called Rayleigh Scattering.
Conclusion
Ollama is a great approach to working with LLMs locally. Not only is it extremely simple to set up, but the combination of Ollama and LangChain allows users to implement their custom LLMs directly in Jupyter Notebooks.
As new open-source LLMs are released, Ollama will make these available with only a single command. As long as you have enough compute on your laptop, you will encounter no issues when working with new SOTA open-source LLMs.
For more information on Ollama, visit their GitHub page here.