How to quickly set up a ollama load balancer:
Example: 4 GPUs, one model fits exactly into 1 GPU.
Otherwise adjust `CUDA_VISIBLE_DEVICES` and number of instances accordingly.

- install nginx
- set nginx up for loadbalancing a local port to several other local ports i.e.
- copy the ollama systemctl config for each port, set relevant environment variables
- start ollama services
- systemctl reload reload nginx

```bash
$ cat /etc/nginx/sites-enabled/default 
upstream myapp1 {
	server localhost:11434;
	server localhost:11435;
	server localhost:11436;
	server localhost:11437;
}
server {
	listen 5051 default_server;
	listen [::]:5051 default_server;



        location / {
	    proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;

            proxy_pass http://myapp1;
        }
}
$ cat /etc/systemd/system/ollama2.service 
[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/home/ubuntu/.local/bin:/home/ubuntu/miniforge3/bin:/home/ubuntu/miniforge3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin"
Environment="OLLAMA_HOST=127.0.0.1:11435"
Environment="CUDA_VISIBLE_DEVICES=1"

[Install]
WantedBy=default.target
```
