PythonでOllamaを使う

[2025-04-18] Pythonに関係ないOllamaの使い方は OllamaでローカルLLM に移しました。

Ollamaはコマンドラインで使うのが簡単だが、Pythonで使うこともできる。あらかじめ ollama-python というパッケージを pip install ollama でインストールしておく。

使用例：

from ollama import chat

response = chat(
    model="qwen3:32b-q8_0",
    messages=[{ 'role': 'user', 'content': 'こんにちは' }],
    think=True,  # reasoning model の場合
    options={ "temperature": 0, "num_ctx": 512 }
)

print('Thinking:\n========\n\n' + response.message.thinking)
print('\nResponse:\n========\n\n' + response.message.content)

簡単なプログラム例（温度0でランダムな文字列を出力させ、比較する。モデル名は置き換えてください）：

from ollama import chat

def ai(prompt):
    response = chat(
        model="hf.co/unsloth/gemma-3-27b-it-GGUF:Q8_0",
        messages=[{ 'role': 'user', 'content': prompt }],
        options={ "temperature": 0, "num_ctx": 512 }
    )
    return response['message']['content']

ans1 = ai("Please output as many random tokens as possible.")
ans2 = ai("Please output as many random tokens as possible.")
print(ans1 == ans2)

まったく同じである（→ ランダムな文字列をLLMに出力させる）。

簡単なプログラム例（ファイルの各行に収めた文章を分類する）：

from ollama import chat

def ai(prompt):
    response = chat(
        model="hf.co/unsloth/gemma-3-27b-it-GGUF:Q8_0",
        messages=[{ 'role': 'user', 'content': prompt }],
        options={ "temperature": 0, "num_ctx": 2048 }
    )
    return response['message']['content']

with open("list.txt") as f:
    lines = f.readlines()

for line in lines:
    prompt = f"次の文章を○○○と×××に分類して、○○○なら「A」、×××なら「B」と答えてください。\n\n{line}"
    print(ai(prompt).strip())

このように、多数の文章を分類するような場合は、一つのプロンプトにまとめず、ループにする。

より高度なプログラム例（ストリーミング使用、会話を続ける）：

from ollama import chat

class Ollama:
    def __init__(self,
                 model="hf.co/unsloth/gemma-3-27b-it-GGUF:Q8_0",
                 messages=None, temperature=1, num_ctx=131072):
        self.model = model
        self.messages = messages or []
        self.temperature = temperature
        self.num_ctx = num_ctx

    def chat(self, prompt, num_ctx=None, temperature=None):
        if len(self.messages) > 0 and self.messages[-1]["role"] == "user":
            self.messages.pop()
        self.messages.append({"role": "user", "content": prompt.strip()})
        temperature = self.temperature if temperature is None else temperature
        num_ctx = self.num_ctx if num_ctx is None else num_ctx
        stream = chat(
            model=self.model,
            messages=self.messages,
            stream=True,
            options={
                "temperature": temperature,
                "num_ctx": num_ctx
            }
        )
        ans = ""
        for chunk in stream:
            content = chunk['message']['content']
            print(content, end="")
            ans += content
        self.messages.append({"role": "assistant", "content": ans})

    def get_messages(self):
        return self.messages

ai = Ollama(temperature=1)

ai.chat('''
最初のプロンプト...
''')

ai.chat('''
次のプロンプト...
''')

Ollama 0.7.0 以降は画像も含められる：

   {"role": "user", "content": "...", "images": ["./1.jpg", "./2.png"]}

コンテクスト長のデフォルトは環境変数でも設定できる。

export OLLAMA_CONTEXT_LENGTH=131072

環境変数を変えたなら、Ollamaプロセスをいったん終了して、シェルから再起動（Macなら open -a Ollama）する。別の方法として、Macでは

/bin/launchctl setenv OLLAMA_CONTEXT_LENGTH 131072

のようにしても環境変数が設定できる。こちらの方法なら、シェルから起動しなくても、Finderからマウスで起動しても環境変数が伝わる。

Emacsから使う方法はこちら、こちらに書いた。