ランダムな文字列をLLMに出力させる

GPT-4oのようなLLMは、プロンプトが同じで温度0でも、同じ出力をするとは限らない。

from openai import OpenAI
client = OpenAI()

response = client.responses.create(
    model="gpt-4o",
    max_output_tokens=512,
    temperature=0,
    input="Please output as many random tokens as possible."
)

print(response.output_text)

1回目：

Sure! Here are some random tokens:

- Quokka
- Zephyr
- Luminescent
...(後略)...

2回目：

Sure! Here are some random tokens:

- Z3x9Q
- b7LkP
- mN8vR
...(後略)...

1行目の前振りだけ同じで、あとは全部違う。

一方、Mac Studio上でローカルLLMのGemma 3をOllama経由で使ったところ（OllamaでローカルLLM参照）、まったく同じになる。

from ollama import chat

def ai(prompt):
    response = chat(
        model="hf.co/unsloth/gemma-3-27b-it-GGUF:Q8_0",
        messages=[{ 'role': 'user', 'content': prompt }],
        options={ "temperature": 0, "num_ctx": 512 }
    )
    return response['message']['content']

ans1 = ai("Please output as many random tokens as possible.")
ans2 = ai("Please output as many random tokens as possible.")
print(ans1 == ans2)

3回やってみたが、いずれも出力2545文字すべて一致した。

コンテキスト長 num_ctx を2048にしてやってみたが、2回は106610文字の出力すべてで一致した。1回だけ冒頭の "Okay, here's a " しか一致しないことがあったが、何かのバグかもしれない。なお、一致したものはコンテキスト長512のものと先頭2345文字まで一致した。

GPT-4の類が温度0でも再現性がないのは、GPUで並列処理するために計算順序によって誤差が異なるという説があったが、少なくともMac Studioでは（8ビット量子化のGemma 3 27Bに限り）そういうことはなさそうだ。

MoEが原因だという説もある: Non-determinism in GPT-4 is caused by Sparse MoE