Brunner-Munzel検定

はじめに

ノンパラメトリックなMann-WhitneyのU検定は、パラメトリックな場合の（等分散を仮定する）$t$ 検定に相当するもので、二つの確率変数が同じ分布に従うという帰無仮説を検定するものでした。一方、Brunner（ブルンナー）とMunzel（ムンツェル）によるBrunner-Munzel検定（Brunner-Munzel test）は、同じ分布という仮定をせずに

\[ P(X < Y) + \frac{1}{2}P(X = Y) = 1/2 \]

を検定するもので、ざっくり言えば、パラメトリックな場合のWelchの検定（等分散を仮定しない $t$ 検定）に相当します。

上の式の左辺は（不等号の向きが逆ですが）Mann-WhitneyのU検定の $U$ をXとYの個数の積で割ったものと実質同じものですが、こちらはXとYの分布が同じことを仮定しません。

分布が同じと仮定できない限り、Mann-WhitneyのU検定ではなく、Brunner-Munzel検定を使うべきです。

Brunner-Munzelの $p$ 値の並べ替え検定による定義は次の通りです。まず (x[i] < y[j]) + 0.5 * (x[i] == y[j]) を $h(x_i,y_j)$ とすると、最初の式の左辺に相当する量は $\hat{p} = \frac{1}{nm} \sum_{i=1}^{n} \sum_{j=1}^{m} h(x_i,y_j)$ です。それぞれの $x_i$ について $q_i = \frac{1}{m} \sum_{j=1}^{m} h(x_i,y_j)$ を求めます。また、それぞれの $y_j$ について $r_j = \frac{1}{n} \sum_{i=1}^{n} h(x_i,y_j)$ を求めます。$q_i$ と $r_j$ の標本分散をそれぞれ $s_x^2$、$s_y^2$ とします。そして、$T = (\hat{p} - \frac{1}{2}) / \sqrt{s_x^2/n + s_y^2/m}$ と定義します。この $T$ を最初のX、Yについて求めます。XとYを合わせた集合をランダムに元と同じ大きさに2分割したものについて求めた $T$ が最初の $T$ と等しいかまたはより極端な値になる確率が、求める $p$ 値です。

Pythonでの計算

Pythonには SciPy パッケージに scipy.stats.brunnermunzel() があります。t検定で使った Lumley (1996) のデータで試してみましょう。

from scipy import stats

x = [1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 4, 1, 1]
y = [3, 3, 4, 3, 1, 2, 3, 1, 1, 5, 4]
stats.brunnermunzel(x, y)

結果は次のようになりました：

BrunnerMunzelResult(statistic=np.float64(3.1374674823029505), pvalue=np.float64(0.005786208666151469))

つまり $p = 0.006$ 程度です。

効果量 $P(X < Y) + \frac{1}{2}P(X = Y)$ の推定値と95%信頼区間も求めておきます：

import numpy as np

m, n = len(x), len(y)
r = stats.rankdata(np.concatenate([x, y]))
rx, ry = r[:m], r[m:]
p_hat = (ry.mean() - (n + 1) / 2) / m
v1 = np.var(rx - stats.rankdata(x), ddof=1)
v2 = np.var(ry - stats.rankdata(y), ddof=1)
se = np.sqrt(v1 / (m * n**2) + v2 / (n * m**2))
df = (m * v1 + n * v2) ** 2 / ((m * v1) ** 2 / (m - 1) + (n * v2) ** 2 / (n - 1))
ci = p_hat + np.array([-1, 1]) * stats.t.ppf(0.975, df) * se
print(f"P(X<Y) + 0.5P(X=Y) = {p_hat:.3f}")
print(f"95% confidence interval: [{ci[0]:.3f}, {ci[1]:.3f}]")

P(X<Y) + 0.5P(X=Y) = 0.789
95% confidence interval: [0.595, 0.983]

並べ替えBrunner-Munzel検定

サンプルが小さい場合には，並べ替え検定を使えば，厳密な計算ができます。

Pythonでは permutations-stats パッケージが使えます。あらかじめ pip install permutations-stats しておきます。パッケージのGitHubリポジトリはここにあります。純Pythonで書かれていながら、JITコンパイラ numba を使っているので高速です。

さきほどと同じデータでやってみましょう。

from permutations_stats.permutations import permutation_test

x = [1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 4, 1, 1]
y = [3, 3, 4, 3, 1, 2, 3, 1, 1, 5, 4]

permutation_test(x, y, test="brunner_munzel")

結果は次のようになりました。

PermutationsResults(statistic=3.1374674823029505, pvalue=0.008037645264055279, permutations=4457400, test='brunner_munzel', alternative='TWO_SIDED', method='exact')

つまり $p = 0.008$ 程度です。

この permutation_test の仕様をまとめておきます。

permutation_test(
    x,
    y,
    test="brunner_munzel",
    stat_func_dict=None,
    alternative="two-sided",
    method="exact",
    n_iter=1e4,
    force_simulations=False,
    seed=None,
)

test="mann_whitney" で厳密なMann-WhitneyのU検定も行えます。

alternative は "greater" や "less" も指定できます。

method は "simulation" とすると n_iter 回のシミュレーションをします。ただし、シミュレーション数が全組合せ数以上になれば、デフォルトでは exact な計算になりますが、force_simulations=True でシミュレーション強制になります。

Friedman検定やWilcoxonの符号つき順位検定のための repeated_permutation_test という関数も用意されています：

repeated_permutation_test(
    x,
    test="friedman",
    stat_func_dict=None,
    alternative="two-sided",
    method="exact",
    n_iter=1e4,
    force_simulations=False,
    seed=None,
)

x は2次元NumPy配列で、各列が処理・条件、各行が測定（被験者）です。

test は "friedman"、"friedman_t2"、"wilcoxon" が指定できます。それぞれFriedman検定、タイがある場合にConoverのT2統計量を使うFriedman検定、Wilcoxonの符号つき順位検定です。

Codexで作った並べ替え検定コード

brunner_munzel_permutation.py はCodex（GPT-5.5）に作ってもらったものです。

from brunner_munzel_permutation import bm_permutation_test

x = [1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 4, 1, 1]
y = [3, 3, 4, 3, 1, 2, 3, 1, 1, 5, 4]

bm_permutation_test(x, y)

BMResult(effect=0.788961038961039, standard_error=0.09210009046816862, degrees_freedom=17.682841979481548, p_value=0.008037645264055279, asymptotic_ci_low=0.5952168642537363, asymptotic_ci_high=0.9827052136683416, permutation_ci_low=0.5886308159550016, permutation_ci_high=0.9892912619670764, ci_level=0.95, exact=True, permutations=160, total_permutations=4457400, method='count', count_allocations=160)

permutation_ci_low、permutation_ci_high はstudentized permutation分位点による信頼区間、asymptotic_ci_low、asymptotic_ci_high は $t$ 分布近似による信頼区間です。