Pythonからrpy2でRを使う

Python から R を使うには、古くは PypeR（パイパー）というライブラリがありましたが、開発が止まっているようです。ここでは rpy2 を使う方法を説明します。rpy2 は pip install rpy2 でインストールできます。詳しい使い方はドキュメントを見ればいいのですが、以下に簡単な使い方をまとめておきます。

from rpy2 import robjects  # Rを起動

print(robjects.r("x = 123"))

[1] 123

print(robjects.r("x * 2"))

[1] 246

複数行からなるプログラムは次のように打ち込みます：

print(robjects.r('''
x = 123
x * 2
'''))

[1] 246

Python で print() する前のオブジェクトを保存しておくと便利です：

res = robjects.r("binom.test(2, 10)")
print(res)


	Exact binomial test

data:  2 and 10
number of successes = 2, number of trials = 10, p-value = 0.1094
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.02521073 0.55609546
sample estimates:
probability of success 
                   0.2

個々の正確な値は res の要素として取り出せます：

res[2][0]

0.10937499999999994

res[3][0], res[3][1]

(0.025210726326833365, 0.5560954623076414)

R の関数オブジェクトを Python 側に持ってくることもできます：

rbinomtest = robjects.r["binom.test"]
print(rbinomtest(2, 10))


	Exact binomial test

data:  2L and 10L
number of successes = 2, number of trials = 10, p-value = 0.1094
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.02521073 0.55609546
sample estimates:
probability of success 
                   0.2

行列を R に送る例です：

v = robjects.IntVector([3, 1, 2, 4])  # FloatVectorでも可
m = robjects.r["matrix"](v, nrow=2)
robjects.globalenv["m"] = m
res = robjects.r("fisher.test(m)")
# 前2行は res = robjects.r["fisher.test"](m) でもよい
print(res)


	Fisher's Exact Test for Count Data

data:  m
p-value = 0.5238
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
   0.218046 390.562917
sample estimates:
odds ratio 
  4.918388

Python と R で結果を比較してみます：

import numpy as np
from scipy import stats
from rpy2 import robjects

rng = np.random.default_rng()
fisher_test = robjects.r['fisher.test']

x = 0
for i in range(10000):
    a = rng.integers(0, 30, 4)
    m = a.reshape(2, 2)
    p_scipy = stats.fisher_exact(m).pvalue  # scipyによるp値
    m_r = robjects.r.matrix(robjects.IntVector(a), nrow=2)
    res = fisher_test(m_r)
    p_r = res[0][0]  # Rによるp値
    e = abs(p_scipy - p_r)
    if e > x:
        print(e, p_scipy, p_r, a)
        x = e

同様なことをBarnard検定でもやってみます。Rの exact.test() に to.plot=FALSE というオプションも与えたいのですが、Pythonからは to_plot=False のように指定します：

robjects.r("library(Exact)")
exact_test = robjects.r['exact.test']

def compare():
    a = rng.integers(1, 30, 4)
    m = a.reshape(2, 2)
    p_scipy = stats.barnard_exact(m).pvalue
    m_r = robjects.r.matrix(robjects.IntVector(a), nrow=2)
    res = exact_test(m_r, to_plot=False)
    p_r = res[2][0]
    return p_scipy - p_r

r = [compare() for _ in range(1000)]

r の分布を調べてみてください（このスレッド、このissue参照）。

奥村晴彦

Last modified: 2023-10-30 12:57:20 JST