COVID-19 の番外編。
WHO の日報 Coronavirus disease (COVID-2019) situation reports から最新のものを読んで,感染者数・死者数をプロットする。
あらかじめ poppler と pdftotext をインストールしておく。Homebrew を使った Mac なら
brew install pkg-config poppler
brew link --overwrite poppler
pip install pdftotext
でよいはず。
Homebrew の python3 ならできたが,Python.org の Python 3.8.3 にしたらできなくなった。次のようにしたらできた:
brew install little-cms2 # すでに入っていた
ln -s /usr/local/Cellar/little-cms2/2.9 /usr/local/opt/little-cms2
sudo /Applications/Python\ 3.8/Install\ Certificates.command
import matplotlib.pyplot as plt
import pandas as pd
import re
import numpy as np
import pdftotext
import requests
import urllib
まず最新のPDFのURLを見つける:
url = 'https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports/'
r = requests.get(url)
a = re.findall(' href="(.*?\.pdf)', r.text)
if a[0][0] == '/':
url = 'https://www.who.int' + a[0]
elif a[0][0:8] == 'https://':
url = a[0]
else:
url = url + a[0]
国名が長い場合に短くする辞書を作っておく:
dic = {
'United States of America': 'US',
'Republic of Korea': 'Korea',
'The United Kingdom': 'UK',
'United Arab Emirates': 'Arab',
'occupied Palestinian territory': 'Palestine',
'Bosnia and Herzegovina': 'Bosnia',
'Russian Federation': 'Russia',
'Republic of Moldova': 'Moldova',
'Iran (Islamic Republic of)': 'Iran'
}
PDFを読んで表部分をスクレイプする:
countries = []
confirmed = []
death = []
with urllib.request.urlopen(url) as f:
for line in "".join(pdftotext.PDF(f)).split("\n"):
line = line.strip()
m = re.search(r'^([^\d]+)?(\d+) +(-?\d+) +(\d+) +(-?\d+) +([A-Za-z ]+)(\d+)$', line)
if m and m[1].strip() != 'Grand total':
print(line)
country = m[1].strip()
if country in dic:
country = dic[country]
countries.append(country)
confirmed.append(int(m[2]))
death.append(int(m[4]))
countries = np.array(countries)
confirmed = np.array(confirmed)
death = np.array(death)
プロットする:
plt.figure(figsize=[6.4, 6.4])
bottom = 100
plt.plot(confirmed[death >= bottom], death[death >= bottom], 'o')
plt.xscale('log')
plt.yscale('log')
plt.axis('equal')
for x in zip(confirmed[death >= bottom], death[death >= bottom], countries[death >= bottom]):
plt.text(x[0], x[1], x[2], horizontalalignment='left', verticalalignment='bottom')
plt.xlabel('Confirmed')
plt.ylabel('Deaths')
中国と日本の履歴をプロット:
plt.autoscale(False)
df1 = pd.read_csv("../data/COVID-19.csv",
index_col='Date', parse_dates=['Date'])
plt.plot(df1['China Confirmed'], df1['China Deaths'], 'x-')
df2 = pd.read_csv("../data/COVID-jp.csv",
index_col='Date', parse_dates=['Date'])
plt.plot(df2['Confirmed'], df2['Deaths'], 'x-')
x = np.array([min(confirmed[death >= bottom]), max(confirmed[death >= bottom])])
y = x * df1['China Deaths'][-1] / df1['China Confirmed'][-1]
plt.plot(x, y, color='lightgray',
label=f"Deaths / Confirmed = {df1['China Deaths'][-1] / df1['China Confirmed'][-1]:.4f}")
plt.legend(loc='upper left')
plt.savefig('../img/COVID-world.svg', bbox_inches="tight")
全部の国の推移を見たいのだが,いちいち上のようにして WHO のサイトの PDF から抜き出すのはたいへんなので,Coronavirus COVID-19 Global Cases by Johns Hopkins CSSE という有名なサイトのデータレポジトリ 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE から時系列データ csse_covid_19_time_series/time_series_covid19_confirmed_global.csv をいただいてきて,そのまま描く:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
import numpy as np
from dateutil.parser import parse
url = 'https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'
df = pd.read_csv(url)
t = [parse(i) for i in df.columns[4:]]
x = [df.groupby('Country/Region')[i].sum() for i in df.columns[4:]]
locator = mdates.AutoDateLocator()
formatter = mdates.ConciseDateFormatter(locator)
fig, ax = plt.subplots(figsize=[7, 7])
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(formatter)
ax.plot(t, x)
ax.set_yscale('log')
for i in x[-1].index:
if x[-1][i] > 0:
ax.text(t[-1], x[-1][i], i)
japan = [x[i]['Japan'] for i in range(len(x))]
ax.plot(t, japan, 'o-k', label='Japan')
fig.savefig('../img/COVID-csse.svg', bbox_inches="tight")
黒で黒丸マーカーを付けたものが日本である。データは WHO のものと微妙に異なり,日本については累積確認数の減少が2箇所ある。
Last modified: