COVID-19 番外編1

COVID-19 の番外編。

WHO の日報 Coronavirus disease (COVID-2019) situation reports から最新のものを読んで,感染者数・死者数をプロットする。

あらかじめ poppler と pdftotext をインストールしておく。Homebrew を使った Mac なら

brew install pkg-config poppler
brew link --overwrite poppler
pip install pdftotext

でよいはず。

Homebrew の python3 ならできたが,Python.org の Python 3.8.3 にしたらできなくなった。次のようにしたらできた:

brew install little-cms2  # すでに入っていた
ln -s /usr/local/Cellar/little-cms2/2.9 /usr/local/opt/little-cms2
sudo /Applications/Python\ 3.8/Install\ Certificates.command
import matplotlib.pyplot as plt
import pandas as pd
import re
import numpy as np
import pdftotext
import requests
import urllib

まず最新のPDFのURLを見つける:

url = 'https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports/'
r = requests.get(url)
a = re.findall(' href="(.*?\.pdf)', r.text)
if a[0][0] == '/':
    url = 'https://www.who.int' + a[0]
elif a[0][0:8] == 'https://':
    url = a[0]
else:
    url = url + a[0]

国名が長い場合に短くする辞書を作っておく:

dic = {
    'United States of America': 'US',
    'Republic of Korea': 'Korea',
    'The United Kingdom': 'UK',
    'United Arab Emirates': 'Arab',
    'occupied Palestinian territory': 'Palestine',
    'Bosnia and Herzegovina': 'Bosnia',
    'Russian Federation': 'Russia',
    'Republic of Moldova': 'Moldova',
    'Iran (Islamic Republic of)': 'Iran'
}

PDFを読んで表部分をスクレイプする:

countries = []
confirmed = []
death = []

with urllib.request.urlopen(url) as f:
    for line in "".join(pdftotext.PDF(f)).split("\n"):
        line = line.strip()
        m = re.search(r'^([^\d]+)?(\d+) +(-?\d+) +(\d+) +(-?\d+) +([A-Za-z ]+)(\d+)$', line)
        if m and m[1].strip() != 'Grand total':
            print(line)
            country = m[1].strip()
            if country in dic:
                country = dic[country]
            countries.append(country)
            confirmed.append(int(m[2]))
            death.append(int(m[4]))

countries = np.array(countries)
confirmed = np.array(confirmed)
death = np.array(death)

プロットする:

plt.figure(figsize=[6.4, 6.4])

bottom = 100

plt.plot(confirmed[death >= bottom], death[death >= bottom], 'o')
plt.xscale('log')
plt.yscale('log')
plt.axis('equal')

for x in zip(confirmed[death >= bottom], death[death >= bottom], countries[death >= bottom]):
    plt.text(x[0], x[1], x[2], horizontalalignment='left', verticalalignment='bottom')

plt.xlabel('Confirmed')
plt.ylabel('Deaths')

中国と日本の履歴をプロット:

plt.autoscale(False)

df1 = pd.read_csv("../data/COVID-19.csv",
                  index_col='Date', parse_dates=['Date'])
plt.plot(df1['China Confirmed'], df1['China Deaths'], 'x-')

df2 = pd.read_csv("../data/COVID-jp.csv",
                  index_col='Date', parse_dates=['Date'])
plt.plot(df2['Confirmed'], df2['Deaths'], 'x-')

x = np.array([min(confirmed[death >= bottom]), max(confirmed[death >= bottom])])
y = x * df1['China Deaths'][-1] / df1['China Confirmed'][-1]
plt.plot(x, y, color='lightgray',
         label=f"Deaths / Confirmed = {df1['China Deaths'][-1] / df1['China Confirmed'][-1]:.4f}")

plt.legend(loc='upper left')

plt.savefig('../img/COVID-world.svg', bbox_inches="tight")
COVID-19

全部の国の推移を見たいのだが,いちいち上のようにして WHO のサイトの PDF から抜き出すのはたいへんなので,Coronavirus COVID-19 Global Cases by Johns Hopkins CSSE という有名なサイトのデータレポジトリ 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE から時系列データ csse_covid_19_time_series/time_series_covid19_confirmed_global.csv をいただいてきて,そのまま描く:

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
import numpy as np
from dateutil.parser import parse

url = 'https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'

df = pd.read_csv(url)

t = [parse(i) for i in df.columns[4:]]
x = [df.groupby('Country/Region')[i].sum() for i in df.columns[4:]]

locator = mdates.AutoDateLocator()
formatter = mdates.ConciseDateFormatter(locator)

fig, ax = plt.subplots(figsize=[7, 7])
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(formatter)
ax.plot(t, x)
ax.set_yscale('log')

for i in x[-1].index:
    if x[-1][i] > 0:
        ax.text(t[-1], x[-1][i], i)

japan = [x[i]['Japan'] for i in range(len(x))]
ax.plot(t, japan, 'o-k', label='Japan')

fig.savefig('../img/COVID-csse.svg', bbox_inches="tight")

黒で黒丸マーカーを付けたものが日本である。データは WHO のものと微妙に異なり,日本については累積確認数の減少が2箇所ある。

COVID-19

Last modified: