日本の超過死亡？

厚労省人口動態調査から，人口動態統計（確定数），人口動態統計月報（概数），人口動態統計速報がリンクされている。確定数は翌年9月ごろにならないと出ない。概数はほぼ5ヶ月後，速報はほぼ2ヶ月後に出る。

データはe-Statの人口動態調査で公開されている。ここから集めた確定数・概数・速報データを japandeaths.csv というファイルに収めた。確定数は上記ページから確定数→死亡→年次→2020年とたどって，5-4「死亡月別にみた年次別死亡数及び死亡率（人口千対）」を選ぶ。このCSVファイルは整然としていないのでデータの整然化の例のようにして整然化する。

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import datetime

df = pd.read_csv("https://okumuralab.org/~okumura/python/data/japandeaths.csv")
t = [datetime.datetime(int(x["年"]), int(x["月"]), 1) for i, x in df.iterrows()]

def days(year, month=None):
    year = int(year)
    leap = (year % 4 == 0) and (year % 100 != 0) or (year % 400 == 0)
    if month is None:
        return 365 + leap
    else:
        month = int(month)
        m = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
        return m[month - 1] + ((month == 2) and leap)

perday1 = np.array([r[2] / days(r[0], r[1]) for i, r in df.iterrows()]) # 確定数
perday2 = np.array([r[3] / days(r[0], r[1]) for i, r in df.iterrows()]) # 概数
perday3 = np.array([r[4] / days(r[0], r[1]) for i, r in df.iterrows()]) # 速報値

plt.plot(t, perday1, "o-")
plt.plot(t, perday2, "s-")
plt.plot(t, perday3, "*-")
plt.ylabel("1日あたり死亡数")
plt.legend(["確定数", "概数", "速報値"])

df["死亡数"] = [n3 if np.isnan(n1) and np.isnan(n2) else n2 if np.isnan(n1) else n1
                for n1, n2, n3 in zip(df["確定数"], df["概数"], df["速報値"])]
df["日数"] = [days(year, month) for year, month in zip(df["年"], df["月"])]

for y in sorted(set(df["年"])):
    df1 = df[df["年"] == y]
    plt.plot(df1["月"], df1["死亡数"] / df1["日数"],
             alpha=0.5, marker=f"${y % 10}$", label=y)
plt.xlabel("月")
plt.ylabel("1日あたり死亡数")
plt.legend(loc=(0.38, 0.60), labelspacing=0.1)

for m in range(1, 13):
    df1 = df[df["月"] == m]
    plt.plot(df1["年"], df1["死亡数"] / df1["日数"], marker=f"${m}$")
plt.ylabel("1日あたり死亡数")
plt.xticks(sorted(set(df["年"] // 2 * 2).intersection(set(df["年"]))))

df1 = df.groupby("年")[["死亡数", "日数"]].sum()
df2 = df1[df1.index < 2023]
y = df2.index
x = df2["死亡数"] / df2["日数"]

plt.plot(y, x, "o-")
plt.xlabel("年")
plt.ylabel("1日あたり死亡数")
plt.xticks(sorted(set(y // 2 * 2).intersection(set(y))))

非常に大雑把に2020〜2022年の超過死亡数を求めてみよう。2011年は東日本大地震で死亡数が増えたと考えられるので，ベースラインを2012〜2019年に設定して，最小2乗法で直線をフィットし，そこからの外れを調べよう。

u = (2012 <= y) & (y <= 2019)
slope, intercept = np.polyfit(y[u], x[u], 1)
plt.clf()
plt.plot(y, x - (slope * y + intercept), "o-")
plt.axhline(linewidth=0.5, color="black")
plt.xlabel("年")
plt.ylabel("1日あたり超過死亡数")
plt.xticks(sorted(set(y // 2 * 2).intersection(set(y))))

厚労省オープンデータによれば，COVID-19による2020年12月31日までの累積死亡数は3459人，2021年12月31日までの累積死亡数は18385人，2022年12月31日までの累積死亡数は57266人である。これをオレンジ色で描き足す。

plt.plot([2020, 2021, 2022], [3459 / 366, (18385 - 3459) / 365, (57266 - 18385) / 365], "o-")

ちなみに月ごとの1日あたりのCOVID-19による死亡数は次の通りである。

月	1日あたりCOVID-19死亡数
2020-06	2.7
2020-07	1.2
2020-08	9.2
2020-09	9.2
2020-10	6.3
2020-11	12.4
2020-12	42.6
2021-01	72.9
2021-02	77.3
2021-03	41.1
2021-04	35.6
2021-05	90.9
2021-06	57.7
2021-07	13.2
2021-08	27.4
2021-09	53.6
2021-10	20.0
2021-11	3.1
2021-12	1.1
2022-01	12.9
2022-02	172.9
2022-03	144.0
2022-04	49.1
2022-05	33.8
2022-06	19.0
2022-07	42.1
2022-08	235.3
2022-09	164.1
2022-10	60.1
2022-11	99.5
2022-12	245.9

df = pd.read_csv("https://covid19.mhlw.go.jp/public/opendata/deaths_cumulative_daily.csv",
                 parse_dates=['Date'])
a = np.arange(np.datetime64("2020-06"), np.datetime64("2023-02"))
da = pd.to_datetime(a)
da1 = da - np.timedelta64(1,"D")
da1s = pd.Series(da1, name='Date')
df2 = pd.merge(da1s, df).diff()
x = df2.ALL / (df2.Date / np.timedelta64(1,"D"))
for i in range(1, len(x)):
    print(f"{a[i-1]},{x[i]:3.1f}")