[Stata] covid19 현황 그리기

티스토리 뷰

STATA

[Stata] covid19 현황 그리기

비조 2020. 3. 28. 21:28

Stata의 Chuck Huber가 만들어 놓은 존스홉킨스대학의 깃헙 자료를 읽어들여 자료를 정리하고,
최근 covid19 관련 확진자를 시각화하는 do 파일을 작성하였습니다.
먼저, Chuck 의 파일을 이용해 Github 자료를 불러들입시다.

local URL = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/"
forvalues month = 1/12 {
   forvalues day = 1/31 {
      local month = string(`month', "%02.0f")
      local day = string(`day', "%02.0f")
      local year = "2020"
      local today = "`month'-`day'-`year'"
      local FileName = "`URL'`today'.csv"
      clear
      capture import delimited "`FileName'"
capture confirm variable ïprovincestate
      if _rc == 0 {
         rename ïprovincestate provincestate
         label variable provincestate "Province/State"
      }
      capture rename province_state provincestate
      capture rename country_region countryregion
      capture rename last_update lastupdate
      capture rename lat latitude
      capture rename long longitude
      generate tempdate = "`today'"
      capture save "`today'", replace
   }
}
clear
forvalues month = 1/12 {
   forvalues day = 1/31 {
      local month = string(`month', "%02.0f")
      local day = string(`day', "%02.0f")
      local year = "2020"
      local today = "`month'-`day'-`year'"
      capture append using "`today'"
   }
}

save raw, replace

이렇게 불러들인 것에 몇 가지 문제가 있었습니다.

날짜 형식이 그날 그날 다른 때가 있어서 Chuck의 경우에는 tempdate라는 변수를 새로 만들어 주었습니다. 저의 이전 버젼에서는 그걸 처리하는 명령어들을 몇 줄 짰었는데, 지금은 그럴 필요가 없습니다.
국가명이 날짜에 따라서 다르게 리포트 되는 경우가 있습니다. 우리나라의 경우에는 "South Korea", "Republic of Korea", "Korea, South" 등으로 다르게 표기되어 있습니다. 그래서 아래와 같이 이름을 정리하였습니다.
같은 날에도 지역별로 여러 번 보고되고 있습니다. 따라서 이런 경우에는 국가별로 합쳐야 합니다. 여기에서는 collapse를 이용했습니다.

위의 사항들을 반영하여 아래와 같은 do 파일을 실행하였습니다.

generate date = date(tempdate, "MDY")
format date %tdNN/DD/CCYY

replace countryregion = "China" if countryregion=="Mainland China"
replace countryregion = "South Korea" if countryregion == "Korea, South"
replace countryregion = "South Korea" if countryregion == "Republic of Korea"

collapse (sum) confirmed deaths recovered, by(date countryregion)
encode countryregion, gen(country)
tsset country date, daily

save covid19_long, replace

이렇게 국가별 패널 자료를 만들었습니다.

한가지 더 추가할 것이 있는데, 그냥 날짜가 아니라 국가별로 기준을 맞추기 위해 확진자가 100이 된 시점을 기점으로 하기 위해서 다음과 같은 명령어들을 추가하였습니다.

use covid19_long, clear

egen id = group(country)
order id country date

xtset id date

bysort id: drop if confirmed < 100
bysort id: gen date_100 = _n
bysort id: gen country_lab = countryregion if date_100 == _N

요 윗 줄의 country_lab 같은 경우는 아래에 그림 그릴 때, 시계열의 마지막에 국가 이름을 붙일려고 만들어 놓은 변수입니다. 나중에 레이블링할 때 매우 유용합디다.

시계열은 수준 변수로 표시하기도 하지만, 보통은 시계열 특성을 보고 로그-변환하여 많이 사용합니다. 특히, 지금처럼 확진자수가 국가별로 수준차이가 많이 날 때에는 로그-변환하면 한 눈에 볼 수 있어 유용합니다. 로그-변환된 시계열의 기울기는 증가율을 의미하기 때문에 증가 수준이 어떤지 살펴볼 때에도 유용합니다. 시계열은 몇 나라만 뽑아서 아래와 같이 그렸습니다.

gen ln_confirmed = ln(confirmed)
label var ln_confirmed "the logarithm of confirmed"

local y = ln(100)
local opt = "mlabel(country_lab) msize(tiny)  lcolor(%15) mcolor(%20) mlabsize(tiny)"

twoway ///
    (connected ln_confirmed date_100 if countryregion == "South Korea", sort mlabel(country_lab) msize(tiny)  mcolor(blue) mlabcolor(blue) lcolor(blue)) ///
    (connected ln_confirmed date_100 if countryregion == "Japan", sort `opt') ///
    (connected ln_confirmed date_100 if countryregion == "China", sort `opt') ///
    (connected ln_confirmed date_100 if countryregion == "United Kingdom", sort `opt') ///
    (connected ln_confirmed date_100 if countryregion == "Italy", sort  `opt') ///
    (connected ln_confirmed date_100 if countryregion == "Spain", sort  `opt') ///
    (connected ln_confirmed date_100 if countryregion == "Switzerland", sort  `opt') ///
    (connected ln_confirmed date_100 if countryregion == "US", sort  `opt'), ///
    legend(off) yline(`y', lpattern(dash) ) ylab(`y' "100th case" 3(3)12,  labsize(small)) ///
    xlab(0(5)60) ///
    xtitle("days since 100th confirmed case") ytitle("no. of confirmed (log-scale)") ///
    note("Source: JHU Github") ///
    scheme(s1color)

그러면 아래와 같은 그래프를 얻을 수 있습니다. 우리나라만 진하게 그리고, 나머지 국가들은 투명도를 조절하여 흐리게 그렸습니다.

이번에는 수준변수 대신에 증가율을 표시해보았습니다. 증가율을 직접 계산하는 방법도 있지만, 계량경제학자들은 로그-차분하는 방식을 많이 사용하고 있습니다. 저도 그렇게 하였습니다. 다만, 퍼센트로 표시하기 위해 100을 곱하였습니다.

일본과 비교해보기 위해서 일본도 진한색으로 표시하였습니다. 한국은 파란색, 일본은 핑크색으로 표시하였는데 큰 의미는 없었다는 것을 사전에 밝힙니다.

gen d_ln_confirmed = D.ln_confirmed*100
label var d_ln_confirmed "log-difference of the confirmed"

local y = "d_ln_confirmed"
local opt = "mlabel(country_lab) msize(tiny)  lcolor(%15) mcolor(%20) mlabsize(tiny)"

twoway ///
    (connected `y' date_100 if countryregion == "South Korea", sort mlabel(country_lab) mlabcolor(blue) msize(tiny)  mcolor(blue) lcolor(blue) mlabp(6)) ///
    (connected `y'  date_100 if countryregion == "Japan", sort  mlabel(country_lab) mlabcolor(red) msize(tiny)  mcolor(red%50) lcolor(red%50) mlabp(1)) ///
    (connected `y'  date_100 if countryregion == "China", sort `opt') ///
    (connected `y'  date_100 if countryregion == "United Kingdom", sort `opt') ///
    (connected `y'  date_100 if countryregion == "Italy", sort  `opt') ///
    (connected `y'  date_100 if countryregion == "Spain", sort  `opt') ///
    (connected `y'  date_100 if countryregion == "Switzerland", sort  `opt') ///
    (connected `y'  date_100 if countryregion == "US", sort  `opt'), ///
    legend(off) xlab(0(10)70) ///
    xtitle("days since 100th confirmed case") ytitle("growth rate (%)") ///
    note("Source: JHU Github") ///
    scheme(s1color)

그러면 아래와 같은 그림을 얻을 수 있습니다. 그림을 살펴보니, 난리난 나라들도 많이 보입니다. 한 가지 흥미로운 사실은 대부분 국가들이 수준은 높지만 점점 증가율이 낮아지고 있는 추세인데, 일본만 열심히 달리기 시작하였습니다.

저작자표시 비영리 변경금지 (새창열림)

'STATA' 카테고리의 다른 글

[Stata] 우리나라 covid-19 확진자 현황 (1)	2020.04.03
[Stata] 여러 데이터를 동시에 사용할 수 있는 frame 사용하기 (1)	2019.12.12
[Stata] 회귀분석 결과 저장하기 regsave (0)	2019.12.08
[Stata] 숫자 변수명으로 된 것을 long-form으로 전환하기 (0)	2019.11.15
Stata 16의 do-file editor의 자동완성 기능 (0)	2019.07.05

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

글 보관함

경제 빅데이터 저장소

티스토리 뷰

[Stata] covid19 현황 그리기

'STATA' 카테고리의 다른 글

티스토리툴바