COVID-19 Analysis in South Kalimantan using R

Overview

This project is adapted from DQLab project. The COVID-19 pandemic began to spread in 2020 throughout the world, including Indonesia. The government has also started to form a special task force to deal with COVID-19 in Indonesia to inform the development of the spread of COVID-19. Therefore, the COVID-19 task force collects and provides data regarding COVID-19 that is visible to all. The data is presented in web form and visualization in the form of images on the covid19.go.id website. The website also provides all information regarding the development of COVID-19 in various provinces, including South Kalimantan.

Project

Accessing API and Status Code

library(httr)
set_config(config(ssl_verifypeer = 0L))
resp_kalsel <- GET("https://data.covid19.go.id/public/api/prov_detail_KALIMANTAN_SELATAN.json")
status_code(resp_kalsel)
## [1] 200

It shows [200], which means that our request is accepted and the content is ready to extract.

Extract Data

cov_kalsel_raw <- content(resp_kalsel, as="parsed", simplifyVector = TRUE)
names(cov_kalsel_raw)
##  [1] "last_date"            "provinsi"             "kasus_total"         
##  [4] "kasus_tanpa_tgl"      "kasus_dengan_tgl"     "meninggal_persen"    
##  [7] "meninggal_tanpa_tgl"  "meninggal_dengan_tgl" "sembuh_persen"       
## [10] "sembuh_tanpa_tgl"     "sembuh_dengan_tgl"    "list_perkembangan"   
## [13] "data"

Check Total Case, Death and Recovery Percentage

#Cek total kasus, persentase meninggal dan sembuh
cov_kalsel_raw$kasus_total
## [1] 87563
cov_kalsel_raw$meninggal_persen
## [1] 2.95216
cov_kalsel_raw$sembuh_persen
## [1] 96.96447

We get the information that per October 24th, 2022, the total case is 87563 with death percentage is 2.95% and recovery percentage is 96.96%.

More Information

#Informasi lebih lengkap
cov_kalsel <- cov_kalsel_raw$list_perkembangan
str(cov_kalsel)
## 'data.frame':    915 obs. of  9 variables:
##  $ tanggal                     : num  1.59e+12 1.59e+12 1.59e+12 1.59e+12 1.59e+12 ...
##  $ KASUS                       : int  5 2 0 0 0 0 8 2 0 4 ...
##  $ MENINGGAL                   : int  0 0 0 0 0 0 1 0 2 0 ...
##  $ SEMBUH                      : int  0 0 0 0 0 0 0 0 0 1 ...
##  $ DIRAWAT_OR_ISOLASI          : int  5 2 0 0 0 0 7 2 -2 3 ...
##  $ AKUMULASI_KASUS             : int  5 7 7 7 7 7 15 17 17 21 ...
##  $ AKUMULASI_SEMBUH            : int  0 0 0 0 0 0 0 0 0 1 ...
##  $ AKUMULASI_MENINGGAL         : int  0 0 0 0 0 0 1 1 3 3 ...
##  $ AKUMULASI_DIRAWAT_OR_ISOLASI: int  5 7 7 7 7 7 14 16 14 17 ...
head(cov_kalsel)
##        tanggal KASUS MENINGGAL SEMBUH DIRAWAT_OR_ISOLASI AKUMULASI_KASUS
## 1 1.585526e+12     5         0      0                  5               5
## 2 1.585613e+12     2         0      0                  2               7
## 3 1.585699e+12     0         0      0                  0               7
## 4 1.585786e+12     0         0      0                  0               7
## 5 1.585872e+12     0         0      0                  0               7
## 6 1.585958e+12     0         0      0                  0               7
##   AKUMULASI_SEMBUH AKUMULASI_MENINGGAL AKUMULASI_DIRAWAT_OR_ISOLASI
## 1                0                   0                            5
## 2                0                   0                            7
## 3                0                   0                            7
## 4                0                   0                            7
## 5                0                   0                            7
## 6                0                   0                            7

From the result, we have problems with the format of the date and inconsistency of column writing format. Then, we have to tidy up the data into better version.

Tidy up data

We have several steps to tidy up the data: - Delete the “DIRAWAT_OR_ISOLASI” and “AKUMULASI_DIRAWAT_OR_ISOLASI” columns - Delete all columns that contain cumulative values - Rename the column “KASUS” to “kasus_baru” - Change the writing format of the “MENINGGAL” and “SEMBUH” columns to lowercase - Correct the data in “tanggal” column

library(dplyr)
new_cov_kalsel <-
  cov_kalsel %>% 
  select(-contains("DIRAWAT_OR_ISOLASI")) %>% 
  select(-starts_with("AKUMULASI")) %>% 
  rename(
    kasus_baru = KASUS,
    meninggal = MENINGGAL,
    sembuh = SEMBUH
  ) %>% 
  mutate(
    tanggal = as.POSIXct(tanggal / 1000, origin = "1970-01-01"),
    tanggal = as.Date(tanggal)
  )
str(new_cov_kalsel)  
## 'data.frame':    915 obs. of  4 variables:
##  $ tanggal   : Date, format: "2020-03-30" "2020-03-31" ...
##  $ kasus_baru: int  5 2 0 0 0 0 8 2 0 4 ...
##  $ meninggal : int  0 0 0 0 0 0 1 0 2 0 ...
##  $ sembuh    : int  0 0 0 0 0 0 0 0 0 1 ...

Visualization

Daily Cases

library("ggplot2")

#Grafik kasus harian positif
ggplot(new_cov_kalsel, aes(tanggal, kasus_baru)) +
  geom_col(fill="salmon")+
  labs(
    x=NULL,
    y="Total cases",
    title="Daily Cases of COVID-19 in South Kalimantan",
    caption="Data source: covid.19.go.id"
  )+
  theme(plot.title.position="plot")

From the graphic above, it can be concluded that so far the highest total cases are around July 2021 to August 2021 with the highest total cases reaching more than 800 people. Furthermore, total cases decrease in September 2021 and increase in February 2022 to March 2022 with the highest total cases above 750 people.

Daily Recovery

#Grafik kasus sembuh
ggplot(new_cov_kalsel, aes(tanggal,sembuh)) +
  geom_col(fill = "olivedrab2") +
  labs(
    x = NULL,
    y = "Total Cases",
    title = "Daily Recovery of COVID-19 in South Kalimantan",
    caption = "Data source: covid.19.go.id"
  ) +
  theme(plot.title.position = "plot")

From the graphic above, it can be concluded that so far the highest total recovery are around July 2021 to August 2021 with the highest total recovery reaching more than 1125 people.

Daily Death

#Grafik kasus meninggal
ggplot(new_cov_kalsel, aes(tanggal, meninggal)) +
  geom_col(fill = "darkslategray4") +
  labs(
    x = NULL,
    y = "Total cases",
    title = "Daily Death of COVID-19 in South Kalimantan",
    caption = "Data source: covid.19.go.id"
  ) +
  theme(plot.title.position = "plot")

From the graphic above, it can be concluded that so far the highest total death are around July 2021 to August 2021 with the highest total death reaching more than 45 people.

Weekly Cases

We will collect the data of weekly cases. We can use lubridate package to convert the daily data into weekly data.

#Apakah pekan ini lebih baik?
library(lubridate)

cov_kalsel_pekanan <- new_cov_kalsel %>% 
  count(
    tahun = year(tanggal),
    pekan_ke = week(tanggal),
    wt = kasus_baru,
    name = "jumlah"
  )
  
glimpse(cov_kalsel_pekanan)
## Rows: 133
## Columns: 3
## $ tahun    <dbl> 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2…
## $ pekan_ke <dbl> 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 2…
## $ jumlah   <int> 7, 10, 20, 60, 53, 62, 62, 208, 146, 339, 468, 685, 565, 463,…

Then, we can compare the data of this week with previous week.

cov_kalsel_pekanan <-
  cov_kalsel_pekanan %>% 
  mutate(
    jumlah_pekanlalu = dplyr::lag(jumlah, 1),
    jumlah_pekanlalu = ifelse(is.na(jumlah_pekanlalu), 0, jumlah_pekanlalu),
    lebih_baik = jumlah < jumlah_pekanlalu
  )
glimpse(cov_kalsel_pekanan)
## Rows: 133
## Columns: 5
## $ tahun            <dbl> 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020,…
## $ pekan_ke         <dbl> 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 2…
## $ jumlah           <int> 7, 10, 20, 60, 53, 62, 62, 208, 146, 339, 468, 685, 5…
## $ jumlah_pekanlalu <dbl> 0, 7, 10, 20, 60, 53, 62, 62, 208, 146, 339, 468, 685…
## $ lebih_baik       <lgl> FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE…

Weekly Cases Graph

From the weekly data, we can make the graph of weekly cases of COVID-19 in South Kalimantan for this year.

ggplot(cov_kalsel_pekanan[cov_kalsel_pekanan$tahun==2022,], aes(pekan_ke, jumlah, fill = lebih_baik)) + geom_col(show.legend = FALSE) + 
  scale_x_continuous(breaks = 1:52, expand = c(0, 0)) +
  scale_fill_manual(values = c("TRUE" = "seagreen3", "FALSE" = "salmon")) +
  labs(
    x = NULL,
    y = "Total Cases",
    title = "Weekly Cases of COVID-19 in South Kalimantan",
    subtitle = "Green columns show the new case is less than the previous week",
    caption = "Data source: covid.19.go.id"
  ) +
  theme(plot.title.position = "plot")

Case Accumulation

From the data we get, we can count the accumulation of each cases and see the newest accumumation.

cov_kalsel_akumulasi <- 
  new_cov_kalsel %>% 
  transmute(
    tanggal,
    akumulasi_aktif = cumsum(kasus_baru) - cumsum(sembuh) - cumsum(meninggal),
    akumulasi_sembuh = cumsum(sembuh),
    akumulasi_meninggal = cumsum(meninggal)
  )

tail(cov_kalsel_akumulasi)
##        tanggal akumulasi_aktif akumulasi_sembuh akumulasi_meninggal
## 910 2022-09-25              66            84903                2585
## 911 2022-09-26              67            84905                2585
## 912 2022-09-27              68            84905                2585
## 913 2022-09-28              72            84905                2585
## 914 2022-09-29              72            84905                2585
## 915 2022-09-30              73            84905                2585

Data Transformation

Because the format of cov_kalsel_akumulasi is wide, we will change the data into long format so we can compare each cases.

#Transformasi data
library(dplyr)
library(tidyr)

dim(cov_kalsel_akumulasi)
## [1] 915   4
cov_kalsel_akumulasi_pivot <- 
  cov_kalsel_akumulasi %>% 
  gather(
    key = "kategori",
    value = "jumlah",
    -tanggal
  ) %>% 
  mutate(
    kategori = sub(pattern = "akumulasi_", replacement = "", kategori)
  )

dim(cov_kalsel_akumulasi_pivot)
## [1] 2745    3
glimpse(cov_kalsel_akumulasi_pivot)
## Rows: 2,745
## Columns: 3
## $ tanggal  <date> 2020-03-30, 2020-03-31, 2020-04-01, 2020-04-02, 2020-04-03, …
## $ kategori <chr> "aktif", "aktif", "aktif", "aktif", "aktif", "aktif", "aktif"…
## $ jumlah   <int> 5, 7, 7, 7, 7, 7, 14, 16, 14, 17, 14, 21, 20, 25, 25, 26, 37,…

Cases Comparation

ggplot(cov_kalsel_akumulasi_pivot, aes(tanggal, jumlah, colour=(kategori))) + 
  geom_line(size=0.9)+
  scale_y_continuous(sec.axis=dup_axis(name=NULL))+
  scale_colour_manual(
    values=c(
      "aktif"="salmon",
      "meninggal"="darkslategray4",
      "sembuh"="olivedrab2"
    ),
    labels=c("Active", "Death", "Recover")
  )+
  labs(
    x=NULL,
    y="Cases accumulation",
    colour=NULL,
    title="Case Dynamics of COVID-19 in South Kalimantan",
    caption="Data source: covid.19.go.id"
  )+
  theme(
    plot.title=element_text(hjust=0.5),
    legend.position="top")

From the graph, we can conclude that the recovery cases are increasing each time. For active cases, there are two peaks of highest case accumulation and for death cases, there are no significant increase.