This project is adapted from DQLab project. The COVID-19 pandemic began to spread in 2020 throughout the world, including Indonesia. The government has also started to form a special task force to deal with COVID-19 in Indonesia to inform the development of the spread of COVID-19. Therefore, the COVID-19 task force collects and provides data regarding COVID-19 that is visible to all. The data is presented in web form and visualization in the form of images on the covid19.go.id website. The website also provides all information regarding the development of COVID-19 in various provinces, including South Kalimantan.
library(httr)
set_config(config(ssl_verifypeer = 0L))
resp_kalsel <- GET("https://data.covid19.go.id/public/api/prov_detail_KALIMANTAN_SELATAN.json")
status_code(resp_kalsel)
## [1] 200
It shows [200], which means that our request is accepted and the content is ready to extract.
cov_kalsel_raw <- content(resp_kalsel, as="parsed", simplifyVector = TRUE)
names(cov_kalsel_raw)
## [1] "last_date" "provinsi" "kasus_total"
## [4] "kasus_tanpa_tgl" "kasus_dengan_tgl" "meninggal_persen"
## [7] "meninggal_tanpa_tgl" "meninggal_dengan_tgl" "sembuh_persen"
## [10] "sembuh_tanpa_tgl" "sembuh_dengan_tgl" "list_perkembangan"
## [13] "data"
#Cek total kasus, persentase meninggal dan sembuh
cov_kalsel_raw$kasus_total
## [1] 87563
cov_kalsel_raw$meninggal_persen
## [1] 2.95216
cov_kalsel_raw$sembuh_persen
## [1] 96.96447
We get the information that per October 24th, 2022, the total case is 87563 with death percentage is 2.95% and recovery percentage is 96.96%.
#Informasi lebih lengkap
cov_kalsel <- cov_kalsel_raw$list_perkembangan
str(cov_kalsel)
## 'data.frame': 915 obs. of 9 variables:
## $ tanggal : num 1.59e+12 1.59e+12 1.59e+12 1.59e+12 1.59e+12 ...
## $ KASUS : int 5 2 0 0 0 0 8 2 0 4 ...
## $ MENINGGAL : int 0 0 0 0 0 0 1 0 2 0 ...
## $ SEMBUH : int 0 0 0 0 0 0 0 0 0 1 ...
## $ DIRAWAT_OR_ISOLASI : int 5 2 0 0 0 0 7 2 -2 3 ...
## $ AKUMULASI_KASUS : int 5 7 7 7 7 7 15 17 17 21 ...
## $ AKUMULASI_SEMBUH : int 0 0 0 0 0 0 0 0 0 1 ...
## $ AKUMULASI_MENINGGAL : int 0 0 0 0 0 0 1 1 3 3 ...
## $ AKUMULASI_DIRAWAT_OR_ISOLASI: int 5 7 7 7 7 7 14 16 14 17 ...
head(cov_kalsel)
## tanggal KASUS MENINGGAL SEMBUH DIRAWAT_OR_ISOLASI AKUMULASI_KASUS
## 1 1.585526e+12 5 0 0 5 5
## 2 1.585613e+12 2 0 0 2 7
## 3 1.585699e+12 0 0 0 0 7
## 4 1.585786e+12 0 0 0 0 7
## 5 1.585872e+12 0 0 0 0 7
## 6 1.585958e+12 0 0 0 0 7
## AKUMULASI_SEMBUH AKUMULASI_MENINGGAL AKUMULASI_DIRAWAT_OR_ISOLASI
## 1 0 0 5
## 2 0 0 7
## 3 0 0 7
## 4 0 0 7
## 5 0 0 7
## 6 0 0 7
From the result, we have problems with the format of the date and inconsistency of column writing format. Then, we have to tidy up the data into better version.
We have several steps to tidy up the data: - Delete the “DIRAWAT_OR_ISOLASI” and “AKUMULASI_DIRAWAT_OR_ISOLASI” columns - Delete all columns that contain cumulative values - Rename the column “KASUS” to “kasus_baru” - Change the writing format of the “MENINGGAL” and “SEMBUH” columns to lowercase - Correct the data in “tanggal” column
library(dplyr)
new_cov_kalsel <-
cov_kalsel %>%
select(-contains("DIRAWAT_OR_ISOLASI")) %>%
select(-starts_with("AKUMULASI")) %>%
rename(
kasus_baru = KASUS,
meninggal = MENINGGAL,
sembuh = SEMBUH
) %>%
mutate(
tanggal = as.POSIXct(tanggal / 1000, origin = "1970-01-01"),
tanggal = as.Date(tanggal)
)
str(new_cov_kalsel)
## 'data.frame': 915 obs. of 4 variables:
## $ tanggal : Date, format: "2020-03-30" "2020-03-31" ...
## $ kasus_baru: int 5 2 0 0 0 0 8 2 0 4 ...
## $ meninggal : int 0 0 0 0 0 0 1 0 2 0 ...
## $ sembuh : int 0 0 0 0 0 0 0 0 0 1 ...
library("ggplot2")
#Grafik kasus harian positif
ggplot(new_cov_kalsel, aes(tanggal, kasus_baru)) +
geom_col(fill="salmon")+
labs(
x=NULL,
y="Total cases",
title="Daily Cases of COVID-19 in South Kalimantan",
caption="Data source: covid.19.go.id"
)+
theme(plot.title.position="plot")
From the graphic above, it can be concluded that so far the highest total cases are around July 2021 to August 2021 with the highest total cases reaching more than 800 people. Furthermore, total cases decrease in September 2021 and increase in February 2022 to March 2022 with the highest total cases above 750 people.
#Grafik kasus sembuh
ggplot(new_cov_kalsel, aes(tanggal,sembuh)) +
geom_col(fill = "olivedrab2") +
labs(
x = NULL,
y = "Total Cases",
title = "Daily Recovery of COVID-19 in South Kalimantan",
caption = "Data source: covid.19.go.id"
) +
theme(plot.title.position = "plot")
From the graphic above, it can be concluded that so far the highest total recovery are around July 2021 to August 2021 with the highest total recovery reaching more than 1125 people.
#Grafik kasus meninggal
ggplot(new_cov_kalsel, aes(tanggal, meninggal)) +
geom_col(fill = "darkslategray4") +
labs(
x = NULL,
y = "Total cases",
title = "Daily Death of COVID-19 in South Kalimantan",
caption = "Data source: covid.19.go.id"
) +
theme(plot.title.position = "plot")
From the graphic above, it can be concluded that so far the highest total death are around July 2021 to August 2021 with the highest total death reaching more than 45 people.
We will collect the data of weekly cases. We can use
lubridate
package to convert the daily data into weekly
data.
#Apakah pekan ini lebih baik?
library(lubridate)
cov_kalsel_pekanan <- new_cov_kalsel %>%
count(
tahun = year(tanggal),
pekan_ke = week(tanggal),
wt = kasus_baru,
name = "jumlah"
)
glimpse(cov_kalsel_pekanan)
## Rows: 133
## Columns: 3
## $ tahun <dbl> 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2…
## $ pekan_ke <dbl> 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 2…
## $ jumlah <int> 7, 10, 20, 60, 53, 62, 62, 208, 146, 339, 468, 685, 565, 463,…
Then, we can compare the data of this week with previous week.
cov_kalsel_pekanan <-
cov_kalsel_pekanan %>%
mutate(
jumlah_pekanlalu = dplyr::lag(jumlah, 1),
jumlah_pekanlalu = ifelse(is.na(jumlah_pekanlalu), 0, jumlah_pekanlalu),
lebih_baik = jumlah < jumlah_pekanlalu
)
glimpse(cov_kalsel_pekanan)
## Rows: 133
## Columns: 5
## $ tahun <dbl> 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020,…
## $ pekan_ke <dbl> 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 2…
## $ jumlah <int> 7, 10, 20, 60, 53, 62, 62, 208, 146, 339, 468, 685, 5…
## $ jumlah_pekanlalu <dbl> 0, 7, 10, 20, 60, 53, 62, 62, 208, 146, 339, 468, 685…
## $ lebih_baik <lgl> FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE…
From the weekly data, we can make the graph of weekly cases of COVID-19 in South Kalimantan for this year.
ggplot(cov_kalsel_pekanan[cov_kalsel_pekanan$tahun==2022,], aes(pekan_ke, jumlah, fill = lebih_baik)) + geom_col(show.legend = FALSE) +
scale_x_continuous(breaks = 1:52, expand = c(0, 0)) +
scale_fill_manual(values = c("TRUE" = "seagreen3", "FALSE" = "salmon")) +
labs(
x = NULL,
y = "Total Cases",
title = "Weekly Cases of COVID-19 in South Kalimantan",
subtitle = "Green columns show the new case is less than the previous week",
caption = "Data source: covid.19.go.id"
) +
theme(plot.title.position = "plot")
From the data we get, we can count the accumulation of each cases and see the newest accumumation.
cov_kalsel_akumulasi <-
new_cov_kalsel %>%
transmute(
tanggal,
akumulasi_aktif = cumsum(kasus_baru) - cumsum(sembuh) - cumsum(meninggal),
akumulasi_sembuh = cumsum(sembuh),
akumulasi_meninggal = cumsum(meninggal)
)
tail(cov_kalsel_akumulasi)
## tanggal akumulasi_aktif akumulasi_sembuh akumulasi_meninggal
## 910 2022-09-25 66 84903 2585
## 911 2022-09-26 67 84905 2585
## 912 2022-09-27 68 84905 2585
## 913 2022-09-28 72 84905 2585
## 914 2022-09-29 72 84905 2585
## 915 2022-09-30 73 84905 2585
Because the format of cov_kalsel_akumulasi
is wide, we
will change the data into long format so we can compare each cases.
#Transformasi data
library(dplyr)
library(tidyr)
dim(cov_kalsel_akumulasi)
## [1] 915 4
cov_kalsel_akumulasi_pivot <-
cov_kalsel_akumulasi %>%
gather(
key = "kategori",
value = "jumlah",
-tanggal
) %>%
mutate(
kategori = sub(pattern = "akumulasi_", replacement = "", kategori)
)
dim(cov_kalsel_akumulasi_pivot)
## [1] 2745 3
glimpse(cov_kalsel_akumulasi_pivot)
## Rows: 2,745
## Columns: 3
## $ tanggal <date> 2020-03-30, 2020-03-31, 2020-04-01, 2020-04-02, 2020-04-03, …
## $ kategori <chr> "aktif", "aktif", "aktif", "aktif", "aktif", "aktif", "aktif"…
## $ jumlah <int> 5, 7, 7, 7, 7, 7, 14, 16, 14, 17, 14, 21, 20, 25, 25, 26, 37,…
ggplot(cov_kalsel_akumulasi_pivot, aes(tanggal, jumlah, colour=(kategori))) +
geom_line(size=0.9)+
scale_y_continuous(sec.axis=dup_axis(name=NULL))+
scale_colour_manual(
values=c(
"aktif"="salmon",
"meninggal"="darkslategray4",
"sembuh"="olivedrab2"
),
labels=c("Active", "Death", "Recover")
)+
labs(
x=NULL,
y="Cases accumulation",
colour=NULL,
title="Case Dynamics of COVID-19 in South Kalimantan",
caption="Data source: covid.19.go.id"
)+
theme(
plot.title=element_text(hjust=0.5),
legend.position="top")
From the graph, we can conclude that the recovery cases are increasing each time. For active cases, there are two peaks of highest case accumulation and for death cases, there are no significant increase.