Exploratory Data Analysis of Turkey Earthquages II

The main purpose of the study is to analysis Turkey earthquake data set obtained from Boğaziçi University KOERI Regional Earthquage-Tsunami Monitoring Center online database using data mining techniques. In this way, it is targeted to be created awareness.

Data set includes earthquakes with magnitude between 3.0 and 9.0. Number of observations is 50000, and number of variables is 15. Earthquage data set is consisted of time series from year 1979 to date 2019-10-29. The definitions of the variables in the data set are given below (Boğaziçi University KOERI Regional Earthquage-Tsunami Monitoring Center).

  1. No: Event Sequence.
  2. Event ID: Unic ID for event [YYYYMMDDHHMMSS (YearMonthDayHourMinuteSecond)
  3. Date: Date of event specified in the following format YYYY.MM.DD (Year.Month.Day).
  4. Origin Time: Origin time of event (UTC) specified in the following format HH:MM:SS.MS (Hour:Minute:Second.Millisecond).
  5. Latitude: in decimal degrees.
  6. Longitude: in decimal degrees.
  7. Depth(km): Depth of the event in kilometers.
  8. xM: Biggest magnitude value in specified magnitude values (MD, ML, Mw, Ms and Mb).
  9. MD ML Mw Ms Mb Type: Magnitude types (MD: Duration, ML: Local, Mw: Moment, Ms: Surface wave, Mb: Body-wave). 0.0 (zero) means no calculation for that type of magnitude.
  10. Location: Nearest settlement.

Exploratory analysis of the earthquake data set is given step by step in the next sections. R programming language is used in the analysis.

Loading Libraries

library(readr)
library(tibble)
library(tidyr)
library(dplyr)
library(lubridate)
library(formattable)
library(ggplot2)
library(ggpubr)
library(formattable)
library(GGally)
library(ggrepel)
library(tidyverse)
library(leaflet)
library(sf)
library(widgetframe)

Loading Data Set

df <- read_delim("data_bogazici.txt", 
    "\t", escape_double = FALSE, trim_ws = TRUE)
df<-as_tibble(df)

Classification by Year, Month, Day, Hour, Minute, and Second

df1<-df[,-c(1,2, 14)]
str(df1)
depth<-tibble(Depth= as.numeric(df1$`Der(km)`))
location<-as_tibble(df1[, 12])
year<-tibble(Year=as.integer(substring(df1$`Olus tarihi`,1,4)))
month<-tibble(Month=as.integer(substring(df1$`Olus tarihi`,6,7)))
day<-tibble(Day=as.integer(substring(df1$`Olus tarihi`,9,10)))
hour<-tibble(Hour=as.integer(hour(df1$`Olus zamani`)))
minute<-tibble(Minute=as.integer(minute(df1$`Olus zamani`)))
second<-tibble(Second=as.integer(second(df1$`Olus zamani`)))
df2<-cbind(year, month, day, hour, minute, second, Latitude= as_tibble(df1[,3]),Longitude=as_tibble(df1[,4]), Depth= depth, Magnitute= as_tibble(df1[,6]), Location=location)
head(df2)
df2<-df2 %>% rename(Latitude = Enlem, Longitude=Boylam, Location=Yer, Magnitude=xM)
#Adding column that categorize magnitudes of earthquages
df2<-mutate(df2, Magnitude_Class=cut(df2$Magnitude, breaks=c(2.9, 4, 5, 6, 7, 8), labels=c("3-4", "4-5", "5-6", "6-7", "7-8")))
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	50000 obs. of  12 variables:
 $ Olus tarihi: chr  "2019.10.29" "2019.10.29" "2019.10.29" "2019.10.27" ...
 $ Olus zamani: 'hms' num  20:48:53 15:38:41 06:36:14 10:18:46 ...
  ..- attr(*, "units")= chr "secs"
 $ Enlem      : num  38.2 40.7 40.7 40.9 39.7 ...
 $ Boylam     : num  42.9 27.4 32.9 28.2 26.4 ...
 $ Der(km)    : chr  "005.0" "003.4" "007.4" "010.8" ...
 $ xM         : num  3 3.3 3.8 3.5 3.2 3 3.5 3.5 3 3.5 ...
 $ MD         : num  0 0 0 0 0 0 0 0 0 0 ...
 $ ML         : num  3 3.3 3.7 3.5 3.1 3 3.5 3.5 3 3.5 ...
 $ Mw         : num  2.9 3.1 3.8 3.3 3.2 2.8 3.4 3.4 2.9 3.4 ...
 $ Ms         : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Mb         : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Yer        : chr  "UNLUCE-BAHCESARAY (VAN) [North East  7.8 km]" "GUZELKOY ACIKLARI-TEKIRDAG (MARMARA DENIZI)" "HACILAR-CERKES (CANKIRI) [North West  4.1 km]" "SILIVRI ACIKLARI-ISTANBUL (MARMARA DENIZI)" ...
 - attr(*, "problems")=Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	2 obs. of  5 variables:
  ..$ row     : int  41487 43525
  ..$ col     : chr  "MD" "Olus zamani"
  ..$ expected: chr  "no trailing characters" "valid date"
  ..$ actual  : chr  "R" "10:03:73.00"
  ..$ file    : chr  "'data_bogazici.txt'" "'data_bogazici.txt'"
 - attr(*, "spec")=
  .. cols(
  ..   No = col_character(),
  ..   `Deprem Kodu` = col_double(),
  ..   `Olus tarihi` = col_character(),
  ..   `Olus zamani` = col_time(format = ""),
  ..   Enlem = col_double(),
  ..   Boylam = col_double(),
  ..   `Der(km)` = col_character(),
  ..   xM = col_double(),
  ..   MD = col_double(),
  ..   ML = col_double(),
  ..   Mw = col_double(),
  ..   Ms = col_double(),
  ..   Mb = col_double(),
  ..   Tip = col_character(),
  ..   Yer = col_character()
  .. )
  Year Month Day Hour Minute Second   Enlem  Boylam Depth  xM
1 2019    10  29   20     48     53 38.1520 42.9158   5.0 3.0
2 2019    10  29   15     38     41 40.7248 27.3940   3.4 3.3
3 2019    10  29    6     36     14 40.7342 32.9457   7.4 3.8
4 2019    10  27   10     18     46 40.8810 28.2057  10.8 3.5
5 2019    10  27    9     17     31 39.6660 26.3607   6.2 3.2
6 2019    10  27    8     18     53 40.8760 28.2063   6.1 3.0
                                                 Yer
1       UNLUCE-BAHCESARAY (VAN) [North East  7.8 km]
2        GUZELKOY ACIKLARI-TEKIRDAG (MARMARA DENIZI)
3      HACILAR-CERKES (CANKIRI) [North West  4.1 km]
4         SILIVRI ACIKLARI-ISTANBUL (MARMARA DENIZI)
5 CAKMAKLAR-AYVACIK (CANAKKALE) [South East  1.4 km]
6         SILIVRI ACIKLARI-ISTANBUL (MARMARA DENIZI)

Density of earthquages

df2%>%ggplot(aes(Year, Magnitude))+
geom_point(size=1, col="red")+
   ggtitle("Density of Earthquakes by Years") +
           xlab("Year") + ylab("Magnitude")+
   scale_x_continuous(breaks=seq(min(df2$Year),max(df2$Year), 4)) +
   labs(caption = "Data Source: Boğaziçi University KOERI Regional 
        Earthquage-Tsunami Monitoring Center")+
   theme(plot.title = element_text(family = "Trebuchet MS", face="bold", 
         size=14, hjust=0.5)) +
   theme(axis.title = element_text(family = "Trebuchet MS", face="bold", 
         size=12))+
   geom_hline(yintercept=mean(df2$Magnitude), linetype="twodash", color = 
              "green", size=1)+
   geom_hline(yintercept=4, linetype="twodash", color = "blue", size=1)+
   geom_hline(yintercept=5, linetype="twodash", color = "blue", size=1)+
   geom_hline(yintercept=6, linetype="twodash", color = "blue", size=1)+
   geom_hline(yintercept=7, linetype="twodash", color = "blue", size=1)

Number of earthquakes by years

year<-df2 %>% group_by(Year) %>% tally()
formattable (year)
year%>%ggplot(aes(Year, n))+
geom_line(size=1, col="red")+
  scale_x_continuous(breaks=seq(min(year$Year),max(year$Year), 10))+
   ggtitle("Number of Earthquakes by Years") +
           xlab("Year") + ylab("Number of Cases")+
  labs(caption = "Data Source: Boğaziçi University KOERI Regional Earthquage-Tsunami Monitoring Center")+
  theme(plot.title = element_text(family = "Trebuchet MS", face="bold", size=14, hjust=0.5)) +
theme(axis.title = element_text(family = "Trebuchet MS", face="bold", size=12))+
 geom_hline(yintercept=mean(year$n), linetype="twodash", color = "green", size=1)

Density of earthquakes by years

df2%>%ggplot(aes(Year, Magnitude, col=Magnitude_Class))+
geom_point(size=1)+
  geom_jitter()+
  facet_grid(Magnitude_Class~., scale="free")+
   ggtitle("Density of Earthquakes of by Years") +
           xlab("Year") + ylab("Magnitude")+
  labs(caption = "Data Source: Boğaziçi University KOERI Regional Earthquage-Tsunami Monitoring Center")+
  theme(plot.title = element_text(family = "Trebuchet MS", face="bold", size=12, hjust=0.5)) +
theme(axis.title = element_text(family = "Trebuchet MS", face="bold", size=10))

Number of Earthquakes by Categories

year<-df2 %>% group_by(Year, Magnitude_Class) %>% tally()
formattable (year)
year%>%ggplot(aes(Year, n))+
geom_point(size=1, col="red")+
  facet_wrap(~Magnitude_Class,  ncol=2, scales="free")+
   ggtitle("Number of Earthquakes by Categories") +
           xlab("Year") + ylab("Number of Cases")+
  labs(caption = "Data Source: Boğaziçi University KOERI Regional Earthquage-Tsunami Monitoring Center")+
  theme(plot.title = element_text(family = "Trebuchet MS", face="bold", size=14, hjust=0.5)) +
theme(axis.title = element_text(family = "Trebuchet MS", face="bold", size=12))

The number of categories of earthquakes

year<-df2 %>% group_by(Magnitude_Class) %>% tally()
formattable (year)
year%>%ggplot(aes(Magnitude_Class, n))+
geom_point(size=1, col="red")+
  facet_grid(~Magnitude_Class)+
   ggtitle("Number of Earthquakes by Categories") +
           xlab("Categories of Earthquakes") + ylab("Number of Cases")+
  labs(caption = "Data Source: Boğaziçi University KOERI Regional Earthquage-Tsunami Monitoring Center")+
  theme(plot.title = element_text(family = "Trebuchet MS", face="bold", size=14, hjust=0.5)) +
theme(axis.title = element_text(family = "Trebuchet MS", face="bold", size=12))+
  geom_text_repel(aes(label=n), size=3, data=year) + theme(legend.position = "None")

Number of Cases by Months

month<-df2 %>% group_by(Month) %>% tally()
month<-month %>% select(Month, n)%>%
  arrange(desc(n))
month
month %>% ggplot(aes(Month, n))+
geom_line(size=1, col="brown")+
  scale_x_continuous(breaks=seq(1, 12, 1))+
   ggtitle("Number of Cases by Months") +
           xlab("Month") + ylab("Number of Cases")+
  labs(caption = "Source: Data Source: Boğaziçi University KOERI Regional Earthquage-Tsunami Monitoring Center")+
  theme(plot.title = element_text(family = "Trebuchet MS", face="bold", size=14, hjust=0.5)) +
theme(axis.title = element_text(family = "Trebuchet MS", face="bold", size=12))+
  geom_hline(yintercept=mean(month$n), linetype="twodash", color = "red", size=1)

Number of Cases by Months and Categories

f2<-mutate(df2, Magnitude_Class=cut(df2$Magnitude, breaks=c(2.9, 5, 6, 8), labels=c("3-5", "5-6", "6-8")))
m<-df2 %>% group_by(Month, Magnitude_Class) %>% tally()
formattable (m)
m %>% ggplot(aes(Month, n))+
geom_point(size=1, col="red")+
   ggtitle("Number of Earthquakes by Categories") +
  scale_x_continuous(breaks=seq(1, 12, 1))+
           xlab("Month") + ylab("Number of Cases")+
  labs(caption = "Data Source: Boğaziçi University KOERI Regional Earthquage-Tsunami Monitoring Center")+
  theme(plot.title = element_text(family = "Trebuchet MS", face="bold", size=14, hjust=0.5)) +
theme(axis.title = element_text(family = "Trebuchet MS", face="bold", size=12))+
  geom_text_repel(aes(label=n), size=3, data=m) + theme(legend.position = "None")+
   facet_grid(Magnitude_Class~.)

Number of Earthquakes by Hour

hour<-df2 %>% group_by(Hour) %>% tally()
hour<-hour %>% select(Hour, n)%>%
      arrange(desc(n))
hour %>% ggplot(aes(Hour, n))+
geom_line(size=1, col="red")+
  scale_x_continuous(breaks=seq(0, 24, 2))+
   ggtitle("Number of Cases by Hour") +
           xlab("Time") + ylab("Number of Cases")+
  labs(caption = "Data Source: Boğaziçi University KOERI Regional Earthquage-Tsunami Monitoring Center")+
  theme(plot.title = element_text(family = "Trebuchet MS", face="bold", size=14, hjust=0.5)) +
theme(axis.title = element_text(family = "Trebuchet MS", face="bold", size=12))+
  geom_hline(yintercept=mean(hour$n), linetype="twodash", color = "blue", size=1)

Number of Earthquakes by Categories and Hour

h<-df2 %>% group_by(Hour, Magnitude_Class) %>% tally()
h%>%ggplot(aes(Hour, n))+
geom_point(size=1, col="red")+
  facet_wrap(~Magnitude_Class,  ncol=2, scales="free")+
   ggtitle("Number of Earthquakes by Categories") +
           xlab("Time") + ylab("Number of Cases")+
  labs(caption = "Data Source: Boğaziçi University KOERI Regional Earthquage-Tsunami Monitoring Center")+
  theme(plot.title = element_text(family = "Trebuchet MS", face="bold", size=14, hjust=0.5)) +
theme(axis.title = element_text(family = "Trebuchet MS", face="bold", size=12))

Map of the earthquages with magnitudes between 5.0 and 6.0

(y <- df2 %>%
  filter(`Magnitude_Class` == "5-6"))
leaflet() %>%
  addTiles() %>%
  addMarkers(data = y, clusterOptions = markerClusterOptions())

Map of the earthquages with magnitudes between 6.0 and 8.0

(y <- df2 %>%
  filter(`Magnitude_Class` == "6-8"))
leaflet() %>% addTiles() %>%
  addCircleMarkers(data=y,
    label=y$Magnitude,
    labelOptions = labelOptions(noHide = T, direction = 'top'))

Earthquages of İstanbul City

istanbul<-df2 %>% filter(str_detect(Location, "ISTANBUL"))
leaflet() %>%
  addTiles() %>%
  addMarkers(data = istanbul, clusterOptions = markerClusterOptions())

Earthquages of Manisa City

manisa<-df2 %>% filter(str_detect(Location, "MANISA"))
leaflet() %>%
  addTiles() %>%
  addMarkers(data = Manisa, clusterOptions = markerClusterOptions())

Earthquages of Elazığ City

elazıg<-df2 %>% filter(str_detect(Location, "ELAZI"))
leaflet() %>%
  addTiles() %>%
  addMarkers(data = elazıg, clusterOptions = markerClusterOptions())

Density plot of magnitute

ggdensity(df2$Magnitude, 
          main = "Density plot of magnitude",
          xlab = "Magnitute")

Density plot of depth

ggdensity(df2$Depth, 
          main = "Density plot of depth",
          xlab = "Depth")

QQ plot of magnitute

ggqqplot(df2$Magnitude)

QQ plot of depth

ggqqplot(df2$Depth)

Kolmogorov-Smirnov Normality test

#Kolmogorov-Smirnov test is used in place of Shapiro-Wilk’s one because sample size exceeds 5000.
ks.test(df2$Magnitude, df2$Depth)
p-value will be approximate in the presence of ties
	Two-sample Kolmogorov-Smirnov test
data:  df2$Magnitude and df2$Depth
D = 0.8073, p-value < 2.2e-16
alternative hypothesis: two-sided

Correlation between depth of earthquage and magnitude of earthquage

ggscatter(df2, x = "Magnitude", y = "Depth", 
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.method = "pearson",
          xlab = "Magnitute", ylab = "Depth", main="Correlation between depth of earthquage and magnitude of earthquage")

Correlation Analysis

#There is no strong relationship between depth and magnitude
cor.test(df2$Magnitude, df2$Depth, 
                    method = "pearson")
	Pearson's product-moment correlation
data:  df2$Magnitude and df2$Depth
t = 33.943, df = 49998, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.1415024 0.1586381
sample estimates:
      cor 
0.1500815 

Conclusion

In this study, it is aimed to be conducted exploratory data analysis of Turkey earthquages using data mining techniques. From descriptive statistics, it is understood that earthquakes often show up at night and in the evening. It is observed that the eartquages with magnitudes ranging from 5.0 to 8.0 are more intense in ones between 10th and 12th months relative to other months.

The findings show that there is no strong correlation between depth of earthquages and magnitude of earthquages. Factors such as soil and rock structure may have affected this relationship. In addition, these factors need to be evaluated.

Hope to create awareness..

I attribute this work to our citizens who died in the earthquake.

References

https://rpubs.com/tevfik1461/Turkey

https://tevfikbulut.com/2020/01/31/exploratory-data-analysis-of-turkey-earthquakes/

https://www.r-project.org/

https://cfss.uchicago.edu/notes/raster-maps-with-ggmap/

http://www.koeri.boun.edu.tr/sismo/zeqdb/indexeng.asp

http://www.koeri.boun.edu.tr/sismo/zeqdb/

Bir Cevap Yazın

Aşağıya bilgilerinizi girin veya oturum açmak için bir simgeye tıklayın:

WordPress.com Logosu

WordPress.com hesabınızı kullanarak yorum yapıyorsunuz. Çıkış  Yap /  Değiştir )

Google fotoğrafı

Google hesabınızı kullanarak yorum yapıyorsunuz. Çıkış  Yap /  Değiştir )

Twitter resmi

Twitter hesabınızı kullanarak yorum yapıyorsunuz. Çıkış  Yap /  Değiştir )

Facebook fotoğrafı

Facebook hesabınızı kullanarak yorum yapıyorsunuz. Çıkış  Yap /  Değiştir )

Connecting to %s