R Markdown for Lecture 1 exercise.

Load the flights dataset.

Calculate the mean delay in arrival for Delta Airlines (DL) (use filter())

Calculate the associated 95% confidence interval.

Do the same for United Airlines (UA) and compare the two. Do their confidence intervals overlap?

Calculate the mode for the delay in arrival for at JFK airport.

save a dataset as .sav with only departing flights from JFK airport.

Load the flights data.

library(nycflights13)
flights<-nycflights13::flights

Select just the delta flights

require(dplyr)
## Loading required package: dplyr
## Warning: package 'dplyr' was built under R version 3.5.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
delta<-filter(flights, carrier=="DL")

Means

#remove the missings
# chose to work in same dataset. (for safety reasons you could make a new one!)
delta<-filter(delta, arr_delay!='NA')
mean(delta$arr_delay)
## [1] 1.644341
# store it
mean_delta<-mean(delta$arr_delay)

95% CI

First get the ‘se’

se_delta<-sd(delta$arr_delay)/sqrt(length(delta$arr_delay))
se_delta
## [1] 0.2033937

Now calculate 95%CI.

UL_delta<- (mean_delta + 1.96*se_delta)
LL_delta<- (mean_delta - 1.96*se_delta)
UL_delta
## [1] 2.042993
LL_delta
## [1] 1.245689

United airlines.

All in one go

require(dplyr)
united<-filter(flights, carrier=="UA")
united<-filter(united, arr_delay!='NA')
mean(united$arr_delay)
## [1] 3.558011
# store it
mean_united<-mean(united$arr_delay)
se_united<-sd(united$arr_delay)/sqrt(length(united$arr_delay))
se_united
## [1] 0.1704989
UL_united<- (mean_united + 1.96*se_united)
LL_united<- (mean_united - 1.96*se_united)
UL_united
## [1] 3.892189
LL_united
## [1] 3.223833

Conclusion: Delta vs. United.

The 95%CI’s do not overlap. United [3.22 to 3.89] is significantly slower in terms of arrival time than Delta [1.25 to 2.04].

JFK airport

Make a dataset.

jfk<- filter(flights, origin=="JFK")
# remove the missings.
jfk<- filter(jfk, arr_delay!='NA')

Calculate the mode.

library(modeest)
## 
## This is package 'modeest' written by P. PONCET.
## For a complete list of functions, use 'library(help = "modeest")' or 'help.start()'.
mlv(jfk$arr_delay,  method='mfv')
## Mode (most likely value): -13 
## Bickel's modal skewness: 0.3091337 
## Call: mlv.default(x = jfk$arr_delay, method = "mfv")

The mode is -13. The most common value in the dataset is thus 13 minutes early!

Write away the data.

require(haven)
## Loading required package: haven
## Warning: package 'haven' was built under R version 3.5.2
write_sav(jfk, 'jfk.sav')

The end.