Introduction.

This is a worksheet for use with Lecture 2.

You should have installed R and RStudio on your own machine (see course manual) or on the PCs in either the common room or the postgrad. teaching room.

You have a video of me narrating these slides.

If you answer correctly the colour of the box will change when correct!

Slides

Try it yourself - select dataframe (Slide 9)

Use the example code to make a smaller dataframe.

Look back to slide 8

Rename (Slide 11).

How would you rename 'tailnum' into 'tail_number' in the flights database via using the rename(data, newvar = oldvar) method?

Did you leave additional spaces? Did you assign the dataframe correctly ('flights<-')?

The pipe (Slide 13)

Starting from R 4.1.0, base R now also has its own pipe: |> . Read more here

How would the code have looked liked if we hadn't used the pipe (%>%)?

  1. flights<- mutate(speed = distance / (air_time * 60), log_speed=log(speed))
  2. flights<- mutate(speed = distance / (air_time * 60), log_speed=log(speed), flights)
  3. flights<- mutate(flights, speed = distance / (air_time * 60), log_speed=log(speed))

My answer:

Centering and standardizing (Slide 15)

What is an alternative name - for the curve in the figure?

  1. Gauss curve
  2. Bell curve
  3. Both of the above
  4. None of the above

My answer:

Try it yourself (dplyr functions) (Slide 18)

Use the information from the previous slides to work this out.

Look back over the previous slides. Remember that you have air_time in the dataframe. Remember you can look at 'help' to find out more!

geom_smooth (Slide 28)

Use the help function to find out more about geom_smooth.

Which method is used in the figure?

  1. lm
  2. glm
  3. gam
  4. loess
  5. NULL, and therefore 'loess' as less than 1,000 observations.
  6. NULL, and therefore 'loess' as more than 1,000 observations.

My answer:

Do you see 'method = ' specified? What happens when it is not specified? How many rows are there in 'flights_jfk_new'

Try it yourself (La Guardia airport) (Slide 29)

Use the information from the previous slides to work this out.

Look back over the previous slides. Remember that you need to filter our the data based on the origin - use the filter function. Next, implement the steps from the previous slides.

Beautifying graphs (Slide 30)

Find out more about shapes here

If I wanted a blue rectangle, I would use shape=

Basics (Histogram) (Slide 35)

I might not have been very clear here. When you don't specify it ggplot2, will pick the number of bins.

What is the default when binwidth is left unspecified?

bins

Change labels (Slide 38)

What are the units used on the X-axis?

  1. minutes
  2. seconds
  3. hours
  4. Z scores.

My answer:

Try it yourself (Slide 43)

Try not to peek at the solution :).

ggsave (Slide 44)

Have a look at the ggplot2 manual via the help (or use google!)

If I set as an option: dpi='print', what would be the resolution?

dpi

Add some 'oomph' (Slide 50)

What does '14' in theme_stata() refer to?

  1. colour
  2. font size
  3. shape
  4. axis size

My answer:

No SD? (Slide 55)

If there were missings in the 'jfk_delta$delay_no_miss' variable, what would have been the output?

  1. NA
  2. 51.17242
  3. A value but not 51.17242
  4. A warning

My answer:

Medians? (Slide 60)

Can you complete the statement below about medians?

The median is always:

  1. The most frequently occurring score in a data set
  2. The middle score when results are ranked in order of magnitude
  3. The same as the mean
  4. The difference between the maximum and minimum scores.

My answer:

95% CI for p? (Slide 67)

Suppose I wanted an 89% confidence interval, then I would put the following within the brackets of the boot.ci command?

  1. boot.out=.445
  2. boot.out=.89
  3. conf=.445
  4. conf=.89

My answer:

Have a look at the boot.ci command in the help function. Find out what commands it is using. Can you work out from the default command and the output on slide 67, which value to input?

Test it yourself. (Slide 69)

You have the example on Slide 63.

You need to replace the dependent variable! (so the bit that goes before ~ )

Going further.

Session Info.

Thanks to Lisa DeBruine for the webexercises package. Please see general disclaimer.

sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Europe/London
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] lubridate_1.9.3    forcats_1.0.0      stringr_1.5.1      dplyr_1.1.4       
##  [5] purrr_1.0.2        readr_2.1.5        tidyr_1.3.1        tibble_3.2.1      
##  [9] ggplot2_3.5.1      tidyverse_2.0.0    webexercises_1.1.0
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6       jsonlite_1.8.9     compiler_4.4.2     tidyselect_1.2.1  
##  [5] nycflights13_1.0.2 jquerylib_0.1.4    scales_1.3.0       yaml_2.3.10       
##  [9] fastmap_1.2.0      R6_2.5.1           generics_0.1.3     knitr_1.49        
## [13] munsell_0.5.1      bslib_0.8.0        pillar_1.9.0       tzdb_0.4.0        
## [17] rlang_1.1.4        utf8_1.2.4         stringi_1.8.4      cachem_1.1.0      
## [21] xfun_0.49          sass_0.4.9         timechange_0.3.0   cli_3.6.3         
## [25] withr_3.0.2        magrittr_2.0.3     digest_0.6.37      grid_4.4.2        
## [29] rstudioapi_0.17.1  hms_1.1.3          lifecycle_1.0.4    vctrs_0.6.5       
## [33] evaluate_1.0.1     glue_1.8.0         fansi_1.0.6        colorspace_2.1-1  
## [37] rmarkdown_2.28     tools_4.4.2        pkgconfig_2.0.3    htmltools_0.5.8.1

The end...