Introduction.

This is a worksheet for use with Lecture 1.

You should have installed R and RStudio on your own machine (see course manual) or on the PCs in either the common room or the postgrad. teaching room.

You have a videos of me narrating these slides. Note that there are minor discrepancies between the current set of slides and the one in the video. The slide numbers refer to the current set. I do not cover every single slide but you can code along!

If you answer correctly the colour of the box will change from red dashed to full blue!

Slides

Markdowntutorial (Slide 11).

Work through the markdown tutorial. You don't have to complete it in its entirety, getting up to 'Lists' is the most important. (You will likely never need block quotes in lists... . You can see me doing an attempt in this video.

Make an example web page (Slide 16).

Follow the steps and make a web page in RStudio.

Open in browser, and save as a .pdf.

Date/Time (Slide 18)

Run current chunk is the play button

See my cursor.

Should display today's date. (Or in my case when I made this script)

Sys.Date()
## [1] "2025-01-06"

Install a package (Slide 20)

Proceed to install the 'ggplot2' package. (Or another package if you feel adventurous).

See my cursor. First click install and then install packages. It'll autocomplete, neat!

Load a package (Slide 21)

You can either remove 'Sysdate' from that previous chunk you made or make a new chunk (Code --> Insert Chunk).

See my cursor. First click install and then install packages. It'll autocomplete, neat!

Next you need to load it. You can also use require()

library(ggplot2) # loads package

Calculations (Slides 23 - 27)

You can check this if you want by using the command line in the console window or again using chunks.

See my cursor. I have calculated 2+2. Note how the console is a different window - here we are 'directly' talking to R.

Question

What is the solution to \(\sqrt{(17+11+1981)}\)? (5 decimals)

Storing things (Slides 34 - 37)

You can again follow along either via chunks or via the command line

See my cursor. I have used the command line here. You can also see how it is now stored in the environment (top right corner). Pressing the broom icon in that top right corner will wipe it, i.e. remove it!.

Try it yourself (storing things) (Slide 37)

You can't work in duos. So you'd have to try to do it on your own!

Try it yourself (combining vectors) (Slide 43)

  1. As you do not have a partner, you will have to generate a second value of height. Perhaps your flatmate? Or just make it up.
  2. You will have to do the same for age.

Try it yourself (writing away data) (Slide 47)

Note that you need to install 'haven' and make it available via the require() or library() command - as in the ggplot2 example which you saw before.

If it works you will see that haven is ticked.

Have a go at writing away your data. And opening the data. Don't worry if you don't have SPSS on your personal machine.

Read in data (Slide 48)

I have used a different url (was http://www.stats.ox.ac.uk/pub/datasets/csb/ch11b.dat before) as the SSL certificate has expired and this will then throw an error. You can still right click and download and then load it.

Now it will load some data from the CDC's Youth Risk Behavior Survey (YRBS): https://www.cdc.gov/healthyyouth/data/yrbs/data.htm

Data wrangling (Slide 49 - 64)

You can follow along and do all the operations on the 'nycflights13' data.

95 % Confidence interval (Slide 64)

Questions

Can you answer the questions below?

  • 1.96 in the formula is based on the
  • What would the multiplier for the se in the formula if you wanted a 90% confidence interval? You might want to go over your undergraduate statistics notes ("critical values") or have a look on this page. The value would be (3 decimals needed).

Have a go at calculating with this calculator? Put in the value for Cumulative probability, while leaving Mean and standard deviation and keeping Standard score (z) blank. What do you get when you put in .975 under Cumulative probability? (Why is this value not .95? --> remember the two-tails of the distribution?) What would the value you have to put in to get a 90% confidence interval? What value does it give you?

Mode estimation (Slide 68)

'mfv' refers to most frequent value.

Questions.

  • What is the mode of this series (most frequent value): "4,5,7,7,7,6,8,5,6,8,7,9,7,5,6,4,4,4,8,9,5,4,7,4,6,5,7,7,5" ?
  • What is the result when you change the method from 'mfv' to 'meanshift'? (2 decimals needed, use scientific rounding) use ??mlv to find out more about this method. What do you make of the output?

'skimr'(Slide 69)

The example tells you to use devtools and install from github but note that you can now also download skimr 'the normal way'. Nonetheless some packages you will have to get from github, hence pay attention to this slide.

Exercise (Slide 72)

Complete the exercise and submit via Blackboard!

A note on Jupyter notebooks. (Slide 73)

Going further.

Session Info.

Thanks to Lisa DeBruine for the webexercises package. Please see general disclaimer.

sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Europe/London
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] modeest_2.4.0      webexercises_1.1.0
## 
## loaded via a namespace (and not attached):
##  [1] cli_3.6.3           knitr_1.49          rlang_1.1.4        
##  [4] statip_0.2.3        xfun_0.49           stabledist_0.7-2   
##  [7] clue_0.3-65         jsonlite_1.8.9      timeSeries_4041.111
## [10] htmltools_0.5.8.1   sass_0.4.9          rmarkdown_2.28     
## [13] evaluate_1.0.1      jquerylib_0.1.4     fastmap_1.2.0      
## [16] yaml_2.3.10         lifecycle_1.0.4     cluster_2.1.6      
## [19] compiler_4.4.2      timeDate_4041.110   rstudioapi_0.17.1  
## [22] digest_0.6.37       stable_1.1.6        R6_2.5.1           
## [25] rpart_4.1.23        bslib_0.8.0         tools_4.4.2        
## [28] rmutil_1.1.10       fBasics_4041.97     spatial_7.3-17     
## [31] cachem_1.1.0

The end...