Exercise… .

  1. Calculate a summary table which has the mean/SD of the horse power variable organized by number of gears. (Bonus: export it to .html or Word.)

  2. Make a new dataframe called my_cars which contains the columns mpg, hp columns but let the column names be miles_per_gallon and horse_power respectively.

  3. Create a new variable in the dataframe called km_per_litre using the mutate function. Note: 1 mpg = 0.425 km/l .

  4. Look at the sample_frac() function. Use it to make a new dataframe with a random selection of half the data.

  5. Look at the slice function. From the original dataframe select rows 10 to 35.

  6. Look at the tibble package and the rownames_to_column function. Make a dataset with just the “Lotus Europa” model. What would be an alternative way of reaching the same goal?

1.

setwd("~/Dropbox/Teaching_MRes_Northumbria/Lecture9")
library(datasets)
cars<-datasets::mtcars
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.4.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
grouped<-group_by(cars, gear)
table<-summarise(grouped, Mean=mean(hp), Sd=sd(hp))
require(stargazer)
## Loading required package: stargazer
## 
## Please cite as:
##  Hlavac, Marek (2015). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2. http://CRAN.R-project.org/package=stargazer
stargazer(table, summary=F, type="html", out= "horsepower.html", header=F)
## 
## <table style="text-align:center"><tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td>gear</td><td>Mean</td><td>Sd</td></tr>
## <tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">1</td><td>3</td><td>176.133333333333</td><td>47.6892720291122</td></tr>
## <tr><td style="text-align:left">2</td><td>4</td><td>89.5</td><td>25.8931370338657</td></tr>
## <tr><td style="text-align:left">3</td><td>5</td><td>195.6</td><td>102.833846568141</td></tr>
## <tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr></table>

2.

Note that I have not reloaded dplyr. Mutate will also get you there but you’d have to then remove the surplus columns.

my_cars <- cars %>% select(miles_per_gallon = mpg, horse_power=hp) 

3.

I have added it to the my_cars dataframe.

my_cars <- my_cars %>% mutate(km_per_litre = 0.425*miles_per_gallon)

4.

Sliced some rows.

my_cars_slice = my_cars %>% slice(10:35)

5.

my_cars_sample <- my_cars %>% sample_frac(size = 0.5, replace = FALSE)

6.

This requires tibble. But if you loaded the tidyverse, it should be in good order.

require(tidyverse)
## Loading required package: tidyverse
## Warning: package 'tidyverse' was built under R version 3.4.2
## ── Attaching packages ────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.0 ──
## ✔ ggplot2 2.2.1     ✔ readr   1.1.1
## ✔ tibble  1.3.4     ✔ purrr   0.2.3
## ✔ tidyr   0.7.1     ✔ stringr 1.2.0
## ✔ ggplot2 2.2.1     ✔ forcats 0.2.0
## ── Conflicts ───────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
mycars_final = rownames_to_column(df = mtcars, var = "model")
Lotus_europa <- mycars_final %>% filter(model == "Lotus Europa")

Alternatively.

Lotus_europa2<- mtcars %>% filter(rownames(mtcars) %in% "Lotus Europa")

Or.

Lotus_europa3 <- mtcars %>% filter(rownames(mtcars) == "Lotus Europa")

You could also make a new variable of row names via mutate.