In class Exercise 9

Exercise… .

Calculate a summary table which has the mean/SD of the horse power variable organized by number of gears. (Bonus: export it to .html or Word.)
Make a new dataframe called my_cars which contains the columns mpg, hp columns but let the column names be miles_per_gallon and horse_power respectively.
Create a new variable in the dataframe called km_per_litre using the mutate function. Note: 1 mpg = 0.425 km/l .
Look at the sample_frac() function. Use it to make a new dataframe with a random selection of half the data.
Look at the slice function. From the original dataframe select rows 10 to 35.
Look at the tibble package and the rownames_to_column function. Make a dataset with just the “Lotus Europa” model. What would be an alternative way of reaching the same goal?

1.

setwd("~/Dropbox/Teaching_MRes_Northumbria/Lecture9")
library(datasets)
cars<-datasets::mtcars
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

grouped<-group_by(cars, gear)
table<-summarise(grouped, Mean=mean(hp), Sd=sd(hp))
require(stargazer)

## Loading required package: stargazer

## 
## Please cite as:

##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.

##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer

stargazer(table, summary=F, type="html", out= "horsepower.html", header=F)

## 
## <table style="text-align:center"><tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td>gear</td><td>Mean</td><td>Sd</td></tr>
## <tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">1</td><td>3</td><td>176.133333333333</td><td>47.6892720291122</td></tr>
## <tr><td style="text-align:left">2</td><td>4</td><td>89.5</td><td>25.8931370338657</td></tr>
## <tr><td style="text-align:left">3</td><td>5</td><td>195.6</td><td>102.833846568141</td></tr>
## <tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr></table>

2.

Note that I have not reloaded dplyr. Mutate will also get you there but you’d have to then remove the surplus columns.

my_cars <- cars %>% select(miles_per_gallon = mpg, horse_power=hp)

3.

I have added it to the my_cars dataframe.

my_cars <- my_cars %>% mutate(km_per_litre = 0.425*miles_per_gallon)

4.

Sliced some rows.

my_cars_slice = my_cars %>% slice(10:35)

5.

my_cars_sample <- my_cars %>% sample_frac(size = 0.5, replace = FALSE)

6.

This requires tibble. But if you loaded the tidyverse, it should be in good order.

require(tidyverse)

## Loading required package: tidyverse

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.5
## ✔ ggplot2   3.5.1     ✔ stringr   1.5.1
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

mycars_final = rownames_to_column(mtcars, var = "model")
Lotus_europa <- mycars_final %>% filter(model == "Lotus Europa")

Alternatively.

Lotus_europa2<- mtcars %>% filter(rownames(mtcars) %in% "Lotus Europa")

Or.

Lotus_europa3 <- mtcars %>% filter(rownames(mtcars) == "Lotus Europa")

You could also make a new variable of row names via mutate.

In class Exercise 9

Dr. Thomas Pollet, Northumbria University (thomas.pollet@northumbria.ac.uk)

2025-03-17 | disclaimer

Exercise… .

1.

2.

3.

4.

5.

6.