Let’s continue to explore how to use R.

Objectives

Reminder: R and RStudio rely on syntax, and are case-sensitive. Always check your cases are correct, and you are using the correct number of parentheses.

Shortcuts:

Symbol or Command Keyboard Shortcut
<- Alt + -
# Shift + 3
Run one line in script Ctrl + Enter
Run entire script Ctrl + Shift + Enter
Open new script Ctrl + Shift + N


Exercise 2
1. Open RStudio and prepare a new script.

  1. Open a new script. All of your code for today’s exercises, and your notes and comments will go in this script.
  2. Write your filename, title, author, date, and description of the script.
# Filename: (what you will save the script as)
# Title: (give script a title)
# Author: (write your full name here)
# Date: Month Day Year (write the actual date here)

# Description: (describe what this script is for)


2. In our last exercise, we saw that we assign variables a specific number in R. For e.g., we can assign the variable height a value of 64 inches by typing in:

     height <- 64

Real data will likely have multiple observations, or a collection of numbers known as a vector. We can also use assignment to assign vectors, rather than a single number, to a variable.

Let’s first see how R creates vectors. Let’s say we have data on daily temperatures in Farenheit for New York City in January. Our temperatures are: 32ºF, 31ºF, 31ºF, 29ºF, 29ºF, 24ºF, and 22ºF. We will tell R to put these numbers in vector format.

a. Type the following your script and run it.

     c(32, 31, 31, 29, 29, 24, 22)

         In your console, your output should look like this:

## [1] 32 31 31 29 29 24 22

Notice that we are using c( ). This has a letter, followed by parantheses, in which we enter values. This means that c( ) is a function! Remember that functions are code that tell R to perform a specific task. We can specify what, and how, through the arguments we write in the parantheses. In this case, c() is telling R to “combine” (c for combine) some elements (numbers or words) into a vector.

b. To assign this vector to our variable, say temperatureF, we follow the same steps for assigning one value to a variable, and then type in its name to ask R to retrieve the value.

Type this in your script and Run each line. Does it appear in the console? What do you see in your environment?

     temperatureF <- c(32, 31, 31, 29, 29, 24, 22)
     temperatureF


3. In our last exercise, we saw that R can be used as a calculator, via basic operations or functions, and we can run those calculations with our variables. This is also true of variables that have been assigned to vectors. Try the following in your scripts.

  1. Convert temperature F to temperatureC, and assign your converted numbers to a new variable, temperatureC. This can be done simultaneously!
     temperatureC <- (temperatureF - 32) * 5/9 
     temperatureC
  1. Calculate the mean of temperatureF.
     mean(temperatureF)
  1. Calculate the sum of temperatureF.
     sum(temperatureF)


4. Sometimes we want to know which value is in the nth position of a vector. To do this, we can use square brackets to specify the value we want to retrieve from a variable. Obtain the 2nd value in the variable temperatureF. What about the 5th value?

     temperatureF[2]
     temperatureF[5]


5. But, what if we already have a beautiful dataset with multiple columns and rows, and we want to import it into R? To do this, we have to read in the file. All files we use in this class will be “.csv” files. CSV files, or comma-separated value files, are plain text files that save tabular data. You can easily save your data in Excel into a CSV format.

To read in a .csv file, we use the function: read.csv( ).

a. Reading in a file from a URL (easiest in-class option): We can read files into R directly from the internet. To do this, we need the URL. Let’s read in a file directly from our GitHub. Type this into your script and hit Run.

     birds <- read.csv("https://github.com/lczawadzki/biostats/raw/main/data/bird-richness.csv", header = TRUE)

read.csv( ) looks for the csv file on the GitHub via the URL. The first argument in this function is the location of the file you’re looking for, and the second argument we have added, “header = TRUE” tells R that the first row in the file has column names. Notice we assign the data to an object to read it in. In this case, we have named it birds. R will recognise from now on that birds contains all this bird richness data (to view more arguments in read.csv, type ?read.csv into your console).

You will now see the object birds in your Environment. You can view the dataset by clicking on it directly in the Environment, or typing in:

     View(birds)

To view only the first few rows, type in:

     head(birds)

Or the last few rows:

     tail(birds)


b. If you already downloaded the .csv file from the GitHub, or have a dataset on your computer, we can need to know the location of the file. We can do this in two ways:

(1) Copy pathname and read it in by pathname (good at-home option_: Go to the GitHub and download the file to your computer. Locate the file, and copy the pathname. On Windows, hold Shift and right-click on the file, then select “Copy as path”. On Mac, right-click on the file, hold Option, and scroll down to where it says “Copy ‘bird-richness.csv’ as Pathname” and click. You should now be able to paste the pathname into your first argument below.

Note: IF YOU USE WINDOWS YOU WILL GET AN ERROR. Windows pathnames use back slashes \. R hates back slashes. On Windows, either change ALL back slashes \ to forward slashes / OR replace ALL back slashes with a double slash \. It should now work. If you are on a Mac, you are safe.

     birds <- read.csv("YOUR PATHNAME HERE; USE / only", header = TRUE)


(2) Set a working directory, and read it in directly from that location (best at-home option): Go to the toolbar at the top of RStudio, go to Session -> Set Working Directory -> Choose Directory. Select a folder on your computer where you will save your datasets for R to read from. Once set, make sure you put __“bird-richness.csv” in this folder.

First, let’s check that R is reading in from the correct directory.

     getwd() #checks directory
     list.files() #lists files in folder



Once confirmed, we can read in our data by just using the file name as our argument.

     birds <- read.csv("bird-richness.csv", header = TRUE)


  1. R also has some pre-loaded datasets that can be accessed directly in R for practice. We will also use some of these during the course. To view all of these datasets, you can type data() into the console. Type in the following datasets and look at the data. To learn more about each dataset, you can type in ?datasetname (e.g. ?iris).
     iris
     mtcars



Now it’s time to practice on your own.

  1. People are notoriously dishonest about revealing how often they perform antisocial behaviors like peeing in swimming pools. In addition to being disgusting, the nitrogenous chemicals in urine combine with the pool’s chlorine to produce some toxic chemicals like trichloramine, the source of most skin irritations for swimmers. A group of researchers (Jmaiff Blackstock et al. 2017) recently realized that an artificial sweetener called ACE passes out in urine unmetabolized and in known average quantities. Therefore, by measuring ACE concentrations we can measure the amount of urine in a pool. (question adopted from: Whitlock & Schluter 2020)

    Here is a list of measurements, each from a different pool, of the concentration of ACE (measured in ng/L) for 23 different pools in Canada.

    640, 1070, 780, 70, 160, 130, 60, 50, 2110, 70, 350, 30, 210, 90, 470, 580, 250, 310, 460, 430, 140, 1070, 130

    a. In R, create a vector of these data, and name it appropriately.

    b. What is the mean ACE concentration of these 23 pools?

    c. Urine on average has 4000 ng ACE/mL. We want to know how many mL of urine are in the pool per L. Make a new vector that shows the concentration of urine in mL/L in these 23 pools. Give it a suitable name. (conversion note: there are 4000 ng ACE/mL of urine, so we need to divide our ACE values by 4000 to get mL of urine/L).

    d. What is the mean concentration of urine per liter? How did this change relative to the mean measurement of ng ACE/L ?

    e. Use R to calculate the average amount of urine (in ml) in a 500,000 L pool.

  2. Read in the file “student-height.csv”.

    a. Directly from the GitHub using the following URL: https://github.com/lczawadzki/biostats/raw/main/data/student-height.csv

    b. By saving it to your working directory, and reading it into R directly.


    **Finally, save your script: File -> Save As -> Save to Desktop. Email the file to yourself, or save to USB to use for future study/practice.**