Intro to R and the Tidyverse¶

What is R?¶

R is a programming language designed for statistical computing. It is not just a statistics package: it is a language.

What is RStudio?¶

RStudio is a free R integrated development environment (IDE). It is cleaner and simpler than the default R GUI (graphical user interface). It has many useful features, like syntax highlighting and tab for suggested code auto-completion.

Additionally, it has a 4-pane workspace:

Top left window: the R code editor
Bottom left: interactive console
Top right window: shows your workspace, including a list of objects currently in memory, history tab
Bottom right: shows plots, external packages available on your system, files in your working directory, and help files

Useful RStudio shortcuts:

tab: auto-complete function
Ctrl+↑ or cmd+↑ (auto-complete tool that works only in the interactive console)
Ctrl+enter or cmd+return (executes the selected lines of code)

Things to keep in mind¶

R is case sensitive, so be careful while typing.
# is used for comments
- Keyboard Shortcuts: Ctrl+Shift+C (Windows) Cmd+Shift+C (MacOS).
R does not care about spaces between commands or arguments.
Names should start with a letter and should not contain spaces.
You can use . in object names (e.g., my.data).
Use forward slash (/) in path names, even on Windows.

Working directory¶

Your working directory is the folder on your computer in which you are working. We can find this with the getwd() command.

# Current working directory
getwd()

Output

[1] /User/fordfishman/

We can also set our working directory with setwd(PATH).

# an example of the path to your workshop materials
# USE YOUR OWN PATH
setwd("Documents/Workshops/Intro to R and the Tidyverse 20220928/")

To see the files in your working directory, you can use list.files().

list.files()

Output

[1] "IntroR_Tidyverse_code_along.R" "IntroR_Tidyverse_code.R"       "penguins.csv"

Replacing/adding new elements¶

We can also use indexing to replace or add new elements to a vector.

greeting[2] <- "How are you?"
greeting

Exercise 2¶

Replace the 3rd element in Myvector2 with a 10.

Solution

myvector2[3] <- 10

Data types¶

When we use c(), R assumes that everything in your vector is of the same data type (all # or all characters).

Myvector4 <- c(1,2,"hello")
Myvector4

Output

[1] "1"     "2"     "hello"

If we have different types of data we need to use the list() function.

Mylist <- list(1,3, "hello", TRUE)

Mylist

Output

[[1]]
[1] 1

[[2]]
[1] 3

[[3]]
[1] "hello"

[[4]]
[1] TRUE

Functions¶

A function is a piece of code to carry out a specified task. R has many built-in functions.

sum(1,3,5)
mean(Myvector1)
length(Myvector1)
max(Myvector1)
rep("hi", times=3)

Output

[1] 9

[1] 3.25

[1] 4

[1] 5

[1] "hi" "hi" "hi"

If we want to learn more about a function we can ask for help with help() or ?.

help(mean)
?rep

Packages¶

We can also bring in extra functions by downloading packages. Packages are collections of functions. There are thousands of add-on packages available at the CRAN (Comprehensive R Archive Network).

For instance, we have the tidyverse, an “opinionated collection of R packages designed for data science” (www.tidyverse.org). These packages are designed to make data wrangling, analysis, and graphing much simpler and more enjoyable.

Tidyverse packages share a philosophy of data organization: they all expect tidy data. Tidy data is set up so that each row is an observation and each column is a variable.

Using the tidyverse packages¶

To install a package we use the function install.packages("package name"). We only need to install a package once.

install.packages("tidyverse")

If we want to use the functions in a package, we need to load it in R using the library() function.

Output

library(tidyverse)

Importing data¶

Let’s explore penguins! In our file called penguins.csv, we have data for three penguin species observed in the Palmer Archipelago, Antarctica, collected by Dr. Kristen Gorman with Palmer Station LTER.

penguins <- read_csv("penguins.csv")

Simple graphs¶

To make a simple scatter plot in R, we can use the plot() function.

plot(penguins$bill_depth_mm, penguins$bill_length_mm)

Output

We can also use ggplot2 to get nicer graphs with many customizations.

mass_flipper <- ggplot(data = penguins,
                       aes(x = flipper_length_mm,
                           y = body_mass_g)) +
   geom_point(aes(color = species,
                  shape = species),
                  size = 3,
                  alpha = 0.8) +
   scale_color_manual(values = c("darkorange","purple","cyan4")) +
   labs(title = "Penguin size, Palmer Station LTER",
         subtitle = "Flipper length and body mass for Adelie, Chinstrap and Gentoo Penguins",
         x = "Flipper length (mm)",
         y = "Body mass (g)",
         color = "Penguin species",
         shape = "Penguin species") +
   theme(legend.position = c(0.2, 0.7),
         plot.title.position = "plot",
         plot.caption = element_text(hjust = 0, face= "italic"),
         plot.caption.position = "plot")

mass_flipper

Output

Intro to R and the Tidyverse¶

What is R?¶

What is RStudio?¶

Things to keep in mind¶

Working directory¶

Creating Objects¶

Storing many numbers as a vector¶

Exercise 1¶

Replacing/adding new elements¶

Exercise 2¶

Data types¶

Functions¶

Packages¶

Using the tidyverse packages¶

Importing data¶

Exploring your data¶

Simple graphs¶

Useful Resources¶