Intro to R and the Tidyverse#

What is R?#

R is a programming language designed for statistical computing. It is not just a statistics package: it is a language.

What is RStudio?#

RStudio is a free R integrated development environment (IDE). It is cleaner and simpler than the default R GUI (graphical user interface). It has many useful features, like syntax highlighting and tab for suggested code auto-completion.

Additionally, it has a 4-pane workspace:

Top left window: the R code editor
Bottom left: interactive console
Top right window: shows your workspace, including a list of objects currently in memory, history tab
Bottom right: shows plots, external packages available on your system, files in your working directory, and help files

Useful RStudio shortcuts:

tab: auto-complete function
Ctrl+↑ or cmd+↑ (auto-complete tool that works only in the interactive console)
Ctrl+enter or cmd+return (executes the selected lines of code)

Things to keep in mind#

R is case sensitive, so be careful while typing.
# is used for comments
- Keyboard Shortcuts: Ctrl+Shift+C (Windows) Cmd+Shift+C (MacOS).
R does not care about spaces between commands or arguments.
Names should start with a letter and should not contain spaces.
You can use . in object names (e.g., my.data).
Use forward slash (/) in path names, even on Windows.

Working directory#

Your working directory is the folder on your computer in which you are working. We can find this with the getwd() command.

# Current working directory
getwd()

Output

[1] /User/fordfishman/

We can also set our working directory with setwd(PATH).

# an example of the path to your workshop materials
# USE YOUR OWN PATH
setwd("Documents/Workshops/Intro to R and the Tidyverse 20220928/")

To see the files in your working directory, you can use list.files().

list.files()

Output

[1] "IntroR_Tidyverse_code_along.R" "IntroR_Tidyverse_code.R"       "penguins.csv"

Replacing/adding new elements#

We can also use indexing to replace or add new elements to a vector.

greeting[2] <- "How are you?"
greeting

Exercise 2#

Replace the 3rd element in Myvector2 with a 10.

Solution

myvector2[3] <- 10

Data types#

When we use c(), R assumes that everything in your vector is of the same data type (all # or all characters).

Myvector4 <- c(1,2,"hello")
Myvector4

Output

[1] "1"     "2"     "hello"

If we have different types of data we need to use the list() function.

Mylist <- list(1,3, "hello", TRUE)

Mylist

Output

[[1]]
[1] 1

[[2]]
[1] 3

[[3]]
[1] "hello"

[[4]]
[1] TRUE

Functions#

A function is a piece of code to carry out a specified task. R has many built-in functions.

sum(1,3,5)
mean(Myvector1)
length(Myvector1)
max(Myvector1)
rep("hi", times=3)

Output

[1] 9

[1] 3.25

[1] 4

[1] 5

[1] "hi" "hi" "hi"

If we want to learn more about a function we can ask for help with help() or ?.

help(mean)
?rep

Packages#

We can also bring in extra functions by downloading packages. Packages are collections of functions. There are thousands of add-on packages available at the CRAN (Comprehensive R Archive Network).

For instance, we have the tidyverse, an “opinionated collection of R packages designed for data science” (www.tidyverse.org). These packages are designed to make data wrangling, analysis, and graphing much simpler and more enjoyable.

Tidyverse packages share a philosophy of data organization: they all expect tidy data. Tidy data is set up so that each row is an observation and each column is a variable.

Using the tidyverse packages#

To install a package we use the function install.packages("package name"). We only need to install a package once.

install.packages("tidyverse")

If we want to use the functions in a package, we need to load it in R using the library() function.

Output

library(tidyverse)

Importing data#

Let’s explore penguins! In our file called penguins.csv, we have data for three penguin species observed in the Palmer Archipelago, Antarctica, collected by Dr. Kristen Gorman with Palmer Station LTER.

penguins <- read_csv("penguins.csv")

Simple graphs#

To make a simple scatter plot in R, we can use the plot() function.

plot(penguins$bill_depth_mm, penguins$bill_length_mm)

Output

We can also use ggplot2 to get nicer graphs with many customizations.

mass_flipper <- ggplot(data = penguins,
                       aes(x = flipper_length_mm,
                           y = body_mass_g)) +
   geom_point(aes(color = species,
                  shape = species),
                  size = 3,
                  alpha = 0.8) +
   scale_color_manual(values = c("darkorange","purple","cyan4")) +
   labs(title = "Penguin size, Palmer Station LTER",
         subtitle = "Flipper length and body mass for Adelie, Chinstrap and Gentoo Penguins",
         x = "Flipper length (mm)",
         y = "Body mass (g)",
         color = "Penguin species",
         shape = "Penguin species") +
   theme(legend.position = c(0.2, 0.7),
         plot.title.position = "plot",
         plot.caption = element_text(hjust = 0, face= "italic"),
         plot.caption.position = "plot")

mass_flipper

Output

Intro to R and the Tidyverse#

What is R?#

What is RStudio?#

Things to keep in mind#

Working directory#

Creating Objects#

Storing many numbers as a vector#

Exercise 1#

Replacing/adding new elements#

Exercise 2#

Data types#

Functions#

Packages#

Using the tidyverse packages#

Importing data#

Exploring your data#

Simple graphs#

Useful Resources#