We’ve got a new case to crack. Let’s open RStudio.
We are Kristie, Dorian, and Derek and we like R.
We are cats and not computer scientists.
We make lots of mistakes. You will see us make mistakes. Feel free to laugh at us. It’s okay.
Spelling is very important in R.
Okay detective, let’s open a new dossier for today’s case.
Start a new project
"cat_mystery"
R “scripts” are where you write your R code and document your work. They’re like recipes or prescriptions you write the tell the computer what you want to happen to your data.
Open a new R script
Save script
cat_code.R
1. Code editor
This is where you write your scripts and document your work. The tabs at the top of the code editor allow you to view scripts and data sets you have open. This is where you’ll spend most of your time.
2. Console
This is where code is actually executed by the computer. It shows code that you have run and any errors, warnings, or other messages resulting from that code. You can input code directly into the console and run it, but it won’t be saved for later. That’s why we like to run all of our code directly from a script in the code editor.
3. Workspace
This pane shows all of the objects and functions that you have created, as well as a history of the code you have run during your current session. The environment tab shows all of your objects and functions. The history tab shows the code you have run. Note the broom icon below the Connections tab. This cleans shop and allows you to clear all of the objects in your workspace.
4. Plots and files
These tabs allows you to view and open files in your current directory, view plots and other visual objects like maps, view your installed packages and their functions, and access the help window. If at anytime you’re unsure what a function or package does, enter the name of thing after a question mark. For example, try entering ?mean
into the console and push ENTER.
Let’s add a little style so we feel more at home. Follow these steps to change the font-size and and color scheme:
Global Options...
Appearance
with the paint bucket.You can create objects and assign values to them using the “left arrow” <-
, more officially known as the assignment operator. Try adding the code below to your R script and creating an object called name
.
Once you add the code to your script, you can run the code by moving the blinking cursor to that line and pressing CTRL + ENTER.
# Create a new object
cat <- "Derek"
cat
# When saving text to a character object you need quotation marks.
# This won't work.
cat <- Demitri
# Without quotes, R looks for an object called Derek, and then let you know that it couldn't find one.
# You can copy an object by saving it to a new name.
cat2 <- cat
# Overwrite an object
cat <- "Demitri III"
cat
# Did name2 change as well?
cat2
If you create a name you don’t like you can drop it with the function rm()
.
# Delete objects to clean-up your environment
rm(cat)
rm(cat2)
# How can you get the original 'cat' object back?
HOORAY! Don’t worry about deleting data or making a mistake in R. When you load data files into R it only copies the contents. That means all your original data files will remain safe and won’t suffer from any accidental changes. If anything disappears or goes wrong in R, it’s okay! You can always re-load the data using your script. No more worries about whether you remembered to save the latest data file or not.
Everything has a name in R and you can name things almost anything you like. You can even name your data TOP_SECRET_Shhhhhh...
or unicorn_sightings
or the_worst_data_ever
.
Sadly, there are a few minor restrictions. Object names can’t include spaces or special characters such as a +
, -
, *
, \
, /
, =
, !
, or )
. However, on the plus side object names can include a _
.
Your turn! Try running some of these examples in your R console.
n cats <- 5
n*cats <- 5
n_cats <- 5
n.cats <- 5
all_the_cats! <- "A very a big number"
# You can add one cat
n_cats <- n_cats + 1
# But what if you have 10,000 cats?
n_cats <- 10,000
# They also cannot begin with a number.
1st_cat <- "Fluffer Puff"
# But they can contain numbers.
cat1 <- "Fluffer Puff"
NOTE: What happened when you created
n_cats
the second time?When you create a new object that has the same name as something that already exists, the new object will replace the old one. Sometimes you’ll want to update an existing object and replace the old version. Other times you may want to copy an object to a new name to preserve the original. This is similar to choosing between Save and Save As when you save a file.
You can put multiple values inside c()
to make a vector of items. Each additional item is separated by a comma. The c
stands for to conCATenate or to combine values.
Let’s use c()
to create a few vectors of cat names and their ages.
# Create a character vector and name it cat_names
cat_names <- c("Fluffy", "Longmire", "Lucy")
# Print cat_names to the console
cat_names
## [1] "Fluffy" "Longmire" "Lucy"
# Create a numeric vector and name it cat_ages
cat_ages <- c(3,7,14.5)
# Print cat_ages to the console
cat_ages
## [1] 3.0 7.0 14.5
A table in R is known as a data frame. Data frames have columns of data, each made from a named vector. Let’s make a data frame with two columns by using the cat names and ages from above.
# Create table with columns "names" and "ages" with values from the cat_names and cat_ages vectors
cat_df <- data.frame(names = cat_names,
ages = cat_ages)
# Print the cat_df data frame to the console
cat_df
## names ages
## 1 Fluffy 3.0
## 2 Longmire 7.0
## 3 Lucy 14.5
To see the values in one of your columns, use the $
sign after the name of your table.
# Print the "ages" column in cat_df
cat_df$ages
## [1] 3.0 7.0 14.5
Which of these names are valid for a new object? (Hint: You are allowed to test them.)
my cat fred
my_CAT55
5cats
my-cat
Whatever!
my_CAT55
Yes!! That was purrfect!
You may have noticed the text in the scripts with the #
in front. These are called comments. Any line that starts with a #
won’t be executed as R code. You can use the # to add notes in your script to make it easier for others and yourself to understand what is happening and why. You can also use comments to add warnings or instructions for others, add references to data you’re using, or point out things that need to be looked into further.
Now that you know what objects are and how to create them, let’s learn how to use them. Functions take one or more inputs called “arguments”. They perform steps based on the arguments and usually return an output object.
You can think of a function like ordering pizza.
order_pizza(address = "140 3rd place, Bingo MN",
toppings = c("mouse whiskers", "catnip", "anchovies"),
time = "ASAP")
The function above calls a pizza place and provides several arguments (your address, pizza toppings, and delivery time). With some luck, the function will sucessfully return a new object (your cat’s favorite pizza). Note that when you have more than one argument, you will use a comma to separate them.
We already covered two functions: c()
and data.frame()
. Now let’s use the mean()
function to find the average age of our cats.
# Call the mean function with cat_ages as input
cat_ages_mean <- mean(cat_ages) # Assigns the output to cat_ages_mean
# Print the cat_ages_mean value to the console
cat_ages_mean
## [1] 8.166667
The mean()
function takes the cat_ages vector as input, performs some calculations, and returns a single numeric object. Note that we assigned the output object to the name cat_ages_mean
. If you don’t assign the output object it will be printed to the console and won’t be saved. Sometimes this is okay, especially when you’re still in the exploratory stages.
# Alternative without assigning output
mean(cat_ages)
## [1] 8.166667
Note that our cat_ages
vector has not changed at all. Each function has its own “environment”, and all of its calculations happen inside its own bubble. Usually anything that happens inside a function won’t change objects outside of the function’s environment.
cat_ages
## [1] 3.0 7.0 14.5
There are functions in R that are more complex, but most boil down to the same general setup:
new_output <- function(input1, input2)
You call the function with input arguments inside parentheses and get an output object in return. You can make your own functions in R and call them almost anything you like, even my_amazing_cat_function()
. Naming rules for functions and arguments are the same as those for objects (no spaces or special characters like +
, -
, *
, /
, \
, =
, or !
and they can’t begin with a number, sorry).
Which of these is a valid function call?
lick(“paws” “tail”)
scratch, “couch”, “door”
sleep(3, “hours”)
meow(until fed)
shed(1 million, “hairs”)
sleep(3, "hours")
Correct! You’re quite good at this.
The first step of a good mystery is finding some clues. Here’s an example data table showing my favorite cats. It is saved on the internet as an Excel file here.
Cat Name | Cat Age | Cat Color |
---|---|---|
Mr. Sauce | 3 | Salt-n-pepper |
Sad Face | 2 | Tabby |
Noodles | 6 | Calico |
But how do we get this data into R?
The main data format in R is the CSV (comma-separated values). A CSV is a simple text file that can be opened in R and most other stats software, including Excel.
Here’s how the example cat table looks when it is saved as a .CSV file.
my_cats.csv
Cat Name,Cat Age,Cat Color
Mr. Sauce,3,Salt-n-pepper
Two Face,2,Tabby
Noodles,6,Calico
It looks squished together right now, but that’s okay. When it’s opened in R the text will become a familiar looking table with columns and rows.
First, open the Excel cat table by copying this path into a new window or to the Windows search bar - X:\Agency_Files\Outcomes\Risk_Eval_Air_Mod\_Air_Risk_Evaluation\R\R_Camp\Student Folder\my_cats.xlsx
.
And then follow these instructions to save the Excel file as a CSV file.
Now let’s check that it worked. Return to RStudio and open your new data folder by finding it in your Files tab in the lower right window. Click on your CSV file and choose View File.
Copy the code below to your R script and run the line with the read.csv
function (Hit CTRL + ENTER). It will return a nice cat data frame called my_cats
. The character string "data/my_cats.csv"
inside the parentheses of the read.csv()
function is the path of the data file.
read.csv("data/my_cats.csv")
## Cat.Name Cat.Age Cat.Color
## 1 Mr. Sauce 3 Salt-n-pepper
## 2 Sad Face 2 Tabby
## 3 Noodles 6 Calico
Note: The location of the CSV file above will only work if you have your project open. When your project is open, R sets the working directory automatically to your project folder. We will demonstrate reading a file directly from the X-drive a bit later.
If you want to work with the data in R, you will need to give it a name by using the assignment operator <-
.
Try the code below.
my_cat_file <- "data/my_cats.csv"
my_cats <- read.csv(my_cat_file)
# Type the name of the table to view it in the console
my_cats
## Cat.Name Cat.Age Cat.Color
## 1 Mr. Sauce 3 Salt-n-pepper
## 2 Sad Face 2 Tabby
## 3 Noodles 6 Calico
You can save the file path as an object, such as cat_file <- "data/my_cats.csv"
. Then you can use that object as a shortcut to the location of your data. Now when you want to load the cat table you can write read.csv(cat_file)
. This handy trick will make it easier down the road when you want to update your code to use with new data.
# Assign the file path character string as an object with the name cat_file.
cat_file <- "data/my_cats.csv"
# Use read.csv with the object cat_file which refers to "data/my_cats.csv"
read.csv(cat_file)
## Cat.Name Cat.Age Cat.Color
## 1 Mr. Sauce 3 Salt-n-pepper
## 2 Sad Face 2 Tabby
## 3 Noodles 6 Calico
package
📦What is a package?
A package is a small add-on for R, like a phone App for your phone. They add capabilities like statistical functions, mapping powers, and special charts. In order to use a new package we first need to install it.
The readr package helps import data into R in different formats. It does extra work for you like cleaning the data of extra white space and formatting tricky dates. Your packages are stored in your R library.
Add a package to your library
install.packages("readr")
in the lower left consolePackages
tab in the lower right window of RStudio to see the packages in your library
readr
packageThe packages tab only shows the available packges that are installed. To use one of them, you will need to load it. Loading a package is like opening an App on your phone. To load a package we need to use the library()
function. After loading the readr package you will able to read the cat data with the shiny new function read_csv()
. This function is 300% better than read
.csv()
.
library("readr")
my_cat_file <- "data/my_cats.csv"
read_csv(my_cat_file)
## Parsed with column specification:
## cols(
## `Cat Name` = col_character(),
## `Cat Age` = col_integer(),
## `Cat Color` = col_character()
## )
## # A tibble: 3 x 3
## `Cat Name` `Cat Age` `Cat Color`
## <chr> <int> <chr>
## 1 Mr. Sauce 3 Salt-n-pepper
## 2 Sad Face 2 Tabby
## 3 Noodles 6 Calico
You may have noticed the row of three letter abbreviations under the column names. These describe the data type of each column.
chr
stands for character vector, or a string of characters. Examples: “apple”, “apple5”, “5 red apples”
int
stands for integer. Examples: 5, 34, 1071
We’ll see more data types, such as dates
and logical
, in later lessons.
What data type is the Color
column?
letters
character
words
numbers
integer
character
## Colors cannot be applied in this environment :( Try using a terminal or RStudio.
##
## --------------
## Meow! Not too shabby for a tabby.
## --------------
## \
## \
## \
## |\___/|
## ) (
## =\ /=
## )===(
## / \
## | |
## / \
## \ /
## jgs \__ _/
## ( (
## ) )
## (_(
##
Let’s look a bit closer at the read_csv()
function.
read_csv(my_cat_file)
# Get help
?read_csv
Function arguments
For read_csv()
, the character object argument my_cat_file is what the function uses to know where to find the data file to read. Funcitons often have more than one argument. Type ?read_csv
into your console to see help in the lower-right pane that describes all of the function’s arguments and what they do. Many of the options have default arguments (such as col_names = TRUE
), which the function will use if you don’t provide an alternative argument. A short scroll down in the help window will show you more details about the arguments and the values they take.
`function(arg1 = input1, arg2 = input2, arg3…)
The file argument tells us that the function expects a path to a file. It can be many types of files, even a ZIP file. Below that, you’ll see the col_names argument. This argument takes either TRUE
, FALSE
, or a character vector of column names. The default is TRUE
, which means the first row in the CSV is used as the column names for your data.
Don’t like the column names? We can give new column names to the col_names argument like this:
my_cat_file <- "data/my_cats.csv"
#Assign desired column names as a character vector named column_names
column_names <- c("Name", "Age", "Color")
read_csv(my_cat_file, column_names)
## # A tibble: 4 x 3
## Name Age Color
## <chr> <chr> <chr>
## 1 Cat Name Cat Age Cat Color
## 2 Mr. Sauce 3 Salt-n-pepper
## 3 Sad Face 2 Tabby
## 4 Noodles 6 Calico
We now have the column names we want, but now the original column names in our CSV file show up as a row in our data. We want read_csv
to ignore the first row. Let’s look through the help window and try to find an argument that can help us. The skip
argument looks like it could be helpful. Sure enough, the description is exactly what we’re looking for here. The default is skip = 0
(read every line), but we can skip the first line by providing skip = 1
.
column_names <- c("Name", "Age", "Color")
read_csv(my_cat_file, column_names, skip = 1)
## Parsed with column specification:
## cols(
## Name = col_character(),
## Age = col_integer(),
## Color = col_character()
## )
## # A tibble: 3 x 3
## Name Age Color
## <chr> <int> <chr>
## 1 Mr. Sauce 3 Salt-n-pepper
## 2 Sad Face 2 Tabby
## 3 Noodles 6 Calico
Success!
You may be wondering why we included skip =
for the skip argument, but only provided the objects for the other two arguments. When you pass inputs to a function, R will assume you’ve entered them in the same order that is shown on the ?help page. Let’s say you had a function called feed_pets()
with 3 arguments:
feed_pets(dogs = "dogfood", cats = "catnip", fish = "pellets")
.
A shorthand way to write this would be feed_pets("dogfood", "catnip", "pellets")
. If we write feed_pets("dogfood", "pellets", "catnip")
, the function will send fish pellets to your cat and catnip to your fish. No good. If you really wanted to write “pellets” second, you would need to tell R which food item belongs to each animal, such as feed_pets("dogfood", fish = "pellets", cats = "catnip")
.
The same thing goes for read_csv()
. In read_csv(my_cat_file, column_names, skip = 1)
, R assumes the file is my_cat_file
and that the col_names should be set to column_names
. The skip =
argument has to be included explicitly because skip is the 10th argument in read_csv()
. If we don’t include skip =
, R will assume the value we entered is meant for the function’s 3rd argument.
A handy shortcut to see the arguments of a function is to enter the name of the function in the console and the first parenthesis, such as read_csv(
, and then hit TAB
on the keyboard. This will bring up a drop-down menu of all the available arguments for that function.
Key terms
package
An add-on for R that contains new functions someone created to help you. It’s like an App for R.
library
The name of the folder that stores all your packages, and the function used to load a package.
What package does read_csv()
come from?
dinosaur
get_data
readr
dplyr
tidyr
readr
Great job! You’re fur real.
How would you load the package catfinder
?
catfinder()
library(“catfinder”)
load(“catfinder”)
package(“catfinder”)
library("catfinder")
Excellent! Keep the streak going.
On your way home from work you find a wet and mopey cat sitting on your front stoop. Oh my! It looks so sad, maybe you can bring it inside and give it a treat.
When you pick up the cat you notice a collar with a number on it.
HINT: Your cat’s tag number is the same as the last digit of your birthday. Weird coincidence right?
This gives you an idea! Your friend recently mentioned a list that has all the missing cats people have reported in the city. Maybe you can use the tag number to help find the lost kitty’s home.
Let’s get our script ready for some detective work. Since we bill by the hour we’re going to be very thorough with our documentation.
Script comments
Add a brief description of your new case to the top of the script using the comment symbol #
.
# This file documents my search for the home of a lost cat I found this evening.
# What I know so far
animal <- "cat"
tag_num <- 444
pet_detective <- "Agent Cooper"
# Next steps
# 1. Find the missing cat database.
# 2. Load the cat data.
#
Now let’s find ourselves some clues.
Thanks to your friend, you can download a list of all the missing cats in your town. Follow the steps below to read the cat data into R.
Open this URL in your browser https://github.com/MPCA-air/RCamp/tree/master/data, and find the file missing_cat_list.csv
. To download, Right click on your cat’s file and select Save Link As…. Navigate to your project folder and save the file into a folder named data\
. Don’t have a data
folder? Go ahead and create a new one.
Now you can paste the file’s location into the read_csv()
function. Here’s a code snippet to get you started.
library("readr")
# Replace the `...` with the name of the file
all_cats_file <- "data/..."
# Replace the `...` with 'all_cats_file'
all_cats <- read_csv(...)
library("readr")
all_cats_file <- "X:/Agency_Files/Outcomes/Risk_Eval_Air_Mod/_Air_Risk_Evaluation/R/R_Camp/Student Folder/missing_cat_list.csv"
all_cats <- read_csv(all_cats_file)
## Parsed with column specification:
## cols(
## name = col_character(),
## color = col_character(),
## age = col_integer(),
## gender = col_character(),
## country = col_character(),
## grumpy = col_integer(),
## fearful_of_people = col_integer(),
## playful = col_integer(),
## friendly_to_people = col_integer(),
## clumsy = col_integer(),
## greedy = col_integer(),
## owner_phone = col_character()
## )
In Windows there’s a handy trick to copy the path to a file on your computer:
File paths in R use forward slashes (
/
). In Windows you’ll need to switch backslashes (\
) to forward slashes (/
). A file on your desktop located atC:\Desktop\file.csv
would be read into R asread_csv("C:/Desktop/file.csv")
One trick to quickly fix a long path name is to press CTRL + F. Then you can use search tool to find any (
\
) and replace it with (/
).
NOTE: It’s good practice to load the packages you will need at the top of your script. You will need to run these lines every time you open R or switch projects. If you forget, you’re likely to see this error message.
read_csv(my_cat_file)
## Error in read_csv(my_cat_file): could not find function "read_csv"
Look in the upper right hand window of RStudio. This is the Environment window that shows all of the data frames you have created this session. You can see the names of the data frames, the number of observations (rows), and the number of variables (columns). This is helpful if you ever need to count your data (Hint Hint).
To see all the cat data click on the table called all_cats
in the Environment window.
“There’s over 2,000 missing cats!”
That’s a lot of cats. And there’s more bad news. There’s no column for tag numbers. So which cat is yours? Let’s see… Does your cat look more like a Mr. Buttons or a Furry Potter? That’s a tough call. We need more information!
Let’s explore the missing cat data a bit more. Hopefully we can find some way to pick out your cat.
What is the second column in the all_cats
data frame?
age
gender
nametag
color
fur
color
You sure aren’t kitten around! Great work!
You’ve unlocked a new package!
The dplyr package is the go-to tool for exploring, re-arranging, and summarizing data.
Use install.packages("dplyr")
to add dplyr to your detective library.
Quick stretch break
Stand up. Move around.
Your analysis toolbox: The key dplyr functions
Function | Returns |
---|---|
select() |
select individual columns to drop, keep, or reorder |
arrange() |
reorder or sort rows by value of a column |
mutate() |
Add new columns or update existing columns |
filter() |
pick a subset of rows by the value of a column |
group_by() |
split data into groups by values in a column |
summarize() |
calculate a single summary row for the entire table |
select()
Use the select()
function to drop a column you no longer need, to select a few columns to create a new sub-table, or rearrange the order of your table’s columns.
Drop a single column with a minus sign
library("dplyr")
# Drop the grumpy column
select(all_cats, -grumpy)
## # A tibble: 2,785 x 11
## name color age gender country fearful_of_peop~ playful
## <chr> <chr> <int> <chr> <chr> <int> <int>
## 1 Latte grey and brown 3 Male Austral~ 2 5
## 2 Othello butterscotch 15 Female Thailand 1 7
## 3 Hershey tabby 5 Male Peru 1 6
## 4 Aboulomri brown 3 Female UK 1 2
## 5 Scratches white 11 Male Canada 1 5
## 6 Tuna salt and pepp~ NA Male Germany 1 7
## 7 Zonker white 4 Female China 5 5
## 8 Ovidius brown 10 Male Spain 2 5
## 9 Delia tabby 2 Female Canada 4 5
## 10 Hotdog salt and pepp~ 8 Female Spain 4 6
## # ... with 2,775 more rows, and 4 more variables:
## # friendly_to_people <int>, clumsy <int>, greedy <int>,
## # owner_phone <chr>
Drop multiple columns with -c(col_1, col_2)
# Drop the grumpy and greedy columns
select(all_cats, -c(grumpy, greedy))
## # A tibble: 2,785 x 10
## name color age gender country fearful_of_peop~ playful
## <chr> <chr> <int> <chr> <chr> <int> <int>
## 1 Latte grey and brown 3 Male Austral~ 2 5
## 2 Othello butterscotch 15 Female Thailand 1 7
## 3 Hershey tabby 5 Male Peru 1 6
## 4 Aboulomri brown 3 Female UK 1 2
## 5 Scratches white 11 Male Canada 1 5
## 6 Tuna salt and pepp~ NA Male Germany 1 7
## 7 Zonker white 4 Female China 5 5
## 8 Ovidius brown 10 Male Spain 2 5
## 9 Delia tabby 2 Female Canada 4 5
## 10 Hotdog salt and pepp~ 8 Female Spain 4 6
## # ... with 2,775 more rows, and 3 more variables:
## # friendly_to_people <int>, clumsy <int>, owner_phone <chr>
Keep only three columns
# Keep the name, grumpy and greedy columns
select(all_cats, c(name, grumpy, greedy))
## # A tibble: 2,785 x 3
## name grumpy greedy
## <chr> <int> <int>
## 1 Latte 2 2
## 2 Othello 2 2
## 3 Hershey 1 3
## 4 Aboulomri 7 7
## 5 Scratches 3 7
## 6 Tuna 1 2
## 7 Zonker 3 3
## 8 Ovidius 3 6
## 9 Delia 5 6
## 10 Hotdog 3 3
## # ... with 2,775 more rows
Rearrange: Move the age
and country
columns directly after name
# Move the `age` and `country` columns directly after `name`
# Leave `everything()` else in same order
select(all_cats, name, age, country, everything())
## # A tibble: 2,785 x 12
## name age country color gender grumpy fearful_of_peop~ playful
## <chr> <int> <chr> <chr> <chr> <int> <int> <int>
## 1 Latte 3 Austral~ grey and~ Male 2 2 5
## 2 Othello 15 Thailand buttersc~ Female 2 1 7
## 3 Hershey 5 Peru tabby Male 1 1 6
## 4 Aboulo~ 3 UK brown Female 7 1 2
## 5 Scratc~ 11 Canada white Male 3 1 5
## 6 Tuna NA Germany salt and~ Male 1 1 7
## 7 Zonker 4 China white Female 3 5 5
## 8 Ovidius 10 Spain brown Male 3 2 5
## 9 Delia 2 Canada tabby Female 5 4 5
## 10 Hotdog 8 Spain salt and~ Female 3 4 6
## # ... with 2,775 more rows, and 4 more variables:
## # friendly_to_people <int>, clumsy <int>, greedy <int>,
## # owner_phone <chr>
arrange()
Use the arrange()
function to sort data based on one or more of the columns in the table. Let’s use arrange()
to answer some questions about the missing cats.
library("dplyr")
# Sort by the grumpy column
all_cats <- arrange(all_cats, grumpy)
# Enter the table name to view the ordered data
all_cats
## # A tibble: 2,785 x 12
## name color age gender country grumpy fearful_of_peop~ playful
## <chr> <chr> <int> <chr> <chr> <int> <int> <int>
## 1 Hershey tabby 5 Male Peru 1 1 6
## 2 Tuna salt an~ NA Male Germany 1 1 7
## 3 Ludwig white 15 Male Peru 1 3 6
## 4 Beastie peaches~ 13 Male Thailand 1 2 5
## 5 Hayley grey an~ 1 Female USA 1 6 7
## 6 Fuzzina~ peaches~ 5 Male New Zea~ 1 2 6
## 7 Desiree black 9 Male Thailand 1 1 7
## 8 Zoloft butters~ 20 Female UK 1 2 7
## 9 Dinky salt an~ 4 Male Peru 1 1 7
## 10 Butters~ brown 6 Male Thailand 1 1 7
## # ... with 2,775 more rows, and 4 more variables:
## # friendly_to_people <int>, clumsy <int>, greedy <int>,
## # owner_phone <chr>
# To sort in descending order (highest to lowest)
# Add desc() around the column name
all_cats <- arrange(all_cats, desc(grumpy))
# View the top 5 rows using head()
head(all_cats)
## # A tibble: 6 x 12
## name color age gender country grumpy fearful_of_peop~ playful
## <chr> <chr> <int> <chr> <chr> <int> <int> <int>
## 1 Aboulo~ brown 3 Female UK 7 1 2
## 2 Abigail grey and b~ 5 Male Canada 7 3 7
## 3 Broody brown 17 Female China 7 1 7
## 4 Albert butterscot~ 15 Male Canada 7 7 1
## 5 Barbra striped 11 Female Thaila~ 7 1 6
## 6 Alfred peaches an~ 14 Male Peru 7 7 5
## # ... with 4 more variables: friendly_to_people <int>, clumsy <int>,
## # greedy <int>, owner_phone <chr>
# Find the Aussie cats
# Find the ancient Aussie cats
When you save an arranged data table it maintains its order. This is perfect for sending people a quick Top 10 list of pollutants or sites.
filter()
The filter()
function creates a subset of the data based on the value of one or more columns. Use filter()
to answer the questions below.
What are the cat names that are only 1 year old?
filter(all_cats, age == 1)
We use a
==
(double equals sign) for comparing values. A==
makes the comparison “is it equal to?” and returns a True or False answer. So the code above returns all the rows where the conditionage == 1
is TRUE.A single equals sign (
=
) is used within functions to set options, for exampleread_csv(file = "mydata.csv")
. Don’t worry too much. If you use the wrong symbol R is often helpful and will let you know which one is needed.
To use filtering effectively you’ll want to know how to select observations using various comparison operators.
Key comparison operators
Symbol | Comparison |
---|---|
> |
greater than |
>= |
greater than or equal to |
< |
less than |
<= |
less than or equal to |
== |
equal to |
!= |
not equal to |
%in% |
value is in a list |
What are the cat names that have a grumpy score greater than 6?
filter(all_cats, grumpy ...)
What are the cat names that are either striped
or orange
in color?
filter(all_cats, color %in% c("orange", "striped"))
What is that
c()
thing all about again?
You can put multiple values inside c()
to make a vector of items. Each item in the vector is separated by a comma. Let’s create a short vector of your favorite colors.
# This is an example vector
my_fave_colors <- c("green", "orange", "cornflower")
my_fave_colors
## [1] "green" "orange" "cornflower"
Create a small table called uk_oldies
that only has cats from the UK and are older than 20.
Work with your neighbor to create the table.
Are there more cats with the color striped
or butterscotch
?
more butterscotch
more striped
same amount in both groups
Hint: Start by creating a new table called striped
using the code filter(all_cats, color == ...)
more striped
Feline fine after that exercise! Great job!
Ask your neighbor a question about the cat data that requires the %in% operator.
Try to answer your neighbor’s question.
Team up with your neighbor to think of a summary question to ask the table next to you about the missing cat data. Ask them the question and they’ll ask you a question about the cats in return.
Work with your neighbor to find the answer.
Now that you are more familiar with the data, maybe you can use the personality traits of the missing cats to find which one is yours.
If only you knew the personality of your cat.
Spend some time getting to know your cat. Is it already hiding from you under the couch? Does it seem Australian? Is it a young cat? Once you’ve gotten to know your cat, move on to the next section.
Congratulations! You’ve unlocked your cat’s personality.
Thanks to all the quality time you spent with your cat, you were able to collect some high quality data describing your cat’s personality. You can use your cat’s tag number to find the CSV file for your cat. Download your file from the Cat personality files.
To download, Right click on your cat’s file and select Save Link As…. Navigate to your project folder and save the file into a folder named data\
.
Now you can read in the file with the personality traits of your cat.
library("readr")
# Replace the `##` with your tag number
my_cat <- read_csv("data/foundcat_tag##.csv")
Take a minute to explore your cat’s personality traits. Can you guess its name?
Almost there! Now you can use filter()
to check if the all_cats table has a cat with a similar personality as yours. Let’s hope so.
Use filter()
to find your cat in the big all_cats table.
# Replace the `?` marks with your cat's personality traits
the_one <- filter(all_cats,
greedy == ?,
playful == ?,
age ...)
Hint: You can also use the cat’s
age
range to help narrow down the number of potential cats.
You found your cat a home and you’ve earned yourself a detective badge.
Meet the R cats
Let’s introduce our rescued cats.
Possible topics:
- Your found cat’s name and best personality trait.
- Your name or detective name.
- Something you want to learn to do with data?
- Are you a Cat person? Dog person? Lizard person? Fish person? Plant person?
RECAP:
What packages have we added to our
library
?What new functions have we learned?
On the front of your sticky note answer one of these:
On the back:
We will compile the questions and send out answers before next class. If you think of something later, please e-mail us any questions you have. If you’re uncertain about something I guarantee someone else is as well. So help a friend, and ask a question.
When you close RStudio it will ask you about saving your workspace and other files. This can get tiresome after awhile. Follow the steps below to set these options once and for always.
Turn off “Save Workspace”
Global Options...
.