A vector is a sequence of data elements of the same basic type. The parts that consist of a vector are called components or elements.
vec1 <- c(1,4,6,8,10)
vec1
## [1] 1 4 6 8 10
A vector vec
is explicitly constructed by the concatenation function c()
.
vec1[5]
## [1] 10
Elements in vectors can be addressed by standard [i]
indexing
vec1[3] <- 12
vec1
## [1] 1 4 12 8 10
One of the elements in the array is replaced with a new number.
vec2 <- seq(from=0, to=1, by=0.25)
vec2
## [1] 0.00 0.25 0.50 0.75 1.00
This shows another useful way of creating a vector: the seq()
or sequence function.
sum(vec1)
## [1] 35
Matrices are two-dimensional vectors.
It looks like this
mat <- matrix(data=c(9,2,3,4,5,6), ncol=3)
mat
## [,1] [,2] [,3]
## [1,] 9 3 5
## [2,] 2 4 6
The argument data
specifies which numbers should be in the matrix.
Use either ncol
to specify the number of columns or nrow
to specify the number of rows.
Matrix operations are similar to vector operations.
mat[1,2]
## [1] 3
Elements of a matrix can be addressed in the usual way
mat[2,1]
## [1] 2
When you want to select a whole row, you leave the spot for the column number empty and vice versa for the columns.
mat[,3]
## [1] 5 6
If you’re used to working with spreadsheets, then data frames will make the most sense to you in R.
This is how to create a data frame from arrays. You don’t have to fully understand this at this point– the data you’ll be working with will come pre-structured if you’re importing spreadsheets.
patientID <- c(111, 208, 113, 408)
age <- c(25, 34, 28, 52)
sex <- c(1,2,1,1)
diabetes <- c("Type1", "Type2", "Type1", "Type1")
status <- c(1,2,3,1)
patientdata <- data.frame(patientID, age, sex, diabetes, status)
patientdata
## patientID age sex diabetes status
## 1 111 25 1 Type1 1
## 2 208 34 2 Type2 2
## 3 113 28 1 Type1 3
## 4 408 52 1 Type1 1
But this is what’s happening. A set of arrays are being created and a function called data.frame()
joins them together into a data frame structure.
How to pull elements from a data frame:
# a : means "through"
patientdata[1:2]
## patientID age
## 1 111 25
## 2 208 34
## 3 113 28
## 4 408 52
# So 1:2 means 1 through 2
patientdata[c("diabetes", "status")]
## diabetes status
## 1 Type1 1
## 2 Type2 2
## 3 Type1 3
## 4 Type1 1
patientdata$age
## [1] 25 34 28 52
patientdata[1:2]
## patientID age
## 1 111 25
## 2 208 34
## 3 113 28
## 4 408 52
patientdata[c(1,3),1:2]
## patientID age
## 1 111 25
## 3 113 28
patientdata[2:3, 1:2]
## patientID age
## 2 208 34
## 3 113 28
You can reference a column with patientdata$age and you can also refer to the column based on the index of it. In this instance it’s 2, so patientdata[,2] is the equivalent. If you only wanted the third row, then it’d look like patientdata[3,]. Think of it as data[Row, Column]. I remember it as data[rocks], as in data[Ro,Cks].
Instead of using mean(patientdata[,2])
, you can select the column age
from the patientdata
data frame with the $
sign.
mean(patientdata$age)
## [1] 34.75
Here’s an alternative way to refer to the age
column of the patientdata
data frame. But you will rarely use this method.
mean(patientdata[["age"]])
## [1] 34.75
Another basic structure in R is a list.
The main advantage of lists is that the “columns” they’re not really ordered in columns any more, but are more of a collection of vectors) don’t have to be of the same length, unlike matrices and data frames.
Kind of like JSON files are structured.
g <- "My First List"
h <- c(25, 26, 18, 39)
# The line below is creating a matrix that's 5 rows deep of numbers 1 through(":") 10
j <- matrix(1:10, nrow = 5)
k <- c("one", "two", "three")
mylist <- list(title = g, ages = h, j, k)
This is how a list would appear in the work space
names(mylist)
## [1] "title" "ages" "" ""
How to find out what’s in the list
mylist[[2]]
## [1] 25 26 18 39
mylist[["ages"]][[1]]
## [1] 25
The code above extracts data from the list
mylist$age + 10
## [1] 35 36 28 49
How to refer to and use the numbers in the example list
Let’s start with the sample_df dataframe below.
# Run the lines of code below
sample_df <- data.frame(id=c(1001,1002,1003,1004), name=c("Steve", "Pam", "Jim", "Dwight"), age=c(26, 65, 15, 7), race=c("White", "Black", "White", "Hispanic"))
sample_df$name <- as.character(sample_df$name)
sample_df
## id name age race
## 1 1001 Steve 26 White
## 2 1002 Pam 65 Black
## 3 1003 Jim 15 White
## 4 1004 Dwight 7 Hispanic
length(x)
- Find out how many things there are in an object or array
length(sample_df$name)
## [1] 4
nchar(x)
- If x is a string, finds how how many characters there are
sample_df$name[1]
## [1] "Steve"
nchar(sample_df$name[1])
## [1] 5
dim(x)
- Gives the dimensions of x
dim(sample_df)
## [1] 4 4
ncol(x)
- Counts the number of columns
ncol(sample_df)
## [1] 4
nrow(x)
- Returns the number of rows of x
nrow(sample_df)
## [1] 4
str(x)
- Returns the structure of x
str(sample_df)
## 'data.frame': 4 obs. of 4 variables:
## $ id : num 1001 1002 1003 1004
## $ name: chr "Steve" "Pam" "Jim" "Dwight"
## $ age : num 26 65 15 7
## $ race: Factor w/ 3 levels "Black","Hispanic",..: 3 1 3 2
summary(x)
- Summarizes the object as understood by R
summary(sample_df)
## id name age race
## Min. :1001 Length:4 Min. : 7.00 Black :1
## 1st Qu.:1002 Class :character 1st Qu.:13.00 Hispanic:1
## Median :1002 Mode :character Median :20.50 White :2
## Mean :1002 Mean :28.25
## 3rd Qu.:1003 3rd Qu.:35.75
## Max. :1004 Max. :65.00
View(x)
- A command to open the object to browse in RStudio
View(sample_df)
rm(x)
- Removes x
rm(sample_df)
sample_df
## Error in eval(expr, envir, enclos): object 'sample_df' not found
Challenge yourself with these exercises so you’ll retain the knowledge of this section.
Instructions on how to run the exercise app are in the intro page to this section.
© Copyright 2018, Andrew Ba Tran
© Copyright 2018, Andrew Tran