A vector is a sequence of data elements of the same basic type. The parts that consist of a vecctor are called components or elements.
vec1 <- c(1,4,6,8,10)
vec1
## [1] 1 4 6 8 10
A vector vec
is explicity constructed by the concatenation function c()
.
vec1[5]
## [1] 10
Elements in vectors can be addressed by standard [i]
indexing
vec1[3] <- 12
vec1
## [1] 1 4 12 8 10
One of the elements in the array is replaced with a new number.
vec2 <- seq(from=0, to=1, by=0.25)
vec2
## [1] 0.00 0.25 0.50 0.75 1.00
This shows another useful way of creating a vector: the seq()
or sequence function.
sum(vec1)
## [1] 35
Some calculations.
vec1 + vec2
## [1] 1.00 4.25 12.50 8.75 11.00
If you add up two vectors of the same length, the first elements of both vectors are summed, and the second elements, etc., leading to a new vector length of 5.
Matrices are two-dimensional vectors.
It looks like this
mat <- matrix(data=c(9,2,3,4,5,6), ncol=3)
mat
## [,1] [,2] [,3]
## [1,] 9 3 5
## [2,] 2 4 6
The argument data
specifies which numbers should be in the matrix.
Use either ncol
to specify the number of columns or nrow
to specify the number of rows.
Matrix-operations are similar to vector operations
mat[1,2]
## [1] 3
Elements of a matrix can be addressed in the usual way
mat[2,1]
## [1] 2
When you want to select a whole row, you leave the spot for the column number empty and vice versa for columns.
mean(mat)
## [1] 4.833333
This is how a function would work with a matrix as an argument.
If you’re used to working with spreadsheets, then data frames will make the most sense to you in R.
It’s a matrix with names above the columns for headers.
This means you can call and use one of the columns without knowing in which position it is.
t <- data.frame(x=c(11,12,14), y=c(19,20,21), z=c(10,9,7))
t
## x y z
## 1 11 19 10
## 2 12 20 9
## 3 14 21 7
A typical data frame built from arrays. The columns have the names x
, y
, and z
.
mean(t$z)
## [1] 8.666667
Instead of using mean(t[,3])
like you would with a matrix, you can select the column z
from the t
data frame with the $
sign.
mean(t[["z"]])
## [1] 8.666667
Here’s an alternative way to refer to the z
column of the t
data frame. But you will rarely use this method.
Another basic structure in R is a list.
The main advantage of lists is that the “columns” they’re not really ordered in columns any more, but are more of a collectoin of vectors) don’t have to be of the same length, unlike matrices and data frames.
Kind of like json files are structured.
L <- list(one=1, two=c(1,2), five=seq(0,1, length=5))
L
## $one
## [1] 1
##
## $two
## [1] 1 2
##
## $five
## [1] 0.00 0.25 0.50 0.75 1.00
This is how a list would appear in the workspace
names(L)
## [1] "one" "two" "five"
How to find out what’s in the list
L$five + 10
## [1] 10.00 10.25 10.50 10.75 11.00
How to refer to and use the numbers in the example list
Functions for working with objects
length(x) dim(x) ncol(x) nrow(x) str(x) summary(x) View(x) rm(x) save(x, file=“myfilename.rdata”) load(file=“myfilename.rdata”)
Data structures
scalar / array / matrix / array / dataframe / list
Vectors
One dimensional arrays
a <- c(1, 2, 5, 3, 6, -2, 4) b <- c(“one”, “two”, “three”) c <- c(TRUE, TRUE, TRUE, FALSE, TRUE, FALSE)
Vectors(2)
Identifying elements
a <- c(1, 2, 5, 3, 6, -2, 4) a[3][1] 5 a[c(1, 3, 5)][1] 1 5 6 a[2:6] [1] 2 5 3 6 -2
Data frame
Rectangular array of data More general than a matrix - different columns can contain different modes of data (numeric, character, etc.) Similar to datasets in SAS, SPSS, and Stata
mydata <- data.frame( col1, col2, …, coln)
Creating a data frame
patientID <- c(111, 208, 113, 408) age <- c(25, 34, 28, 52) sex <- c(1,2,1,1) diabetes <- c(“Type1”, “Type2”, “Type1”, “Type1”) status <- c(1,2,3,1)
patientdata <- data.frame(patientID, age, sex, diabetes, status)
patientdata
Specifying elements of a data frame
patientdata[1:2]
patientdata[c(diabetes“,”status“)]
patientdata$age
patientdata[1:2]
patientdata[c(1,3),1:2]
patientdata[2:3, 1:2]
Factors
Data structure specifying categorical (nominal) or ordered categorical (ordinal) variables
Tells R how to handle that variable in analyses
Very important and misunderstood Any variable that is categorical or ordinal should usually be stored as a factor.
patientdata\(sex <- factor(patient\)sex, levels=c(1, 2), labels=c(“Male”, “Female”))
associates 1=Male, 2=Female Treats sex as a categorical variable in all analyses What happens to sex=5?
patientdata$status <- factor(status, ordered=TRUE, levels=c(1, 2, 3), labels=c(“Poor”, “Improved”, “Excellent”))
associates 1=Poor, 2=Improved, 3=Excellent Treats status as an ordinal variable in all analyses
WHEN DO FACTORS MATTER? important for statistical analysis Well, it determines order of categories in charts
List example
g <- “My First List” h <- c(25, 26, 18, 39) j <- matrix(1:10, nrow = 5) k <- c(“one”, “two”, “three”) mylist <- list(title = g, ages = h, j, k)
mylist[[2]] [1] 25 26 18 39 mylist[[“ages”]][[1] 25 26 18 39
Formats for R