Overview
Any variable in R can be classified into a different type (or ‘class’), according to the information it contains. While some variables can contain very complex types of information, there are a few basic types that you will encounter most commonly.
In this session, I will give a brief introduction to these commonly used basic data types:
- Single-value (atomic) data types
- Numeric types (float and integer)
- Character strings
- Logical values
- Converting data types and no-data values
- Combining multiple values
- Vectors
- Factors
- Lists
- Data frames
As before, I recommend this cheatsheet, which gives an overview of functions for working with different data types (thanks to Mhairi McNeill for making this available).
And again, you should try these things for yourself. If you haven’t yet installed the R software, you can run simple code using this great website.
Atomic Data Types
Numeric Types
Most of the important scientific data are stored as numbers. By default, R stores numbers using the ‘numeric’ type:
<- 1.1
myNumeric myNumeric
## [1] 1.1
class(myNumeric)
## [1] "numeric"
As we saw in the previous session, we can manipulate these numeric variables, for example by conducting some simple arithmetic:
<- myNumeric * 2
myNumeric2 myNumeric2
## [1] 2.2
By default, R will set single integer values to use the numeric class:
<- 1
myInteger class(myInteger)
## [1] "numeric"
If we have very large datasets, we can save memory by storing these using the integer class. R has a series of functions for converting between data types. In this case, we can use the as.integer function:
<- as.integer(1)
myInteger myInteger
## [1] 1
class(myInteger)
## [1] "integer"
If we convert a non-integer value to an integer, it will be rounded:
<- as.integer(1.1)
myInteger2 myInteger2
## [1] 1
NOTE: If you convert a non-integer to an integer, R will always round down. If you want to round to the nearest whole number, you can use the round function.
<- as.integer(1.9)
myInteger3 myInteger3
## [1] 1
<- as.integer(round(1.9))
myInteger4 myInteger4
## [1] 2
Character Strings
Character strings (i.e., text) are another very commonly used data type in R:
<- "Some text"
myCharacter class(myCharacter)
## [1] "character"
You can convert other data types into strings, should you wish to, using the as.character function:
<- as.character(myNumeric2)
myCharacter2 myCharacter2
## [1] "2.2"
Now we have converted this number into a character string, we can no longer use it in arithmetic operations:
TIP: The try function allows you to attempt an operation without stopping your R script if an error occurs.
try(myCharacter2*2)
## Error in myCharacter2 * 2 : non-numeric argument to binary operator
Logical Values
The other data type that you will commonly encounter in R is logical (i.e. True or False values):
<- TRUE
myLogical
class(myLogical)
## [1] "logical"
We can perform arithmetic operations on logical values, as we do with numbers. In doing so, R treats FALSE as being equal to 0 and TRUE equal to 1:
<- TRUE * 2
myNumeric4 myNumeric4
## [1] 2
<- FALSE * 2
myNumeric5 myNumeric5
## [1] 0
TIP: By default, R recognises T and F as being TRUE and FALSE, respectively. But, be very careful: T and F can be overwritten with other values, whereas TRUE and FALSE cannot. Therefore, to avoid errors in your code, it is very strongly recommended always to use the full TRUE and FALSE when working with logical values:
T
## [1] TRUE
<- FALSE
T T
## [1] FALSE
try(TRUE <- FALSE)
## Error in TRUE <- FALSE : invalid (do_set) left-hand side to assignment
TIP: You can find out about the different functions available for working with a particular data type using the help function:
help(numeric)
help(character)
help(logical)
Converting Data Types and No-data Values
We have already come across the as.integer function for converting to integer values. All data types have an equivalent function: for example, as.numeric, as.integer, as.character and as.logical:
<- as.integer(1)
myInteger <- as.numeric(myInteger)
myNumeric2 myNumeric2
## [1] 1
class(myNumeric2)
## [1] "numeric"
<- as.logical("TRUE")
myLogical2 myLogical2
## [1] TRUE
We can also convert numbers to logical. We saw before when we converted logical values to numbers, that R converted FALSE to 0 and TRUE to 1. Similarly, converting 0 and 1 to logical values creates FALSE and TRUE, respectively:
<- as.logical(0)
myLogical3 myLogical3
## [1] FALSE
<- as.logical(1)
myLogical4 myLogical4
## [1] TRUE
In fact, R will convert all non-zero numbers (even negative numbers) to a TRUE logical value:
<- as.logical(10)
myLogical5 myLogical5
## [1] TRUE
<- as.logical(-10)
myLogical6 myLogical6
## [1] TRUE
Finally, a note on no-data values, which R stores as NA. If we try to convert something to an incompatible data type, we will obtain an NA value:
<- as.numeric("Some text") myNumeric6
## Warning: NAs introduced by coercion
myNumeric6
## [1] NA
<- as.logical("Some text")
myLogical7 myLogical7
## [1] NA
I will talk more about NAs later, when dealing with data structures that contain multiple values.
TIP: You can check whether a variable is of the expected data type using another series of functions: for example, is.numeric, is.integer, is.character and is.logical:
<- 1.1
myNumeric is.numeric(myNumeric)
## [1] TRUE
<- 1
myNumeric2 is.integer(myNumeric2)
## [1] FALSE
<- TRUE
myLogical is.numeric(myLogical)
## [1] FALSE
is.logical(myLogical)
## [1] TRUE
Combining Multiple Values
Often, when working in R, we don’t want to use just single values, but rather to work with sets of data.
Vectors
The simplest way to combine values in R is into a vector. A vector is a single, one-dimensional set of values.
You can combine values into a vector using the c function:
<- c(2,4,6,8,10)
myVector myVector
## [1] 2 4 6 8 10
The class of the vector is the class of the individual data values it contains:
class(myVector)
## [1] "numeric"
Single values, range of values or specific sets of values can be extracted from a vector as follows.
Single values are returned by putting the position of the value you want to return in square brackets.
2] myVector[
## [1] 4
To obtain a range of values, you can specify the start and end positions, separated by a colon:
3:5] myVector[
## [1] 6 8 10
To return individually specified values, you can give a series of positions using the c function (in other words you specify another vector to give the positions of the values that you want to return):
c(1,4)] myVector[
## [1] 2 8
You can perform arithmetic on a vector. If your arithmetic operation is based on your vector and one other number, the calculation is applied to all values in the vector:
<- myVector * 2
myVector2 myVector2
## [1] 4 8 12 16 20
If instead you apply an arithmetic operation to two vectors of equal length, then the operation will be applied to corresponding pairs of numbers:
* c(1,2,3,4,5) myVector
## [1] 2 8 18 32 50
If your vector contains NA values, the result of the operation will contain corresponding NA values:
<- c(2,4,NA,8,10)
myVector3 <- myVector3 * 2
myVector4 myVector4
## [1] 4 8 NA 16 20
Vectors can hold values of any of the atomic data types we encountered earlier (although any one vector can only contain one type):
<- c(TRUE,FALSE,TRUE,TRUE)
myLogicalVector myLogicalVector
## [1] TRUE FALSE TRUE TRUE
class(myLogicalVector)
## [1] "logical"
Just as with single logical values, we can apply arithmetic to a logical vector:
<- myLogicalVector * 2
myVector5 myVector5
## [1] 2 0 2 2
Of course, arithmetic operations on a character vector will not work (returning an error):
<- c("Text 1","Text 2","Text 3")
myCharacterVector myCharacterVector
## [1] "Text 1" "Text 2" "Text 3"
try(myCharacterVector * 2)
## Error in myCharacterVector * 2 : non-numeric argument to binary operator
You can use the length function to find out how many values your vector contains:
<- c(2,4,6,8,10)
myVector length(myVector)
## [1] 5
You can change specific values, ranges of values, or specific sets of values in a vector. Specifying values is done in the same way as when we asked to return specific values:
<- c(2,4,6,8,10)
myVector 4] <- 24
myVector[ myVector
## [1] 2 4 6 24 10
<- c(2,4,6,8,10)
myVector 3:5] <- c(22,24,26)
myVector[ myVector
## [1] 2 4 22 24 26
<- c(2,4,6,8,10)
myVector c(1,3,5)] <- 0
myVector[ myVector
## [1] 0 4 0 8 0
You can also add new values at a specified position that is not already found within the vector (note that any intermediate values are filled with NA):
<- c(2,4,6,8,10)
myVector 10] <- 20
myVector[ myVector
## [1] 2 4 6 8 10 NA NA NA NA 20
length(myVector)
## [1] 10
And you can also remove specified values:
<- c(2,4,6,8,10)
myVector <- myVector[-4]
myVector myVector
## [1] 2 4 6 10
length(myVector)
## [1] 4
You can also initialise an empty vector using either the numeric, integer, character or logical functions:
<- numeric()
myVector6 length(myVector6)
## [1] 0
As before, you can then add values to this vector into specified positions (with intermediate positions then being filled with NA values):
6] <- 6.4
myVector6[ myVector6
## [1] NA NA NA NA NA 6.4
length(myVector6)
## [1] 6
NOTE: the data type of vector is not fixed, so if you enter incompatible data types then the data type of your vector may change. Alternatively, sometimes the data type of the value will change. Therefore, care is advised when entering data into an existing vector (or data-frame - of which more later):
<- numeric()
myVector7 5] <- "Some text"
myVector7[class(myVector7)
## [1] "character"
1] <- 1.1
myVector7[ myVector7
## [1] "1.1" NA NA NA "Some text"
class(myVector7)
## [1] "character"
You can also initialise a vector, containing default values (0 for numeric, FALSE for logical or empty strings for character), using the same numeric, integer, character and logical functions as before, but this time specifying the number of values you want in your vector:
<- numeric(10)
myVector8 myVector8
## [1] 0 0 0 0 0 0 0 0 0 0
Or you can do the same thing using the generic vector function:
<- vector(mode = "numeric",length = 10)
myVector9 myVector9
## [1] 0 0 0 0 0 0 0 0 0 0
<- vector(mode = "logical",length = 10)
myVector10 myVector10
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
<- vector(mode = "character",length = 10)
myVector11 myVector11
## [1] "" "" "" "" "" "" "" "" "" ""
Factors
Factors are a special type of vector, where there is a set of specified values (or ‘levels’) that a grouping variable is allowed to take. These ‘levels’ are stored with the variable in R:
<- factor(c("Treatment1","Treatment2","Treatment3",
myFactor "Treatment1","Treatment2","Treatment3"))
myFactor
## [1] Treatment1 Treatment2 Treatment3 Treatment1 Treatment2 Treatment3
## Levels: Treatment1 Treatment2 Treatment3
levels(myFactor)
## [1] "Treatment1" "Treatment2" "Treatment3"
If you try to add a new value that does not belong to one of the specified levels, an NA value will be inserted (note that NA values are shown as <NA> in factors):
7] <- "Treatment4" myFactor[
## Warning in `[<-.factor`(`*tmp*`, 7, value = "Treatment4"): invalid factor level,
## NA generated
myFactor
## [1] Treatment1 Treatment2 Treatment3 Treatment1 Treatment2 Treatment3 <NA>
## Levels: Treatment1 Treatment2 Treatment3
As with the atomic data types, we can coerce a vector (or indeed an atomic value) to be a factor, this time using the as.factor function:
<- c("Treatment1","Treatment2","Treatment3",
myCharacter "Treatment1","Treatment2","Treatment3")
<- as.factor(myCharacter)
myFactor2 myFactor2
## [1] Treatment1 Treatment2 Treatment3 Treatment1 Treatment2 Treatment3
## Levels: Treatment1 Treatment2 Treatment3
You can also create a factor with pre-specified values. In this case, any values that don’t correspond with these pre-specified levels will become NA values:
<- factor(c("Treatment1","Treatment2","Treatment3",
myFactor "Treatment1","Treatment2","Treatment3"),
levels=c("Treatment1","Treatment2"))
myFactor
## [1] Treatment1 Treatment2 <NA> Treatment1 Treatment2 <NA>
## Levels: Treatment1 Treatment2
Lists
Lists are similar to vectors, but more flexible in terms of data types within them. A basic list can be created using the list function:
<- list(1,2,3,4,5)
myList myList
## [[1]]
## [1] 1
##
## [[2]]
## [1] 2
##
## [[3]]
## [1] 3
##
## [[4]]
## [1] 4
##
## [[5]]
## [1] 5
Unlike with vectors, the class of a list object is ‘list’, rather than corresponding with the type of the individual data values:
class(myList)
## [1] "list"
The individual elements within the list have their own class, and can be extracted in a similar way as with vectors, but this time using double rather than single square brackets:
1]] myList[[
## [1] 1
class(myList[[1]])
## [1] "numeric"
The values within a list can themselves be vectors of numbers:
<- list(c(1,2,3,4,5))
myList2 myList2
## [[1]]
## [1] 1 2 3 4 5
<- list(c(1,2,3,4,5),c(6,7,8,9,10))
myList3 myList3
## [[1]]
## [1] 1 2 3 4 5
##
## [[2]]
## [1] 6 7 8 9 10
If you extract an element from one of these lists, you will get a vector:
2]] myList3[[
## [1] 6 7 8 9 10
Alternatively, you can use both double and single square brackets to return a specific position within the vector from a specified position in the list:
2]][3] myList3[[
## [1] 8
The elements within a list can be named, which helps with storing and retrieving complex data:
<- list(Item1=1.0,Item2=4.0)
myList4 myList4
## $Item1
## [1] 1
##
## $Item2
## [1] 4
Specific named items in a list can be extracted either by putting the name into the double square brackets, or by using the $ symbol:
"Item2"]] myList4[[
## [1] 4
$Item2 myList4
## [1] 4
If you want to, you can apply names to the elements of an existing list using the names function:
<- list(c(1,2,3,4,5),c(6,7,8,9,10))
myList3 myList3
## [[1]]
## [1] 1 2 3 4 5
##
## [[2]]
## [1] 6 7 8 9 10
names(myList3) <- c("Vector1","Vector2")
myList3
## $Vector1
## [1] 1 2 3 4 5
##
## $Vector2
## [1] 6 7 8 9 10
$Vector2 myList3
## [1] 6 7 8 9 10
Lists are very flexible. They can take mixed data types:
<- list(Name="Tim",Role="Tutor",Years=5)
myList5 myList5
## $Name
## [1] "Tim"
##
## $Role
## [1] "Tutor"
##
## $Years
## [1] 5
class(myList5[[1]])
## [1] "character"
class(myList5[[3]])
## [1] "numeric"
Lists can also contain elements of different lengths:
$Modules <- c("BIOS0002","BIOL0032")
myList5 myList5
## $Name
## [1] "Tim"
##
## $Role
## [1] "Tutor"
##
## $Years
## [1] 5
##
## $Modules
## [1] "BIOS0002" "BIOL0032"
Data frames
Data frames are tremendously useful for scientific research. They are a special form of lists, where each element must have the same length. This is good for ensuring that each variable in your dataset has the same number of entries. In a later session, I will show you how to import data from a spreadsheet into an R data frame.
<- data.frame(
myDataFrame Treatment=factor(c("Treatment1","Treatment2","Treatment3",
"Treatment1","Treatment2","Treatment3")),
Measurement=c(2.0,4.5,1.2,1.0,6.0,2.3))
myDataFrame
## Treatment Measurement
## 1 Treatment1 2.0
## 2 Treatment2 4.5
## 3 Treatment3 1.2
## 4 Treatment1 1.0
## 5 Treatment2 6.0
## 6 Treatment3 2.3
class(myDataFrame)
## [1] "data.frame"
We can extract the elements of data frames in exactly the same was as for lists:
$Treatment myDataFrame
## [1] Treatment1 Treatment2 Treatment3 Treatment1 Treatment2 Treatment3
## Levels: Treatment1 Treatment2 Treatment3
"Treatment"]] myDataFrame[[
## [1] Treatment1 Treatment2 Treatment3 Treatment1 Treatment2 Treatment3
## Levels: Treatment1 Treatment2 Treatment3
1]] myDataFrame[[
## [1] Treatment1 Treatment2 Treatment3 Treatment1 Treatment2 Treatment3
## Levels: Treatment1 Treatment2 Treatment3
$Treatment[1] myDataFrame
## [1] Treatment1
## Levels: Treatment1 Treatment2 Treatment3
We can also add new elements to a data frame, just as we can with lists:
$Measurement2 <- c(2.1,4.4,1.0,1.4,7.2,2.4)
myDataFrame myDataFrame
## Treatment Measurement Measurement2
## 1 Treatment1 2.0 2.1
## 2 Treatment2 4.5 4.4
## 3 Treatment3 1.2 1.0
## 4 Treatment1 1.0 1.4
## 5 Treatment2 6.0 7.2
## 6 Treatment3 2.3 2.4
Unlike with a list, if we try to create a data frame where the elements have different lengths (i.e., numbers of values), we will get an error:
<- list(Component1=c(1,2,3,4,5),Component2=c(6,7))
myList6 myList6
## $Component1
## [1] 1 2 3 4 5
##
## $Component2
## [1] 6 7
try(data.frame(Component1=c(1,2,3,4,5),Component2=c(6,7)))
## Error in data.frame(Component1 = c(1, 2, 3, 4, 5), Component2 = c(6, 7)) :
## arguments imply differing number of rows: 5, 2
Matrices
Although you may not encounter matrices when running basic statistics in R, you may do if you get into more advanced statistics, and they are useful if you use R for modelling or maths. Like data frames, matrices have a square structure, but unlike data frames can only hold a single data type. You can create a matrix using the matrix function. The byrow option determines whether data are entered along each row (byrow = TRUE) or down each column (byrow = FALSE):
<- matrix(data = 1:12,nrow = 4,ncol = 3,byrow = TRUE)
myMatrix myMatrix
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
## [4,] 10 11 12
<- matrix(data = 1:12,nrow = 4,ncol = 3,byrow = FALSE)
myMatrix2 myMatrix2
## [,1] [,2] [,3]
## [1,] 1 5 9
## [2,] 2 6 10
## [3,] 3 7 11
## [4,] 4 8 12
We can convert objects of a different class to a matrix using the as.matrix function. If we convert the data-frame that we created earlier into a matrix, all values become strings, because matrices can’t handle mixed data types:
<- as.matrix(myDataFrame)
myMatrix3 myMatrix3
## Treatment Measurement Measurement2
## [1,] "Treatment1" "2.0" "2.1"
## [2,] "Treatment2" "4.5" "4.4"
## [3,] "Treatment3" "1.2" "1.0"
## [4,] "Treatment1" "1.0" "1.4"
## [5,] "Treatment2" "6.0" "7.2"
## [6,] "Treatment3" "2.3" "2.4"
There are many mathematical operations that you can perform on matrices. In fact, you can also do the same with data frames so long as all the columns contain numbers. Matrix maths can get very complex, and is beyond the scope of these sessions. If you want a quick introduction to the basics, I recommend this webpage.
There are two main advantages of using matrices: 1) it ensures that all the values are of the same data type; and 2) the amount of memory used up by a matrix tends to be much smaller than that of a data frame, which can be important when working with very large datasets.
Next Time
That’s it for this session. In the next session, I introduce some of the functions that can be used to conduct arithmetic operations in R, including to calculate summary statistics that are indispensible in scientific research.