Data Types in R

Tim Newbold

2021-10-06

Overview

Any variable in R can be classified into a different type (or ‘class’), according to the information it contains. While some variables can contain very complex types of information, there are a few basic types that you will encounter most commonly.

In this session, I will give a brief introduction to these commonly used basic data types:

  • Single-value (atomic) data types
    • Numeric types (float and integer)
    • Character strings
    • Logical values
    • Converting data types and no-data values
  • Combining multiple values
    • Vectors
    • Factors
    • Lists
    • Data frames

As before, I recommend this cheatsheet, which gives an overview of functions for working with different data types (thanks to Mhairi McNeill for making this available).

And again, you should try these things for yourself. If you haven’t yet installed the R software, you can run simple code using this great website.

Atomic Data Types

Numeric Types

Most of the important scientific data are stored as numbers. By default, R stores numbers using the ‘numeric’ type:

myNumeric <- 1.1
myNumeric
## [1] 1.1
class(myNumeric)
## [1] "numeric"

As we saw in the previous session, we can manipulate these numeric variables, for example by conducting some simple arithmetic:

myNumeric2 <- myNumeric * 2
myNumeric2
## [1] 2.2

By default, R will set single integer values to use the numeric class:

myInteger <- 1
class(myInteger)
## [1] "numeric"

If we have very large datasets, we can save memory by storing these using the integer class. R has a series of functions for converting between data types. In this case, we can use the as.integer function:

myInteger <- as.integer(1)
myInteger
## [1] 1
class(myInteger)
## [1] "integer"

If we convert a non-integer value to an integer, it will be rounded:

myInteger2 <- as.integer(1.1)
myInteger2
## [1] 1

NOTE: If you convert a non-integer to an integer, R will always round down. If you want to round to the nearest whole number, you can use the round function.

myInteger3 <- as.integer(1.9)
myInteger3
## [1] 1
myInteger4 <- as.integer(round(1.9))
myInteger4
## [1] 2

Character Strings

Character strings (i.e., text) are another very commonly used data type in R:

myCharacter <- "Some text"
class(myCharacter)
## [1] "character"

You can convert other data types into strings, should you wish to, using the as.character function:

myCharacter2 <- as.character(myNumeric2)
myCharacter2
## [1] "2.2"

Now we have converted this number into a character string, we can no longer use it in arithmetic operations:

TIP: The try function allows you to attempt an operation without stopping your R script if an error occurs.

try(myCharacter2*2)
## Error in myCharacter2 * 2 : non-numeric argument to binary operator

Logical Values

The other data type that you will commonly encounter in R is logical (i.e. True or False values):

myLogical <- TRUE

class(myLogical)
## [1] "logical"

We can perform arithmetic operations on logical values, as we do with numbers. In doing so, R treats FALSE as being equal to 0 and TRUE equal to 1:

myNumeric4 <- TRUE * 2
myNumeric4
## [1] 2
myNumeric5 <- FALSE * 2
myNumeric5
## [1] 0

TIP: By default, R recognises T and F as being TRUE and FALSE, respectively. But, be very careful: T and F can be overwritten with other values, whereas TRUE and FALSE cannot. Therefore, to avoid errors in your code, it is very strongly recommended always to use the full TRUE and FALSE when working with logical values:

T
## [1] TRUE
T <- FALSE
T
## [1] FALSE
try(TRUE <- FALSE)
## Error in TRUE <- FALSE : invalid (do_set) left-hand side to assignment

TIP: You can find out about the different functions available for working with a particular data type using the help function:

help(numeric)
help(character)
help(logical)

Converting Data Types and No-data Values

We have already come across the as.integer function for converting to integer values. All data types have an equivalent function: for example, as.numeric, as.integer, as.character and as.logical:

myInteger <- as.integer(1)
myNumeric2 <- as.numeric(myInteger)
myNumeric2
## [1] 1
class(myNumeric2)
## [1] "numeric"
myLogical2 <- as.logical("TRUE")
myLogical2
## [1] TRUE

We can also convert numbers to logical. We saw before when we converted logical values to numbers, that R converted FALSE to 0 and TRUE to 1. Similarly, converting 0 and 1 to logical values creates FALSE and TRUE, respectively:

myLogical3 <- as.logical(0)
myLogical3
## [1] FALSE
myLogical4 <- as.logical(1)
myLogical4
## [1] TRUE

In fact, R will convert all non-zero numbers (even negative numbers) to a TRUE logical value:

myLogical5 <- as.logical(10)
myLogical5
## [1] TRUE
myLogical6 <- as.logical(-10)
myLogical6
## [1] TRUE

Finally, a note on no-data values, which R stores as NA. If we try to convert something to an incompatible data type, we will obtain an NA value:

myNumeric6 <- as.numeric("Some text")
## Warning: NAs introduced by coercion
myNumeric6
## [1] NA
myLogical7 <- as.logical("Some text")
myLogical7
## [1] NA

I will talk more about NAs later, when dealing with data structures that contain multiple values.

TIP: You can check whether a variable is of the expected data type using another series of functions: for example, is.numeric, is.integer, is.character and is.logical:

myNumeric <- 1.1
is.numeric(myNumeric)
## [1] TRUE
myNumeric2 <- 1
is.integer(myNumeric2)
## [1] FALSE
myLogical <- TRUE
is.numeric(myLogical)
## [1] FALSE
is.logical(myLogical)
## [1] TRUE

Combining Multiple Values

Often, when working in R, we don’t want to use just single values, but rather to work with sets of data.

Vectors

The simplest way to combine values in R is into a vector. A vector is a single, one-dimensional set of values.

You can combine values into a vector using the c function:

myVector <- c(2,4,6,8,10)
myVector
## [1]  2  4  6  8 10

The class of the vector is the class of the individual data values it contains:

class(myVector)
## [1] "numeric"

Single values, range of values or specific sets of values can be extracted from a vector as follows.

Single values are returned by putting the position of the value you want to return in square brackets.

myVector[2]
## [1] 4

To obtain a range of values, you can specify the start and end positions, separated by a colon:

myVector[3:5]
## [1]  6  8 10

To return individually specified values, you can give a series of positions using the c function (in other words you specify another vector to give the positions of the values that you want to return):

myVector[c(1,4)]
## [1] 2 8

You can perform arithmetic on a vector. If your arithmetic operation is based on your vector and one other number, the calculation is applied to all values in the vector:

myVector2 <- myVector * 2
myVector2
## [1]  4  8 12 16 20

If instead you apply an arithmetic operation to two vectors of equal length, then the operation will be applied to corresponding pairs of numbers:

myVector * c(1,2,3,4,5)
## [1]  2  8 18 32 50

If your vector contains NA values, the result of the operation will contain corresponding NA values:

myVector3 <- c(2,4,NA,8,10)
myVector4 <- myVector3 * 2
myVector4
## [1]  4  8 NA 16 20

Vectors can hold values of any of the atomic data types we encountered earlier (although any one vector can only contain one type):

myLogicalVector <- c(TRUE,FALSE,TRUE,TRUE)
myLogicalVector
## [1]  TRUE FALSE  TRUE  TRUE
class(myLogicalVector)
## [1] "logical"

Just as with single logical values, we can apply arithmetic to a logical vector:

myVector5 <- myLogicalVector * 2
myVector5
## [1] 2 0 2 2

Of course, arithmetic operations on a character vector will not work (returning an error):

myCharacterVector <- c("Text 1","Text 2","Text 3")
myCharacterVector
## [1] "Text 1" "Text 2" "Text 3"
try(myCharacterVector * 2)
## Error in myCharacterVector * 2 : non-numeric argument to binary operator

You can use the length function to find out how many values your vector contains:

myVector <- c(2,4,6,8,10)
length(myVector)
## [1] 5

You can change specific values, ranges of values, or specific sets of values in a vector. Specifying values is done in the same way as when we asked to return specific values:

myVector <- c(2,4,6,8,10)
myVector[4] <- 24
myVector
## [1]  2  4  6 24 10
myVector <- c(2,4,6,8,10)
myVector[3:5] <- c(22,24,26)
myVector
## [1]  2  4 22 24 26
myVector <- c(2,4,6,8,10)
myVector[c(1,3,5)] <- 0
myVector
## [1] 0 4 0 8 0

You can also add new values at a specified position that is not already found within the vector (note that any intermediate values are filled with NA):

myVector <- c(2,4,6,8,10)
myVector[10] <- 20
myVector
##  [1]  2  4  6  8 10 NA NA NA NA 20
length(myVector)
## [1] 10

And you can also remove specified values:

myVector <- c(2,4,6,8,10)
myVector <- myVector[-4]
myVector
## [1]  2  4  6 10
length(myVector)
## [1] 4

You can also initialise an empty vector using either the numeric, integer, character or logical functions:

myVector6 <- numeric()
length(myVector6)
## [1] 0

As before, you can then add values to this vector into specified positions (with intermediate positions then being filled with NA values):

myVector6[6] <- 6.4
myVector6
## [1]  NA  NA  NA  NA  NA 6.4
length(myVector6)
## [1] 6

NOTE: the data type of vector is not fixed, so if you enter incompatible data types then the data type of your vector may change. Alternatively, sometimes the data type of the value will change. Therefore, care is advised when entering data into an existing vector (or data-frame - of which more later):

myVector7 <- numeric()
myVector7[5] <- "Some text"
class(myVector7)
## [1] "character"
myVector7[1] <- 1.1
myVector7
## [1] "1.1"       NA          NA          NA          "Some text"
class(myVector7)
## [1] "character"

You can also initialise a vector, containing default values (0 for numeric, FALSE for logical or empty strings for character), using the same numeric, integer, character and logical functions as before, but this time specifying the number of values you want in your vector:

myVector8 <- numeric(10)
myVector8
##  [1] 0 0 0 0 0 0 0 0 0 0

Or you can do the same thing using the generic vector function:

myVector9 <- vector(mode = "numeric",length = 10)
myVector9
##  [1] 0 0 0 0 0 0 0 0 0 0
myVector10 <- vector(mode = "logical",length = 10)
myVector10
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
myVector11 <- vector(mode = "character",length = 10)
myVector11
##  [1] "" "" "" "" "" "" "" "" "" ""

Factors

Factors are a special type of vector, where there is a set of specified values (or ‘levels’) that a grouping variable is allowed to take. These ‘levels’ are stored with the variable in R:

myFactor <- factor(c("Treatment1","Treatment2","Treatment3",
                     "Treatment1","Treatment2","Treatment3"))
myFactor
## [1] Treatment1 Treatment2 Treatment3 Treatment1 Treatment2 Treatment3
## Levels: Treatment1 Treatment2 Treatment3
levels(myFactor)
## [1] "Treatment1" "Treatment2" "Treatment3"

If you try to add a new value that does not belong to one of the specified levels, an NA value will be inserted (note that NA values are shown as <NA> in factors):

myFactor[7] <- "Treatment4"
## Warning in `[<-.factor`(`*tmp*`, 7, value = "Treatment4"): invalid factor level,
## NA generated
myFactor
## [1] Treatment1 Treatment2 Treatment3 Treatment1 Treatment2 Treatment3 <NA>      
## Levels: Treatment1 Treatment2 Treatment3

As with the atomic data types, we can coerce a vector (or indeed an atomic value) to be a factor, this time using the as.factor function:

myCharacter <- c("Treatment1","Treatment2","Treatment3",
                 "Treatment1","Treatment2","Treatment3")
myFactor2 <- as.factor(myCharacter)
myFactor2
## [1] Treatment1 Treatment2 Treatment3 Treatment1 Treatment2 Treatment3
## Levels: Treatment1 Treatment2 Treatment3

You can also create a factor with pre-specified values. In this case, any values that don’t correspond with these pre-specified levels will become NA values:

myFactor <- factor(c("Treatment1","Treatment2","Treatment3",
                     "Treatment1","Treatment2","Treatment3"),
                   levels=c("Treatment1","Treatment2"))
myFactor
## [1] Treatment1 Treatment2 <NA>       Treatment1 Treatment2 <NA>      
## Levels: Treatment1 Treatment2

Lists

Lists are similar to vectors, but more flexible in terms of data types within them. A basic list can be created using the list function:

myList <- list(1,2,3,4,5)
myList
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 2
## 
## [[3]]
## [1] 3
## 
## [[4]]
## [1] 4
## 
## [[5]]
## [1] 5

Unlike with vectors, the class of a list object is ‘list’, rather than corresponding with the type of the individual data values:

class(myList)
## [1] "list"

The individual elements within the list have their own class, and can be extracted in a similar way as with vectors, but this time using double rather than single square brackets:

myList[[1]]
## [1] 1
class(myList[[1]])
## [1] "numeric"

The values within a list can themselves be vectors of numbers:

myList2 <- list(c(1,2,3,4,5))
myList2
## [[1]]
## [1] 1 2 3 4 5
myList3 <- list(c(1,2,3,4,5),c(6,7,8,9,10))
myList3
## [[1]]
## [1] 1 2 3 4 5
## 
## [[2]]
## [1]  6  7  8  9 10

If you extract an element from one of these lists, you will get a vector:

myList3[[2]]
## [1]  6  7  8  9 10

Alternatively, you can use both double and single square brackets to return a specific position within the vector from a specified position in the list:

myList3[[2]][3]
## [1] 8

The elements within a list can be named, which helps with storing and retrieving complex data:

myList4 <- list(Item1=1.0,Item2=4.0)
myList4
## $Item1
## [1] 1
## 
## $Item2
## [1] 4

Specific named items in a list can be extracted either by putting the name into the double square brackets, or by using the $ symbol:

myList4[["Item2"]]
## [1] 4
myList4$Item2
## [1] 4

If you want to, you can apply names to the elements of an existing list using the names function:

myList3 <- list(c(1,2,3,4,5),c(6,7,8,9,10))
myList3
## [[1]]
## [1] 1 2 3 4 5
## 
## [[2]]
## [1]  6  7  8  9 10
names(myList3) <- c("Vector1","Vector2")
myList3
## $Vector1
## [1] 1 2 3 4 5
## 
## $Vector2
## [1]  6  7  8  9 10
myList3$Vector2
## [1]  6  7  8  9 10

Lists are very flexible. They can take mixed data types:

myList5 <- list(Name="Tim",Role="Tutor",Years=5)
myList5
## $Name
## [1] "Tim"
## 
## $Role
## [1] "Tutor"
## 
## $Years
## [1] 5
class(myList5[[1]])
## [1] "character"
class(myList5[[3]])
## [1] "numeric"

Lists can also contain elements of different lengths:

myList5$Modules <- c("BIOS0002","BIOL0032")
myList5
## $Name
## [1] "Tim"
## 
## $Role
## [1] "Tutor"
## 
## $Years
## [1] 5
## 
## $Modules
## [1] "BIOS0002" "BIOL0032"

Data frames

Data frames are tremendously useful for scientific research. They are a special form of lists, where each element must have the same length. This is good for ensuring that each variable in your dataset has the same number of entries. In a later session, I will show you how to import data from a spreadsheet into an R data frame.

myDataFrame <- data.frame(
  Treatment=factor(c("Treatment1","Treatment2","Treatment3",
                     "Treatment1","Treatment2","Treatment3")),
  Measurement=c(2.0,4.5,1.2,1.0,6.0,2.3))
myDataFrame
##    Treatment Measurement
## 1 Treatment1         2.0
## 2 Treatment2         4.5
## 3 Treatment3         1.2
## 4 Treatment1         1.0
## 5 Treatment2         6.0
## 6 Treatment3         2.3
class(myDataFrame)
## [1] "data.frame"

We can extract the elements of data frames in exactly the same was as for lists:

myDataFrame$Treatment
## [1] Treatment1 Treatment2 Treatment3 Treatment1 Treatment2 Treatment3
## Levels: Treatment1 Treatment2 Treatment3
myDataFrame[["Treatment"]]
## [1] Treatment1 Treatment2 Treatment3 Treatment1 Treatment2 Treatment3
## Levels: Treatment1 Treatment2 Treatment3
myDataFrame[[1]]
## [1] Treatment1 Treatment2 Treatment3 Treatment1 Treatment2 Treatment3
## Levels: Treatment1 Treatment2 Treatment3
myDataFrame$Treatment[1]
## [1] Treatment1
## Levels: Treatment1 Treatment2 Treatment3

We can also add new elements to a data frame, just as we can with lists:

myDataFrame$Measurement2 <- c(2.1,4.4,1.0,1.4,7.2,2.4)
myDataFrame
##    Treatment Measurement Measurement2
## 1 Treatment1         2.0          2.1
## 2 Treatment2         4.5          4.4
## 3 Treatment3         1.2          1.0
## 4 Treatment1         1.0          1.4
## 5 Treatment2         6.0          7.2
## 6 Treatment3         2.3          2.4

Unlike with a list, if we try to create a data frame where the elements have different lengths (i.e., numbers of values), we will get an error:

myList6 <- list(Component1=c(1,2,3,4,5),Component2=c(6,7))
myList6
## $Component1
## [1] 1 2 3 4 5
## 
## $Component2
## [1] 6 7
try(data.frame(Component1=c(1,2,3,4,5),Component2=c(6,7)))
## Error in data.frame(Component1 = c(1, 2, 3, 4, 5), Component2 = c(6, 7)) : 
##   arguments imply differing number of rows: 5, 2

Matrices

Although you may not encounter matrices when running basic statistics in R, you may do if you get into more advanced statistics, and they are useful if you use R for modelling or maths. Like data frames, matrices have a square structure, but unlike data frames can only hold a single data type. You can create a matrix using the matrix function. The byrow option determines whether data are entered along each row (byrow = TRUE) or down each column (byrow = FALSE):

myMatrix <- matrix(data = 1:12,nrow = 4,ncol = 3,byrow = TRUE)
myMatrix
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9
## [4,]   10   11   12
myMatrix2 <- matrix(data = 1:12,nrow = 4,ncol = 3,byrow = FALSE)
myMatrix2
##      [,1] [,2] [,3]
## [1,]    1    5    9
## [2,]    2    6   10
## [3,]    3    7   11
## [4,]    4    8   12

We can convert objects of a different class to a matrix using the as.matrix function. If we convert the data-frame that we created earlier into a matrix, all values become strings, because matrices can’t handle mixed data types:

myMatrix3 <- as.matrix(myDataFrame)
myMatrix3
##      Treatment    Measurement Measurement2
## [1,] "Treatment1" "2.0"       "2.1"       
## [2,] "Treatment2" "4.5"       "4.4"       
## [3,] "Treatment3" "1.2"       "1.0"       
## [4,] "Treatment1" "1.0"       "1.4"       
## [5,] "Treatment2" "6.0"       "7.2"       
## [6,] "Treatment3" "2.3"       "2.4"

There are many mathematical operations that you can perform on matrices. In fact, you can also do the same with data frames so long as all the columns contain numbers. Matrix maths can get very complex, and is beyond the scope of these sessions. If you want a quick introduction to the basics, I recommend this webpage.

There are two main advantages of using matrices: 1) it ensures that all the values are of the same data type; and 2) the amount of memory used up by a matrix tends to be much smaller than that of a data frame, which can be important when working with very large datasets.

Next Time

That’s it for this session. In the next session, I introduce some of the functions that can be used to conduct arithmetic operations in R, including to calculate summary statistics that are indispensible in scientific research.