Arithemtic in R

Tim Newbold

2021-10-08

Overview

In this session, we will cover some of the basic functions for performing summary arithmetic on sets of numbers in R:

  • Basic arithmetic
  • Summary statistics
  • Dealing with no-data values
  • Working with complex data types

As usual, this cheatsheet gives you a useful overview of the key operations and functions, and you can try out some R code for yourself on this website.

We will first create a vector of numbers to use:

x <- c(8.033245,10.499592,9.983152,10.860751,
       10.857530,12.865557,8.428087,11.646942,
       12.121439,10.773926,12.054675,10.578910,
       8.560825,10.623534,10.580913,13.219849,
       8.741934,13.927849,11.615910,10.653642,
       8.034322,9.426207,8.404283,12.127669,
       9.821571,13.785322,10.524268,7.572063,
       9.593128,13.225378)

Basic Arithmetic

In the first session, I showed you how to conduct arithemtic on individual numbers in R. It is also useful to be able summarise sets of numbers. There are many functions in R that enable you to calculate summaries of sets of numbers. Here, I will cover sums, products, and cumulative sums.

To calculate the sum of a vector of numbers, you can use the built-in sum function:

sum(x)
## [1] 319.1425

To calculate the product of all numbers in a set, use the prod function:

prod(x)
## [1] 4.229897e+30

Finally, to calculate a cumulative sum, there is the cumsum function. Instead of returning a single number, this function returns a vector of same length as the input vector. The first value in the returned vector is equal to the sum of the first value in the input vector, the second value is the sum of the first and second values in the input vector, the third value the sum of the first, second and third values, and so on:

cumsum(x)
##  [1]   8.033245  18.532837  28.515989  39.376740  50.234270  63.099827
##  [7]  71.527914  83.174856  95.296295 106.070221 118.124896 128.703806
## [13] 137.264631 147.888165 158.469078 171.688927 180.430861 194.358710
## [19] 205.974620 216.628262 224.662584 234.088791 242.493074 254.620743
## [25] 264.442314 278.227636 288.751904 296.323967 305.917095 319.142473

Summary Statistics

If you are using R to run statistics, some of the most useful functions are those that allow you to calculate summary statistics on sets of numbers. Again, there are many functions available in R. Here I will deal with means, medians, variances, standard deviations and standard errors.

The mean is calculated with the built-in mean function:

mean(x)
## [1] 10.63808

And the median with the median function:

median(x)

Variance can be calculated with the var function:

var(x)
## [1] 3.172763

And standard deviation with the sd function:

sd(x)
## [1] 1.781225

We could calculate standard error manually as the standard deviation divided by the square root of the sample size:

sd(x)/sqrt(length(x))
## [1] 0.3252057

Alternatively, there is a function std.error that will calculate standard error directly (for this we need to load a new package - plotrix):

install.packages("plotrix")
library(plotrix)
std.error(x)
## [1] 0.3252057

Dealing with No-data Values

Things are a very little more complicated if our data contain no-data values. To demonstrate, we will create a new vector that contains some no-data values:

x <- c(8.033245,10.499592,9.983152,10.860751,
       10.857530,12.865557,8.428087,11.646942,
       12.121439,10.773926,12.054675,10.578910,
       8.560825,10.623534,10.580913,13.219849,
       8.741934,13.927849,11.615910,10.653642,
       8.034322,9.426207,8.404283,12.127669,
       9.821571,13.785322,10.524268,7.572063,
       9.593128,13.225378,NA,NA)

If we try to apply any arithmetic or summary statistics functions on this vector, we will obtain no-data values:

sum(x)
## [1] NA
prod(x)
## [1] NA
mean(x)
## [1] NA
median(x)
## [1] NA
var(x)
## [1] NA
sd(x)
## [1] NA

The solution is very simple. All of these functions have an na.rm option, which we need to set to TRUE:

sum(x,na.rm=TRUE)
## [1] 319.1425
prod(x,na.rm=TRUE)
## [1] 4.229897e+30
mean(x,na.rm=TRUE)
## [1] 10.63808
median(x,na.rm=TRUE)
## [1] 10.60222
var(x,na.rm=TRUE)
## [1] 3.172763
sd(x,na.rm=TRUE)
## [1] 1.781225

Working with Complex Data Types

Just a reminder that you can extract vectors of numbers from complex data types, in order to calculate summary arithmetic or statistics.

So, if your data are in a list:

myList <- list(x = c(8.033245,10.499592,9.983152,10.860751,
       10.857530,12.865557,8.428087,11.646942,
       12.121439,10.773926,12.054675,10.578910,
       8.560825,10.623534,10.580913,13.219849,
       8.741934,13.927849,11.615910,10.653642,
       8.034322,9.426207,8.404283,12.127669,
       9.821571,13.785322,10.524268,7.572063,
       9.593128,13.225378),
       y = c(1,2,3))

mean(myList[[1]])
## [1] 10.63808
mean(myList$x)
## [1] 10.63808

Or if your data are in a data-frame:

myDataFrame <- data.frame(x = c(8.033245,10.499592,9.983152,10.860751,
       10.857530,12.865557,8.428087,11.646942,
       12.121439,10.773926,12.054675,10.578910,
       8.560825,10.623534,10.580913,13.219849,
       8.741934,13.927849,11.615910,10.653642,
       8.034322,9.426207,8.404283,12.127669,
       9.821571,13.785322,10.524268,7.572063,
       9.593128,13.225378),
       y = c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,
             3,3,3,3,3,3,3,3,3,3))
mean(myDataFrame[[1]])
## [1] 10.63808
mean(myDataFrame$x)
## [1] 10.63808

TIP: When extracting data from a data frame, it is always better to extract by name rather than column number. If the columns in the dataset are reordered, you are more likely to make a mistake if you use column numbers.

Next Time

In the next session, I will show you how to install R and R Studio, so that you can move onto reading in and saving data (which is not possible in the website we have been using so far).