R Data Input and Output

Tim Newbold

2021-10-11

Overview

In this session, I will introduce the basics of reading data into R and saving data from R.

Data Input
- Built-in datasets
- CSV format
- Other table formats
- RDS format
Data Output
- CSV format
- Other table formats
- RDS format

You will be able to load all of the example datasets in this session for yourself. However, to do so, you will need to have installed the R software on to your computer (see previous session).

The cheatsheet from previous sessions will still be useful here.

Data Input

There are a few ways that you can read data into R.

Built-in Datasets

Firstly, there are built-in datasets in R. For example, the cars dataset, which we will use again in later sessions:

TIP: The head and tail functions allow you to view the first and last lines, respectively, of a large data-frame. You can specify a number of rows to display. Here I have asked for 10 rows. Alternatively, 6 rows will be displayed by default.

data(mtcars)
head(mtcars,n = 10)

##                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4

CSV Format

It is also common to need to read data from Comma-Separated-Values files. For example, you can read my moth-trapping data directly from my website using the read.csv function:

moths <- read.csv(file = "https://timnewbold.github.io/TimNewboldMothDataPublicRelease.csv")
head(moths,10)

##    TrapNumber       Date            Species NumberCaught Location
## 1           1 08/08/2020   Pebble prominent            1     Home
## 2           1 08/08/2020 Tree lichen beauty            1     Home
## 3           1 08/08/2020      Common carpet            4     Home
## 4           1 08/08/2020             Rustic            1     Home
## 5           1 08/08/2020      Common rustic            7     Home
## 6           1 08/08/2020    Flounced rustic            3     Home
## 7           1 08/08/2020               Clay            3     Home
## 8           1 08/08/2020    Straw underwing            8     Home
## 9           1 08/08/2020             Turnip            5     Home
## 10          1 08/08/2020       Yellow shell            2     Home
##                   Trap  Type Month Month2 Year Year_Month
## 1  Heath - 40W Actinic Macro     8      8 2020    2020_08
## 2  Heath - 40W Actinic Macro     8      8 2020    2020_08
## 3  Heath - 40W Actinic Macro     8      8 2020    2020_08
## 4  Heath - 40W Actinic Macro     8      8 2020    2020_08
## 5  Heath - 40W Actinic Macro     8      8 2020    2020_08
## 6  Heath - 40W Actinic Macro     8      8 2020    2020_08
## 7  Heath - 40W Actinic Macro     8      8 2020    2020_08
## 8  Heath - 40W Actinic Macro     8      8 2020    2020_08
## 9  Heath - 40W Actinic Macro     8      8 2020    2020_08
## 10 Heath - 40W Actinic Macro     8      8 2020    2020_08

You can also use the read.csv function to read local files (just point the file option to the location of the file on your computer).

Other Table Formats

The read.csv function is a specific instance of the more general read.table function. You can also read CSV files via the more general function, but you have to specify the character that separates entries in the dataset (commas in the case of CSVs), and also that the data contains a header row (i.e., the column names):

moths <- read.table(file = "https://timnewbold.github.io/TimNewboldMothDataPublicRelease.csv",
                    header = TRUE,sep = ",")
tail(moths,10)

##     TrapNumber       Date                Species NumberCaught Location
## 757        112 06/08/2021 Large yellow underwing            2     Home
## 758        112 06/08/2021            Riband wave            1     Home
## 759        112 06/08/2021         Flame shoulder            1     Home
## 760        112 06/08/2021           Least carpet            1     Home
## 761        112 06/08/2021      Elephant hawkmoth            1     Home
## 762        112 06/08/2021     Single-dotted wave            1     Home
## 763        112 06/08/2021          Scalloped Oak            1     Home
## 764        112 06/08/2021           Yellow shell            1     Home
## 765        112 06/08/2021                 Dagger            1     Home
## 766        112 06/08/2021          Common rustic            1     Home
##                    Trap  Type Month Month2 Year Year_Month
## 757 Heath - 40W Actinic Macro     8      8 2021    2021_08
## 758 Heath - 40W Actinic Macro     8      8 2021    2021_08
## 759 Heath - 40W Actinic Macro     8      8 2021    2021_08
## 760 Heath - 40W Actinic Macro     8      8 2021    2021_08
## 761 Heath - 40W Actinic Macro     8      8 2021    2021_08
## 762 Heath - 40W Actinic Macro     8      8 2021    2021_08
## 763 Heath - 40W Actinic Macro     8      8 2021    2021_08
## 764 Heath - 40W Actinic Macro     8      8 2021    2021_08
## 765 Heath - 40W Actinic Macro     8      8 2021    2021_08
## 766 Heath - 40W Actinic Macro     8      8 2021    2021_08

The read.table function is more useful if you want to read in formats other than CSV, for example tab-separated text files. Here we will read data from the PanTHERIA database of the traits of mammal species (Jones et al. 2009) (the ‘\t’ in the sep argument specifies that this dataset uses the tab character as the data separator):

pantheria <- read.table("https://www.dropbox.com/s/zj3ydfwo79t1n4f/PanTHERIA_1-0_WR05_Aug2008.txt?dl=1",sep = "\t",header = TRUE)
str(pantheria,list.len=10)

## 'data.frame':    5416 obs. of  55 variables:
##  $ MSW05_Order                  : chr  "Artiodactyla" "Carnivora" "Carnivora" "Carnivora" ...
##  $ MSW05_Family                 : chr  "Camelidae" "Canidae" "Canidae" "Canidae" ...
##  $ MSW05_Genus                  : chr  "Camelus" "Canis" "Canis" "Canis" ...
##  $ MSW05_Species                : chr  "dromedarius" "adustus" "aureus" "latrans" ...
##  $ MSW05_Binomial               : chr  "Camelus dromedarius" "Canis adustus" "Canis aureus" "Canis latrans" ...
##  $ X1.1_ActivityCycle           : num  3 1 2 2 2 2 -999 2 3 -999 ...
##  $ X5.1_AdultBodyMass_g         : num  492714 10392 9659 11989 31757 ...
##  $ X8.1_AdultForearmLen_mm      : num  -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 ...
##  $ X13.1_AdultHeadBodyLen_mm    : num  -999 745 828 872 1055 ...
##  $ X2.1_AgeatEyeOpening_d       : num  -999 -999 7.5 11.9 14 ...
##   [list output truncated]

REMINDER: The str function reports the type and contents of columns in a data-frame (or elements in other R data structures). Specifying the list.len option as 10 restricts the function to displaying the first 10 columns only. I am using that here, because this dataset contains many columns, and so using the head function would clutter the console.

RDS Format

Another format you may come across is the R proprietorial RDS format. This can be handy for very large datasets, because it is much more efficient than text files, such as CSVs and tab-delimited text files.

We will read here the PREDICTS database (Hudson et al. 2017), which is a very large dataset (3.2 million rows). The CSV version of this database is huge, so it is convenient to use the RDS format. We will come across the PREDICTS database again in later sessions.

TIP: For some reason, with RDS files we have to use a 2-stage process to load an RDS from an online repository. Alternatively, you can just point to a local RDS file on your computer.

myFile <- url("https://www.dropbox.com/s/pb1mdiel8o22186/database.rds?dl=1")
predicts <- readRDS(myFile)
str(predicts,list.len=10)

## 'data.frame':    3250404 obs. of  67 variables:
##  $ Source_ID                              : Factor w/ 480 levels "AD1_2001__Liow",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Reference                              : Factor w/ 490 levels "Aben et al. 2008",..: 250 250 250 250 250 250 250 250 250 250 ...
##  $ Study_number                           : int  1 1 1 1 1 2 2 2 2 2 ...
##  $ Study_name                             : Factor w/ 593 levels "1 Western Ghat",..: 505 505 505 505 505 506 506 506 506 506 ...
##  $ SS                                     : Factor w/ 666 levels "AD1_2001__Liow 1",..: 1 1 1 1 1 2 2 2 2 2 ...
##  $ Diversity_metric                       : Factor w/ 15 levels "abundance","biomass",..: 1 1 1 1 1 15 15 15 15 15 ...
##  $ Diversity_metric_unit                  : Factor w/ 29 levels "effort-corrected individuals",..: 6 6 6 6 6 18 18 18 18 18 ...
##  $ Diversity_metric_type                  : Factor w/ 3 levels "Abundance","Occurrence",..: 1 1 1 1 1 3 3 3 3 3 ...
##  $ Diversity_metric_is_effort_sensitive   : logi  TRUE TRUE TRUE TRUE TRUE FALSE ...
##  $ Diversity_metric_is_suitable_for_Chao  : logi  TRUE TRUE TRUE TRUE TRUE FALSE ...
##   [list output truncated]

Data Output

We will now deal with saving data from R. Let’s say for example that you want to add a new column containing a manipulation of the data, and then save the result. Here, we will create a new column in the cars dataset expressing the power-to-weight ratio of the car models (not an ecological example but a simple example for demonstration!):

mtcars$PowerWeightRatio <- mtcars$hp/mtcars$wt
head(mtcars)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
##                   PowerWeightRatio
## Mazda RX4                 41.98473
## Mazda RX4 Wag             38.26087
## Datsun 710                40.08621
## Hornet 4 Drive            34.21462
## Hornet Sportabout         50.87209
## Valiant                   30.34682

CSV Format

You can write data to a CSV file using the write.csv function. I prefer to specify not to include row names (row.names = FALSE). Specifying quote = FALSE prevents the inclusion of quotation marks around character strings (you may need to use quote = TRUE if any of your character strings contain commas):

write.csv(x = mtcars,file = "CarData.csv",quote = FALSE,row.names = FALSE)

Other Table Formats

If you want to write a text file with a separator other than commas, you can use the more generic write.table function. Here, you have to specify the separator, in addition to the other arguments:

write.table(x = mtcars,file = "CarData.txt",quote = FALSE,row.names = FALSE,sep = '\t')

RDS Format

Finally, if you are going to keep working in R, and especially if you have a large dataset, you may want to consider using the RDS format. You can output an RDS file using the saveRDS function. Note, though, that you will not be able to read RDS datasets other than in R:

saveRDS(object = mtcars,file = "CarData.rds")

Next Time

In the next session, I will give a very brief and rather superficial introduction to the vast plotting capabilities in R.

References

Hudson, Lawrence N., Tim Newbold, Sara Contu, Samantha L. L. Hill, Igor Lysenko, Adriana De Palma, Helen R. P. Phillips, et al. 2017. “The Database of the PREDICTS (Projecting Responses of Ecological Diversity In Changing Terrestrial Systems) Project.” Ecology & Evolution 7: 145–88. https://doi.org/10.1002/ece3.2579.

Jones, Kate E., Jon Bielby, Marcel Cardillo, Susanne A. Fritz, Justin O’Dell, C. David L. Orme, Kamran Safi, et al. 2009. “PanTHERIA: A Species-Level Database of Life History, Ecology, and Geography of Extant and Recently Extinct Mammals.” Ecology 90: 2648. https://esapubs.org/archive/ecol/E090/184/metadata.htm.