Overview
In this session, I will introduce the basics of reading data into R and saving data from R.
- Data Input
- Built-in datasets
- CSV format
- Other table formats
- RDS format
- Data Output
- CSV format
- Other table formats
- RDS format
You will be able to load all of the example datasets in this session for yourself. However, to do so, you will need to have installed the R software on to your computer (see previous session).
The cheatsheet from previous sessions will still be useful here.
Data Input
There are a few ways that you can read data into R.
Built-in Datasets
Firstly, there are built-in datasets in R. For example, the cars dataset, which we will use again in later sessions:
TIP: The head and tail functions allow you to view the first and last lines, respectively, of a large data-frame. You can specify a number of rows to display. Here I have asked for 10 rows. Alternatively, 6 rows will be displayed by default.
data(mtcars)
head(mtcars,n = 10)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
CSV Format
It is also common to need to read data from Comma-Separated-Values files. For example, you can read my moth-trapping data directly from my website using the read.csv function:
<- read.csv(file = "https://timnewbold.github.io/TimNewboldMothDataPublicRelease.csv")
moths head(moths,10)
## TrapNumber Date Species NumberCaught Location
## 1 1 08/08/2020 Pebble prominent 1 Home
## 2 1 08/08/2020 Tree lichen beauty 1 Home
## 3 1 08/08/2020 Common carpet 4 Home
## 4 1 08/08/2020 Rustic 1 Home
## 5 1 08/08/2020 Common rustic 7 Home
## 6 1 08/08/2020 Flounced rustic 3 Home
## 7 1 08/08/2020 Clay 3 Home
## 8 1 08/08/2020 Straw underwing 8 Home
## 9 1 08/08/2020 Turnip 5 Home
## 10 1 08/08/2020 Yellow shell 2 Home
## Trap Type Month Month2 Year Year_Month
## 1 Heath - 40W Actinic Macro 8 8 2020 2020_08
## 2 Heath - 40W Actinic Macro 8 8 2020 2020_08
## 3 Heath - 40W Actinic Macro 8 8 2020 2020_08
## 4 Heath - 40W Actinic Macro 8 8 2020 2020_08
## 5 Heath - 40W Actinic Macro 8 8 2020 2020_08
## 6 Heath - 40W Actinic Macro 8 8 2020 2020_08
## 7 Heath - 40W Actinic Macro 8 8 2020 2020_08
## 8 Heath - 40W Actinic Macro 8 8 2020 2020_08
## 9 Heath - 40W Actinic Macro 8 8 2020 2020_08
## 10 Heath - 40W Actinic Macro 8 8 2020 2020_08
You can also use the read.csv function to read local files (just point the file option to the location of the file on your computer).
Other Table Formats
The read.csv function is a specific instance of the more general read.table function. You can also read CSV files via the more general function, but you have to specify the character that separates entries in the dataset (commas in the case of CSVs), and also that the data contains a header row (i.e., the column names):
<- read.table(file = "https://timnewbold.github.io/TimNewboldMothDataPublicRelease.csv",
moths header = TRUE,sep = ",")
tail(moths,10)
## TrapNumber Date Species NumberCaught Location
## 757 112 06/08/2021 Large yellow underwing 2 Home
## 758 112 06/08/2021 Riband wave 1 Home
## 759 112 06/08/2021 Flame shoulder 1 Home
## 760 112 06/08/2021 Least carpet 1 Home
## 761 112 06/08/2021 Elephant hawkmoth 1 Home
## 762 112 06/08/2021 Single-dotted wave 1 Home
## 763 112 06/08/2021 Scalloped Oak 1 Home
## 764 112 06/08/2021 Yellow shell 1 Home
## 765 112 06/08/2021 Dagger 1 Home
## 766 112 06/08/2021 Common rustic 1 Home
## Trap Type Month Month2 Year Year_Month
## 757 Heath - 40W Actinic Macro 8 8 2021 2021_08
## 758 Heath - 40W Actinic Macro 8 8 2021 2021_08
## 759 Heath - 40W Actinic Macro 8 8 2021 2021_08
## 760 Heath - 40W Actinic Macro 8 8 2021 2021_08
## 761 Heath - 40W Actinic Macro 8 8 2021 2021_08
## 762 Heath - 40W Actinic Macro 8 8 2021 2021_08
## 763 Heath - 40W Actinic Macro 8 8 2021 2021_08
## 764 Heath - 40W Actinic Macro 8 8 2021 2021_08
## 765 Heath - 40W Actinic Macro 8 8 2021 2021_08
## 766 Heath - 40W Actinic Macro 8 8 2021 2021_08
The read.table function is more useful if you want to read in formats other than CSV, for example tab-separated text files. Here we will read data from the PanTHERIA database of the traits of mammal species (Jones et al. 2009) (the ‘\t’ in the sep argument specifies that this dataset uses the tab character as the data separator):
<- read.table("https://www.dropbox.com/s/zj3ydfwo79t1n4f/PanTHERIA_1-0_WR05_Aug2008.txt?dl=1",sep = "\t",header = TRUE)
pantheria str(pantheria,list.len=10)
## 'data.frame': 5416 obs. of 55 variables:
## $ MSW05_Order : chr "Artiodactyla" "Carnivora" "Carnivora" "Carnivora" ...
## $ MSW05_Family : chr "Camelidae" "Canidae" "Canidae" "Canidae" ...
## $ MSW05_Genus : chr "Camelus" "Canis" "Canis" "Canis" ...
## $ MSW05_Species : chr "dromedarius" "adustus" "aureus" "latrans" ...
## $ MSW05_Binomial : chr "Camelus dromedarius" "Canis adustus" "Canis aureus" "Canis latrans" ...
## $ X1.1_ActivityCycle : num 3 1 2 2 2 2 -999 2 3 -999 ...
## $ X5.1_AdultBodyMass_g : num 492714 10392 9659 11989 31757 ...
## $ X8.1_AdultForearmLen_mm : num -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 ...
## $ X13.1_AdultHeadBodyLen_mm : num -999 745 828 872 1055 ...
## $ X2.1_AgeatEyeOpening_d : num -999 -999 7.5 11.9 14 ...
## [list output truncated]
REMINDER: The str function reports the type and contents of columns in a data-frame (or elements in other R data structures). Specifying the list.len option as 10 restricts the function to displaying the first 10 columns only. I am using that here, because this dataset contains many columns, and so using the head function would clutter the console.
RDS Format
Another format you may come across is the R proprietorial RDS format. This can be handy for very large datasets, because it is much more efficient than text files, such as CSVs and tab-delimited text files.
We will read here the PREDICTS database (Hudson et al. 2017), which is a very large dataset (3.2 million rows). The CSV version of this database is huge, so it is convenient to use the RDS format. We will come across the PREDICTS database again in later sessions.
TIP: For some reason, with RDS files we have to use a 2-stage process to load an RDS from an online repository. Alternatively, you can just point to a local RDS file on your computer.
<- url("https://www.dropbox.com/s/pb1mdiel8o22186/database.rds?dl=1")
myFile <- readRDS(myFile)
predicts str(predicts,list.len=10)
## 'data.frame': 3250404 obs. of 67 variables:
## $ Source_ID : Factor w/ 480 levels "AD1_2001__Liow",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Reference : Factor w/ 490 levels "Aben et al. 2008",..: 250 250 250 250 250 250 250 250 250 250 ...
## $ Study_number : int 1 1 1 1 1 2 2 2 2 2 ...
## $ Study_name : Factor w/ 593 levels "1 Western Ghat",..: 505 505 505 505 505 506 506 506 506 506 ...
## $ SS : Factor w/ 666 levels "AD1_2001__Liow 1",..: 1 1 1 1 1 2 2 2 2 2 ...
## $ Diversity_metric : Factor w/ 15 levels "abundance","biomass",..: 1 1 1 1 1 15 15 15 15 15 ...
## $ Diversity_metric_unit : Factor w/ 29 levels "effort-corrected individuals",..: 6 6 6 6 6 18 18 18 18 18 ...
## $ Diversity_metric_type : Factor w/ 3 levels "Abundance","Occurrence",..: 1 1 1 1 1 3 3 3 3 3 ...
## $ Diversity_metric_is_effort_sensitive : logi TRUE TRUE TRUE TRUE TRUE FALSE ...
## $ Diversity_metric_is_suitable_for_Chao : logi TRUE TRUE TRUE TRUE TRUE FALSE ...
## [list output truncated]
Data Output
We will now deal with saving data from R. Let’s say for example that you want to add a new column containing a manipulation of the data, and then save the result. Here, we will create a new column in the cars dataset expressing the power-to-weight ratio of the car models (not an ecological example but a simple example for demonstration!):
$PowerWeightRatio <- mtcars$hp/mtcars$wt
mtcarshead(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
## PowerWeightRatio
## Mazda RX4 41.98473
## Mazda RX4 Wag 38.26087
## Datsun 710 40.08621
## Hornet 4 Drive 34.21462
## Hornet Sportabout 50.87209
## Valiant 30.34682
CSV Format
You can write data to a CSV file using the write.csv function. I prefer to specify not to include row names (row.names = FALSE). Specifying quote = FALSE prevents the inclusion of quotation marks around character strings (you may need to use quote = TRUE if any of your character strings contain commas):
write.csv(x = mtcars,file = "CarData.csv",quote = FALSE,row.names = FALSE)
Other Table Formats
If you want to write a text file with a separator other than commas, you can use the more generic write.table function. Here, you have to specify the separator, in addition to the other arguments:
write.table(x = mtcars,file = "CarData.txt",quote = FALSE,row.names = FALSE,sep = '\t')
RDS Format
Finally, if you are going to keep working in R, and especially if you have a large dataset, you may want to consider using the RDS format. You can output an RDS file using the saveRDS function. Note, though, that you will not be able to read RDS datasets other than in R:
saveRDS(object = mtcars,file = "CarData.rds")
Next Time
In the next session, I will give a very brief and rather superficial introduction to the vast plotting capabilities in R.