In this tutorial we’ll learn about read csv in R, importing and exporting data in R. We’ll start with importing csv and txt files in R using base R functions and then move on to different packages that are used for data management tasks. We’ll also learn how to handle missing observations. Finally we’ll take a brief look at how to import and export data from other software such as SAS, STATA and SPSS.
You can download the data files for this tutorial here
To perform data analysis in R , you should have data in a file either stored locally on your device or if it’s on the web , you should download and save the data in your system and then load it into R to work on it. Loading Data is known as importing data in R and it supports importing data files stored in various formats, for example csv, txt, xlsx and so on. The most commonly used file formats to store data are csv, txt and xlsx.
This is what CSV files and TXT files look like. CSV is the abbreviation for comma separated value. In a CSV file data is stored in table format, with commas acting as separators. TXT is the abbreviation for text file . A text file stores data in plain text format, separated by different types of delimiters such as blank spaces, tabs (“\t”), commas (“,”), pipes (“|”) and so on.
Let’s first take a look at a data snapshot
In this example, we have a snapshot of a dataset named basic_salary. The data describes employees’ salary components, along with their grade and location. Each row corresponds to information about a single employee.
The read csv in R (read.csv) is a commonly used function for importing a csv file into R
Read csv in R function (read.csv)
If your file is not stored in the default working directory, then you will have to indicate its path for importing it into R. Ensure the path uses forward slashes (/) and the file name is accurate. Note that the read.csv function assumes header = TRUE and sep = “,” by default.
salary_data <- read.csv("C:/Users/Documents/basic_salary.csv")
The read.table function
The read.table function is a more general function to import datasets into R.
Here, header = TRUE indicates that the first row of the file contains the names of the columns. The default is header = FALSE and sep = ”,” specifies that the data is separated by comma. Without this command, R imports data in a single column.
salary_data <-read.table("C:/Users/Documents/basic_salary.csv", header = TRUE, sep = ",")
We already know that the read.table function is used to import data in R. If within the read.table command we add another argument, fill = TRUE it implicitly adds blank fields, in case the rows have unequal length.
salary_data <-read.table("C:/Users/Documents/basic_salary.txt", header = TRUE, sep = "\t", fill= TRUE)
In statistics, missing data, or missing values, occur when no data value is stored for the variable in an observation. It’s important to handle missing values efficiently so as to reduce bias and obtain accurate results. While importing data in R, missing values are automatically saved as NA.
Sometimes data may contain random values that are considered as missing values. It’s important to detect those values and overcome them. Therefore, we have a special tutorial for Handling missing values at a later stage, where we’ll learn how to deal with different types of missing data.
The readr package
readr is another package used to import data files. It may work faster if you are importing a very large amount of data.
In order to install the readr package we use the install.packages command and the function library is used to load the package into the ongoing session.
Install and load readr
Importing a csv file
Importing a txt file (white space delimited)
Importing a .txt file (tab delimited)
We can also use the fread function for importing very large data sets. This function is available in the data.table package . This is much faster and more convenient than other methods and one of the great things about this function is that all controls, expressed in arguments such as sep, are automatically detected.
install.packages("data.table") library(data.table) salary_data<-fread("C:/Users/Documents/basic_salary.csv")
In addition to readr and data.table, readxl is a package that provides functions to read Excel worksheets in both .xls and .xlsx formats.
read_excel is used to import an excel file in R.
install.packages("readxl") library(readxl) salary_data<-read_excel("C:/Users/Documents/basic_salary.xlsx")
R also provides manual selection of the directory and file where the dataset is located.
We use file.choose() to do this.
After executing the read.csv function with file.choose() inside it, a dialog box opens that allows you to choose the file interactively.
salary_data <- read.csv(file.choose())
Importing data in R from SAS, STATA and SPSS
We can also import data from SAS, STATA and SPSS software. The foreign package is used to do this.
We use read.xport to import a SAS file which is stored in transport format. Similarly, we have read.spss and read.dta functions for importing SPSS and Stata files.
Install and load the foreign package
# Save SAS dataset in transport format. Requires SAS on your system.
# For Stata binary
Importing data in R from MySQL
R also supports importing data from MySQL using the RMySQL package
The first step while importing data from MySQL is to create a database connection object using the dbConnect function.
After creating a database connection object, a result set object named “rs” is defined to retrieve data from the database using the dbSendQuery command.
Finally the results are accessed in R using the fetch function .
Create a database connection object.
mydb <- dbConnect(MySQL(), user='user', password='password', dbname='database_name', host='host')
Save a results set object to retrieve data from the database
rs = dbSendQuery(mydb, "select * from some_table")
Access the result in R
data = fetch(rs, n=-1)
Importing data in R from PostgreSQL
The RPostgreSQL package is used to work with PostgreSQL.
The first step while importing data from PostgreSQL is to create a database connection object using the dbDriver function.
The table is accessed in R using the function dbReadTable
Create a database connection object.
drv <- dbDriver("PostgreSQL") con <- dbConnect(drv, host='host', port=‘port',dbname='database_name', user='user', password='password')
Obtain a table into R data frame
myTable <- dbReadTable(con, "tablename")
Importing in R from Oracle
There are several packages in R that allow us to connect to an Oracle database. The most common ones are RODBC, RJDBC and ROracle.
In our example, we use the RODBC package to implement a database connection with Oracle. Here, we establish a connection using the odbcConnect function.
The sqlQuery function queries the database and put the results into a data frame.
Create a database connection object.
con <-odbcConnect("data", uid="user", pwd="password")
Query the database and put the results into a data frame
mydata <- sqlQuery(con, "SELECT * FROM TABLENAME.DATATABLE")
Oracle databases can be imported using RODBC, RJDBC or ROracle packages.
ROracle is an open source R package supporting a DBI or (database interface driver)- compliant Oracle driver based on the high performance OCI (oracle call interface) library.
Export csv from r to txt, excel, SAS, SPSS, STATA..
Files can be exported from R into different file formats including csv, txt, xlsx etc. To export a csv file from R, the write.csv function is used.
To export a file as a Tab delimited Text file, the write.table function is used with a separator. To export a file as an Excel spreadsheet, write.xlsx is used.
To a CSV File
write.csv(mydata, file = "MyData.csv")
To a Tab Delimited Text File
write.table(mydata, “c:/mydata.txt”, sep=”\t”)
To a Excel Spreadsheet
install.packages("xlsx") library(xlsx) write.xlsx(mydata, "c:/mydata.xlsx")
We’ve already seen that the “foreign” package is used to import and export files from SPSS, SAS and STATA
The write.foreign function in the foreign package can be used to export data to various formats such as SAS,SPSS and STATA
write.foreign(mydata, "c:/mydata.txt", "c:/mydata.sps",package="SPSS")
write.foreign(mydata, "c:/mydata.txt", "c:/mydata.sas", package="SAS")
This is a quick recap of concepts in this tutorial. We covered how to import files of varied formats using basic functions, and looks at the various packages that make importing data easy in the case of large data sets. After importing files we also learned how to export files from R into specific formats.
Finally we looked at importing and exporting data from other software and databases including SAS, SPSS, STATA, MySQL and Oracle.
This tutorial lesson is taken from the Postgraduate Diploma in Data Science.