Read CSV in R – Importing data – A Tutorial

In this tutorial we’ll learn about read csv in R, importing and exporting data in R. We’ll start with importing csv and txt files in R using base R functions and then move on to different packages that are used for data management tasks. We’ll also learn how to handle missing observations. Finally we’ll take a brief look at how to import and export data from other software such as SAS, STATA and SPSS.

Read CSV in R and importing Data

To perform data analysis in R , you should have data in a file either stored locally on your device or if it’s on the web , you should download and save the data in your system and then load it into R to work on it. Loading Data is known as importing data in R and it supports importing data files stored in various formats, for example csv, txt, xlsx and so on. The most commonly used file formats to store data are csv, txt and xlsx.

read csv in R

This is what CSV files and TXT files look like. CSV is the abbreviation for comma separated value. In a CSV file data is stored in table format,  with commas acting as separators. TXT is the abbreviation for text file . A text file stores data in plain text format, separated by different types of delimiters such as blank spaces, tabs (“\t”), commas (“,”), pipes (“|”) and so on.

TXT and CSV file formats

Let’s first take a look at a data snapshot

In this example, we have  a snapshot of a dataset named  basic_salary. The data describes employees’ salary components, along with their grade and location. Each row corresponds to information about a single employee.

Data Snapshot

The read csv in R (read.csv) is a commonly used function for importing a csv file into R

Read csv in R function (read.csv)

If your file is not stored in the default working directory, then you will have to indicate its path for importing it into R. Ensure the path uses forward slashes (/) and the file name is accurate. Note that the read.csv function assumes header = TRUE and sep = “,”  by default.

 salary_data <- read.csv("C:/Users/Documents/basic_salary.csv")   

The read.table function

The read.table function is a more general function to import datasets into R.

Here, header = TRUE indicates that the first row of the file contains the names of the columns. The default is header = FALSE and sep = ”,” specifies that the data is separated by comma. Without this command, R imports data in a single column.

 salary_data <-read.table("C:/Users/Documents/basic_salary.csv", 
 header = TRUE, sep = ",") 

We already know that the read.table function is used to import data in R. If within the read.table command we add another argument,  fill = TRUE it implicitly adds blank fields, in case the rows have unequal length.

 salary_data <-read.table("C:/Users/Documents/basic_salary.txt", 
 header = TRUE, sep = "\t", fill= TRUE) 

In statistics, missing data, or missing values, occur when no data value is stored for the variable in an observation. It’s important to handle missing values efficiently so as to reduce bias and obtain accurate results. While importing data in R,  missing values are automatically saved as NA.

Sometimes data may contain random values that are considered as missing values. It’s important to detect those values  and overcome them. Therefore, we have a special tutorial for Handling missing values at a later stage,  where we’ll learn how to deal with different types of missing data.

The readr package

readr is another package used to import data files. It may work faster if you are importing a very large amount of data. 

In order to install the readr package we use the install.packages command and the function library  is used to load the package into the ongoing session.

Install and load readr

 install.packages(“readr”)
 library(readr) 

Importing a csv file

 salary_data<-read_csv("C:/Users/Documents/basic_salary.csv") 

Importing a txt file (white space delimited)

 salary_data<-read_table("C:/Users/Documents/basic_salary.txt") 

Importing a .txt file (tab delimited)

salary_data<-read_tsv("C:/Users/Documents/basic_salary.txt")

We can also use the fread function for importing very large data sets. This function is available in the data.table package . This is much faster and more convenient than other methods and one of the great things  about this function is that all controls, expressed in arguments such as sep, are automatically detected.

 install.packages("data.table")
 library(data.table)
 salary_data<-fread("C:/Users/Documents/basic_salary.csv") 

In addition to readr and data.table, readxl is a package that provides functions to read Excel worksheets in both .xls and .xlsx formats.

read_excel is used to import an excel file in R.

 install.packages("readxl")
 library(readxl)
 salary_data<-read_excel("C:/Users/Documents/basic_salary.xlsx") 

R also provides manual selection of the directory and file where the dataset is located. 

We use file.choose() to do this.

After executing the read.csv function  with file.choose() inside it, a dialog box opens that allows you to choose the file interactively.

 salary_data <- read.csv(file.choose())  

Importing data in R from SAS, STATA and SPSS

We can also import data from SAS, STATA and SPSS software. The foreign package is used to do this.

We use read.xport to import a SAS file which is stored in transport format. Similarly, we have read.spss and read.dta functions for importing SPSS and Stata files.

Install and load the foreign package

 install.packages("foreign")
 library(foreign) 

For SAS

# Save SAS dataset in transport format. Requires SAS on your system.

 read.xport(“dataset.xpt”)

For SPSS

 read.spss(“dataset.sav”,use.value.labels=TRUE) 

# For Stata binary

 read.dta(“dataset.dta”) 

Importing data in R from MySQL

R also supports importing data from MySQL using the RMySQL package

The first step while importing data from MySQL is to create a database connection object using the dbConnect function.

After creating a database connection object, a result set object named “rs” is defined to retrieve data from the database using the dbSendQuery command.

Finally the results are accessed in R using the fetch  function .

 install.packages("RMySQL")
 library(RMySQL) 

Create a database connection object.

mydb <- dbConnect(MySQL(), user='user', password='password', dbname='database_name', host='host')

Save a results set object to retrieve data from the database

rs = dbSendQuery(mydb, "select * from some_table")

Access the result in R

 data = fetch(rs, n=-1) 

Importing data in R from PostgreSQL

The RPostgreSQL package is used to work with PostgreSQL.

The first step while importing data from PostgreSQL is to create a database connection object using the dbDriver function.

The table is accessed in R using the function dbReadTable

 install.packages("RPostgreSQL")
 library(RPostgreSQL) 

Create a database connection object.

drv <- dbDriver("PostgreSQL")  
con <- dbConnect(drv, host='host', port=‘port',dbname='database_name', user='user', password='password') 

Obtain a table into R data frame

 myTable <- dbReadTable(con, "tablename") 

Importing in R from Oracle

There are several packages in R that allow us to connect to an Oracle database. The most common ones are RODBC, RJDBC and ROracle.

In our example, we use the RODBC package to implement a database connection with Oracle. Here, we establish a connection using the odbcConnect function.

The sqlQuery function  queries the database and put the results into a data frame.

 install.packages("RODBC")
 library(RODBC) 

Create a database connection object.

 con <-odbcConnect("data", uid="user", pwd="password") 

Query the database and put the results into a data frame

mydata <- sqlQuery(con, "SELECT * FROM TABLENAME.DATATABLE")

Oracle databases can be imported using RODBC, RJDBC or ROracle packages.

ROracle is an open source R package supporting a DBI or (database interface driver)- compliant Oracle driver based on the high performance OCI (oracle call interface) library. 

Export csv from r to txt, excel, SAS, SPSS, STATA..

Files can be exported from R into different file formats including csv, txt, xlsx etc. To export a csv file from R, the write.csv   function is used.

To export a file as  a Tab delimited Text file, the write.table function is used with a separator. To export a file as an Excel spreadsheet, write.xlsx is used.

To a CSV File

 write.csv(mydata, file = "MyData.csv") 

To a Tab Delimited Text File

write.table(mydata, “c:/mydata.txt”, sep=”\t”)

To a Excel Spreadsheet

 install.packages("xlsx")
 library(xlsx)
 write.xlsx(mydata, "c:/mydata.xlsx") 

We’ve already seen that the “foreign” package is used to import and export files from SPSS, SAS and STATA

The write.foreign function in the foreign package can be used to export data to various formats such as SAS,SPSS and STATA

 install.packages("foreign")
 library(foreign) 

To SPSS

 write.foreign(mydata, "c:/mydata.txt", "c:/mydata.sps",package="SPSS") 

To SAS

write.foreign(mydata, "c:/mydata.txt", "c:/mydata.sas", package="SAS")

To STATA

 write.dta(mydata, "c:/mydata.dta") 

This is a quick recap of concepts in this tutorial.  We covered how to import files of varied formats using basic functions,  and looks at the various packages that make importing data easy in the case of large data sets. After importing files we also learned how to export files from R into specific formats.

Finally we looked at importing and exporting data from other software and databases including SAS, SPSS, STATA, MySQL and Oracle.

This tutorial is based on lessons from the Data Analytics in R unit of the Digita Schools Advanced Diploma in Data Analytics.

Importing data in R - a summary

This tutorial lesson is taken from Digita Schools Advanced Diploma in Data Analytics and the Postgraduate Diploma in Data Science.

You can try our courses for free to learn more