{{%attachments title=“Files for this section” style=“grey” /%}}
Comma separated files are the most common way to save spreadsheets that doesn’t require a paid program from Microsoft to open.
CSV file names end with a .csv
This explains the values separated with commas part of the file name.
If you have the link to a CSV file, right click the link of the data and click Copy Link Address. This data set can be found on the Connecticut Open Data Portal.
The Base R function to import a CSV file is read.csv()
. Just put the URL address in quotation marks and add the stringsAsFactors=F
(In this code we’re using the function head()
– this returns 6 rows by default, but we want to look at 10, so we’ll specify that when we call the function head(data, 10))
df_csv <- read.csv("https://data.ct.gov/api/views/iyru-82zq/rows.csv?accessType=DOWNLOAD", stringsAsFactors=F)
head(df_csv, 10)
## FiscalYear MonthYear Town AdmMonth FYMonthOrder AdmYear
## 1 2014 July-13 Ansonia 7 1 2013
## 2 2014 August-13 Ansonia 8 2 2013
## 3 2014 September-13 Ansonia 9 3 2013
## 4 2014 October-13 Ansonia 10 4 2013
## 5 2014 November-13 Ansonia 11 5 2013
## 6 2014 December-13 Ansonia 12 6 2013
## 7 2014 January-14 Ansonia 1 7 2014
## 8 2014 February-14 Ansonia 2 8 2014
## 9 2014 March-14 Ansonia 3 9 2014
## 10 2014 April-14 Ansonia 4 10 2014
## MonthTotal
## 1 42
## 2 43
## 3 39
## 4 33
## 5 38
## 6 45
## 7 35
## 8 38
## 9 45
## 10 46
When you right click on the link, instead of clicking Copy Link Address– this time, click Save Link As…
Save to your working directory.
After saving to the directory, click on the circle arrow on the right to refresh the files to make sure it’s there.
Either by typing setwd("/directory/where/you/want")
or by clicking in the menu up top Session > Set Working Directory > Choose Directory…
Just like before, except instead of the URL, it’s the name of the file.
Note: This will only work if the working directory is set to where the csv file is.
df_csv <- read.csv("data/Admissions_to_DMHAS_Addiction_Treatment_by_Town__Year__and_Month.csv", stringsAsFactors=F)
Why?
Blame statisticians.
Back when R was created the users weren’t using it as we use it now, with all these different strings.
What happens when you don’t use stringsAsFactors=F
df_csv <- read.csv("data/Admissions_to_DMHAS_Addiction_Treatment_by_Town__Year__and_Month.csv")
str(df_csv)
## 'data.frame': 3420 obs. of 7 variables:
## $ FiscalYear : int 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 ...
## $ MonthYear : Factor w/ 36 levels "April-14","April-15",..: 16 4 34 31 28 7 13 10 22 1 ...
## $ Town : Factor w/ 102 levels "Ansonia","Berlin",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ AdmMonth : int 7 8 9 10 11 12 1 2 3 4 ...
## $ FYMonthOrder: int 1 2 3 4 5 6 7 8 9 10 ...
## $ AdmYear : int 2013 2013 2013 2013 2013 2013 2014 2014 2014 2014 ...
## $ MonthTotal : int 42 43 39 33 38 45 35 38 45 46 ...
readr is a package that read rectangular data quickly and assumes characters are strings and not factors by default.
## If you don't have readr installed yet, uncomment and run the line below
#install.packages("readr")
library(readr)
df_csv <- read_csv("data/Admissions_to_DMHAS_Addiction_Treatment_by_Town__Year__and_Month.csv")
## Parsed with column specification:
## cols(
## FiscalYear = col_double(),
## MonthYear = col_character(),
## Town = col_character(),
## AdmMonth = col_double(),
## FYMonthOrder = col_double(),
## AdmYear = col_double(),
## MonthTotal = col_double()
## )
As you can see, the read_csv()
function interpreted the MonthYear and Town columns as characters and not as Factors as read.csv()
did.
When you’re done analyzing or transforming your data, you can save your dataframe as a CSV file with write_csv()
from the readr package.
# Pass the write_csv() function the name of the dataframe and what you want to call the file
write_csv(df_csv, "transformed_data.csv")
The file will save to your working directory, but you can specify sub directories with the function.
# Pass the write_csv() function the name of the dataframe and what you want to call the file
write_csv(df_csv, "data/transformed_data.csv")
NA
Weird quirk alert: Exported files will include NA
s so to replace them, pass the variable na="whatever"
.
# This replaces the NAs with blanks
write_csv(df_csv, "data/transformed_data.csv", na="")
Challenge yourself with these exercises so you’ll retain the knowledge of this section.
Instructions on how to run the exercise app are on the intro page to this section.
© Copyright 2018, Andrew Ba Tran
© Copyright 2018, Andrew Tran