Have you ever dealt with a large folder of spreadsheets that you need to combine into one?
With spreadsheets, you have to copy and paste over and over again.
But with R, you can script that process.
In fact, I’ve created a function for you that simplifies the process.
You have to download my muckrakr package off of GitHub with the devtools package.
# If you don't have devtools installed yet, uncomment and run the line below
#install.packages("devtools")
devtools::install_github("andrewbtran/muckrakr")
library(muckrakr)
The function we want to use is called bulk_csv()
.
It looks like this (Don’t run this command, it’s theoretical):
bulk_csv(folder = "DEFAULTBULKCSV2017", export = "filenamedefaultbulkcsv2018.csv"")
Two arguments:
Alright, so within this project folder is a sub folder full of CSVs.
It’s in data/csvs
This package is based on readr, so passing on variables from read_csv will work in bulk_csv. These variables will apply to the import of each CSV sheet.
To create a new dataframe with all the data sets combined, the command would be:
combined_data <- bulk_csv(folder="data/csvs")
##
|
| | 0%
|
|================ | 25%
|
|================================ | 50%
|
|================================================= | 75%
|
|=================================================================| 100%
To simply export the data as a new combined CSV file, the command would be:
bulk_csv(folder="data/csvs", export="combined_data.csv")
##
|
| | 0%
|
|================ | 25%
|
|================================ | 50%
|
|================================================= | 75%
|
|=================================================================| 100%
## # A tibble: 3,558 x 11
## STATEFP COUNTYFP TRACTCE GEOID NAME NAMELSAD subid.GEOID subid.NAME
## <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
## 1 2 198 300 2.20e9 3.00e0 Census … 213039010 Ketchikan
## 2 2 198 940100 2.20e9 9.40e3 Census … 219864310 Prince of…
## 3 2 198 200 2.20e9 2.00e0 Census … 219864310 Prince of…
## 4 2 198 100 2.20e9 1.00e0 Census … 219864310 Prince of…
## 5 2 164 100 2.16e9 1.00e0 Census … 215041200 Kodiak Is…
## 6 2 16 200 2.02e9 2.00e0 Census … 201601615 Aleutians…
## 7 2 16 100 2.02e9 1.00e0 Census … 201601615 Aleutians…
## 8 2 170 502 2.17e9 5.02e0 Census … 212238460 Kenai-Coo…
## 9 2 170 501 2.17e9 5.01e0 Census … 217047440 Matanuska…
## 10 2 170 1202 2.17e9 1.20e1 Census … 217047440 Matanuska…
## # … with 3,548 more rows, and 3 more variables: legit_geo <dbl>,
## # legit_name <chr>, legit_area <dbl>
This would save the file to the current working directory.
You can specify the location to download in export as long as you put the folder structure before the CSV file name, like:
bulk_csv(folder="data/csvs", export="data/data_export/combined_data.csv")
##
|
| | 0%
|
|================ | 25%
|
|================================ | 50%
|
|================================================= | 75%
|
|=================================================================| 100%
## # A tibble: 3,558 x 11
## STATEFP COUNTYFP TRACTCE GEOID NAME NAMELSAD subid.GEOID subid.NAME
## <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
## 1 2 198 300 2.20e9 3.00e0 Census … 213039010 Ketchikan
## 2 2 198 940100 2.20e9 9.40e3 Census … 219864310 Prince of…
## 3 2 198 200 2.20e9 2.00e0 Census … 219864310 Prince of…
## 4 2 198 100 2.20e9 1.00e0 Census … 219864310 Prince of…
## 5 2 164 100 2.16e9 1.00e0 Census … 215041200 Kodiak Is…
## 6 2 16 200 2.02e9 2.00e0 Census … 201601615 Aleutians…
## 7 2 16 100 2.02e9 1.00e0 Census … 201601615 Aleutians…
## 8 2 170 502 2.17e9 5.02e0 Census … 212238460 Kenai-Coo…
## 9 2 170 501 2.17e9 5.01e0 Census … 217047440 Matanuska…
## 10 2 170 1202 2.17e9 1.20e1 Census … 217047440 Matanuska…
## # … with 3,548 more rows, and 3 more variables: legit_geo <dbl>,
## # legit_name <chr>, legit_area <dbl>
Same concept as above, but the function name is bulk_excel()
.
Read more about the function on the muckrakr github repo.
© Copyright 2018, Andrew Ba Tran
© Copyright 2018, Andrew Tran