This article illustrates how to download stock price data files from Google, save it into a local drive and merge them into a single data frame. This script is slightly modified from a script which downloads RStudio package download log data. The original source can be found here.
First of all, the following three packages are used.
1library(knitr)
2library(lubridate)
3library(stringr)
4library(plyr)
5library(dplyr)
The script begins with creating a folder to save data files.
1# create data folder
2dataDir <- paste0("data","_","2014-11-20-Download-Stock-Data-1")
3if(file.exists(dataDir)) {
4 unlink(dataDir, recursive = TRUE)
5 dir.create(dataDir)
6} else {
7 dir.create(dataDir)
8}
After creating urls and file paths, files are downloaded using Map
function - it is a warpper of mapply
. Note that, in case the function breaks by an error (eg when a file doesn’t exist), download.file
is wrapped by another function that includes an error handler (tryCatch
).
1# assumes codes are known beforehand
2codes <- c("MSFT", "TCHC") # codes <- c("MSFT", "1234") for testing
3urls <- paste0("http://www.google.com/finance/historical?q=NASDAQ:",
4 codes,"&output=csv")
5paths <- paste0(dataDir,"/",codes,".csv") # back slash on windows (\\)
6
7# simple error handling in case file doesn't exists
8downloadFile <- function(url, path, ...) {
9 # remove file if exists already
10 if(file.exists(path)) file.remove(path)
11 # download file
12 tryCatch(
13 download.file(url, path, ...), error = function(c) {
14 # remove file if error
15 if(file.exists(path)) file.remove(path)
16 # create error message
17 c$message <- paste(substr(path, 1, 4),"failed")
18 message(c$message)
19 }
20 )
21}
22# wrapper of mapply
23Map(downloadFile, urls, paths)
Finally files are read back using llply
and they are combined using rbind_all
. Note that, as the merged data has multiple stocks’ records, Code
column is created.
1# read all csv files and merge
2files <- dir(dataDir, full.name = TRUE)
3dataList <- llply(files, function(file){
4 data <- read.csv(file, stringsAsFactors = FALSE)
5 # get code from file path
6 pattern <- "/[A-Z][A-Z][A-Z][A-Z]"
7 code <- substr(str_extract(file, pattern), 2, nchar(str_extract(file, pattern)))
8 # first column's name is funny
9 names(data) <- c("Date","Open","High","Low","Close","Volume")
10 data$Date <- dmy(data$Date)
11 data$Open <- as.numeric(data$Open)
12 data$High <- as.numeric(data$High)
13 data$Low <- as.numeric(data$Low)
14 data$Close <- as.numeric(data$Close)
15 data$Volume <- as.integer(data$Volume)
16 data$Code <- code
17 data
18}, .progress = "text")
19
20data <- rbind_all(dataList)
Some of the values are shown below.
Date | Open | High | Low | Close | Volume | Code |
---|---|---|---|---|---|---|
2014-11-26 | 47.49 | 47.99 | 47.28 | 47.75 | 27164877 | MSFT |
2014-11-25 | 47.66 | 47.97 | 47.45 | 47.47 | 28007993 | MSFT |
2014-11-24 | 47.99 | 48.00 | 47.39 | 47.59 | 35434245 | MSFT |
2014-11-21 | 49.02 | 49.05 | 47.57 | 47.98 | 42884795 | MSFT |
2014-11-20 | 48.00 | 48.70 | 47.87 | 48.70 | 21510587 | MSFT |
2014-11-19 | 48.66 | 48.75 | 47.93 | 48.22 | 26177450 | MSFT |
This way wouldn’t be efficient compared to the way where files are read directly without being saved into a local drive. This option may be useful, however, if files are large and the API server breaks connection abrubtly.
I hope this article is useful and I’m going to write an article to show the second way.
Comments