In an earlier article, a way to download stock price data files from Google, save it into a local drive and merge them into a single data frame. If files are not large, however, it wouldn’t be effective and, in this article, files are downloaded and merged internally.

The following packages are used.

1library(knitr)
2library(lubridate)
3library(stringr)
4library(plyr)
5library(dplyr)

Taking urls as file locations, files are directly read using llply and they are combined using rbind_all. As the merged data has multiple stocks’ records, Code column is created. Note that, when an error occurrs, the function returns a dummy data frame in order not to break the loop - values of the dummy data frame(s) are filtered out at the end.

 1# assumes codes are known beforehand
 2codes <- c("MSFT", "TCHC") # codes <- c("MSFT", "1234") for testing
 3files <- paste0("http://www.google.com/finance/historical?q=NASDAQ:",
 4                codes,"&output=csv")
 5
 6dataList <- llply(files, function(file, ...) {
 7      # get code from file url
 8      pattern <- "Q:[0-9a-zA-Z][0-9a-zA-Z][0-9a-zA-Z][0-9a-zA-Z]"
 9      code <- substr(str_extract(file, pattern), 3, nchar(str_extract(file, pattern)))
10      
11      # read data directly from a URL with only simple error handling
12      # for further error handling: http://adv-r.had.co.nz/Exceptions-Debugging.html
13      tryCatch({
14            data <- read.csv(file, stringsAsFactors = FALSE)
15            # first column's name is funny
16            names(data) <- c("Date","Open","High","Low","Close","Volume")
17            data$Date <- dmy(data$Date)
18            data$Open <- as.numeric(data$Open)
19            data$High <- as.numeric(data$High)
20            data$Low <- as.numeric(data$Low)
21            data$Close <- as.numeric(data$Close)
22            data$Volume <- as.integer(data$Volume)
23            data$Code <- code
24            data               
25      },
26      error = function(c) {
27            c$message <- paste(code,"failed")
28            message(c$message)
29            # return a dummy data frame
30            data <- data.frame(Date=dmy(format(Sys.Date(),"%d%m%Y")), Open=0, High=0,
31                               Low=0, Close=0, Volume=0, Code="NA")
32            data
33      })
34})
35
36# dummy data frame values are filtered out
37data <- filter(rbind_all(dataList), Code != "NA")

Some of the values are shown below.

DateOpenHighLowCloseVolumeCode
2014-11-2647.4947.9947.2847.7527164877MSFT
2014-11-2547.6647.9747.4547.4728007993MSFT
2014-11-2447.9948.0047.3947.5935434245MSFT
2014-11-2149.0249.0547.5747.9842884795MSFT
2014-11-2048.0048.7047.8748.7021510587MSFT
2014-11-1948.6648.7547.9348.2226177450MSFT

It took a bit longer to complete the script as I had to teach myself how to handle errors in R. And this is why I started to write articles in this blog.

I hope this article is useful.