This article illustrates how to download stock price data files from Google, save it into a local drive and merge them into a single data frame. This script is slightly modified from a script which downloads RStudio package download log data. The original source can be found here.

First of all, the following three packages are used.

1library(knitr)
2library(lubridate)
3library(stringr)
4library(plyr)
5library(dplyr)

The script begins with creating a folder to save data files.

1# create data folder
2dataDir <- paste0("data","_","2014-11-20-Download-Stock-Data-1")
3if(file.exists(dataDir)) { 
4      unlink(dataDir, recursive = TRUE)
5      dir.create(dataDir)
6} else {
7      dir.create(dataDir)
8}

After creating urls and file paths, files are downloaded using Map function - it is a warpper of mapply. Note that, in case the function breaks by an error (eg when a file doesn’t exist), download.file is wrapped by another function that includes an error handler (tryCatch).

 1# assumes codes are known beforehand
 2codes <- c("MSFT", "TCHC") # codes <- c("MSFT", "1234") for testing
 3urls <- paste0("http://www.google.com/finance/historical?q=NASDAQ:",
 4               codes,"&output=csv")
 5paths <- paste0(dataDir,"/",codes,".csv") # back slash on windows (\\)
 6 
 7# simple error handling in case file doesn't exists
 8downloadFile <- function(url, path, ...) {
 9      # remove file if exists already
10      if(file.exists(path)) file.remove(path)
11      # download file
12      tryCatch(            
13            download.file(url, path, ...), error = function(c) {
14                  # remove file if error
15                  if(file.exists(path)) file.remove(path)
16                  # create error message
17                  c$message <- paste(substr(path, 1, 4),"failed")
18                  message(c$message)
19            }
20      )
21}
22# wrapper of mapply
23Map(downloadFile, urls, paths)

Finally files are read back using llply and they are combined using rbind_all. Note that, as the merged data has multiple stocks’ records, Code column is created.

 1# read all csv files and merge
 2files <- dir(dataDir, full.name = TRUE)
 3dataList <- llply(files, function(file){
 4      data <- read.csv(file, stringsAsFactors = FALSE)
 5      # get code from file path
 6      pattern <- "/[A-Z][A-Z][A-Z][A-Z]"
 7      code <- substr(str_extract(file, pattern), 2, nchar(str_extract(file, pattern)))
 8      # first column's name is funny
 9      names(data) <- c("Date","Open","High","Low","Close","Volume")
10      data$Date <- dmy(data$Date)
11      data$Open <- as.numeric(data$Open)
12      data$High <- as.numeric(data$High)
13      data$Low <- as.numeric(data$Low)
14      data$Close <- as.numeric(data$Close)
15      data$Volume <- as.integer(data$Volume)
16      data$Code <- code
17      data
18}, .progress = "text")
19 
20data <- rbind_all(dataList)

Some of the values are shown below.

DateOpenHighLowCloseVolumeCode
2014-11-2647.4947.9947.2847.7527164877MSFT
2014-11-2547.6647.9747.4547.4728007993MSFT
2014-11-2447.9948.0047.3947.5935434245MSFT
2014-11-2149.0249.0547.5747.9842884795MSFT
2014-11-2048.0048.7047.8748.7021510587MSFT
2014-11-1948.6648.7547.9348.2226177450MSFT

This way wouldn’t be efficient compared to the way where files are read directly without being saved into a local drive. This option may be useful, however, if files are large and the API server breaks connection abrubtly.

I hope this article is useful and I’m going to write an article to show the second way.