Download Stock Data - Part I

This article illustrates how to download stock price data files from Google, save it into a local drive and merge them into a single data frame.

This article illustrates how to download stock price data files from Google, save it into a local drive and merge them into a single data frame. This script is slightly modified from a script which downloads RStudio package download log data. The original source can be found here.

First of all, the following three packages are used.

1library(knitr)
2library(lubridate)
3library(stringr)
4library(plyr)
5library(dplyr)

The script begins with creating a folder to save data files.

1# create data folder
2dataDir <- paste0("data","_","2014-11-20-Download-Stock-Data-1")
3if(file.exists(dataDir)) { 
4      unlink(dataDir, recursive = TRUE)
5      dir.create(dataDir)
6} else {
7      dir.create(dataDir)
8}

After creating urls and file paths, files are downloaded using Map function - it is a warpper of mapply. Note that, in case the function breaks by an error (eg when a file doesn’t exist), download.file is wrapped by another function that includes an error handler (tryCatch).

 1# assumes codes are known beforehand
 2codes <- c("MSFT", "TCHC") # codes <- c("MSFT", "1234") for testing
 3urls <- paste0("http://www.google.com/finance/historical?q=NASDAQ:",
 4               codes,"&output=csv")
 5paths <- paste0(dataDir,"/",codes,".csv") # back slash on windows (\\)
 6 
 7# simple error handling in case file doesn't exists
 8downloadFile <- function(url, path, ...) {
 9      # remove file if exists already
10      if(file.exists(path)) file.remove(path)
11      # download file
12      tryCatch(            
13            download.file(url, path, ...), error = function(c) {
14                  # remove file if error
15                  if(file.exists(path)) file.remove(path)
16                  # create error message
17                  c$message <- paste(substr(path, 1, 4),"failed")
18                  message(c$message)
19            }
20      )
21}
22# wrapper of mapply
23Map(downloadFile, urls, paths)

Finally files are read back using llply and they are combined using rbind_all. Note that, as the merged data has multiple stocks’ records, Code column is created.

 1# read all csv files and merge
 2files <- dir(dataDir, full.name = TRUE)
 3dataList <- llply(files, function(file){
 4      data <- read.csv(file, stringsAsFactors = FALSE)
 5      # get code from file path
 6      pattern <- "/[A-Z][A-Z][A-Z][A-Z]"
 7      code <- substr(str_extract(file, pattern), 2, nchar(str_extract(file, pattern)))
 8      # first column's name is funny
 9      names(data) <- c("Date","Open","High","Low","Close","Volume")
10      data$Date <- dmy(data$Date)
11      data$Open <- as.numeric(data$Open)
12      data$High <- as.numeric(data$High)
13      data$Low <- as.numeric(data$Low)
14      data$Close <- as.numeric(data$Close)
15      data$Volume <- as.integer(data$Volume)
16      data$Code <- code
17      data
18}, .progress = "text")
19 
20data <- rbind_all(dataList)

Some of the values are shown below.

Date	Open	High	Low	Close	Volume	Code
2014-11-26	47.49	47.99	47.28	47.75	27164877	MSFT
2014-11-25	47.66	47.97	47.45	47.47	28007993	MSFT
2014-11-24	47.99	48.00	47.39	47.59	35434245	MSFT
2014-11-21	49.02	49.05	47.57	47.98	42884795	MSFT
2014-11-20	48.00	48.70	47.87	48.70	21510587	MSFT
2014-11-19	48.66	48.75	47.93	48.22	26177450	MSFT

This way wouldn’t be efficient compared to the way where files are read directly without being saved into a local drive. This option may be useful, however, if files are large and the API server breaks connection abrubtly.

I hope this article is useful and I’m going to write an article to show the second way.

Download Stock Data - Part I

Comments