In an earlier article, a way to download stock price data files from Google, save it into a local drive and merge them into a single data frame. If files are not large, however, it wouldn’t be effective and, in this article, files are downloaded and merged internally.
The following packages are used.
1library(knitr)
2library(lubridate)
3library(stringr)
4library(plyr)
5library(dplyr)
Taking urls as file locations, files are directly read using llply
and they are combined using rbind_all
. As the merged data has multiple stocks’ records, Code
column is created. Note that, when an error occurrs, the function returns a dummy data frame in order not to break the loop - values of the dummy data frame(s) are filtered out at the end.
1# assumes codes are known beforehand
2codes <- c("MSFT", "TCHC") # codes <- c("MSFT", "1234") for testing
3files <- paste0("http://www.google.com/finance/historical?q=NASDAQ:",
4 codes,"&output=csv")
5
6dataList <- llply(files, function(file, ...) {
7 # get code from file url
8 pattern <- "Q:[0-9a-zA-Z][0-9a-zA-Z][0-9a-zA-Z][0-9a-zA-Z]"
9 code <- substr(str_extract(file, pattern), 3, nchar(str_extract(file, pattern)))
10
11 # read data directly from a URL with only simple error handling
12 # for further error handling: http://adv-r.had.co.nz/Exceptions-Debugging.html
13 tryCatch({
14 data <- read.csv(file, stringsAsFactors = FALSE)
15 # first column's name is funny
16 names(data) <- c("Date","Open","High","Low","Close","Volume")
17 data$Date <- dmy(data$Date)
18 data$Open <- as.numeric(data$Open)
19 data$High <- as.numeric(data$High)
20 data$Low <- as.numeric(data$Low)
21 data$Close <- as.numeric(data$Close)
22 data$Volume <- as.integer(data$Volume)
23 data$Code <- code
24 data
25 },
26 error = function(c) {
27 c$message <- paste(code,"failed")
28 message(c$message)
29 # return a dummy data frame
30 data <- data.frame(Date=dmy(format(Sys.Date(),"%d%m%Y")), Open=0, High=0,
31 Low=0, Close=0, Volume=0, Code="NA")
32 data
33 })
34})
35
36# dummy data frame values are filtered out
37data <- filter(rbind_all(dataList), Code != "NA")
Some of the values are shown below.
Date | Open | High | Low | Close | Volume | Code |
---|---|---|---|---|---|---|
2014-11-26 | 47.49 | 47.99 | 47.28 | 47.75 | 27164877 | MSFT |
2014-11-25 | 47.66 | 47.97 | 47.45 | 47.47 | 28007993 | MSFT |
2014-11-24 | 47.99 | 48.00 | 47.39 | 47.59 | 35434245 | MSFT |
2014-11-21 | 49.02 | 49.05 | 47.57 | 47.98 | 42884795 | MSFT |
2014-11-20 | 48.00 | 48.70 | 47.87 | 48.70 | 21510587 | MSFT |
2014-11-19 | 48.66 | 48.75 | 47.93 | 48.22 | 26177450 | MSFT |
It took a bit longer to complete the script as I had to teach myself how to handle errors in R. And this is why I started to write articles in this blog.
I hope this article is useful.
Comments