R Merging XTS time series causing duplicate time of day


I've never found an efficient way to solve a problem I've encountered every time I try to combine different sources of time series data. By different sources, I mean combining say a data source from the internet (yahoo stock prices) with say a local csv time series.

yahoo.xts  # variable containing security prices from yahoo
local.xts  # local time series data 
cbind(yahoo.xts,local.xts)  # combine them

The result is as follows:

enter image description here

I get a combined xts data frame with different time for a given date. What I want is ignore the time for a given day and align them. The way I've been solving this problem is to extract the two separate sources of data's index and converting using as.Date function and then re-wrapping them as xts object. My question is if there is another better more efficient way that I missed.

Note: I am unsure how to provide a good example of a local data source to give you guys a good way to replicate the problem but the following is a snippet of how to get data from online.

require(quantmod)
data.etf = env()
getSymbols.av(c('XOM','AAPL'), src="av", api.key="your-own-key",from = '1970-01-01',adjusted=TRUE,
            output.size="full",env = data.etf,  set.symbolnames = T, auto.assign = T)
yahoo.xts = Cl(data.etf$XOM)

Heres some data:

Yahoo:

structure(c(112.68, 109.2, 107.86, 104.35, 104.68, 110.66), class = c("xts", 
"zoo"), .indexCLASS = c("POSIXct", "POSIXt"), tclass = c("POSIXct", 
"POSIXt"), .indexTZ = "America/Chicago", tzone = "America/Chicago", index = structure(c(1508457600, 
1508716800, 1508803200, 1508889600, 1508976000, 1509062400), tzone = "America/Chicago", tclass = c("POSIXct", 
"POSIXt")), .Dim = c(6L, 1L), .Dimnames = list(NULL, "XIV"))

Local structure:

structure(c(0.176601541324807, -0.914132074513824, -0.0608652702022332, 
-0.196679777210441, -0.190397155984135, 0.915313388202916, -0.0530280808936784, 
0.263895885521142, 0.10844973759151, 0.0547864992300319, 0.0435149080877898, 
-0.202388932508539, 0.0382888645282672, -0.00800908217028123, 
-0.0798424223984417, 0.00268898461896916, 0.00493307845560457, 
0.132697099147406, 0.074267173330532, -0.336299384720176, -0.0859815663679892, 
-0.0597168456705514, -0.0867777000321366, 0.283394650847026, 
-0.0100414455118704, 0.106355723615723, -0.0640682814821423, 
0.0481841070155836, -0.00321273561708742, -0.13182105331959), .indexCLASS = c("POSIXct", 
"POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = structure("America/Chicago", .Names = "TZ"), tzone = structure("America/Chicago", .Names = "TZ"), class = c("xts", 
"zoo"), na.action = structure(1L, class = "omit", index = 1080540000), index = structure(c(1508475600, 
1508734800, 1508821200, 1508907600, 1508994000, 1509080400), tzone = structure("America/Chicago", .Names = "TZ"), tclass = c("POSIXct", 
"POSIXt")), .Dim = c(6L, 5L), .Dimnames = list(NULL, c("D.30", 
"D.60", "D.90", "D.120", "D.150")))

Answers:


If you understand the sources of your problem, perhaps you can avoid the problem in the first place.

Your problem is that the 19:00:00 stamps in your printed results correspond to UTC dates (as at 12AM UTC) converted to "America/Chicago" POSIXct timestamps, when the merge happens.

As you've pointed out, one solution is to make new xts time indexes which are all of date format. But it does get annoying. It's best to avoid the situation in the first place, if you can, otherwise you have to resort to changing the date time series to a POSIXct time series with appropriate timezones.

The key thing you need to understand when you have misaligning xts objects with date data (or more precisely, what you think is date data), is that the time zones are not aligning in the objects. If the timezones are aligning in the time indexes of your xts objects, then you will get the correct merging without the undesirable behaviour. Of course, date objects don't have timezones, and by default they will be given the timezone "UTC" if they are merged with xts objects with time indexes of type POSIXct.

# reproduce your data (your code isn't reproducible fully for me:

require(quantmod)
data.etf = new.env()
getSymbols(c('XOM','AAPL'), src="yahoo", api.key="your-own-key",from = '1970-01-01',adjusted=TRUE,output.size="full",env = data.etf,  set.symbolnames = T, auto.assign = T)
yahoo.xts = Cl(data.etf$XOM)

z <- structure(c(0.176601541324807, -0.914132074513824, -0.0608652702022332, 
                 -0.196679777210441, -0.190397155984135, 0.915313388202916, -0.0530280808936784, 
                 0.263895885521142, 0.10844973759151, 0.0547864992300319, 0.0435149080877898, 
                 -0.202388932508539, 0.0382888645282672, -0.00800908217028123, 
                 -0.0798424223984417, 0.00268898461896916, 0.00493307845560457, 
                 0.132697099147406, 0.074267173330532, -0.336299384720176, -0.0859815663679892, 
                 -0.0597168456705514, -0.0867777000321366, 0.283394650847026, 
                 -0.0100414455118704, 0.106355723615723, -0.0640682814821423, 
                 0.0481841070155836, -0.00321273561708742, -0.13182105331959), .indexCLASS = c("POSIXct", 
                                                                                               "POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = structure("America/Chicago", .Names = "TZ"), tzone = structure("America/Chicago", .Names = "TZ"), class = c("xts", 
                                                                                                                                                                                                                                                                  "zoo"), na.action = structure(1L, class = "omit", index = 1080540000), index = structure(c(1508475600, 
                                                                                                                                                                                                                                                                                                                                                             1508734800, 1508821200, 1508907600, 1508994000, 1509080400), tzone = structure("America/Chicago", .Names = "TZ"), tclass = c("POSIXct", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          "POSIXt")), .Dim = c(6L, 5L), .Dimnames = list(NULL, c("D.30", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 "D.60", "D.90", "D.120", "D.150")))

#inspect the index timezones and classes:
> class(index(z))
# [1] "POSIXct" "POSIXt" 
> class(index(yahoo.xts))
# [1] "Date"

indexTZ(z)
# TZ 
# "America/Chicago" 
indexTZ(yahoo.xts)
# [1] "UTC"

You can see that yahoo.xts is using a date class. When this is merged with a POSIXct class (i.e. with z, it will be converted to the "UTC" timestamp.

# Let's see what happens if the timezone of the yahoo.xts2 object is the same as z:
yahoo.xts2 <- xts(coredata(yahoo.xts), order.by = as.POSIXct(as.character(index(yahoo.xts)), tz = "America/Chicago"))

str(yahoo.xts2)
An ‘xts’ object on 1970-01-02/2017-10-27 containing:
    Data: num [1:12067, 1] 1.94 1.97 1.96 1.95 1.96 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr "XOM.Close"
Indexed by objects of class: [POSIXct,POSIXt] TZ: America/Chicago
xts Attributes:  
    NULL


u2 <- merge(z,yahoo.xts2)
tail(u2)
class(index(u2))
# [1] "POSIXct" "POSIXt" 

tail(u2, 3)
# D.30        D.60        D.90       D.120        D.150 XOM.Close
# 2017-10-25 -0.1966798  0.05478650 0.002688985 -0.05971685  0.048184107     83.17
# 2017-10-26 -0.1903972  0.04351491 0.004933078 -0.08677770 -0.003212736     83.47
# 2017-10-27  0.9153134 -0.20238893 0.132697099  0.28339465 -0.131821053     83.71

Everything is as expected now.

A shortcut that you might find useful is to this:

z3 <- as.xts(as.data.frame(z), dateFormat="Date")
tail(merge(z3, yahoo.xts))

# D.30        D.60         D.90       D.120        D.150 XOM.Close
# 2017-10-20  0.17660154 -0.05302808  0.038288865  0.07426717 -0.010041446     83.11
# 2017-10-23 -0.91413207  0.26389589 -0.008009082 -0.33629938  0.106355724     83.24
# 2017-10-24 -0.06086527  0.10844974 -0.079842422 -0.08598157 -0.064068281     83.47
# 2017-10-25 -0.19667978  0.05478650  0.002688985 -0.05971685  0.048184107     83.17
# 2017-10-26 -0.19039716  0.04351491  0.004933078 -0.08677770 -0.003212736     83.47
# 2017-10-27  0.91531339 -0.20238893  0.132697099  0.28339465 -0.131821053     83.71

Convert to a data.frame, then convert back to an xts with the appropriate parameter setting : dateFormat="Date". Now you are working with an xts object with a time index that is of type date with no timezone issues:

class(index(merge(z3, yahoo.xts)))
#[1] "Date"