How to fill NAs with LOCF by factors in data frame, split by country
I have the following data frame (simplified) with the country variable as a factor and the value variable has missing values:
country value AUT NA AUT 5 AUT NA AUT NA GER NA GER NA GER 7 GER NA GER NA
The following generates the above data frame:
data <- data.frame(country=c("AUT", "AUT", "AUT", "AUT", "GER", "GER", "GER", "GER", "GER"), value=c(NA, 5, NA, NA, NA, NA, 7, NA, NA))
Now, I would like to replace the NA values in each country subset using the method last observation carried forward (LOCF). I know the command
na.locf in the zoo package.
data <- na.locf(data) would give me the following data frame:
country value AUT NA AUT 5 AUT 5 AUT 5 GER 5 GER 5 GER 7 GER 7 GER 7
However, the function should only be used on the individual subsets split by the country. The following is the output I would need:
country value AUT NA AUT 5 AUT 5 AUT 5 GER NA GER NA GER 7 GER 7 GER 7
I can't think of an easy way to implement it. Before starting with for-loops, I was wondering if anyone has any idea as to how to solve this.
ddply solution. Try this
library(plyr) ddply(DF, .(country), na.locf) country value 1 AUT <NA> 2 AUT 5 3 AUT 5 4 AUT 5 5 GER <NA> 6 GER <NA> 7 GER 7 8 GER 7 9 GER 7
ddply help you can find that
.variables: variables to split data frame by, as quoted variables, a formula or character vector.
so another alternatives to get what you want are:
ddply(DF, "country", na.locf) ddply(DF, ~country, na.locf)
note that replacing
DF$variable is not allowed, that's why you got an error when doing this.
DF is your data.frame