How to select features for predictive model from temporal dataset with irregular measurements?

by MachineEpsilon   Last Updated July 12, 2019 06:19 AM

I have a series of measurements taken of water pollution on a roughly weekly basis at different locations. In total there is a measurement every 3 days at a different location. I have series of regular environmental measurements (wind, tides, precipitation) taken daily at the same locations with no missing data. I know that the environmental information is a strong predictor of the water pollution level, and I want to forecast the water pollution.

This difference in timescale of the measurements seems to be problematic. If I use the union of the two sets of measurements approximate 6/7 of the measurements of the dependent variable is missing per day and per location. It is known that there can be a lag between the effect from the environmental measurements effect the pollution measurements, how can I quantify this though given the amount of missing data?

As mentioned, I want to use the environmental measurements to build a predictive model, but I am not sure how many days of environmental measurements I should use or how to select that in an appropriate way e.g. interpolating the missing data, and looking at the ACF as per this question: Use ACF and PACF for irregular time series?.



Related Questions


Updated July 23, 2019 20:19 PM

Updated July 20, 2019 17:19 PM

Updated July 22, 2019 09:19 AM

Updated July 23, 2019 20:19 PM