I have a series of measurements taken of water pollution on a roughly weekly basis at different locations. In total there is a measurement every 3 days at a different location. I have series of regular environmental measurements (wind, tides, precipitation) taken daily at the same locations with no missing data. I know that the environmental information is a strong predictor of the water pollution level, and I want to forecast the water pollution.
This difference in timescale of the measurements seems to be problematic. If I use the union of the two sets of measurements approximate 6/7 of the measurements of the dependent variable is missing per day and per location. It is known that there can be a lag between the effect from the environmental measurements effect the pollution measurements, how can I quantify this though given the amount of missing data?
As mentioned, I want to use the environmental measurements to build a predictive model, but I am not sure how many days of environmental measurements I should use or how to select that in an appropriate way e.g. interpolating the missing data, and looking at the ACF as per this question: Use ACF and PACF for irregular time series?.