How to Correct the Precipitation Gauge

If you looked at the precipitation gauge's raw data during the Summer, you would often see a daily cycle of variation. On some days, the reported amounts might swing over a range of more than an inch.

Since this variation is happening in dry weather, the gauge is apparently being affected by factors other than just precipitation. Here's one method to try to reduce this variation.

The raw data for the gauge is available on the net, and looks like:

Hogg Pass
2005 07 30 00 52.7 54.9
2005 07 30 01 52.5 53.1
2005 07 30 02 52.6 51.3
2005 07 30 03 52.6 49.5
2005 07 30 04 52.6 47.5

If we saved this "page" to a file, fed it to a spreadsheet program, and plotted the "Temperature" and "Precipitation" data columns, we might see something like:

From this plot, it is quite evident that as the temperature rises, the reported precipitation amount falls. Perhaps a temperature correction factor would help. (Note that we don't have to prove "cause and effect", but just that a reasonable correlation exists.)

This technique assumes that no rainfall occurred during the sample data period.

Correlating Precipitation to Temperature

Suppose the temperature that should be used in a correction equation is not the current air temperature. The sensor is after all like a large, unstirred, metal pot filled with water. Perhaps the temperature of the water (or pressure gauge) is what really needs to be estimated. Whatever the cause, it is probably driven by, and lags behind, variations in the air temperature.

The estimated temperature could be calculated as a weighted average of the current and a few of the previous hours' air temperatures. We need to determine how much "weight" to give to these factors. We can do this by creating an "estimated temperature" equation, plotting its values versus the precipitation values, and looking for a set of weights (or coefficients) which provide the best correlation.

The "cloud" of data points which show the correlation of precipitation to temperature will change as the coefficients change. For better correlations, the cloud will appear to coalesce along a simple curve.

To find these coefficients, add another data column to the spreadsheet, "T_estimated", which is the result of an equation:

T_estimated = (c_0 * T_current) + (c_1 * T_current-1) + (c_2 * T_current-2) + (c_3 * T_current-3)

The coefficients (c_0 ... c_3) should be defined in cells so that they may be easily changed and the effects of that change observed. When the sum of the values of these coefficients is 1.0, "T_estimated" can be considered to be a real temperature, blended from portions of the current and previous hours' data.

If c_0 = 1.0 and all other coefficients = 0.0, then "T_estimated" = the current temperature.
If c_1 = 1.0 and all other coefficients = 0.0, then "T_estimated" = the previous hour's temperature.

Now change the graph to be a "scatter plot", with "T_estimated" on the X-axis and Precipitation on the Y-axis. Add a second degree polynomial "trend line", and show its "R²" correlation factor. We are going to use R² to find the optimal coefficient weightings ("c_0" ... "c_3").

An R² = 1.0 implies a perfect correlation, which is the maximum possible value. For all other cases R² will be less. Whatever combination of coefficients yields the highest R² value will be the "best" ones to use in the temperature estimation equation. For example:

With c_0 = 1.0 and all other coefficients = 0.0, "R²" = 0.866

With c_1 = 1.0 and all other coefficients = 0.0, "R²" = 0.9268.

This means that the reported precipitation amount is slightly more correlated to the previous hour's temperature than it is to its own!

Start adding non-zero values for the rest of the coefficients, and try to find the set which gives the best "R²" value. The sum of c_0 to c_3 need not be constrained to equal "1.00" until you are ready to finalize their values. At that point, normalize the coefficients. (R² will not be affected.)

It appears that a combination of the current and two previous hours' data has the best correlation (R² = 0.9325), for which the estimated temperature's equation becomes:

T_estimated = (0.24 * T_current) + (0.57 * T_current-1) + (0.19 * T_current-2)

Different data sets can be expected to yield different results, so if you try this, your results will probably differ slightly. Short of automating and periodically re-generating these results, the above equation is probably "Close Enough".

Making an Estimated Correction

If the scatter plot's trend line had been perfectly level, "Precipitation" would be a constant, and would not be correlated to changes in temperature. The difference between a level line and the trend line is the amount of temperature correction needed. The trend line seen in this example is level for temperatures near 45 degrees, but requires increasing corrections as the temperature rises.

Correction = (Trend line(45 degrees) - Trend line(T_estimated_current))

To generalize this correction factor, we need to scale it by a factor proportional to the current precipitation divided by the sample data's precipitation amount:

Correction = (Trend line(45 degrees) - Trend line(T_estimated_current)) * ((Precipitation_current) / (Precipitation_{sample data}))

If we ask the spreadsheet for the equation of the trend line, the correction equation looks something like:

Correction = (52.6 - ((-0.0009*T_est.²) + (0.0754*T_est.) + (50.95))) * ((Precipitation_current) / (52.6))

This correction should probably only be used when "T_estimated" is above 45 degrees.

If we use these equations, the "corrected precipitation" has about one-third of the variation seen in the original data. The daily component is mostly gone, leaving some noise and a slowly varying component which looks to be related to atmospheric pressure.

The correlation of the temperature-corrected Precipitation to Atmospheric pressure was relatively poor (R² = 0.2175), resulting in no observable improvement in the variation when attempting to correct for it.

A simple temperature correction is not perfect, but such is to be expected when using simple models and quantized data.

2005 08/06,07,08