If you looked at the precipitation gauge's raw data during the Summer, you would often see a daily cycle of variation. On some days, the reported amounts might swing over a range of more than an inch.

Since this variation is happening in dry weather, the gauge is apparently being affected by factors other than just precipitation. Here's one method to try to reduce this variation.

The raw data for the gauge is available on the net, and looks like:

Hogg Pass 2005 07 30 00 52.7 54.9 2005 07 30 01 52.5 53.1 2005 07 30 02 52.6 51.3 2005 07 30 03 52.6 49.5 2005 07 30 04 52.6 47.5If we saved this "page" to a file, fed it to a spreadsheet program, and plotted the "Temperature" and "Precipitation" data columns, we might see something like:

From this plot, it is quite evident that as the temperature rises, the reported precipitation amount falls. Perhaps a temperature correction factor would help. (Note that we don't have to prove "cause and effect", but just that a reasonable correlation exists.)

This technique assumes that no rainfall occurred during the sample data period.

The estimated temperature could be calculated as a weighted average of the current and a few of the previous hours' air temperatures. We need to determine how much "weight" to give to these factors. We can do this by creating an "estimated temperature" equation, plotting its values versus the precipitation values, and looking for a set of weights (or coefficients) which provide the best correlation.

The "cloud" of data points which show the correlation of precipitation to temperature will change as the coefficients change. For better correlations, the cloud will appear to coalesce along a simple curve.

To find these coefficients, add another data column to the spreadsheet, "T_estimated", which is the result of an equation:

The coefficients (c_0 ... c_3) should be defined in cells so that they may be easily changed and the effects of that change observed. When the sum of the values of these coefficients is 1.0, "T_estimated" can be considered to be a real temperature, blended from portions of the current and previous hours' data.

If c_0 = 1.0 and all other coefficients = 0.0, then "T_estimated" = the
current temperature.

If c_1 = 1.0 and all other coefficients = 0.0, then "T_estimated" = the
previous hour's temperature.

Now change the graph to be a "scatter plot", with "T_estimated" on the X-axis
and Precipitation on the Y-axis. Add a second degree polynomial "trend line",
and show its "R^{2}" correlation factor. We are going to use
R^{2} to find the optimal coefficient weightings ("c_0" ... "c_3").

An R^{2} = 1.0 implies a perfect correlation, which is the maximum
possible value. For all other cases R^{2} will be less. Whatever
combination of coefficients yields the highest R^{2} value will be
the "best" ones to use in the temperature estimation equation. For example:

With c_0 = 1.0 and all other coefficients = 0.0, "R^{2}" = 0.866

With c_1 = 1.0 and all other coefficients = 0.0, "R^{2}" = 0.9268.

This means that the reported precipitation amount is slightly more correlated
to the *previous hour's* temperature than it is to its own!

Start adding non-zero values for the rest of the coefficients, and try to find
the set which gives the best "R^{2}" value. The sum of c_0 to c_3 need
not be constrained to equal "1.00" until you are ready to finalize their values.
At that point, normalize the coefficients. (R^{2} will not be affected.)

It appears that a combination of the current and two previous hours' data has the
best correlation (R^{2} = 0.9325), for which the estimated temperature's
equation becomes:

Different data sets can be expected to yield different results, so if you try this, your results will probably differ slightly. Short of automating and periodically re-generating these results, the above equation is probably "Close Enough".

To generalize this correction factor, we need to scale it by a factor proportional
to the *current precipitation* divided by the *sample data's
precipitation* amount:

If we ask the spreadsheet for the equation of the trend line, the correction equation looks something like:

This correction should probably only be used when "T_estimated" is above 45 degrees.

If we use these equations, the "corrected precipitation" has about one-third of the variation seen in the original data. The daily component is mostly gone, leaving some noise and a slowly varying component which looks to be related to atmospheric pressure.

The correlation of the temperature-corrected
Precipitation to Atmospheric pressure
was relatively poor (R^{2} = 0.2175), resulting in
no observable improvement in the variation
when attempting to correct for it.

A simple temperature correction is not perfect, but such is to be expected when using simple models and quantized data.

2005 08/06,07,08