DATABASE DESCRIPTION
NEW YORK
Hi-Rez Data(TM) Climatological Series
Copyright(c) 1990-95 by ZedX, Inc.
INTRODUCTION
The databases comprising the Hi-Rez Data(TM), Climatological Series, consist of geographically-addressed integer values in a compressed, raster form. The databases are identified by state, variable, and month. The high resolution data in each monthly database were generated by applying a mathematical algorithm to the 30-year (1951-80) and (1961-90) climatological station records, which were compiled as part of the "Climatographies of the United States," by the National Climatic Data Center. The high resolution data in each annual database were derived by either averaging or summing the twelve interpolated monthly databases. The data are representative of a standard station setting at a spatial resolution of approximately one square kilometer. State and county political boundaries are included as a separate database. The high resolution boundary data were derived from the "1980 Census Digital Boundary File," which was created by the Geography Division of the Bureau of Census. The boundary database, which can be overlaid on the climatological data, has been provided for reference purposes. This printout provides a description of the database structure, a statement concerning the accuracy of a Hi-Rez Data(TM), and reference station data for checking the installation of the Hi-Rez Data(TM) into a geographic information system (GIS) or similar program.
DATA ACCURACY
The "accuracy" of the Hi-Rez Data(TM), or the extent to which an interpolated value agrees with a station record, depends on a number of factors. These factors include the quality and protocol of the original tation observations, the density of stations, the spatial scale represented by a datum, the variation and height of the surface topography across a region, the proximity of large water bodies to station locations, and, most importantly, how the data are to be integrated int o an application or interpreted for a particular decision. Before any evaluation is made to determine the accuracy of the interpolated values, a number of assumptions must be stated to clarify the physical setting represented by a datum. The first assumption is that the climatological records derived from station observations are the true values of any potential site within a geographic area. The second assumption is that the observations at a site were made with current instrument placement practices, and observation protocols. The third assumption is that all station sites are over a gently sloping, clipped grass surface that has no nearby obstructions to air flow. With these assumptions, no accounting is made for a non-standard station design, poor instrument placement, damaged or miscalibrated instruments, poor observing practices, poor exposure, or a non-grass surface cover, when arriving at a number to represent the accuracy of a datum.
The accuracy figures presented in this printout are limited to their inherent spatial variability in a data set for a region defined by state boundaries. There are a number of statistical measures that can be used to judge the collective or "average" accuracy of a data sample across a state, but few reveal the risk of using an interpolated datum in place of a station record. However, this risk can be quantified with three arithmetic measures that can computed by comparing the Hi-Rez Data(TM) to matched samples of the original station records. In order to match an interpolated datum to a station record, it was assumed that both data sets satisfied the assumptions above about the physical setting, and that the climatological data in both sets represent average values for the approximate one-square-kilometer areas (pixels) that define the spatial scale of the databases.
The three arithmetic measures are the difference between means of the two matched data sets, the mean absolute difference between the two data sets, and the absolute differences for five progressive percentages of the two matched data samples. The difference between means can be interpreted as the "bias" in the Hi-Rez Data(TM) relative to the station records serving as a truth set. The bias gives a measure of how far off, as a state average, the interpolated values are from the original station values. The mean absolute difference is a measure of how much an interpolated value differs from its station counterpart regardless of whether the difference is positive or negative. As a mean, the absolute difference, like the bias, is an average for all locations within a state. The third measure, which is also based on the absolute difference, accounts for the variability in the differences between interpolated and station values for different sites across a state. This measure consists of five absolute differences that correspond to progressive percentages of the sample size of the matched data sets.
The absolute differences as a percentage of sample size were derived in three steps. First the absolute difference was calculated for each matched pair of interpolated and station values for all station locations within a state. Second, the absolute differences were sorted in an ascending order. That is, the station with the lowest absolute difference, regardless of its geographic location, was the first entry and the station with the largest difference was the last entry. Third, an absolute difference was assigned to a percentage of the sample size based on its sorted position. Five percentages were selected: 50, 75, 90, 95, and 99%. Since the absolute difference is being presented in an unconventional form, an example may be useful to illustrate its derivation and its usefulness for quantifying the accuracy of the Hi-Rez Data(TM). Suppose a small state had matched temperature data sets of ten values each and that the absolute differences computed for the two sets were 2.8, 1.5, 0.8, 3.5, 2.1, 0.4, 1.9, 1.0, 2.5, and 1.2 F. After being sorted without regard to location, the values were stored as 0.4, 0.8, 1.0, 1.2, 1.5, 1.9, 2.1, 2.5, 2.8, and 3.5 F. The absolute difference corresponding to 50% of the sample size would be 1.5 F. The absolute difference corresponding to 90% of the sample would be 2.8 F. The absolute differences at these set percentages can be interpreted as a measure of Hi-Rez Data(TM) accuracy. A user could infer that 50% of all locations represented by the interpolated data had an absolute difference of 1.5 F or less. Similarly, the same user could infer that 90% of all locations had an absolute difference of 2.8 F or less. The user understands that the numbers represented by the absolute differences can be positive or negative. The user also understands that the progressive percentages represent a decrease in the risk of using an interpolated value as a substitute for a station value. That is, an absolute difference of 2.8 F for 90% of a sample has less risk than the same absolute difference at 50% of the same sample.
The choice of percentage for judging data accuracy depends on the data requirements and the acceptable level of risk for a particular application. It is important to note that the user is basing the above inferences on a limited sample size within a state. However, the inferences are based on the original station records, and the pattern displayed by the absolute differences is, for most states, representative of most locations within a state. As a final note, the user should be aware that the greatest absolute differences are generally found at higher elevations, along the borders of large water bodies, and along the slopes of large mountains. The three arithmetic measures chosen to evaluate the accuracy of the Hi-Rez Data(TM) are presented in Table 3 for the listed state. It should be noted that the sample size given in the table represents a subset of the total number of stations used to generate the Hi-Rez Data. The larger sample, used by the interpolation algorithm, defines a climatological zone that extends over several states including the one listed in the table. However, the sample used to compute the measures in Table 3 does include nearly all the stations in the listed state. Table 1. The difference between means (bias), the mean absolute difference, and the absolute difference as a percentage of sample size are presented for the monthly maximum temperature, minimum temperature, and precipitation totals for the listed state. The three measures were computed from matched data sets of Hi-Rez Data(TM) and station records. ACCURACY FILES
COMPARISON OF ZEDX HI-REZ DATA TO STATION VALUES
DATA SET: 1961-90 STATE: NEW YORK
COMPARISON OF ZEDX HI-REZ DATA TO STATION VALUES
DATA SET: 1961-90 STATE: NEW YORK
|
Maximum Temperature (F) |
Abs dif as % of Sample |
||||||||||
|
Month |
Sample Size |
Hi-Rez Data |
Station Data |
Bias |
Mean ABS Dif |
50% |
75% |
90% |
95% |
99% |
|
|
JAN |
110 |
30.8 |
30.4 |
.4 |
1.0 |
.9 |
1.5 |
2.1 |
2.5 |
2.8 |
|
|
FEB |
110 |
33.4 |
32.8 |
.6 |
1.1 |
1.0 |
1.5 |
2.2 |
2.6 |
3.0 |
|
|
MAR |
110 |
43.7 |
42.9 |
.8 |
1.4 |
1.2 |
2.0 |
2.6 |
2.8 |
3.2 |
|
|
APR |
110 |
56.0 |
55.5 |
.5 |
1.5 |
1.2 |
2.1 |
3.1 |
3.5 |
3.9 |
|
|
MAY |
110 |
68.0 |
67.7 |
.3 |
1.6 |
1.2 |
2.2 |
3.7 |
3.9 |
4.2 |
|
|
JUN |
110 |
76.6 |
76.2 |
.4 |
1.4 |
1.1 |
1.9 |
3.0 |
3.6 |
4.2 |
|
|
JUL |
110 |
81.1 |
80.9 |
.2 |
1.2 |
.9 |
1.8 |
2.4 |
3.3 |
3.7 |
|
|
AUG |
110 |
79.1 |
78.7 |
.4 |
1.1 |
.9 |
1.6 |
2.2 |
2.7 |
3.1 |
|
|
SEP |
110 |
71.7 |
71.2 |
.5 |
1.1 |
.9 |
1.7 |
2.0 |
2.4 |
2.9 |
|
|
OCT |
110 |
60.4 |
60.1 |
.3 |
.9 |
.8 |
1.3 |
1.7 |
2.0 |
2.8 |
|
|
NOV |
110 |
47.7 |
47.5 |
.2 |
.8 |
.7 |
1.2 |
1.8 |
2.0 |
2.2 |
|
|
DEC |
110 |
35.2 |
35.0 |
.2 |
1.0 |
.8 |
1.5 |
1.8 |
2.2 |
2.4 |
|
|
ANN. |
110 |
57.0 |
56.6 |
.4 |
1.0 |
.9 |
1.4 |
1.9 |
2.1 |
2.8 |
|
|
Minimum Temperature (F) |
Abs dif as % of Sample |
|||||||||||||||||||||||||||||||||||||||||
|
Month |
Sample Size |
Hi-Rez Data |
Station Data |
Bias |
MeanABS Dif |
50% |
75% |
90% |
95% |
99% |
||||||||||||||||||||||||||||||||
|
JAN |
110 |
11.7 |
12.3 |
-.6 |
2.7 |
2.4 |
4.0 |
5.2 |
5.6 |
6.5 |
||||||||||||||||||||||||||||||||
|
FEB |
110 |
13.0 |
13.4 |
-.4 |
2.5 |
2.2 |
3.4 |
4.7 |
5.7 |
6.4 |
||||||||||||||||||||||||||||||||
|
MAR |
110 |
23.3 |
23.4 |
-.1 |
1.7 |
1.4 |
2.2 |
3.2 |
4.5 |
5.1 |
||||||||||||||||||||||||||||||||
|
APR |
110 |
33.7 |
34.0 |
-.3 |
1.5 |
1.3 |
1.9 |
3.2 |
3.8 |
4.7 |
||||||||||||||||||||||||||||||||
|
MAY |
110 |
44.2 |
44.4 |
-.2 |
1.6 |
1.3 |
2.3 |
3.3 |
3.9 |
5.0 |
||||||||||||||||||||||||||||||||
|
JUN |
110 |
53.2 |
53.5 |
-.3 |
1.6 |
1.3 |
2.3 |
3.4 |
3.8 |
5.2 |
||||||||||||||||||||||||||||||||
|
JUL |
110 |
58.2 |
58.6 |
-.4 |
1.9 |
1.6 |
2.6 |
3.6 |
4.2 |
5.7 |
||||||||||||||||||||||||||||||||
|
AUG |
110 |
56.7 |
57.1 |
-.4 |
1.9 |
1.5 |
2.8 |
3.5 |
4.0 |
5.9 |
||||||||||||||||||||||||||||||||
|
SEP |
110 |
49.2 |
49.8 |
-.6 |
2.1 |
1.7 |
2.9 |
4.0 |
4.8 |
6.5 |
||||||||||||||||||||||||||||||||
|
OCT |
110 |
38.7 |
39.4 |
-.7 |
2.2 |
2.0 |
3.0 |
4.6 |
5.2 |
7.0 |
||||||||||||||||||||||||||||||||
|
NOV |
110 |
30.5 |
31.1 |
-.6 |
1.9 |
1.6 |
2.4 |
4.0 |
4.7 |
6.2 |
||||||||||||||||||||||||||||||||
|
DEC |
110 |
18.5 |
19.1 |
-.6 |
2.3 |
2.2 |
3.4 |
4.6 |
5.0 |
5.9 |
||||||||||||||||||||||||||||||||
|
ANN. |
110 |
35.9 |
36.3 |
-.4 |
1.9 |
1.8 |
2.6 |
3.4 |
4.3 |
5.7 |
||||||||||||||||||||||||||||||||
|
Precipitation (in) |
Abs dif as % of Sample |
|||||||||||||||||||||||||||||||||||||||||
|
Month |
Sam-ple Size |
Hi-Rez Data |
StationData |
Bias |
MeanABS Dif |
50% |
75% |
90% |
95% |
99% |
||||||||||||||||||||||||||||||||
|
JAN |
196 |
2.66 |
2.61 |
.05 |
.49 |
.43 |
.73 |
.94 |
1.10 |
2.20 |
||||||||||||||||||||||||||||||||
|
FEB |
196 |
2.50 |
2.44 |
.06 |
.39 |
.35 |
.55 |
.73 |
.88 |
1.48 |
||||||||||||||||||||||||||||||||
|
MAR |
196 |
3.02 |
2.90 |
.12 |
.44 |
.38 |
.62 |
.85 |
.98 |
1.27 |
||||||||||||||||||||||||||||||||
|
APR |
196 |
3.38 |
3.36 |
.02 |
.35 |
.29 |
.49 |
.70 |
.77 |
.97 |
||||||||||||||||||||||||||||||||
|
MAY |
196 |
3.74 |
3.67 |
.07 |
.33 |
.29 |
.46 |
.65 |
.75 |
.90 |
||||||||||||||||||||||||||||||||
|
JUN |
196 |
3.94 |
3.89 |
.05 |
.26 |
.21 |
.38 |
.56 |
.65 |
.87 |
||||||||||||||||||||||||||||||||
|
JUL |
196 |
3.77 |
3.58 |
.19 |
.36 |
.31 |
.52 |
.73 |
.88 |
1.02 |
||||||||||||||||||||||||||||||||
|
AUG |
196 |
3.87 |
3.83 |
.04 |
.40 |
.37 |
.58 |
.78 |
.95 |
1.31 |
||||||||||||||||||||||||||||||||
|
SEP |
196 |
3.66 |
3.73 |
-.07 |
.37 |
.29 |
.48 |
.79 |
.96 |
1.61 |
||||||||||||||||||||||||||||||||
|
OCT |
196 |
3.32 |
3.31 |
.01 |
.34 |
.26 |
.48 |
.64 |
.88 |
1.39 |
||||||||||||||||||||||||||||||||
|
NOV |
196 |
3.80 |
3.76 |
.04 |
.52 |
.44 |
.69 |
.93 |
1.26 |
1.86 |
||||||||||||||||||||||||||||||||
|
DEC |
196 |
3.36 |
3.30 |
.06 |
.55 |
.48 |
.79 |
1.03 |
1.21 |
2.09 |
||||||||||||||||||||||||||||||||
|
ANNL |
196 |
41.01 |
40.40 |
.61 |
4.14 |
3.65 |
5.74 |
8.13 |
9.48 |
12.29 |
||||||||||||||||||||||||||||||||
Hi-Rez Data(TM) 6/95
Hi-Rez Data(TM) 6/95