DATABASE DESCRIPTION

NEW YORK

Hi-Rez Data(TM) Climatological Series

Copyright(c) 1990-95 by ZedX, Inc.

INTRODUCTION

The databases comprising the Hi-Rez Data(TM), Climatological Series, consist of geographically-addressed integer values in a compressed, raster form. The databases are identified by state, variable, and month. The high resolution data in each monthly database were generated by applying a mathematical algorithm to the 30-year (1951-80) and (1961-90) climatological station records, which were compiled as part of the "Climatographies of the United States," by the National Climatic Data Center. The high resolution data in each annual database were derived by either averaging or summing the twelve interpolated monthly databases. The data are representative of a standard station setting at a spatial resolution of approximately one square kilometer. State and county political boundaries are included as a separate database. The high resolution boundary data were derived from the "1980 Census Digital Boundary File," which was created by the Geography Division of the Bureau of Census. The boundary database, which can be overlaid on the climatological data, has been provided for reference purposes. This printout provides a description of the database structure, a statement concerning the accuracy of a Hi-Rez Data(TM), and reference station data for checking the installation of the Hi-Rez Data(TM) into a geographic information system (GIS) or similar program.

DATA ACCURACY

The "accuracy" of the Hi-Rez Data(TM), or the extent to which an interpolated value agrees with a station record, depends on a number of factors. These factors include the quality and protocol of the original tation observations, the density of stations, the spatial scale represented by a datum, the variation and height of the surface topography across a region, the proximity of large water bodies to station locations, and, most importantly, how the data are to be integrated int o an application or interpreted for a particular decision. Before any evaluation is made to determine the accuracy of the interpolated values, a number of assumptions must be stated to clarify the physical setting represented by a datum. The first assumption is that the climatological records derived from station observations are the true values of any potential site within a geographic area. The second assumption is that the observations at a site were made with current instrument placement practices, and observation protocols. The third assumption is that all station sites are over a gently sloping, clipped grass surface that has no nearby obstructions to air flow. With these assumptions, no accounting is made for a non-standard station design, poor instrument placement, damaged or miscalibrated instruments, poor observing practices, poor exposure, or a non-grass surface cover, when arriving at a number to represent the accuracy of a datum.

The accuracy figures presented in this printout are limited to their inherent spatial variability in a data set for a region defined by state boundaries. There are a number of statistical measures that can be used to judge the collective or "average" accuracy of a data sample across a state, but few reveal the risk of using an interpolated datum in place of a station record. However, this risk can be quantified with three arithmetic measures that can computed by comparing the Hi-Rez Data(TM) to matched samples of the original station records. In order to match an interpolated datum to a station record, it was assumed that both data sets satisfied the assumptions above about the physical setting, and that the climatological data in both sets represent average values for the approximate one-square-kilometer areas (pixels) that define the spatial scale of the databases.

The three arithmetic measures are the difference between means of the two matched data sets, the mean absolute difference between the two data sets, and the absolute differences for five progressive percentages of the two matched data samples. The difference between means can be interpreted as the "bias" in the Hi-Rez Data(TM) relative to the station records serving as a truth set. The bias gives a measure of how far off, as a state average, the interpolated values are from the original station values. The mean absolute difference is a measure of how much an interpolated value differs from its station counterpart regardless of whether the difference is positive or negative. As a mean, the absolute difference, like the bias, is an average for all locations within a state. The third measure, which is also based on the absolute difference, accounts for the variability in the differences between interpolated and station values for different sites across a state. This measure consists of five absolute differences that correspond to progressive percentages of the sample size of the matched data sets.

The absolute differences as a percentage of sample size were derived in three steps. First the absolute difference was calculated for each matched pair of interpolated and station values for all station locations within a state. Second, the absolute differences were sorted in an ascending order. That is, the station with the lowest absolute difference, regardless of its geographic location, was the first entry and the station with the largest difference was the last entry. Third, an absolute difference was assigned to a percentage of the sample size based on its sorted position. Five percentages were selected: 50, 75, 90, 95, and 99%. Since the absolute difference is being presented in an unconventional form, an example may be useful to illustrate its derivation and its usefulness for quantifying the accuracy of the Hi-Rez Data(TM). Suppose a small state had matched temperature data sets of ten values each and that the absolute differences computed for the two sets were 2.8, 1.5, 0.8, 3.5, 2.1, 0.4, 1.9, 1.0, 2.5, and 1.2 F. After being sorted without regard to location, the values were stored as 0.4, 0.8, 1.0, 1.2, 1.5, 1.9, 2.1, 2.5, 2.8, and 3.5 F. The absolute difference corresponding to 50% of the sample size would be 1.5 F. The absolute difference corresponding to 90% of the sample would be 2.8 F. The absolute differences at these set percentages can be interpreted as a measure of Hi-Rez Data(TM) accuracy. A user could infer that 50% of all locations represented by the interpolated data had an absolute difference of 1.5 F or less. Similarly, the same user could infer that 90% of all locations had an absolute difference of 2.8 F or less. The user understands that the numbers represented by the absolute differences can be positive or negative. The user also understands that the progressive percentages represent a decrease in the risk of using an interpolated value as a substitute for a station value. That is, an absolute difference of 2.8 F for 90% of a sample has less risk than the same absolute difference at 50% of the same sample.

The choice of percentage for judging data accuracy depends on the data requirements and the acceptable level of risk for a particular application. It is important to note that the user is basing the above inferences on a limited sample size within a state. However, the inferences are based on the original station records, and the pattern displayed by the absolute differences is, for most states, representative of most locations within a state. As a final note, the user should be aware that the greatest absolute differences are generally found at higher elevations, along the borders of large water bodies, and along the slopes of large mountains. The three arithmetic measures chosen to evaluate the accuracy of the Hi-Rez Data(TM) are presented in Table 3 for the listed state. It should be noted that the sample size given in the table represents a subset of the total number of stations used to generate the Hi-Rez Data. The larger sample, used by the interpolation algorithm, defines a climatological zone that extends over several states including the one listed in the table. However, the sample used to compute the measures in Table 3 does include nearly all the stations in the listed state. Table 1. The difference between means (bias), the mean absolute difference, and the absolute difference as a percentage of sample size are presented for the monthly maximum temperature, minimum temperature, and precipitation totals for the listed state. The three measures were computed from matched data sets of Hi-Rez Data(TM) and station records. ACCURACY FILES

COMPARISON OF ZEDX HI-REZ DATA TO STATION VALUES

DATA SET: 1961-90 STATE: NEW YORK

COMPARISON OF ZEDX HI-REZ DATA TO STATION VALUES

DATA SET: 1961-90 STATE: NEW YORK

Maximum Temperature (F)

Abs dif as % of Sample

 

Month

Sample

Size

Hi-Rez

Data

Station

Data

 

Bias

Mean

ABS

Dif

 

50%

 

75%

 

90%

 

95%

 

99%

JAN

110

30.8

30.4

.4

1.0

.9

1.5

2.1

2.5

2.8

FEB

110

33.4

32.8

.6

1.1

1.0

1.5

2.2

2.6

3.0

MAR

110

43.7

42.9

.8

1.4

1.2

2.0

2.6

2.8

3.2

APR

110

56.0

55.5

.5

1.5

1.2

2.1

3.1

3.5

3.9

MAY

110

68.0

67.7

.3

1.6

1.2

2.2

3.7

3.9

4.2

JUN

110

76.6

76.2

.4

1.4

1.1

1.9

3.0

3.6

4.2

JUL

110

81.1

80.9

.2

1.2

.9

1.8

2.4

3.3

3.7

AUG

110

79.1

78.7

.4

1.1

.9

1.6

2.2

2.7

3.1

SEP

110

71.7

71.2

.5

1.1

.9

1.7

2.0

2.4

2.9

OCT

110

60.4

60.1

.3

.9

.8

1.3

1.7

2.0

2.8

NOV

110

47.7

47.5

.2

.8

.7

1.2

1.8

2.0

2.2

DEC

110

35.2

35.0

.2

1.0

.8

1.5

1.8

2.2

2.4

ANN.

110

57.0

56.6

.4

1.0

.9

1.4

1.9

2.1

2.8

 

Minimum Temperature (F)

Abs dif as % of Sample

 

Month

Sample

Size

Hi-Rez

Data

Station

Data

 

Bias

MeanABS Dif

 

50%

 

75%

 

90%

 

95%

 

99%

JAN

110

11.7

12.3

-.6

2.7

2.4

4.0

5.2

5.6

6.5

FEB

110

13.0

13.4

-.4

2.5

2.2

3.4

4.7

5.7

6.4

MAR

110

23.3

23.4

-.1

1.7

1.4

2.2

3.2

4.5

5.1

APR

110

33.7

34.0

-.3

1.5

1.3

1.9

3.2

3.8

4.7

MAY

110

44.2

44.4

-.2

1.6

1.3

2.3

3.3

3.9

5.0

JUN

110

53.2

53.5

-.3

1.6

1.3

2.3

3.4

3.8

5.2

JUL

110

58.2

58.6

-.4

1.9

1.6

2.6

3.6

4.2

5.7

AUG

110

56.7

57.1

-.4

1.9

1.5

2.8

3.5

4.0

5.9

SEP

110

49.2

49.8

-.6

2.1

1.7

2.9

4.0

4.8

6.5

OCT

110

38.7

39.4

-.7

2.2

2.0

3.0

4.6

5.2

7.0

NOV

110

30.5

31.1

-.6

1.9

1.6

2.4

4.0

4.7

6.2

DEC

110

18.5

19.1

-.6

2.3

2.2

3.4

4.6

5.0

5.9

ANN.

110

35.9

36.3

-.4

1.9

1.8

2.6

3.4

4.3

5.7

 

Precipitation (in)

Abs dif as % of Sample

 

Month

Sam-ple

Size

Hi-Rez

Data

 

StationData

 

Bias

MeanABS Dif

 

50%

 

75%

 

90%

 

95%

 

99%

JAN

196

2.66

2.61

.05

.49

.43

.73

.94

1.10

2.20

FEB

196

2.50

2.44

.06

.39

.35

.55

.73

.88

1.48

MAR

196

3.02

2.90

.12

.44

.38

.62

.85

.98

1.27

APR

196

3.38

3.36

.02

.35

.29

.49

.70

.77

.97

MAY

196

3.74

3.67

.07

.33

.29

.46

.65

.75

.90

JUN

196

3.94

3.89

.05

.26

.21

.38

.56

.65

.87

JUL

196

3.77

3.58

.19

.36

.31

.52

.73

.88

1.02

AUG

196

3.87

3.83

.04

.40

.37

.58

.78

.95

1.31

SEP

196

3.66

3.73

-.07

.37

.29

.48

.79

.96

1.61

OCT

196

3.32

3.31

.01

.34

.26

.48

.64

.88

1.39

NOV

196

3.80

3.76

.04

.52

.44

.69

.93

1.26

1.86

DEC

196

3.36

3.30

.06

.55

.48

.79

1.03

1.21

2.09

ANNL

196

41.01

40.40

.61

4.14

3.65

5.74

8.13

9.48

12.29

Hi-Rez Data(TM) 6/95

Hi-Rez Data(TM) 6/95