Generically, this is similar to the Histogram plots except that one histogram is plotted verses another and the number of counts per bin is represented via a color.
So far, these plots make lots of pretty pictures, but I have not found them to provide anything useful.
On the other hand, the fact that I haven't found anything weird indicates that there are no significant errors in the data beyond those already identified using the other tools.
Currently, only the following properties are supported - these are the same properties supported on the Histogram tab .
|xy Scatter Plot - Lat/Long plot
|Red and Orange show where the most sites are located
The linear regression formula used to compute trends only requires 3 data points (3 years of data). However, data records that short are pretty meaningless - trends of more than 4°C/decade are fairly common near the beginning or end of a data record .. or if the range width is too short. As a result, the application default is to require at least 10 years of data, but can be overridden via the Trend Lines tab. The number of sites that don't meet the minimum requirement will be shown in the excluded field.
Checkboxes and number fields are provided to control the minimum and maximum values of the x- and y-axes.
Of course, the application makes both the colors and ranges user configurable. Depending on your need, I think that Light Grey might be better than Black for the Greater than zero bins. However, I have left the default as Black to make sure the markers are visible on most displays.
If you only want to see the bins with counts between two values, set the limits accordingly and color the other bins White (the background color).
When the Compute bins for 100x200 grid box is unchecked, the user has the option to control the bin sizes. The current configuration is visually indicated by the background color of the number fields - white is editable, grey is readonly.
When only one pixel was used per bin, then the image is large enough to display the bins without any overlaps. However, those are pretty hard to see. To improve visibility, the default markers are significantly larger (6x6), but frequently produce a significant amount of marker overlap. You can use the Marker size control to adjust the overlap. Another approach is to increase the bin sizes which also removes the overlaps, but with a grainier image.
|xy Scatter Plots
|Default bin sizes and color mapping
|Custom bin sizes and color mapping
|Trend = 0.018 °C/bin Baseline = 0.433 °C/bin
Default color definitions
|Trend = 0.03 °C/bin Baseline = 2 °C/bin
Custom color definitions - 5 Orange, 15 Red
The range control is placed under the graph - as with all number fields in the application, you can type a value or use the mouse wheel (hold the shift key to modify digits in the ten's place). As the dates change, the graph is automatically updated. Normally, the range is included with the x-axis label if either the Trend or Number of Years option is selected for either axis. The associated checkbox controls whether the range is included in the x-axis label or not.
When a new range is set, the application determines the number of data points and the slope (via least squares linear regression) in that range. If there are not enough points to compute a trend, then that site will be excluded from the plot and the number of selected, but excluded, sites is reported in the provided field. By default, 10 years of data must exist within the selected range to compute a trend, but you can control that via the Trend Lines tab - only 3 are absolutely necessary, but trends with too few points can be misleading. You really want a large sample .. unless you are specifically looking for problem data.
Since the Baseline Average Temperatures are computed via the Basic Filters tab, the date range controlling those is there.
Remember, the sites used in these plots are selected using a number of filters.
Raw vs Adjusted
As explained above, using any of the available datasets, you can plot one of the provided properties verses another. In addition, assuming that a raw/adjusted pair of datasets is loaded (the options are greyed out if they are not), you can compare the following properties
The resulting plots where pretty much what I expected - except for the amount of deleted data. I have always been aware that the adjusted data had fewer data points (years per station) than the raw data, but I never expected so many sites to have more than 10 years of data removed via the adjustment process.
|Raw vs Adjusted examples
|Change in Trend verses Raw Trend
|Number of Years excluded
|Raw negative trends were made more positive
Strong positive trends were made less positive
MouseOver - Similar plot, but with identical sites removed
|The "adjustments removed lots of data
|Markers: Black - only one :: Red - 8 or more
4561 - 3917 = 644 sites with no change from Raw to Adjusted
|Map showing sites used in the plots above
Selected (green) and Identical (red)
|They were all selected for the base plots
The identical sites were deselected for the extra Trend plot
The x-axis selection is located just below the Plot (Adjusted - Raw) vs selection radio button and is disabled unless the option is selected. Only the raw versions of the available properties are available.
These 2 radio buttons only work with the Trend, Baseline Temperature, and Num of Yrs options. When either radio button is selected, the other 2 data options (Lat/Long) are disabled and greyed out since their values are identical in both datasets. In addition, all the x-axis radio buttons are disabled since both datasets MUST use the same property.
Since every adjusted site has an associated unadjusted (raw) site, these features use only the sites selected on the adjusted dataset. This means that it does not matter if sites on the raw dataset are selected or not. It also does not matter which dataset is displayed on the map.
When the displayed data is changed from one where both raw and adjusted are available to one where only one is loaded, then
The baseline temperature is an average based on the dates set on the Basic Filters tab. When this is changed on one dataset, be sure to also change it on the other - otherwise, the 2 datasets will not be in sync. When Baseline Temp is selected, there should never be excluded sites - if there are, then check to make sure the baselines are in sync. (I have seen this problem - user error.)
|The reported ocean temperatures are only anomaly values (relative to an unspecified baseline). I verified that the baseline is not the same as the application default by simply observing that the values simply cluster around zero. For more information, see my Histograms page discussing oceans.
These are some of the other parameters that might be useful in an xy-plot - min, max, mean, median, σ2, R2, etc - each associated with some time period.
Obviously, with a large selection, radio buttons are a problem. (They require too much space.) Normally a pull-down listbox is more appropriate with a large number of options. (I strongly dislike write-in data fields because then the user must know in advance what the available options are.)
At any rate, these xy-plots are already of questionable value and I don't want to make it harder to understand and/or use by simply adding a lot of parameters that no one wants.
|All the images on this page can be zoomed by simply using the mouse wheel.
Double click to toggle full size to default size