|Introduction to SQL|
|Sample SQL Queries|
|SkyServer Traffic Page|
|Contact Help Desk|
Match and MatchHead Tables
Jim Gray, Alex Szalay, Robert Lupton, Jeff Munn
May 2003, revised January, May, June, July, December 2004
The SDSS data can be used for temporal studies of objects that are re-observed at different times. The SDSS survey observes about 10% of the Northern survey area 2 or more times, and observes the Southern stripe more than a dozen times.
The match table is intended to make temporal queries easy by providing a precomputed list of all objects that were observed multiple times. More formally,
But, as always, there are complications.
Green, Yellow, Red: What if ObjID2 in Run2 is missing?
Surrogate: When an object is missing in Run2, what do we put in
the match table?
Computing the Match table
The Match table is computed by using the Neighbors table and has a very similar schema (the Neighbors table only stores mode (1,2) (aka primary/secondary) and type (3,5,6) (aka galaxy, unknown, star) objects.
CREATE TABLE Match ( objID1 bigint not null, objID2 bigint not null, -- object pair run1 smallint not null, run2 smallint not null, -- their run numbers type1 tinyint not null, type2 tinyint not null, -- star, galaxy,... mode1 tinyint not null, mode2 tinyint not null, -- primary, secondary,... distance float not null, -- in arcminutes miss char not null, -- " " no miss, RGY:red,green,yellow matchHead bigint not null, -- see below. primary key (objID1, ObjID2) ) ON [Neighbors] -- now populate the table INSERT Match SELECT objID as objID1, neighborObjID as objID2, (objID & 0x0000FFFF00000000)/power(cast(2 as bigint),32) as run1, (NeighborObjID & 0x0000FFFF00000000)/power(cast(2 as bigint),32) run2, type as type1, neighborType as type2, mode as mode1, neighborMode as mode2, distance, ' ' as miss, 0 as matchHead FROM Neighbors WHERE distance < 1.0/60.0 -- within 1 arcsecond of one another
One arcsecond is a large error in Sloan Positioning - the vast majority is within a ˝ arcsecond (95%). But a particular cluster may not form a complete graph (all members connected to all others). To make the graph fully transitive, we repeatedly execute the query to add the "curved" arcs in Figure 1. Notice that that figure shows two objects observed in four runs, and that the two objects are observed only once in the middle two runs. The whole collection is closed to make a "bundle" that will have a matchHead object (the smallest objID of the bundle).
declare @Trip table ( objid1 bigint,ObjID2 bigint, run1 smallint, run2 smallint, type1 tinyint, type2 tinyint, mode1 tinyInt, mode2 tinyInt, primary key (objID1,ObjID2) )
Computing the MatchHead table
Now each cluster of objects in the Match table is fully connected. We can name the clusters in the Match table by the minimum (non zero) objID in the cluster and can compute the MatchHead table that describes the global properties of the cluster: its name, its average RA and DEC and the variance in RA, DEC.
-- build a table of cluster IDs (minimum object ID of each cluster).
-- compute the minimum object IDs.
Matching the Missing Objects
There may be an object in camcol A that should have matching objects in an overlapping camcol B (see figure 2). In particular, any object in the green part of A should have a matching object in B (in Figure 2). Objects in A that are near the edge of B (10 pixels ~4 arc seconds = the yellow part of B) may have matching objects in B.In some cases the B area is masked (red) and that explains why there is not a match.
If a "green" A object does not match a B object then either the object is moving or variable or masked. We can check the masks to see if the (A.ra, A.dec) is masked in B. If not, we assume that A is just "missing."
Similarly, if a "yellow" A object does not match a B object, then either the object is moving or variable or masked or the edge effects caused the object to be missing.In these edge cases we check to see if (A.ra, A.dec) is masked in B, if not we call the object missing-edge.
So, missing objects come in 3 varieties:
In each of these cases we create a match object as the closest object in B to A and Match.flag is set to Green, Red or Yellow.These "fake" objects do not contribute to the cluster average or variance or centroid.
We add this object to A’s cluster (along with all the edges), and we increment the cluster miss count by the number of records we add to the cluster.
The logic for computing missing objects is as follows.
For each RunA in the Regions table.
The actual code is a little more complex (about 700 lines of SQL). In the personal SkyServerDR1 there are about 20,000 matches and 10,000 object misses, so it seems that the misses will make an interesting study.
The results of this are that a bundle can have dangling pointers to these surrogate objects.Figure 3 shows the diagram of Figure 1 where a fifth overlapping run has been added. The leftmost object is masked in this new run and so we find a surrogate "red" object for it. The other objects are also have no match in this run but are not masked and are closest to the green (right) object in the figure.
It takes 4 minutes to compute on the personal SkyServer DR1, It will take a bit longer on the thousand times larger Dr2, but ...
As per Robert’s request, surrogate match objects are found rather than invented. Sometimes we have to look far away for them (500 arcseconds in some cases).
Misses are painted Yellow (near the edge), Red (masked), and Green (well inside the overlap). Most misses are Green.
The graphs of distances are shown in Figure 4.