Jim Gray, Alex Szalay, Robert Lupton, Jeff Munn
May 2003, revised January, May, June, July, December 2004
The SDSS data can be used for temporal studies of objects that are re-observed
at different times. The SDSS survey observes about 10% of the Northern survey
area 2 or more times, and observes the Southern stripe more than a dozen
times.
The match table is intended
to make temporal queries easy by providing a precomputed list of all objects that
were observed multiple times. More formally,
Match = { (ObjID1,ObjID2) | Objid1 and ObjID2 are both from different
runs (i.e. different observations). |
|
And they are within 1 arcsecond of one another |
|
And are both good (star or galaxy or unknown) |
|
And are both fully deblended (no children) |
|
And they are primary or secondary (not family or outside) } |
But, as always, there are complications.
Green, Yellow, Red: What if ObjID2 in Run2 is missing?
It could be missing because it was not seen or because it was masked. What
about the "edge cases" at the edge of run1 or of run2. Perhaps it is
an edge object. We color-code these edge cases as "yellow" objects and the
masked objects as "red" objects. And of course, we flag the missing but
matched objects as "green".
Surrogate: When an object is missing in Run2, what do we put in
the match table?
We could fabricate ObjID2, or we could find the closest ObjID2 in Run2 and
just flag it and record the distance to it (which will be more than 1
arcsecond).
Computing the Match table
The Match table is computed by using the Neighbors table and has a very
similar schema (the Neighbors table only stores mode (1,2) (aka
primary/secondary) and type (3,5,6) (aka galaxy, unknown, star) objects.
CREATE TABLE Match (
objID1 bigint not null, objID2 bigint not null, -- object pair
run1 smallint not null, run2 smallint not null, -- their run numbers
type1 tinyint not null, type2 tinyint not null, -- star, galaxy,...
mode1 tinyint not null, mode2 tinyint not null, -- primary, secondary,...
distance float not null, -- in arcminutes
miss char not null, -- " " no miss, RGY:red,green,yellow
matchHead bigint not null, -- see below.
primary key (objID1, ObjID2)
) ON [Neighbors]
-- now populate the table
INSERT Match
SELECT objID as objID1, neighborObjID as objID2,
(objID & 0x0000FFFF00000000)/power(cast(2 as bigint),32) as run1,
(NeighborObjID & 0x0000FFFF00000000)/power(cast(2 as bigint),32) run2,
type as type1, neighborType as type2,
mode as mode1, neighborMode as mode2,
distance, ' ' as miss, 0 as matchHead
FROM Neighbors
WHERE distance < 1.0/60.0 -- within 1 arcsecond of one another
One arcsecond is a large error in Sloan Positioning - the vast majority is
within 0.5 arcsecond (95%). But a particular cluster may not form a
complete graph (all members connected to all others). To make the graph fully
transitive, we repeatedly execute the query to add the "curved" arcs in Figure
1. Notice that that figure shows two objects observed in four runs, and that
the two objects are observed only once in the middle two runs. The whole
collection is closed to make a "bundle" that will have a matchHead object (the
smallest objID of the bundle).
 |
Figure 1: Six objects from 4 runs
forming a bundle.
|
declare @Trip table (
objid1 bigint,ObjID2 bigint,
run1 smallint, run2 smallint,
type1 tinyint, type2 tinyint,
mode1 tinyInt, mode2 tinyInt,
primary key (objID1,ObjID2)
)
WHILE (1=1)
BEGIN
delete @Trip -- empty the triples table
insert @Trip -- compute new pairs based on triples
select distinct a.objID1, b.ObjID2,
a.run1, b.run2,
a.type1, b.type2,
a.mode1, b.mode2
from Match a join Match b on a.ObjID2 = b.objID1
where a.objID1 != b.ObjID2 -- different objects
and not exists (
select objID1 -- and pair not already in Match
from match c
where c.objID1 = a.objID1
and c.objID2 = b.objID2)
IF @@rowcount = 0 BREAK -- stop if none found.
select "adding " + cast(count(*) as varchar(20)) + " triples."
from @Trip
-- now add these into Match and repeat till no more rows.
insert Match -- else add pair to Match.
select T.*,N.distance, 0,0
from @Trip T join Neighbors N
on t.objID1 = N.objID and t.ObjID2 = N.NeighborObjID
END
Computing the MatchHead table
Now each cluster of objects in the Match table is fully connected. We can
name the clusters in the Match table by the minimum (non zero) objID in the
cluster and can compute the MatchHead table that describes the global
properties of the cluster: its name, its average RA and DEC and the variance
in RA, DEC.
-- build a table of cluster IDs (minimum object ID of each cluster).
CREATE TABLE MatchHead (
objID bigint not null primary key, -- id of match head object
averageRa float not null default 0, -- average right ascension
averageDec float not null default 0, -- average declination
varRa float not null default 0, -- variance in RA
varDec float not null default 0, -- variance in DEC
matchCount tinyInt not null default 0, -- real number in cluster
missCounttinyInt not null default 0 -- fake objects in cluster
) ON [Neighbors]
 |
Figure 2: Green area of A
must match B, yellow area may match B. Red areas of B (masks) have a
good reason for the A object to have a missing object. In these
missing cases, we pick the "nearest" object in B to be a surrogate
for A in the match table. |
-- compute the minimum object IDs.
declare @MinID table (
objID bigint
primary key
)
insert @MinID
select distinct objID1
from Match MinId
where 0 = (
select count(*)
from Match m
where
MinId.objID1 = m.objID1
and
MinId.objID1 > m.ObjID2
)
-- compute all pairs of objIDs in a
-- cluster (including x,x for headID)
declare @pairs table (
objID1 bigint not null,
objID2 bigint not null,
primary key(objID1,objID2)
)
insert @pairs
select minID.objID, m.ObjID2
from @MinID minID join Match m on
minID.objID = m.objID1
insert @pairs select objID, objID from @MinID
-- now populate the MatchHead table with minObjID and statistics
insert MatchHead
select MinID.objID, -- for each MatchHead ID
avg(ra), avg(dec), -- get avg ra,dec of group
coalesce(stdev(ra),0),
coalesce(stdev(dec),0), -- get stddev of group
count(distinct -- count runs
(p.objid2 & 0x0000FFFF00000000)),
0 -- count misses later
From @MinID as MinID
join @pairs as p on MinID.objID = p.objID1
join PhotoObjAll as o on p.objID2 = o.objID
group by MinID.objID
-- now set MatchHead on each object
UPDATE Match
SET Matchhead = P.objID1
FROM Match M join @Pairs P on M.objID1 = P.ObjID2
 |
Figure 3: A green surrogate for
the right 4 objects and a red (masked) surrogate for the
leftmost one. |
The number missing from the cluster is computed in the next section.
Matching the Missing Objects
There may be an object in camcol A that should have matching objects in an
overlapping camcol B (see figure 2). In particular, any object in the green
part of A should have a matching object in B (in Figure 2). Objects in
A that are near the edge of B (10 pixels ~4 arc seconds = the yellow part of
B) may have matching objects in B.In some cases the B area is masked
(red) and that explains why there is not a match.
If a "green" A object does not match a B object then either the object is
moving or variable or masked. We can check the masks to see if the (A.ra,
A.dec) is masked in B. If not, we assume that A is just "missing."
Similarly, if a "yellow" A object does not match a B object, then either the
object is moving or variable or masked or the edge effects caused the object
to be missing.In these edge cases we check to see if (A.ra, A.dec) is masked
in B, if not we call the object missing-edge.
So, missing objects come in 3 varieties:
Object | Color | Flag |
Hit | | 0 |
Missing | Green | 1 |
Missing Edge | Yellow | 2 |
Masked | Red | 3 |
In each of these cases we create a match object as the closest object in B to
A and Match.flag is set to Green, Red or Yellow.These "fake" objects do not
contribute to the cluster average or variance or centroid.
We add this object to A's cluster (along with all the edges), and we increment
the cluster miss count by the number of records we add to the cluster.
 |
Figure 4: Graph showing distance
vs frequency of missies of various colors. This is data from a small
sample of the BestDR1 database. |
The logic for computing missing objects is as follows.
For each RunA in the Regions table.
For each RunB overlapping RunA other than RunA
Let R_ABYellow be the Region RunAÇ((runB+ ε))
Find (x,y) where x and y in R_ABYellow and run.x = RunA and run.y = RunB
And x not in match and y the closest object in RunB.
If the position of x is in runB - ε we mark the pair as green.
If the position of x is masked in RunB then we mark the pair as red.
We now add the (x,y) pairs to the match table and
If x is not in match, we add x to matchhead with a miss count of 1
If x is in match, we propagate x.matchHead to this match entry
and increment the matchHead miss count
The actual code is a little more complex (about 700 lines of SQL). In the
personal SkyServerDR1 there are about 20,000 matches and 10,000 object misses,
so it seems that the misses will make an interesting study.
 |
Figure 5: Statistics from full
DR1 dataset showing distribution of miss distances. Red objects are
not being computed. On BestDR1, the statistics are:
Match | 12,431,518 |
Green | 5,446,930 |
Yellow | 912,571 |
Red | ?? |
|
The results of this are that a bundle can have dangling pointers to these
surrogate objects.Figure 3 shows the diagram of Figure 1 where a fifth
overlapping run has been added. The leftmost object is masked in this new run
and so we find a surrogate "red" object for it. The other objects are also
have no match in this run but are not masked and are closest to the green
(right) object in the figure.
It takes 4 minutes to compute on the personal SkyServer DR1, It will take a
bit longer on the thousand times larger Dr2, but ...
As per Robert's request, surrogate match objects are found rather than
invented. Sometimes we have to look far away for them (500 arcseconds in some
cases).
Misses are painted Yellow (near the edge), Red (masked), and Green (well
inside the overlap). Most misses are Green.
color | count |
Y | 3,516 |
R | 58 |
G | 15,264 |
The graphs of distances are shown in Figure 4.
|