Algorithm Descriptions
DR3 Help
 Archive Intro
 Table Descriptions
 Schema Browser
 Glossary
 Algorithms
 Introduction to SQL
 Form Query User Guide
 Query Limits
 How To
 FAQ
 API
 sdssQA
 Download
 SkyServer Sites
 SkyServer Traffic Page
 Web Browsers
 Site News
 Contact Help Desk

Match and MatchHead Tables

Computing the Match table

Jim Gray, Alex Szalay, Robert Lupton, Jeff Munn, Ani Thakar
Aug 20, 2003

The SDSS data can be used for temporal studies of objects that are re-observed at different times. The SDSS survey observes about 10% of the Northern survey area 2 or more times, and observes the Southern stripe more than a dozen times.

The match table is intended to make temporal queries easy by providing a precomputed list of all objects that were observed multiple times. More formally,

Match = { (ObjID1,ObjID2) | Objid1 and ObjID2 are both from different runs (==observations)

And they are within 1 arcsecond of one another
And are both good (star or galaxy or unknown)
And are both fully deblended (no children)
And they are primary or secondary (not family or outside)

The following count from the DR1 dataset says gives.
Mode Total nChild=0
primary 52,525,57652,525,576
secondary 14,596,93114,596,931
family 17,074,000 6,153,714
outside 126,819 126,819

And here are the flag counts for DR1
Dr1 Count Flag Description
72,926,906 SET Object's status has been set in reference to its own run
72,926,906 GOOD Object is good as determined by its object flags. Absence implies bad.
10,186,591 DUPLICATE Object has one or more duplicate detections in an adjacent field of the same Frames Pipeline Run.
67,029,849 OK_RUN Object is usable, it is located within the primary range of rows for this field.
66,894,914 RESOLVED Object has been resolved against other runs.
66,839,376 PSEGMENT Object Belongs to a PRIMARY segment. This does not imply that this is a primary object.
387,964 FIRST_FIELD Object belongs to the first field in its segment. Used to distinguish objects in fields shared by two segments.
62,728,244 OK_SCANLINE Object lies within valid nu range for its scanline.
53,60,3453 OK_STRIPE Object lies within valid eta range for its stripe.

Computing the Match table

The Match table is computed by using the Neighbors table and has a very similar schema (the Neighbors table only stores mode (1,2) (aka primary/secondary) and type (3,5,6) (aka galaxy, unknown, star) objects;
Create table Match (objID 			bigint not null,
			matchObjID 		bigint not null, 
			distance 		float not null,
			type			tinyint not null,
			matchType 		tinyint not null,
			Mode			tinyint not null,
			matchMode 		tinyint not null,
			primary key (objID, matchObjID)
) ON [Neighbors]
-- now populate the table
insert Match
select N.*
  from  (Neighbors N join PhotoObj P1 on N.objID = P1.objID) 
                    join PhotoObj P2 on N.NeighborObjID = P2.objID
  where ((N.objID ^ N.neighborObjID) & 0x0000FFFF00000000) != 0 -- dif  runs
  and distance < 1.0/60.0 	             -- within 1 arcsecond of one another

One arcsecond is a large error in Sloan Positioning - the vast majority (95%) are within 0.5 arcsecond. But a particular cluster may not form a complete graph (all members connected to all others). To make the graph fully transitive, we repeatedly execute the query to add the "curved" arcs in the figure below.

 
-- compute triples
create table ##Trip(objid bigint, matchObjID bigint,  distance float,  
	type tinyint, neighborType tinyint, 
        mode tinyInt, matchMode tinyInt,
	primary key (objID, matchObjID))
again: truncate table ##trip
-- compute triples
insert ##trip
select distinct a.objID, b.matchObjID, 0,
       a.type, b.matchType, a.mode, b.matchMode 
from Match a join Match b on a.matchObjID = b.objID 
where a.objID != b.matchObjID
and (a.objid     & 0x0000FFFF00000000)!=
    (b.matchObjID& 0x0000FFFF00000000) -- Different runs
-- now delete the pairs we already have in Match
delete ##trip  
where 0 != (
	select count(*) 
	from Match p 
	where p.objID = ##trip.objID and p.matchObjID = ##trip.matchObjID
	)
-- compute the distance between the remaining tripples
select 'adding ' + cast(count(*) as varchar(20)) + ' tripples.'
update ##trip 
set distance = 
	(select min(N.distance) 
	from ##trip t join Neighbors N
	     on t.objID = N.objID and t.matchObjID = N.NeighborObjID)
-- now add these into Match and repeat till no more rows.
insert Match   select * from ##trip
if @@rowcount > 0 goto again
drop table ##trip

Computing the MatchHead table

Now each cluster of objects in the Match table is fully connected. We can name the clusters in the Match table by the minimum objID in the cluster. We can compute the MatchHead table that describes the global properties of the cluster: its name, its average RA and DEC and the variance in RA, DEC.
-- build a table of cluster IDs (minimum object ID of each cluster).
Create table MatchHead (
objID 		bigint not null primary key,
		averageRa	float not null default 0,
		averageDec	float not null default 0,
		varRa		float not null default 0, 	-- variance in RA	
		varDec		float not null default 0,	-- variance in DEC
		matchCount	tinyInt not null default 0,	-- number in cluster
		missCount	tinyInt not null default 0	-- runs missing from cluster
		) ON [Neighbors]	
-- compute the minimum object IDs.
Create table ##MinID (objID bigint primary key)
Insert ##MinID
select distinct objID 
from Match MinId
where 0 = (	select count(*)
	 	from Match m
		where MinId.objID = m.objID  
    		and MinId.objID > m.matchObjID)
-- compute all pairs of objIDs in a cluster (including x,x for the headID)
create table ##pairs (objID 			bigint not null,
			matchObjID 		bigint not null
			primary key(objID, matchObjID)) 
insert ##pairs 
select h.objID, m.matchObjID 
from ##MinID h join Match m on h.objID = m.objID
insert ##pairs select objID, objID from ##MinID
-- now populate the MatchHead table with minObjID and statistics 
Insert MatchHead  
Select MinID.objID, avg(ra), avg(dec), 
 	coalesce(stdev(ra),0), coalesce(stdev(dec),0), 
	count(m.objid & 0x0000FFFF00000000), -- count runs
	0	-- count misses later
from  	##MinID as MinID,
		##pairs	as m,
		PhotoObj as o
where  MinID.objID = m.objID 
	   and   m.matchObjID = o.objID
group by MinID.objID
order by MinID.objID
-- cleanup	 
Drop table ##MinID
Drop table ##pairs
The number missing from the cluster is computed in the next section.

Computing the MatchMiss table

It is also of interest to have a list of objects that are in areas that were observed multiple times but that were only observed once. To do this we need:
a description of each multiple-observation region.
A count of how many times it was observed.
An efficient way to test if a point is in a region
Alex will provide 1 and 2, jim will provide 3 (right?).

We will create a table of "dropouts", places where a match cluster should have an object but does not.

Create  table MatchMiss (objID  	bigint not null,  	--- the unique ID of the cluster	
Run 	int not null,	-- the run that is missing a member of this cluster.
Primary key (objID, Run)
)
Logic:
	From Match find all pairs of runs that overlap
	Form the domain that is the union of the intersection of these pairs.
	Now build T, a list of all objects primary/secondary type (3,5, 6) objects that are in this domain.
	Subtract from T all objects that appear in Match 
	Add these objects and the missing run number(s) to MatchMiss
	For each object in MatchHead, count the number of overlaps it is a member of. (MatchHead, runs)
	If this is equals the number of runs the match list then  

Performance

Building Match and MatcHead takes about an hour on SdssDr1 with the Best database of 85M objects. The cardinalities of each step are:
Match 12,294,016
add from triples 19,040
add from triples 322
add from triples 16
add from triples 2
add from triples 0
MinID 5,545,446
Mirror Pairs 5,849,459
Paris from match 5,545,446
MatchHead 5,545,446