Algorithm Descriptions
DR4 Help
 Archive Intro
 Table Descriptions
 Schema Browser
 Glossary
 Algorithms
 Introduction to SQL
 Sample SQL Queries
 Query Limits
 How To
 FAQ
 API
 Download
 SkyServer Sites
 SkyServer Traffic Page
 Web Browsers
 Site News
 Contact Help Desk

QSO Catalog

Building the QsoCatalogAll and QsoConcordanceAll tables

Jim Gray, Sebastian Jester, Gordon Richards, Alex Szalay, Ani Thakar
March 2006

Abstract: We constructed a catalog of all quasar candidates and gathered their "vital signs" from the many different SDSS data sources into one Quasar Concordance table.

1. The Target, Best, and Spec SDSS Datasets

The SDSS Target Database is used to select the targets that will be observed with the SDSS spectrographs. Once made, these targeting decisions are never changed but the targeting algorithm has improved over time. The SDSS pipeline software is always improving so the underlying pixels are re-analyzed with each data release. To have a consistent catalog, all the mosaiced pixels, both from early and recent observations are reprocessed with the new software in subsequent data releases. The output of each of these uniform processing steps is called a Best Database. So at any instant there is the historical cumulative Target database and the current Best database.  As of early 2006 we have the Early Data Release (EDR) databases and then five "real" data releases DR1, DR2, DR3, DR4, and DR5.

The target selection is done by the various branches (galaxy, quasar, serendipity) of the TARGET selection algorithm. These targets are organized for spectroscopic follow-up by the TILING (Blanton et al. 2003) [0] algorithm as part of a tiling run that works within a tiling geometry. The tiling run places a 2.5 deg. circle over a tiling geometry and then assigns spectroscopic targets to be observed.  The circle corresponds to a plate that can be mounted on the SDSS telescope to observe 640 targets at a time. The plates are "drilled" and "plugged" with optical fibers and then "observed".   These spectroscopic observations are fed through a pipeline that builds the Spec dataset. Because Spec is relatively small (2% the size of Best), it is included in the Best database. Unfortunately, only the "main" SDSS target photometry is exported to the Target database (the target photometry for Southern and Special plates is not exported - at best we have the later Best photometry for these objects in the database).

The SDSS catalogs are cross-matched with the FIRST, ROSAT, Stetson, USNO, and USNO-B catalogs and some vital signs from some of those catalogs are included in the Quasar Concordance.

2. Overview:  Finding Everything That MIGHT be a Quasar

We look in the Target..PhotoObjAll, Best..SpecObjAll, and Best..PhotoObjAll tables to find any object that might be a quasar (a QSO).   We build a QsoCatalogAll table that has a row for every combination of nearby TargPhoto-Spec-BestPhoto objects from these lists that are within 1.5 arcseconds of one another. If no matching object can be found from the QSO candidate list we find a surrogate object --  the nearest primary object from the corresponding catalog (Spec, BestPhoto, TargPhoto) if one can be found (again using the 1.5" radius.) If an object is still unmatched, we look for a secondary object, or put a zero for that ObjectID (in general, we use zero rather than the SQL null value to represent missing data).

2.1. Overview: QSO Tables

The tables and views created by the quasar concordance algorithm on the Best, Target and Spectro datasets are part of the Best database.  The following sections explain how they are computed.

QSO Table/View descriptions

Name

type

Description

QsoCatalog

View

A view of QsoCatalogAll limited to only the best QSO from each bunch

QsoConcordance

View

A view of QsoConcordanceAll limited to only the best QSO from each bunch

QsoCatalogAll

Table

The superset of all QSO candidates identified by the algorithm described below

QsoConcordanceAll

Table

The wide table that combines the Best, Spec and Target fields for each QSO candidate

QsoBunch

Table

The QSO neighbors organized into neighborhood bunches with a head QSO associated with each bunch

QsoBest

Table

The fields from the Best PhotoObjAll table associated with each QSO candidate

QsoSpec

Table

The fields from the Best SpecObjAll table associated with each QSO candidate

QsoTarget

Table

The fields from the Target PhotoObjAll table associated with each QSO candidate

2.2. Overview: Quasar Bunches

Figure 1:  A bunch of 2 targets, 2 bests and one spec object that are within 1.5" of another bunch member. This bunch produces 4 (target,best,spec) triples in the concordance. The first target is the bunch head.  

The algorithm uses spatial proximity (aka: "is it nearby?") to cross-correlate objects in the Target, Best, and Spec databases. The definition of nearby is fairly loose:  The SDSS Photo Survey pixels are 0.4 arcsecond and the positioning is accurate to .1 arcsecond, but the Spectroscopic survey has fibers that are 1.5 arcseconds in diameter.   Therefore, the QSO concordance uses the 1.5" fiber radius to define nearby for all 3 datasets.

In a perfect world, one SpecObj matches one BestObj and one TargetObj, and they are all marked as QSOs.  Some objects have no match in the other catalogs -- so we have zeros in those slots of that object’s row.    But, sometimes 2 SpecObj match 3 TargetObj and 4 BestObj, and all 9 objects are marked as QSOs.   In this case we get 2x3x4 rows. We group together all the objects that are related in this way as a bunch.  Each bunch has a head object ID: the first member of the bunch to be recognized as a possible QSO.  The precedence is TargetObjID first, if there is no target in the bunch then the first SpecObjID (highest S/N primary first), else the first BestObjID. This ordering reflects the first time the object was considered for follow-up spectroscopy.  This order avoids a selection bias in the dataset (e.g., Malmquist bias if we were to order on decreasing S/N).

2.3 The QSO Catalog and Concordance

Figure 2: The Qso schema.

The premise is that any Target-Spec-Best tripple may be interesting so all such triples are  the QsoCatalogAll table. The vital signs (e.g position, flags, flux,...) of each object are copied from the corresponding database to a small tables along with some derived measurements special to QSOs (these are the QsoTarget, QsoSpec, and QsoBest tables).  All these tables are unified by the QsoConcordanceAll view that "glues" the vital signs together.  Most people just want to see the best triple of each bunch - primary only and best S/N.  So the QsoConcordance view shows just the "primary" triple of each bunch.

3. Overview: A Walkthrough of the Algorithm.

Phase 1: Gather the Quasars and Quasar Candidates: As a first step, gather the Target, Spec, and Best quasar candidate or confirmed objects into a Zones table [1] containing their object identifiers and positions. These are copied from the Best and Target PhotoObjAll tables and the Best SpecObjAll table. These copies are filtered by flags indicating that the objects are QSOs or are targeted as QSOs.   For the photo objects (target and best), this means they are primary or secondary and flagged (primTarget) as:  TARGET_QSO_HIZ OR TARGET_QSO_CAP OR TARGET_QSO_SKIRT OR TARGET_QSO_FIRST_CAP OR TARGET_QSO_FIRST_SKIRT ( = 0x0000001F).   For the spectroscopic objects, they must have one or more of the following properties:
  1. recognized as a QSO or is of Unknown type or    -- specClass in {UNKNOWN, QSO, or HIZ_QSO}
  2. have high redshift (z > 0.6), or     -- High Redshift objects are likely QSOs
  3. they must be a QSO target ((primTarget & 0x1F) ≠ 0).    -- or the object was targeted as a QSO   

That logic is fine for most Spectroscopic objects, but there are "special plates" whose authors overloaded the primary target flags (yes, they made it much harder to understand the data and cost  many hours of discussion trying to disambiguate the data.) One can recognize the standard cases with the predicate plate.programType = 0 meaning that the plate was processed as a "Main" (programType=0 is "Main") chunk, not as a "special" (programType=2) or "Southern" (programType=1) plate.   The three-case logic about works fine for "main" targets.  The "targets for special plates" have SpecObj.primtarget & 0x80000000≠ 0.  Once you know it is "special" plate you have to ask if it is a "special target".  If it is, you have to ask is it the "Fstar72" group? If not you can use the standard test ((primTarget & 0x1F) ≠ 0) - those nice people did not "overload" the primTarget flags.  But the folks who did "Fstar72" overloaded the flags and so we get the following complex logic:

-- select SpecObjects that are either declared QSOs from their spectra
-- or that were targeted as likely QSOs
Select S.SpecObjID
    from BestDr5.dbo.platex     as P 
    join BestDr5.dbo.specobjall as S on P.plateid = S.plateid
where     
    specClass in (3,4,0)	-- class is QSO or HiZ_QSO or Unknown.
         or z > 0.6          -- or high redshift
         or (			-- standard-survey plates
	     px.programtype = 0 -- MAIN targeting survey
	     and so.primtarget & 0x1f != 0
	    )
         or (			-- special quasar targets from special plates
				-- see http://www.sdss.org/dr4/products/spectra/special.html
	     so.primtarget & 0x80000000 != 0
	     and (  (    px.programname in ('merged48','south22')
	                 and so.primtarget & 0x1f != 0
                    )
		    or (   px.programname = 'fstar72'
                   	   and so.primtarget & 4 != 0
               	       )                    
		    or (  -- bent double-lobed FIRST source counterparts from specialplates
			  -- The "straight double" counterparts have already been snuck 
                       	  -- into the usual FIRST counterpart quasar category 0x10.
                 	px.programname = 'merged48'
                 	and so.primtarget & 0x200000 != 0
		       )   
		 )  
	    ) 
         or (	          -- non-special quasar targets from special plates
	     so.primtarget & 0x80000000 = 0
	     and px.programname in ('merged73','merged48','south22')
	     and so.primtarget & 0x1f != 0
	    )
----------------------------------------------------------------------------------------------

Phase 2: Find the Neighbors. Once the zone table is assembled containing all the candidates, a zones algorithm [1] is used to build a neighbors table among all these objects. Two objects are QSO neighbors if they are within 1.5 arcseconds of one another.  The relationship is made transitive so that friends of friends are all part of the same neighborhood.  

Phase 3: Build the Bunches. The Neighbors relationship partitions the objects into bunches.   We pick a distinguished member from each bunch to represent that bunch - called the bunch head. The selection favors Target then Spec, then Photo Objects and within that category it favors primary, then secondary, then outside objects if there is a tie within one group (e.g. multiple target objects in a bunch.) If there are multiple selections within these groups, the tie is broken by taking the minimum object ID for PhotoObj (again, to avoid any selection bias) and the highest S/N for specObjs.  Given these bunch heads, we record a summary record for each bunch in the QsoBunch table:

QsoBunch table

Name

type

Description

HeadID

bigint

Unique identifier of the head object of this bunch of objects (all nearby one another).

HeadType

Char(6)

TARGET, SPEC, or BEST depending on what type of object the head is

RA

Float

RA of bunch head object

Dec

Float

DEC of bunch head object

TargetObjs

int

Count of the number of Target objects in the bunch.

SpecObjs

int

Count of the number of Spectroscopic objects in the bunch.

BestObjs

int

Count of the number of Best objects in the bunch.

TargetPrimaries

int

Count of Primary Target objects in the bunch.

SpecPrimaries

int

Count of the SciencePrimary Spectroscopic objects in the bunch.

BestPrimaries

int

Count of Primary Best objects in the bunch.

Where the difference between TargetObjs and TargetPrimaries (etc.) is that TargetObjs indicates multiple entries of the same object in the database (e.g. both as a primary and a secondary), whereas TargetPrimaries helps us to identify objects that are either very close together or that were deblended into two objects separated by less than 1.5" (or are in a circle of 1.5" radius).  Because the object primary flags are not handy at this point of the computation, the Bunch statistics are actually computed in Phase 9.

Phase 4: Build the Catalog. Now we grow the QsoCatalogAll table which, for each bunch, has triples drawn from each class of the bunch (a target, a spec, and a best object).  For example, the bunch of Figure 1 would produce 4 triples.    If there is no object in one of the classes, we fill in with a non-QSO surrogate object - the primary object from that database (Targ, Photo, Spec) closest to the bunch head, or if there is no primary then a secondary (the test insists on the 1.5 arcsecond radius.) If no such object can be found we fill in that slot with a zero object.   The resulting table looks like this:

QsoCatalogAll table

Name

type

Description

HeadID

bigint

Unique identifier of this bunch of objects (all nearby one another).

TripleID

bigint

Unique identifier of this (spec, best, target) triple

QsoPrimary

bit

This is the best triple of the bunch.

TargetObjID

bigint

Unique ID in Target DB or 0 if there is no matching object.

SpecObjID

bigint

Unique ID of spectrographic object or 0 if there is no such object.

BestObjID

bigint

Unique ID in BestDB composed from or 0 if there is no such object.

TargetQsoTargeted

bit

Flag: 1 PhotoObjID was flagged as a QSO in the target flags.

SpecQsoConfirmed  

bit

Flag: 1 means this SpecObj.SpecClass QSO or HiZ_QSO 

SpecQsoUnknown 

bit

Flag: 1 means this SpecObj.SpecClass is unknown

SpecQsoLargeZ  

bit

Flag: 1 means this SpecObj Z  > 0.6

SpecQsoTargeted 

bit

Flag: 1 means this SpecObj was picked as a QSO target

BestQsoTargeted

bit

Flag: 1 PhotoObjID was flagged as a QSO in the target flags.

dist_Target_Best

float

distance arcMin between Target and Best

dist_Target_Spec

float

distance arcMin between Target and Spec

dist_Best_Spec

float

distance arcMin between Best   and Spec

psfmag_i_diff

float

target.psfmag_i - best.psfmag_i

psfmag_g_i_diff  

float

(target.psfmag_g-target.psfmag_i) - (best.psfmag_g-best.psfmag_i)

The last 5 "quality fields" are computed in Phase 9.

Phase 5:  Find Surrogates for missing objects.   Some objects in the Catalog entries  have no matching Target, Best, or Spec objects.  In these cases we look in the database to find a surrogate object (which was not a QSO candidate) that is nearby the bunch head object - as usual the search radius is 1.5 arcseconds and we favor primary over secondary objects and favor low-signal-to noise ratio SpecObjs.  

Phase 6: Get the Vital Signs. We now go to the source databases and get the "vital signs" of these photo and spetro objects (both quasar candidates and also surrogates) , building a QsoSpec, QsoTarget, and QsoBest tables holding these values and for the photo objects, some additional values from ROSAT and FIRST if there is a match.    We then define QsoConcordanceAll as a view on these base tables with the following (~100) fields.

Phase 7: Define QsoConcordanceAll and QsoConcordance Views: Now we are ready to "glue together the QsoCatalog with the vital signs to make a "fat table" with all the attributes.

From QsoTarget From QsoSpec From QsoBest
HeadObjID
tripleID
QsoPrimary
TargetQsoTargeted
SpecQsoConfirmed
SpecQsoUnkonwn  
SpecQsoLargeZ
SpecQsoTargeted
BestQsoTargeted
dist_Target_Best
dist_Target_Spec
dist_Best_Spec
psfmag_i_diff
psfmag_g_i_diff  
targetObjID
targetRa
targetDec
targetCx
targetCy
targetCz
targetPsfMag_u
targetPsfMag_g
targetPsfMag_r
targetPsfMag_i
targetPsfMag_z
targetPsfMagErr_u
targetPsfMagErr_g
targetPsfMagErr_r
targetPsfMagErr_i
targetPsfMagErr_z
targetExtinction_u
targetExtinction_g
targetExtinction_r
targetExtinction_i
targetExtinction_z
targetType
targetMode
targetStatus
targetFlags
targetFlags_u
targetFlags_g
targetFlags_r
targetFlags_i
targetFlags_z
targetRowC_i
targetColC_i
targetInsideMask
targetPrimTarget
targetPriTargHiZ
targetPriTargLowZ
targetPriTargFirst
targetFieldID
targetFieldMjd
targetFieldQuality
targetFieldCulled
targetSectorID
targetFirstID
targetFirstPeak
targetRosatID
targetRosatCps
targetMi
targetUniform

SpecObjID
SpecRa
SpecDec
SpecCx
SpecCy
SpecCz
SpecZ
SpecZerr
SpecZConf
SpecZStatus
SpecZWarning
SpecClass
SpecPlate
SpecFiberID
SpecMjd
SpecSciencePrimary
 SpecPrimTarget
SpecLineID
SpecMaxVelocity
SpecBestObjID
SpecTargetObjID
SpecTarget
SpecSn1_i
SpecSn2_i

























bestObjID
bestRa
bestDec
bestCx
bestCy
bestCz
bestPsfMag_u
bestPsfMag_g
bestPsfMag_r
bestPsfMag_i
bestPsfMag_z
bestPsfMagErr_u
bestPsfMagErr_g
bestPsfMagErr_r
bestPsfMagErr_i
bestPsfMagErr_z
bestExtinction_u
bestExtinction_g
bestExtinction_r
bestExtinction_i
bestExtinction_z
bestType
bestMode
bestFlags
bestFlags_u
bestFlags_g
bestFlags_r
bestFlags_i
bestFlags_z
bestRowC_i
bestColC_i
bestInsideMask
bestPrimTarget
bestPriTargHiZ
bestPriTargLowZ
bestPriTargFirst
bestFieldID
bestFieldMjd
bestFieldQuality
bestFieldCulled
bestFirstID
bestFirstPeak
bestRosatID
bestRosatCps
bestMi



Bunch members

Bunches

1

238,073

2

10,619

3

1,397

4

14,470

5

202

6

170

7

36

8

551

9

115

12

61

16

2

Phase 9: Mark the primary triple of each bunch, compute some derived magnitude values and cleanup:  Having the QsoConcordanceAll view and all the vital signs in place we compute some derived values: Picking the best triple of each bunch, computing the distances among members of the triple and computing some derived psf magnitudes.

In the end, the DR5 database has 265,697 bunches, 329,871 triples in the concordance and 114,883 confirmed quasars.  Most bunches have one catalog entry, but about 10% have multiple matches (generally and primary and secondary best or target object where both are flagged as QSO candidates or multiple observations of a spectroscopic object).     The catalog itself has some interesting cases.   In DR5 there are 82,142 cases where the Target, Spec, and Best all agree that it is a quasar.   Since SDSS spectroscopy lags the imaging, it is not surprising that there are 81,011 objects where both the Target and Best indicate a likely QSO, but there is no spectrogram for the object (the Spec Zero case).    

With the QsoCatalogAll and QsoConcordanceAll in place we define two views: QsoCatalog (the best of the bunch) and QsoConcordance (the wide version) by picking the best targetObj, spec, and bestObj of each bunch.

DR5 QsoCatalogAll

Target

Spec

Best

Count

Surrogate

Confirmed

Surrogate

24,348

Surrogate

Confirmed

Targeted

1,080

Surrogate

Confirmed

Zero

88

Targeted

Confirmed

Surrogate

5,556

Targeted

Confirmed

Targeted

83,142

Targeted

Confirmed

Zero

102

Zero

Confirmed

Surrogate

108

Zero

Confirmed

Targeted

32

Zero

Confirmed

Zero

427

Surrogate

LargeZ

Surrogate

1,458

Surrogate

LargeZ

Targeted

31

Surrogate

LargeZ

Zero

32

Targeted

LargeZ

Surrogate

110

Targeted

LargeZ

Targeted

209

Targeted

LargeZ

Zero

1

Zero

LargeZ

Surrogate

26

Zero

LargeZ

Targeted

3

Zero

LargeZ

Zero

25

Surrogate

other

Surrogate

93

Surrogate

other

Targeted

1,627

Targeted

other

Surrogate

301

Targeted

other

Targeted

593

Zero

other

Targeted

2

Surrogate

Targeted

Surrogate

8,514

Surrogate

Targeted

Targeted

728

Surrogate

Targeted

Zero

28

Targeted

Targeted

Surrogate

24,460

Targeted

Targeted

Targeted

39,354

Targeted

Targeted

Zero

194

Zero

Targeted

Surrogate

80

Zero

Targeted

Targeted

25

Zero

Targeted

Zero

71

Surrogate

Unknown

Surrogate

6,049

Surrogate

Unknown

Targeted

122

Surrogate

Unknown

Zero

344

Targeted

Unknown

Surrogate

1,367

Targeted

Unknown

Targeted

1,772

Targeted

Unknown

Zero

9

Zero

Unknown

Surrogate

262

Zero

Unknown

Targeted

16

Zero

Unknown

Zero

2,635

Surrogate

Zero

Targeted

31,661

Targeted

Zero

Surrogate

8,659

Targeted

Zero

Targeted

82,011

Targeted

Zero

Zero

162

Zero

Zero

Targeted

1,954

References

[0] "An Efficient Targeting Strategy for Multiobject Spectrograph Surveys: The Sloan Digital Sky Survey," Blanton et al., AJ 125:2276 (2003)

[1] "There Goes the Neighborhood: Relational Algebra for Spatial Data Search", pdf, Alexander S. Szalay, Gyorgy Fekete, Wil O’Mullane, Maria A. Nieto-Santisteban, Aniruddha R. Thakar, Gerd Heber, Arnold H. Rots, MSR-TR-2004-32, April 2004

[2] "Creating Sectors," Alex Szalay, Gyorgy Fekete, Tamas Budavari, Jim Gray, Adrian Pope, Ani Thakar, August 2003, http://cas.sdss.org/dr4/en/help/docs/algorithm.asp?search=sector

Resolving Multiple Detections and Defining Samples

In addition to reading this section, we recommend that users familiarize themselves with the , which indicate what happened to each object during the Resolve procedure.

SDSS scans overlap, leading to duplicate detections of objects in the overlap regions. A variety of unique (i.e., containing no duplicate detections of any objects) well-defined (i.e., areas with explicit boundaries) samples may be derived from the SDSS database. This section describes how to define those samples. The resolve figure is a useful visual aid for the discussion presented below.

Consider a single drift scan along a stripe, called a run. The camera has six columns of CCDs, which scan six swaths across the sky. A given camera column is referred to throughout with the abbreviation camCol. The unit for data processing is the data from a single camCol for a single run. The same data may be processed more than once; repeat processing of the same run/camCol is assigned a unique rerun number. Thus, the fundamental unit of data process is identified by run/rerun/camCol.

While the data from a single run/rerun/camCol is a scan line of data 2048 columns wide by a variable number of rows (approximately 133000 rows per hour of scanning), for purposes of data processing the data is split up into frames 2048 columns wide by 1361 rows long, resulting in approximately 100 frames per scan line per hour of scanning. Additionally, the first 128 rows from the next frame is added to the previous frame, leading to frames 2048 columns wide by 1489 rows long, where the first and last 128 rows overlap the previous and next frame, respectively. Each frame is processed separately. This leads to duplicate detections for objects in the overlap regions between frames. For each frame, we split the overlap regions in half, and consider only those objects whose centroids lie between rows 64 and 1361+64 as the unique detection of that object for that run/rerun/camCol. These objects have the OK_RUN bit set in the "status" bit mask. Thus, if you want a unique sample of all objects detected in a given run/rerun/camCol, restrict yourself to all objects in that run/rerun/camCol with the OK_RUN bit set. The boundaries of this sample are poorly defined, as the area of sky covered depends on the telescope tracking. Objects must satisfy other criteria as well to be labeled OK_RUN; an object must not be flagged BRIGHT (as there is a duplicate "regular" detection of the same object); and must not be a deblended parent (as the children are already included); thus it must not be flagged BLENDED unless the NODEBLEND flag is set. Such objects have their GOOD bit set.

For each stripe, 12 non-overlapping but contiguous scan lines are defined parallel to the stripe great circle (that is, they are bounded by two lines of constant great circle latitude). Each scan line is 0.20977 arcdegrees wide (in great circle latitude). Each run/camCol scans along one of these scan lines, completely covering the extent of the scan line in latitude, and overlapping the adjacent scan lines by approximately 1 arcmin. Six of these scan lines are covered when the "north" strip of the stripe is scanned, and the remaining six are covered by the "south" strip. The fundamental unit for defining an area of the sky considered as observed at sufficient quality is the segment. A segment consists of all OK_RUN objects for a given run/rerun/camCol contained within a rectangle defined by two lines of constant great circle longitude (the east and west boundaries) and two lines of constant great circle latitude (the north and south boundaries, being the same two lines of constant great circle latitude which define the scan line). Such objects have their OK_SCANLINE bit set in the status bit mask. A segment consists of a contiguous set of fields, but only portions of the first and/or last field may be contained within the segment, and indeed a given field could be divided between two adjacent segments. If an object is in the first field in a segment, then its FIRST_FIELD bit is set, along with the OK_SCANLINE bit; if its not in the first field in the segment, then the OK_SCANLINE bit is set but the FIRST_FIELD bit is not set. This extra complication is necessary for fields which are split between two segments; those OK_SCANLINE objects without the FIRST_FIELD bit set would belong to the first segment (the segment for which this field is the last field in the segment), and those OK_SCANLINE objects with the FIRST_FIELD bit set would belong the the second segment (the segment for which this field is the first field in the segment).

A chunk consists of a non-overlapping contiguous set of segments which span a range in great circle longitude over all 12 scan lines for a single stripe. Thus, the set of OK_SCANLINE (with appropriate attention to the FIRST_FIELD bit) objects in all segments for a given chunk comprises a unique sample of objects in an area bounded by two lines of constant great circle longitude (the east and west boundaries of the chunk) and two lines of constant great circle latitude (+- 1.25865 degrees, the north and south boundaries of the chunk).

Segments and chunks are defined in great circle coordinates along their given stripe, and contain unique detections only when limited to other segments and chunks along the same stripe. Each stripe is defined by a great circle, which is a line of constant latitude in survey coordinates (in survey coordinates, lines of constant latitude are great circles while lines of constant longitude are small circles, switched from the usual meaning of latitude and longitude). Since chunks are 2.51729 arcdegrees wide, but stripes are separated by 2.5 degrees (in survey latitude), chunks on adjacent stripes can overlap (and towards the poles of the survey coordinate system chunks from more than two stripes can overlap in the same area of sky). A unique sample of objects spanning multiple stripes may then be defined by applying additional cuts in survey coordinates. For a given chunk, all objects that lie within +- 1.25 degrees in survey latitude of its stripe's great circle have the OK_STRIPE bit set in the "status" bit mask. All OK_STRIPE objects comprise a unique sample of objects across all chunks, and thus across the entire survey area. The southern stripes (stripes 76, 82, and 86) do not have adjacent stripes, and thus no cut in survey latitude is required; for the southern stripes only, all OK_SCANLINE objects are also marked as OK_STRIPE, with no additional survey latitude cuts.

Finally, the official survey area is defined by two lines of constant survey longitude for each stripe, with the lines being different for each stripe. All OK_STRIPE objects falling within the specified survey longitude boundaries for their stripe have the PRIMARY bit set in the "status" bit mask. Those objects comprise the unique SDSS sample of objects in that portion of the survey which has been finished to date. Those OK_RUN objects in a segment which fail either the great circle latitude cut for their segment, or the survey latitude or longitude cut for their stripe, have their SECONDARY bit set. They do not belong to the primary sample, and represent either duplicate detections of PRIMARY objects in the survey area, or detections outside the area of the survey which has been finished to date.

Objects that lie close to the bisector between frames, scan lines, or chunks present some difficulty. Errors in the centroids or astrometric calibrations can place such objects on either side of the bisector. A resolution is performed at all bisectors, and if two objects lie within 2 arcsec of each other, then one object is declared OK_RUN/OK_SCANLINE/OK_STRIPE (depending on the test), and the other is not.

Creating Sectors

Alex Szalay, Gyorgy Fekete, Tamas Budavari, Jim Gray, Adrian Pope, Ani Thakar

August 2003, revised March 2004, December 2004, November 2005
The Problem

The SDSS spectroscopic survey will consist of about 2000 circular Tiles, about 1.5Žº radius, which contain the objects for a given spectroscopic observation. There are more opportunities to target (get the spectrum of) an object if it is covered by multiple tiles. If three tiles cover an area, the objects in that area are three times more opportunity to be targeted. At the same time, objects are not targeted uniformly over a plate. The targeting is driven by a program that uses the SDSS photographic observations to schedule the spectroscopic observations. These photographic observations are 2.5Žº wide stripes across the sky. The strips overlap about 15%, so the sky is partitioned into disjoint staves and the tiling is actually done in terms of these staves (see Figure 1.) Staves are often misnamed stripes in the database and in other SDSS documentation.
Figure 1. Observations consist of overlapping stripes partitioned into disjoint staves. Tiling Runs work on a set of staves, and each Tiling Geometry region is contained within a stave.

Spectroscopic targeting is done by a tiling run that works with a collection of staves - actually not whole staves but segments of them called chunks. The tiling run generates tiles that define which objects are going to be observed (actually, which holes to drill in a SDSS spectroscopic plate.) The tiling run also generates a list of TilingGeometry rectangular regions that describe the sections of the staves that were used to make the tiles. Some TilingGeometry rectangles are positive, others are negative (masks or holes.) Subsequent tiling runs may use the same staves (chunks) and so tiling runs are not necessarily disjoint. So, TilingGeometries form rather complex intersections that we call SkyBoxes.

The goal is to compute contiguous sectors covered by some number of plates and at least one positive TilingGeometry. We also want to know how many plates cover the sector.

This is a surprisingly difficult task because there are subtle interactions. We will develop the algorithm to compute sectors in steps. First we will ignore the TilingGeometry and just compute the wedges (Boolean combinations of tiles). Then we will build TilingBoxes, positive quadrilateral partitions of each tiling region that cover the regions. SkyBoxes are the synthesis of the TilingBoxes from several tiling runs into a partitioning of the survey footprint into disjoint quadrilaterals positive quadrilaterals. Now, to compute sectors, we simply intersect all wedges with all Skyboxes. The residue is the tile coverage of the survey. A tile contributes to a sector if the tile contributes to the wedge and the tile was created by one of the tile runs that contain the SkyBox (you will probably understand that last sentence better after you read to the end of this paper.)

Wedges
Figure 2. A wedge and sector covered by one plate. There are adjoining wedges covered by 2, 3, 4 plates. The lower left corner is an area that is not part of any wedge or sector. SkyBoxes break wedges into sectors and may mask parts of a wedge.
A wedge is the intersection of one or more tiles or the intersection of some tiles with the complements of some others. Each wedge has a depth: the number of positive tiles covering the wedge (see figures 2, 3). The two intersecting tiles in figure 2, A and B, have (A-B) and (B-A) wedges of depth 1, and the intersection (AB) is a depth 2 wedge.
Figure 3. Tile A has a blue boundary; tile B has the red boundary, both regions of depth 1. Their intersection is yellow, a Region of depth 2. The crescents shaded in blue and green are the two wedges of depth 1, and the yellow area is a wedge of depth 2. Nodes are purple dots.

A sector is a wedge modified by intersections with overlapping TilingGeometry regions. If the TilingGeometry regions are complex (multiple convexes) or if they are holes (isMask=1), then the result of the intersection may also be complex (a region of multiple wedges). By going to a SkyBox model we keep things simple. Since SkyBoxes partition the sky into areas of known tile-run depth, SkyBox boundaries do not add any depth to the sectors; they just truncate them to fit in the box boundary and perhaps mask a tile if that tile is in a TilingGeometry hole or if the tile that contributes to that wedge is not part of the TilingGeometry (one of the tiling runs) that make up that SkyBox (Figure 4 shows a simple example of these concepts).
Figure 4.This shows how the tiles and TilingGeometry rectangles intersect to form sectors. On the figure we have a layout that has wedges of various depths, depth 1 is gray, depth 2 is light blue, depth 3 is yellow and depth 4 is magenta. The wedges are clipped by the TilingGeometry boundary to form sectors.

To get started, spCreateWedges() computes the wedge regions, placing them in the Sectors table, and for each wedge W and each tile T that adds to or subtracts from W, records the T->W in the Sectors2Tiles table (both positive and negative parents). So, in Figure 3, the green wedge (the leftmost wedge) would have tile A as a positive parent and tile B as a negative parent.

Boxes
A particular tiling run works on a set of (contiguous) staves, and indeed only a section of each stave called a chunk. These areas are defined by disjoint TilingRegions. To complicate matters, some TilingRegions have rectangular holes in it them that represent bad seeing (bright stars, cosmic rays or other flaws). So a tiling run looks something like Figure 5. And each TilingGeometry is spherical rectangle with spherical-rectangular holes (see Figure 5.)
Figure 5.Staves (convex sides not illustrated) are processed in chunks. TilingGeometry is a chunk/stavesubset with holes (masks). TilingBoxes cover a TilingGeometrywith disjoint spherical rectangles.Ž  There are many such coverings, two are shown for TG1. The one at left has 23 TileBoxes while the one at right has 7 TileBoxes
To simplify matters, we want to avoid the holes and work only with simple convex regions. So we decompose each TileGeometry to a disjoint set of TileBoxes. As Figure 5 shows, there are many different TileBox decompositions. We want a TileBox decomposition with very few TileBoxes. Fewer is better - but the answer will be the same in the end since we will merge adjacent sectors if they have the same depth.

It is not immediately obvious how to construct the TileBoxes. Figure 6 gives some idea.

First, the whole operation of subtracting out the masks happens inside the larger TilingGeometry, called the Universe, U. We are going to construct nibbles which are a disjunctive normal form of the blue area with at least one negative hole edge to make sure we exclude the hole. These nibbles are disjoint and cover the TileGeometry and exclude the mask (white) area.

As described in "There Goes the Neighborhood: Relational Algebra for Spatial Data Search" we represent spherical polygons as a set of half-space constraints of the form h = (hx,hy,hz,c). Point p = (px,py,pz) is inside the halfspace if hx*px+hy*py+hz*pz>c. A convex region, C ={hi} is the set of points inside each of the hi.

Given that representation we can compute the set N of nibbles covering region R = U-C as follows:

Compute R = N = U - C where U and C are convex regions (C is the "hole" in U) the idea is

R 	= {ui} - {ci}
= U &{~c1} | U&{~c2} | ...| U&{~cm}
= U&~c1 | U&c1&~c2 | ... | U&c1&c2&...&cm-1&~cm 
The terms in the last equation are called nibbles.  
They are disjoint (look at the terms if each term has a unique ~ci)
and together they cover R and exclude C (each ~ci excludes C). 
Algorithm:
  
   R= {}			-- the disjoint regions will be added to R.
   NewU = spRegionCopy U  	-- make a copy of U so we do not destroy it
   for each c in C	  	-- for each constraint in c that is an arc 
				--   of the hull 
       Nibble = NewU &{ ~c }	-- intersect Not c with the current universe
       if Nibble not empty	-- if Not c intersects universe then 
          add Nibble to R	-- Add  this Nibble to answer set
          NewU = NewU & {c}    	-- Not c is covered, so reduce the universe 
When each positive TilingGeometry is "nibbled" by its masks, the resulting
nibbles are the TileBoxes we need. 

The procedure spCreateTileBoxes creates, for each TilingGeometry, a set of TilingBox regions that cover it. That procedure also records in Region2Boxes a mapping of TilingGeometry-> TileBox so that we can tell which TilingGeometry region covers a box.

SkyBoxes are the unification of all TileBoxes into a partitioning of the entire sky. Logically, SkyBboxes are the Boolean combination of all the TileBoxes - somewhat analogous to the relationship between wedges and tiles. A SkyBoxes may be covered by multiple TilingGeometries (and have corresponding tiling runs); Region2Boxes records this mapping of TilingGeometry -> TileBox. Figure 7 illustrates how SkyBoxes are computed and how the TilingGeometry relationship is maintained.
Figure 7. SkyBoxes are the intersection of TileBoxes. A pair can produce up to 7 SkyBoxes. The green areas are covered by the union of the tiling runs of the two TileBoxes and the other SkyBoxes are covered by the Tiling Runs of their one parent box.

spCreateSkyBoxes builds all the SkyBoxes and records the dependencies. spCreateSkyBoxes uses the logic of spRegionQuradangleFourOtherBoxes to create the SkyBoxes from the intersections of TileBoxes.

From Wedges and SkyBoxes to Sectorlets to Sectors
We really want the sectors, but it is easier to first compute wedges and SkyBoxes and then build the sectors from them. Recall that:
Wedge: a Boolean combination of tiles.
Skybox: a convex region of the survey covered by certain TilingRuns. So, the sectors are just
Wedge ( Skybox.

This is may be fine a partition - but two adjacent sectors computed in this way might have the same list of covering TileGeometry and Tiles in which case they should be unified into one sector. So, this first Wedge-SkyBox partition is called sectorlets. These sectorlets need to be unified into sectors if they have the same covering tiles. This unification gives us a unique answer (remember that Figure 5 showed many different TileBox partitions, this final step eliminates any "fake" partitions introduced by that step).

Sectorlets are computed as follows: Given a wedge W and a SkyBox SB, the area is just W ( SB. If that area is non-empty then we need to compute the list of covering tileGeometry and tiles. The TilingGeometries come from SB. The tiles are a bit more complex. Let T be the set of tiles covering W. Discard from T any tile not created by a tiling run covering SB. In mathematical notation:

T(sectorlet) = { T e T(wedge) | ( TileRun TR covering SB and TR generated T}
T(sectorlet) is the tile list for the sectorlet W ( SB. This logic is embodied in the procedure spSectorCreateSectorlets (note that wedges have positive and negative tiles).

But, a particular tile or set of tiles can create many sectorlets. We want the sector to be all the adjacent sectorlets with the same list of parent tiles (note that sectorlets have positive (covering) and negative (excluded) parents that make up the sector).
Figure 8.This diagram shows some SDSS data and demonstrates the concepts of Tile, Mask, TileBox, TilingGeometry, SkyBox, Wedge, Sectorlet, and Sector.

The routine spSectorCreateSectors unifies all the sectorlets with the same list of parent tiles into one region. This region may not be connected (masks or tiling geometry may break it into pieces which we then glued back together - see the example of 5 sectorlets creating one sector in Figure 8.)

All these routines are driven by the parent spSectorCreate routine.