Algorithm Descriptions
DR5 Help
 Site News
 Introduction
 Cooking with Sloan
 FAQ
 
 Search Form Guide
 SQL Tutorial
 SQL in SkyServer
 Sample SQL Queries
 Graphing
 Query Limits
 Searching Advice
 
 Archive Intro
 Table Descriptions
 Schema Browser
 Glossary
 Algorithms
 Web Browsers
 
 Download
 Data Publications
 API
 SkyServer Sites
 
 Contact Help Desk

QSO Catalog

Building the QsoCatalogAll and QsoConcordanceAll tables

Jim Gray, Sebastian Jester, Gordon Richards, Alex Szalay, Ani Thakar
March 2006

Abstract: We constructed a catalog of all quasar candidates and gathered their "vital signs" from the many different SDSS data sources into one Quasar Concordance table.

1. The Target, Best, and Spec SDSS Datasets

The SDSS Target Database is used to select the targets that will be observed with the SDSS spectrographs. Once made, these targeting decisions are never changed but the targeting algorithm has improved over time. The SDSS pipeline software is always improving so the underlying pixels are re-analyzed with each data release. To have a consistent catalog, all the mosaiced pixels, both from early and recent observations are reprocessed with the new software in subsequent data releases. The output of each of these uniform processing steps is called a Best Database. So at any instant there is the historical cumulative Target database and the current Best database. As of early 2006 we have the Early Data Release (EDR) databases and then five "real" data releases DR1, DR2, DR3, DR4, and DR5.

The target selection is done by the various branches (galaxy, quasar, serendipity) of the TARGET selection algorithm. These targets are organized for spectroscopic follow-up by the TILING (Blanton et al. 2003) [0] algorithm as part of a tiling run that works within a tiling geometry. The tiling run places a 2.5 deg. circle over a tiling geometry and then assigns spectroscopic targets to be observed. The circle corresponds to a plate that can be mounted on the SDSS telescope to observe 640 targets at a time. The plates are "drilled" and "plugged" with optical fibers and then "observed". These spectroscopic observations are fed through a pipeline that builds the Spec dataset. Because Spec is relatively small (2% the size of Best), it is included in the Best database. Unfortunately, only the "main" SDSS target photometry is exported to the Target database (the target photometry for Southern and Special plates is not exported - at best we have the later Best photometry for these objects in the database).

The SDSS catalogs are cross-matched with the FIRST, ROSAT, Stetson, USNO, and USNO-B catalogs and some vital signs from some of those catalogs are included in the Quasar Concordance.

2. Overview: Finding Everything That MIGHT be a Quasar

We look in the Target..PhotoObjAll, Best..SpecObjAll, and Best..PhotoObjAll tables to find any object that might be a quasar (a QSO). We build a QsoCatalogAll table that has a row for every combination of nearby TargPhoto-Spec-BestPhoto objects from these lists that are within 1.5 arcseconds of one another. If no matching object can be found from the QSO candidate list we find a surrogate object -- the nearest primary object from the corresponding catalog (Spec, BestPhoto, TargPhoto) if one can be found (again using the 1.5" radius.) If an object is still unmatched, we look for a secondary object, or put a zero for that ObjectID (in general, we use zero rather than the SQL null value to represent missing data).

2.1. Overview: QSO Tables

The tables and views created by the quasar concordance algorithm on the Best, Target and Spectro datasets are part of the Best database. The following sections explain how they are computed.

QSO Table/View descriptions

Name

type

Description

QsoCatalog

View

A view of QsoCatalogAll limited to only the best QSO from each bunch

QsoConcordance

View

A view of QsoConcordanceAll limited to only the best QSO from each bunch

QsoCatalogAll

Table

The superset of all QSO candidates identified by the algorithm described below

QsoConcordanceAll

Table

The wide table that combines the Best, Spec and Target fields for each QSO candidate

QsoBunch

Table

The QSO neighbors organized into neighborhood bunches with a head QSO associated with each bunch

QsoBest

Table

The fields from the Best PhotoObjAll table associated with each QSO candidate

QsoSpec

Table

The fields from the Best SpecObjAll table associated with each QSO candidate

QsoTarget

Table

The fields from the Target PhotoObjAll table associated with each QSO candidate

2.2. Overview: Quasar Bunches

Figure 1: A bunch of 2 targets, 2 bests and one spec object that are within 1.5" of another bunch member. This bunch produces 4 (target,best,spec) triples in the concordance. The first target is the bunch head.

The algorithm uses spatial proximity (aka: "is it nearby?") to cross-correlate objects in the Target, Best, and Spec databases. The definition of nearby is fairly loose: The SDSS Photo Survey pixels are 0.4 arcsecond and the positioning is accurate to .1 arcsecond, but the Spectroscopic survey has fibers that are 1.5 arcseconds in diameter. Therefore, the QSO concordance uses the 1.5" fiber radius to define nearby for all 3 datasets.

In a perfect world, one SpecObj matches one BestObj and one TargetObj, and they are all marked as QSOs. Some objects have no match in the other catalogs -- so we have zeros in those slots of that object's row. But, sometimes 2 SpecObj match 3 TargetObj and 4 BestObj, and all 9 objects are marked as QSOs. In this case we get 2x3x4 rows. We group together all the objects that are related in this way as a bunch. Each bunch has a head object ID: the first member of the bunch to be recognized as a possible QSO. The precedence is TargetObjID first, if there is no target in the bunch then the first SpecObjID (highest S/N primary first), else the first BestObjID. This ordering reflects the first time the object was considered for follow-up spectroscopy. This order avoids a selection bias in the dataset (e.g., Malmquist bias if we were to order on decreasing S/N).

2.3 The QSO Catalog and Concordance

Figure 2: The Qso schema.

The premise is that any Target-Spec-Best tripple may be interesting so all such triples are the QsoCatalogAll table. The vital signs (e.g position, flags, flux,...) of each object are copied from the corresponding database to a small tables along with some derived measurements special to QSOs (these are the QsoTarget, QsoSpec, and QsoBest tables). All these tables are unified by the QsoConcordanceAll view that "glues" the vital signs together. Most people just want to see the best triple of each bunch - primary only and best S/N. So the QsoConcordance view shows just the "primary" triple of each bunch.

3. Overview: A Walkthrough of the Algorithm.

Phase 1: Gather the Quasars and Quasar Candidates: As a first step, gather the Target, Spec, and Best quasar candidate or confirmed objects into a Zones table [1] containing their object identifiers and positions. These are copied from the Best and Target PhotoObjAll tables and the Best SpecObjAll table. These copies are filtered by flags indicating that the objects are QSOs or are targeted as QSOs. For the photo objects (target and best), this means they are primary or secondary and flagged (primTarget) as: TARGET_QSO_HIZ OR TARGET_QSO_CAP OR TARGET_QSO_SKIRT OR TARGET_QSO_FIRST_CAP OR TARGET_QSO_FIRST_SKIRT ( = 0x0000001F). For the spectroscopic objects, they must have one or more of the following properties:
  1. recognized as a QSO or is of Unknown type or    -- specClass in {UNKNOWN, QSO, or HIZ_QSO}
  2. have high redshift (z > 0.6), or     -- High Redshift objects are likely QSOs
  3. they must be a QSO target ((primTarget & 0x1F) ≠ 0).    -- or the object was targeted as a QSO

That logic is fine for most Spectroscopic objects, but there are "special plates" whose authors overloaded the primary target flags (yes, they made it much harder to understand the data and cost many hours of discussion trying to disambiguate the data.) One can recognize the standard cases with the predicate plate.programType = 0 meaning that the plate was processed as a "Main" (programType=0 is "Main") chunk, not as a "special" (programType=2) or "Southern" (programType=1) plate. The three-case logic about works fine for "main" targets. The "targets for special plates" have SpecObj.primtarget & 0x80000000≠ 0. Once you know it is "special" plate you have to ask if it is a "special target". If it is, you have to ask is it the "Fstar72" group? If not you can use the standard test ((primTarget & 0x1F) ≠ 0) - those nice people did not "overload" the primTarget flags. But the folks who did "Fstar72" overloaded the flags and so we get the following complex logic:

-- select SpecObjects that are either declared QSOs from their spectra
-- or that were targeted as likely QSOs
Select S.SpecObjID
      from BestDr5.dbo.platex         as P 
      join BestDr5.dbo.specobjall as S on P.plateid = S.plateid
where       
    specClass in (3,4,0)	-- class is QSO or HiZ_QSO or Unknown.
                or z > 0.6                    -- or high redshift
                or (			-- standard-survey plates
	     px.programtype = 0 -- MAIN targeting survey
	     and so.primtarget & 0x1f != 0
	    )
                or  (			-- special quasar targets from special plates
				-- see http://www.sdss.org/dr4/products/spectra/special.html
	     so.primtarget & 0x80000000 != 0
	     and (  (      px.programname in ('merged48','south22')
	                           and so.primtarget & 0x1f != 0
                                      )
		    or (    px.programname = 'fstar72'
                                      	   and so.primtarget & 4 != 0
                              	       )                                      
		    or (  -- bent double-lobed FIRST source counterparts from specialplates
			  -- The "straight double" counterparts have already been snuck 
                                              	  -- into the usual FIRST counterpart quasar category 0x10.
                                  	px.programname = 'merged48'
                                  	and so.primtarget & 0x200000 != 0
		       )      
		 )    
	    ) 
                or (	          -- non-special quasar targets from special plates
	     so.primtarget & 0x80000000 = 0
	     and px.programname in ('merged73','merged48','south22')
	     and so.primtarget & 0x1f != 0
	    )
----------------------------------------------------------------------------------------------

Phase 2: Find the Neighbors. Once the zone table is assembled containing all the candidates, a zones algorithm [1] is used to build a neighbors table among all these objects. Two objects are QSO neighbors if they are within 1.5 arcseconds of one another. The relationship is made transitive so that friends of friends are all part of the same neighborhood.

Phase 3: Build the Bunches. The Neighbors relationship partitions the objects into bunches. We pick a distinguished member from each bunch to represent that bunch - called the bunch head. The selection favors Target then Spec, then Photo Objects and within that category it favors primary, then secondary, then outside objects if there is a tie within one group (e.g. multiple target objects in a bunch.) If there are multiple selections within these groups, the tie is broken by taking the minimum object ID for PhotoObj (again, to avoid any selection bias) and the highest S/N for specObjs. Given these bunch heads, we record a summary record for each bunch in the QsoBunch table:

QsoBunch table

Name

type

Description

HeadID

bigint

Unique identifier of the head object of this bunch of objects (all nearby one another).

HeadType

Char(6)

TARGET, SPEC, or BEST depending on what type of object the head is

RA

Float

RA of bunch head object

Dec

Float

DEC of bunch head object

TargetObjs

int

Count of the number of Target objects in the bunch.

SpecObjs

int

Count of the number of Spectroscopic objects in the bunch.

BestObjs

int

Count of the number of Best objects in the bunch.

TargetPrimaries

int

Count of Primary Target objects in the bunch.

SpecPrimaries

int

Count of the SciencePrimary Spectroscopic objects in the bunch.

BestPrimaries

int

Count of Primary Best objects in the bunch.

Where the difference between TargetObjs and TargetPrimaries (etc.) is that TargetObjs indicates multiple entries of the same object in the database (e.g. both as a primary and a secondary), whereas TargetPrimaries helps us to identify objects that are either very close together or that were deblended into two objects separated by less than 1.5" (or are in a circle of 1.5" radius). Because the object primary flags are not handy at this point of the computation, the Bunch statistics are actually computed in Phase 9.

Phase 4: Build the Catalog. Now we grow the QsoCatalogAll table which, for each bunch, has triples drawn from each class of the bunch (a target, a spec, and a best object). For example, the bunch of Figure 1 would produce 4 triples. If there is no object in one of the classes, we fill in with a non-QSO surrogate object - the primary object from that database (Targ, Photo, Spec) closest to the bunch head, or if there is no primary then a secondary (the test insists on the 1.5 arcsecond radius.) If no such object can be found we fill in that slot with a zero object. The resulting table looks like this:

QsoCatalogAll table

Name

type

Description

HeadID

bigint

Unique identifier of this bunch of objects (all nearby one another).

TripleID

bigint

Unique identifier of this (spec, best, target) triple

QsoPrimary

bit

This is the best triple of the bunch.

TargetObjID

bigint

Unique ID in Target DB or 0 if there is no matching object.

SpecObjID

bigint

Unique ID of spectrographic object or 0 if there is no such object.

BestObjID

bigint

Unique ID in BestDB composed from or 0 if there is no such object.

TargetQsoTargeted

bit

Flag: 1 PhotoObjID was flagged as a QSO in the target flags.

SpecQsoConfirmed

bit

Flag: 1 means this SpecObj.SpecClass QSO or HiZ_QSO

SpecQsoUnknown

bit

Flag: 1 means this SpecObj.SpecClass is unknown

SpecQsoLargeZ

bit

Flag: 1 means this SpecObj Z > 0.6

SpecQsoTargeted

bit

Flag: 1 means this SpecObj was picked as a QSO target

BestQsoTargeted

bit

Flag: 1 PhotoObjID was flagged as a QSO in the target flags.

dist_Target_Best

float

distance arcMin between Target and Best

dist_Target_Spec

float

distance arcMin between Target and Spec

dist_Best_Spec

float

distance arcMin between Best and Spec

psfmag_i_diff

float

target.psfmag_i - best.psfmag_i

psfmag_g_i_diff

float

(target.psfmag_g-target.psfmag_i) - (best.psfmag_g-best.psfmag_i)

The last 5 "quality fields" are computed in Phase 9.

Phase 5: Find Surrogates for missing objects. Some objects in the Catalog entries have no matching Target, Best, or Spec objects. In these cases we look in the database to find a surrogate object (which was not a QSO candidate) that is nearby the bunch head object - as usual the search radius is 1.5 arcseconds and we favor primary over secondary objects and favor low-signal-to noise ratio SpecObjs.

Phase 6: Get the Vital Signs. We now go to the source databases and get the "vital signs" of these photo and spetro objects (both quasar candidates and also surrogates) , building a QsoSpec, QsoTarget, and QsoBest tables holding these values and for the photo objects, some additional values from ROSAT and FIRST if there is a match. We then define QsoConcordanceAll as a view on these base tables with the following (~100) fields.

Phase 7: Define QsoConcordanceAll and QsoConcordance Views: Now we are ready to "glue together the QsoCatalog with the vital signs to make a "fat table" with all the attributes.

From QsoTarget From QsoSpec From QsoBest
HeadObjID
tripleID
QsoPrimary
TargetQsoTargeted
SpecQsoConfirmed
SpecQsoUnkonwn
SpecQsoLargeZ
SpecQsoTargeted
BestQsoTargeted
dist_Target_Best
dist_Target_Spec
dist_Best_Spec
psfmag_i_diff
psfmag_g_i_diff
targetObjID
targetRa
targetDec
targetCx
targetCy
targetCz
targetPsfMag_u
targetPsfMag_g
targetPsfMag_r
targetPsfMag_i
targetPsfMag_z
targetPsfMagErr_u
targetPsfMagErr_g
targetPsfMagErr_r
targetPsfMagErr_i
targetPsfMagErr_z
targetExtinction_u
targetExtinction_g
targetExtinction_r
targetExtinction_i
targetExtinction_z
targetType
targetMode
targetStatus
targetFlags
targetFlags_u
targetFlags_g
targetFlags_r
targetFlags_i
targetFlags_z
targetRowC_i
targetColC_i
targetInsideMask
targetPrimTarget
targetPriTargHiZ
targetPriTargLowZ
targetPriTargFirst
targetFieldID
targetFieldMjd
targetFieldQuality
targetFieldCulled
targetSectorID
targetFirstID
targetFirstPeak
targetRosatID
targetRosatCps
targetMi
targetUniform

SpecObjID
SpecRa
SpecDec
SpecCx
SpecCy
SpecCz
SpecZ
SpecZerr
SpecZConf
SpecZStatus
SpecZWarning
SpecClass
SpecPlate
SpecFiberID
SpecMjd
SpecSciencePrimary
SpecPrimTarget
SpecLineID
SpecMaxVelocity
SpecBestObjID
SpecTargetObjID
SpecTarget
SpecSn1_i
SpecSn2_i

























bestObjID
bestRa
bestDec
bestCx
bestCy
bestCz
bestPsfMag_u
bestPsfMag_g
bestPsfMag_r
bestPsfMag_i
bestPsfMag_z
bestPsfMagErr_u
bestPsfMagErr_g
bestPsfMagErr_r
bestPsfMagErr_i
bestPsfMagErr_z
bestExtinction_u
bestExtinction_g
bestExtinction_r
bestExtinction_i
bestExtinction_z
bestType
bestMode
bestFlags
bestFlags_u
bestFlags_g
bestFlags_r
bestFlags_i
bestFlags_z
bestRowC_i
bestColC_i
bestInsideMask
bestPrimTarget
bestPriTargHiZ
bestPriTargLowZ
bestPriTargFirst
bestFieldID
bestFieldMjd
bestFieldQuality
bestFieldCulled
bestFirstID
bestFirstPeak
bestRosatID
bestRosatCps
bestMi



Bunch members

Bunches

1

238,073

2

10,619

3

1,397

4

14,470

5

202

6

170

7

36

8

551

9

115

12

61

16

2

Phase 9: Mark the primary triple of each bunch, compute some derived magnitude values and cleanup: Having the QsoConcordanceAll view and all the vital signs in place we compute some derived values: Picking the best triple of each bunch, computing the distances among members of the triple and computing some derived psf magnitudes.

In the end, the DR5 database has 265,697 bunches, 329,871 triples in the concordance and 114,883 confirmed quasars. Most bunches have one catalog entry, but about 10% have multiple matches (generally and primary and secondary best or target object where both are flagged as QSO candidates or multiple observations of a spectroscopic object). The catalog itself has some interesting cases. In DR5 there are 82,142 cases where the Target, Spec, and Best all agree that it is a quasar. Since SDSS spectroscopy lags the imaging, it is not surprising that there are 81,011 objects where both the Target and Best indicate a likely QSO, but there is no spectrogram for the object (the Spec Zero case).

With the QsoCatalogAll and QsoConcordanceAll in place we define two views: QsoCatalog (the best of the bunch) and QsoConcordance (the wide version) by picking the best targetObj, spec, and bestObj of each bunch.

DR5 QsoCatalogAll

Target

Spec

Best

Count

Surrogate

Confirmed

Surrogate

24,348

Surrogate

Confirmed

Targeted

1,080

Surrogate

Confirmed

Zero

88

Targeted

Confirmed

Surrogate

5,556

Targeted

Confirmed

Targeted

83,142

Targeted

Confirmed

Zero

102

Zero

Confirmed

Surrogate

108

Zero

Confirmed

Targeted

32

Zero

Confirmed

Zero

427

Surrogate

LargeZ

Surrogate

1,458

Surrogate

LargeZ

Targeted

31

Surrogate

LargeZ

Zero

32

Targeted

LargeZ

Surrogate

110

Targeted

LargeZ

Targeted

209

Targeted

LargeZ

Zero

1

Zero

LargeZ

Surrogate

26

Zero

LargeZ

Targeted

3

Zero

LargeZ

Zero

25

Surrogate

other

Surrogate

93

Surrogate

other

Targeted

1,627

Targeted

other

Surrogate

301

Targeted

other

Targeted

593

Zero

other

Targeted

2

Surrogate

Targeted

Surrogate

8,514

Surrogate

Targeted

Targeted

728

Surrogate

Targeted

Zero

28

Targeted

Targeted

Surrogate

24,460

Targeted

Targeted

Targeted

39,354

Targeted

Targeted

Zero

194

Zero

Targeted

Surrogate

80

Zero

Targeted

Targeted

25

Zero

Targeted

Zero

71

Surrogate

Unknown

Surrogate

6,049

Surrogate

Unknown

Targeted

122

Surrogate

Unknown

Zero

344

Targeted

Unknown

Surrogate

1,367

Targeted

Unknown

Targeted

1,772

Targeted

Unknown

Zero

9

Zero

Unknown

Surrogate

262

Zero

Unknown

Targeted

16

Zero

Unknown

Zero

2,635

Surrogate

Zero

Targeted

31,661

Targeted

Zero

Surrogate

8,659

Targeted

Zero

Targeted

82,011

Targeted

Zero

Zero

162

Zero

Zero

Targeted

1,954

References

[0] "An Efficient Targeting Strategy for Multiobject Spectrograph Surveys: The Sloan Digital Sky Survey," Blanton et al., AJ 125:2276 (2003)

[1] "There Goes the Neighborhood: Relational Algebra for Spatial Data Search", pdf, Alexander S. Szalay, Gyorgy Fekete, Wil O'Mullane, Maria A. Nieto-Santisteban, Aniruddha R. Thakar, Gerd Heber, Arnold H. Rots, MSR-TR-2004-32, April 2004

[2] "Creating Sectors," Alex Szalay, Gyorgy Fekete, Tamas Budavari, Jim Gray, Adrian Pope, Ani Thakar, August 2003, http://cas.sdss.org/dr4/en/help/docs/algorithm.asp?search=sector