Downloading Sloan Digital Sky Surver (SDSS) data
We're going to download SDSS data using SQL ("Structured Query
Language") queries, saving the data to a file for us to analyse.
To download the data for a cluster do this: but only AFTER
you have read through the documentation below:
- Go to the SDSS here: skyserver.sdss.org/dr16
and then click on "SQL Search" under "Data Access"
- Delete the sample query that shows up in the box and
cut-n-paste the query I describe in the documentation below.
Make sure the Output Format checkbox is set to HTML and hit
"submit query". This will show you a sample dataset of the first
100 objects it finds. Look at it and see if it is what you
expected (that it returned a table with the values you asked
- Assuming all looks good, then change the query in two spots:
- Change TOP 100 to TOP 500000 (that's "five hundred thousand"
-- which will give you all the data).
- Change Output Format to CSV ("comma separated values")
- Rerun the query and save the file that gets returned. Move the
file to a sensible spot on your computer and give it the
- When you do this for the other clusters, make sure to
change the ra and dec range to be centered on the correct
- Abell 2065: RA = 230.62156, dec =
- Abell 2063: RA = 230.77116, dec
- Abell 1795: RA = 207.21886, dec =
Here is an SQL search which will get data for the cluster Abell
SELECT TOP 100
ISNULL(s.z, -999) AS
redshift, ISNULL(s.zErr,-999) AS redshiftErr
FROM PhotoObj AS p
OUTER JOIN SpecObj AS s ON s.bestobjid = p.objid
AND p.dec>27.2 and p.dec<28.2
Let's parse this out slowly.
Red lines: this says that I want
to get the first 100 entries from the SDSS "PhotoObj" table (objects
with photometry), which I'm going to label "p", and I'm only going
to get them for objects "WHERE" some criteria are true. In this case
the criteria are tat I want objects that only lie in some
small range of right ascension and declination. In this case, it's
one degree box centered on the position of Abell 2065.
Blue lines: these are the
properties I'm going to pull: position (ra, dec), magnitudes, etc.
I'll explain the specific values in a second. "p." means get the
properties from the PhotoObj table (which, remember, I labelled
Purple lines: I want to also
check the SDSS spectroscopy to see if they have spectra. So I
am JOINing two catalogs (PhotObj and SpecObj, where SpecObj is being
labelled as 's') and I want to make sure that SDSS identifications
match, i.e., that when I grab data from SpecObj, I'm matching it to
the proper PhotObj object. So I am JOINing ON the condition that the
object id's in the two catalogs match each other (they go by
slightly different names in each catalog).
Green lines: The tricky part of
the join is that most of the photometric objects WONT have
spectroscopy, and I want objects even if they dont exist in the
spectroscopy catalog. So I do something called a "LEFT OUTER" JOIN,
which means join the two catalogs even if they dont have entries in
both catalogs. And so where they don't have spectroscopy (so s is
NULL), I need to assign a value to tell me there was no data. For
-999) AS redshift means if there is no
SpecObj redshift value (what SpecObj calls z), just give it a
redshift value of -999, and I'll know to ignore those values.
So the properties I am getting for sources are the following:
- Position: ra and dec
- Photometric type: 3 = extended/resolved source (galaxy), 6 =
unresolved source (star or unresolved galaxy)
- ugriz magnitudes and their uncertainties (called
- the log-likelihood that the object's surface brightness
profile is well-fit by an exponential model (ie a disk) or a
deVaucouleur model (ie an elliptical). Obviously this only makes
sense for resolved sources.
- the redshift and redshiftErr (if available from the
What do these magnitudes refer to? The SDSS data pipeline takes
every photometric object detected and fits its light profile in
one of three ways: as a point source, as an exponential profile,
or as a deVaucouleur profile (a Sersic profile with n=4). It
figures out which of these profiles does the best job of fitting
the source, and then assigns a magnitude for the source based on
the total integrated magnitude of whichever profile gave it the
best fit. The error is just the uncertainty in the magnitude due
to the uncertainty in the fit. For bright sources, this
uncertainty can be quite small, but remember that uncertainty
doesnt include a lot of the systematic errors we've talked about.
If you want, you can browse the PhotoObj
tables to see if there are other photometric or spectrscopic
properties you might be interested in looking at, and alter the
SQL query to add those properties to your download request as