Field Journal from Tom Comeau - 3/8/96
HAWKS, ALUMINUM, AND MATH: MORE WORK ON THE ARCHIVE
I had just packed away my wife's snowboots (we're moving the end of this
month) and this morning it snowed. Two inches or so, but enough to mess
things up and close some of the schools.
We think the hawks are back! Last year we had a small family of red-tailed
hawks that lived in a nearby park. A couple times a week they would come
over onto the campus and go hunting. We'd see them perched halfway up
a tree, or up on the gutters just below the roof line of the building.
The female is pretty big, the male was about the size of a large crow,
and the juveniles (there were one or two, we were never sure which) got
to be the size of the male by the end of the summer. A couple of people
think they saw the female return in the last few days. I've been keeping
an eye out for her.
Today I'm doing two things at once: testing the changes I described
in my last journal, and doing some database work.
The optical platters we use cost about $300 each, so we try to make
sure things are working pretty well before we actually start writing data.
I've been "burning aluminum" all day, and I'm pretty confident that things
are working correctly. Suzanne (another DADS developer) and I have been
working on this project since about Halloween, and we're both relieved
to see it coming close to the end. And we're anxious to get on to the
next phase of work for SM-97.
I mentioned "burning aluminum" above. I call it that because when we
write to an optical disk, a laser in the drive blows little pits in a
very thin sheet of aluminum trapped between two layers of transparent
plastic. When we write, a high-powered laser burns out the pits. When
we read, a lower-powered laser looks to see what those pits look like.
Once you've burned the pits into the aluminum, it's permanent. You can't
erase it, and we expect the disks to be good for at least 20 years, and
maybe as much as 100 years.
CD-ROMs (and music CDs) work basically the same way, but the pits are
"stamped" using a pressing machine, rather than blowing them out with
a laser. The low-power laser in your CD player works the same as our optical
disks drives.
To test a new version of the programs that add data to the archive,
we run a standard set of test data through the system. There are about
700 files, for a total of 305 megabytes of data. That's about half of
a typical CD-ROM, or about one twentieth of our big disks. It takes a
couple hours to run all the data through, and that leaves me time to work
on my other problem.
The database work I'm doing involves figuring out how much space we
should reserve when we are making tapes to send to astronomers.
I'm trying to figure out just how big HST Datasets are. A dataset is
a collection of files that together hold all the data for an image or
spectrum. For WFPC-II (Wide-Field and Planetary Camera Two - the camera
that takes most HST pictures) this is a pretty constant number: about
25 megabytes in 10 files. It's a pretty constant because the camera takes
the same sort of pictures all the time. Each picture is four "chips" in
an 800x800 array. (A typical PC screen has 1024x768 pixels -- a single
WFPC-II chip is just slightly smaller, but square.) There are a total
of about 40 bytes of information about each pixel, including calibrated
and uncalibrated values, quality information, and other stuff. Since the
size of the picture doesn't change, the size of the dataset doesn't change
either.
For the spectrographs, the size of the dataset can vary a lot. This
is because a single dataset can contain multiple spectra. In the case
of the Goddard High Resolution Spectrograph, it can vary from just 38
kilobytes to over 300 megabytes!
But what I want is a "pretty good" estimate of each kind of dataset,
and I can use that to plan how much space I'll need to retrieve a particular
set of data. To get a statistical look at the data, I have this nice complicated
query that gets the minimum, maximum, and average size of "Z-CAL" datasets.
"Z-CAL" datasets are CALibrated science data for the GHRS. (Each instrument
has a letter associated with it: U is for WFPC-II, X is for FOC, Z is
for GHRS.) Once I have all that data, I can also compute the "standard
deviation", which is a kind of average difference in sizes. That gives
me an idea of how much variation there is in size.
Here's another example: If ten people take a test, and they all score
between forty and sixty points, with an average of fifty points, that's
a pretty low standard deviation. If another group of ten take the test,
and half of them score about 20, while the other half score about 80,
the average would still be 50, but the standard deviation would be pretty
big.
When you see a large standard deviation like that, you have to decide
if you're seeing different "populations". For example, if you have a test
aimed at eighth graders, and you get five people who score about 20, and
five who score about 80, the fact that you have a large deviation makes
you wonder if maybe the five who scored 20s were perhaps second graders!
In my case, I've discovered there are two types of GHRS observations:
short, small observations with one or a few spectra, and large observations
that have many spectra. The "mode" I see for those observations is "RAPID",
and I'll have to get one of the astronomer types to explain that operating
mode to me.
That's the kind of math I do pretty regularly: Statistical analysis
of the contents of the archive. I rarely need to do any calculus, though
I know enough to understand how the mathematical "tools" I use work. But
I do a lot of algebra, and use programs that have statistical functions.
Well, my big test is finished, and while most things are working, there
are a couple of problems I need to work on. I'm going to take a break,
get something to drink, and see if I can spot that hawk before I tackle
them.
|