SETIhome: Internet Distributed Computing for SETI
D. P. Anderson, D. Werthimer, J. Cobb, E. Korpela, M. Lebofsky
Spa e S ien es Laboratory, University of California, Berkeley,
California
D. Gedye
Vul an Northwest, In .
W. Sullivan
Department of Astronomy, University of Washington
Abstra t.
SETIhome is a radio SETI proje t that does its primary
signal analysis using Internet- onne ted omputers. In the rst ve months
of operation of SETIhome, a million people have parti ipated, and have
ontributed 100,000 years of omputer time. The use of distributed omputing limits frequen y overage but allows greater sensitivity and generality in the signal analysis.
1.
Introdu tion
SETIhome is a radio SETI sky survey whi h, like SERENDIP IV (Werthimer,
et al. 1997), gets data from a \piggyba k" re eiver at the Are ibo radio teles ope.
Whereas SERENDIP analyzes this data primarily using a spe ial-purpose super omputer lo ated at the teles ope, SETIhome distributes the data through
the Internet to hundreds of thousands of personal omputers. This approa h
provides a tremendous amount of omputing power but limits the amount of
data that an be handled. Hen e SETIhome overs a relatively narrow frequen y range (2.5 MHz) but sear hes for a wider range of signal types, and with
better sensitivity, than other SETI sky surveys to date.
SETIhome was laun hed on May 17, 1999. In its rst ve months, it
has attra ted over a million parti ipants. Together they have ontributed over
100,000 years of omputer time, making SETIhome the largest omputation
ever performed.
2.
S ien e Design
SETIhome is a SETI sky survey at the National Astronomy and Ionospheri
Center's 305 meter radio teles ope in Are ibo, Puerto Ri o. It shares the piggyba k re eiver used by SERENDIP IV (Werthimer et al. 1997), but its sear h
spa e is roughly orthogonal to that of SERENDIP IV; although SETIhome has
1
1/40 the frequen y overage of SERENDIP IV, its sensitivity is ten times better.
The SETIhome sear h also overs a ri her variety of signal bandwidths, drift
rates, and time s ales than SERENDIP IV or any other SETI program to date.
Primary data analysis, done using distributed omputing, omputes power
spe tra and sear hes for \ andidate" signals su h as spikes, Gaussians, and other
signal types. Se ondary analysis, done on the proje t's own omputers, reje ts
RFI and sear hes for repeated events within the database of andidate signals.
2.1.
Re eiver and Data Re ording
SETIhome uses a dedi ated at feed and ryogeni re eiver mounted on the
arriage house of the Are ibo teles ope. The feed provides a single linear polarization with a gain of 3K/Jy and a 0.1 degree beam width. System temperature
is 45K. The SETIhome survey overs 28% of the sky (de linations ranging
from +1 to +35 degrees) with a sensitivity of 3E-25 W/m2. SETIhome observations will span a total of two years, during whi h most of the sky will be
observed two or three times. Observations began in O tober 1998.
SETIhome overs a 2.5 MHz bandwidth entered at the 1420 MHz Hydrogen line. The re eiver output is down- onverted with quadrature analog mixers
and lters, then digitized and onverted to baseband by a digital quadrature
mixer and a pair of 256 tap nite impulse response low pass lters. The resulting 2.5 MHz band is re orded ontinuously on 35 Gbyte DLT IV tapes with two
bit omplex sampling, along with data on teles ope oordinates, time and engineering monitors. Tapes are mailed to UC Berkeley for analysis; the omplete
sky survey requires 1100 tapes to re ord a total of 39 terabytes of data.
We expe t to re ord high quality data 65% of the time, observing ea h of
the one million beams two or three times during the two year program. It is
important to observe ea h beam several times be ause sour es may s intillate
(Cordes, 1991) or have short duty y les, and most of our robust dete tion
algorithms require multiple dete tions. SETIhome is able to olle t useful
data whenever the teles ope is stationary or the Gregorian feed is tra king a
sour e. When the Gregorian system tra ks a sour e, the SETIhome feed is
moving at 1 to 2 times sidereal rate on the sky and a sour e remains in the
beam for 12 to 24 se onds. When the teles ope is stationary, a sour e is in the
beam for 24 se onds.
2.2.
Primary Data Analysis
SETIhome data tapes from the Are ibo teles ope are divided into small \work
units" as follows: the 2.5 MHz bandwidth data is rst divided into 256 subbands by means of a 2048 point fast Fourier transform (FFT) and 256 eight
point inverse transforms. Ea h work unit onsists of 107 se onds of data from
a given 9,765 Hz sub-band. Work units are then sent over the Internet to the
lient programs for the primary data analysis.
Be ause an extraterrestrial ivilization's signal has unknown bandwidth and
time s ale, the lient software sear hes for signals at 15 o tave spa ed bandwidths
ranging from 0.075 Hz to 1220 Hz, and time s ales from 0.8 mSe to 13.4 se onds. The rest frame of the transmitter is also unknown (it may be on a planet
that is rotating and revolving), so extraterrestrial signals are likely to be drifting in frequen y with respe t to the observatory's topo entri referen e frame.
2
Be ause the referen e frame is unknown, the lient software examines 6761 different Doppler a eleration frames of rest (dubbed \ hirp rates"), ranging from
-10 Hz/se to +10 Hz/se .
At ea h hirp rate, peak sear hing is done by omputing non-overlapping
FFTs and their resulting power spe tra. FFT lengths range from 8 to 131,072
in 15 o tave steps. Peaks greater than 22 times the mean power are re orded
and sent ba k to the SETIhome server for further analysis.
Besides sear hing for peaks in the multi-spe tral-resolution data,
SETIhome also sear hes for signals that mat h the teles ope's Gaussian beam
pattern. Gaussian beam tting is omputed at every frequen y and every hirp
rate at spe tral resolutions ranging from 0.6 to 1220 Hz (temporal resolutions
from 0.8 mS to 1.7 se onds). The beam tting algorithm attempts to t a Gaussian urve at ea h time and frequen y in the multi-resolution spe tral data, of
the form:
P = B + Ae
where:
P
B
A
t
t0
b
(
(
(t t0 ) 2
) )
b
= predi ted power
= baseline power
= peak power
= time
= time of Gaussian peak
= half power beamwidth
B , A, and t0 are free parameters in the t, but the beamwidth is known,
al ulated from the slew rate of the teles ope beam for ea h work unit. Gaussian
ts whose A/B ex eeds 3.2 and whose hi-squared < 10 are reported by the lient
software to the server for se ondary analysis.
We plan to extend the primary analysis to sear h for pulsed signals using the
Fast Folding Algorithm (FFA) (Staelin 1969) and to sear h for regularly-spa ed
triplet peaks.
2.3.
Se ondary Data Analysis
Most of the signals found by the lient programs turn out to be terrestrial
based radio frequen y interferen e (RFI). We employ a substantial number of
algorithms to reje t the several types of RFI (see Cobb et al, these pro eedings).
After the RFI is reje ted, we sear h the remaining data set for multiple
dete tions in any referen e frame, giving higher weights to drifting or pulsed
signals, those that repeat in the bary entri frame, that mat h the antenna
beam pattern, or dete tions oin ident with newly dete ted planets, nearby stars
(from the Gliese atalog) or globular lusters (again, details in Cobb, et al).
We ompare andidates signals with SERENDIP IV data, and will follow up
interesting andidates with dedi ated observations.
3.
Software Design
The SETIhome software an be divided into two parts: the \ lient", the program that runs on volunteer omputers, and the \server", whi h runs on omputers at UC Berkeley.
3
3.1.
Server
The indexing and sear hing apabilities of a relational database are riti al for
the huge volumes of information handled by SETIhome. The SETIhome
server uses a database ontaining tables for:
Users (name, email address, work ompleted).
A ounting re ords storing the amount of work done.
Country, CPU type, Internet domain, and so on.
Tapes pro essed.
Work units, whether on disk or not.
Results (per work unit).
Platforms (types of lient omputers).
Versions of the lient software.
For performan e, the database is divided between two servers, ea h running
Informix, a ommer ial database server.
The fun tions of the server are divided among several programs (see Figure
1):
Data Server: This program ommuni ates with lients. It sends work
units, a epts results, and handles requests to reate new user a ounts.
It must handle about 10-15 requests per se ond, some of whi h an take
up to a minute to omplete, so several hundred opies of the program are
run on urrently.
Splitter: This program onverts raw data into work units, as des ribed in
Se tion 2.2. Splitting is slower than real time on a Spar Ultra 10, so we
run the splitter on several ma hines.
Disk Cleaner: This program deletes work units, making room for new work
units. It deletes work units for whi h a result has been re eived, and if disk
spa e is low it also deletes work units that have been sent several times.
A ountant: This program s ans at les des ribing results returned, and
updates a ounting re ords. It maintains a memory a he of frequentlya essed re ords, minimizing database traÆ .
CGI program: This program, invoked from an Apa he web server, provides
database-driven features on the web site, su h as Groups and Polls.
Web page generator: This program generates frequently-a essed dynami
web pages su h as Totals and Country Totals (generated every hour) and
Group pages (generated every 24 hours).
All these programs are written in C++ using ESQL (embedded SQL) for
database a ess.
4
Figure 1. The SETIhome server system. Ea h re tangle represents a omputer. Shaded ovals represent programs developed by
SETIhome. The ow of radio teles ope data is shown with heavy
lines.
5
Figure 2. The lient software ar hite ture. Platform-dependent
parts are in dotted boxes. SETI-spe i parts are in shaded boxes.
3.2.
Client
The lient program is ar hite ted so that it an be easily ported to many platforms, and so that it an be retargeted for omputations other than SETIhome
(see Figure 2).
On Windows and Ma intosh the program a ts as a s reensaver: it runs only
when the user is not a tive. When the program runs, its main loop repeatedly
attemps to onne t to the server, get a work unit, pro ess it, and return the
result. While analyzing data, the program periodi ally writes a \ he kpoint"
le to disk, so that it will resume from the same pla e if the omputer is stopped
and restarted.
The lient program has been ported to about 50 platforms. Two of these
(Windows and Ma intosh) required platform-spe i programming. The other
platforms (Linux and other forms of UNIX, BeOS, OS/2, VMS) are POSIX
ompliant and support the Gnu development environment; they all ompile from
the same sour e ode.
To ensure that ea h version is numeri ally orre t, we test ea h one on
a \referen e work unit" and validate its result before making it available for
download.
4.
User Involvement
The SETIhome proje t was announ ed early in 1998 and laun hed on May 17,
1999. During the interim our web site allowed people to sign up for noti ation.
6
We re eived over 400,000 su h requests. Within 2 weeks of the laun h there
were 200,000 users, and the number has steadily in reased, to a urrent total of
1,300,000. Users ome from 224 ountries, and about 50% of the users are from
outside the U.S.
4.1.
Web site
The SETIhome web site (http://setiathome.berkeley.edu) serves to attra t
SETIhome users, to edu ate them, and to maintain their interest and involvement in the proje t. The web site has many fun tions:
It allows users to download the lient program.
It has edu ational material about SETI in general and SETIhome in par-
ti ular. Separate versions are aimed at a general and s ienti
audien es.
It has a Frequently Asked Questions (FAQ) se tion for ommon user prob-
lems, and a bug report submission form.
It has News se tions, updated every few days, for urrent general and
te hni al information.
It shows urrent usage statisti s (work units ompleted, CPU time) both in
total and broken down by various riteria (Internet domains, CPU types,
ountries, top 100 users, et .).
It shows urrent s ien e results: a map of the sky showing where data has
been analyzed, graphs of the distributions of spikes and Gaussians with
respe t to frequen y and hirp rate, and so on.
4.2.
Groups
The web site allows users to form \groups", typi ally of users within a ompany
or s hool. These groups are divided into nine ategories: ompanies (large,
medium, small) s hools (primary, se ondary, 2-year, university), government
agen ies, and lubs. Users an see the top 100 (ordered by usage) groups within
a ategory, and an sear h for groups by name.
The group me hanism has been very popular. Over 25,000 groups have
been formed. Group leaders, in order to in rease the standing of their group,
often a tively re ruit new SETIhome users; this has expanded our user base.
4.3.
Polls
In an e ort to learn more about our users, we ondu ted a poll on the web site.
This poll has questions in several areas. Some examples:
Demographi s: 92.7% of users are male.
Attitudes about SETI: 95.6% of users think that life exists outside Earth;
only 9.2% think that humans will dete t an extraterrestrial signal within
2 years.
Attitudes about distributed omputing: 38% of users leave their omputer
on 24 hours a day be ause of SETIhome. 34% run SETIhome on more
than one omputer.
7
5.
Con lusion
In its rst ve months, SETIhome has performed the largest omputation in
history. While it is not lear if other resear h proje ts will have the same mass
appeal as does SETI, this learly shows the viability of distributed omputing for other s ienti problems. We are investigating the adaptation of the
SETIhome infrastru ture for handling other problems.
Applied to radio SETI, distributed omputing allows greater sensitivity and
generality than dedi ated super omputers. However, limitations in the rates of
re ording and sending data limit the frequen y range that an be handled.
Referen es
Cobb, J., Lebofsky, M., Werthimer, D., Bowyer, S., & Lampton, M. 2000, this
volume
Cordes. J., Lazio, T., Joseph, W., & Sagan, C. 1997, ApJ, 487, 782
Staelin, D. H. 1969, Pro . IEEE, 57, 724
Sullivan, W., Werthimer, D., Bowyer, S., Cobb, J., Gedye, D., & Anderson, D.
1997, in Astronomi al and Bio hemi al Origins and the Sear h for Life
in the Universe, ed: Cosmovi i, Bowyer and Werthimer
Werthimer, D., Bowyer, S., Ng, D., Donnelly, C., Cobb, J., Lampton, M., &
Airieau, S. 1997, in Astronomi al and Bio hemi al Origins and the Sear h
for Life in the Universe, ed: Cosmovi i, Bowyer and Werthimer
8