Academia.eduAcademia.edu

Anderson Bioast99

SETIhome: Internet Distributed Computing for SETI D. P. Anderson, D. Werthimer, J. Cobb, E. Korpela, M. Lebofsky Spa e S ien es Laboratory, University of California, Berkeley, California D. Gedye Vul an Northwest, In . W. Sullivan Department of Astronomy, University of Washington Abstra t. SETIhome is a radio SETI proje t that does its primary signal analysis using Internet- onne ted omputers. In the rst ve months of operation of SETIhome, a million people have parti ipated, and have ontributed 100,000 years of omputer time. The use of distributed omputing limits frequen y overage but allows greater sensitivity and generality in the signal analysis. 1. Introdu tion SETIhome is a radio SETI sky survey whi h, like SERENDIP IV (Werthimer, et al. 1997), gets data from a \piggyba k" re eiver at the Are ibo radio teles ope. Whereas SERENDIP analyzes this data primarily using a spe ial-purpose super omputer lo ated at the teles ope, SETIhome distributes the data through the Internet to hundreds of thousands of personal omputers. This approa h provides a tremendous amount of omputing power but limits the amount of data that an be handled. Hen e SETIhome overs a relatively narrow frequen y range (2.5 MHz) but sear hes for a wider range of signal types, and with better sensitivity, than other SETI sky surveys to date. SETIhome was laun hed on May 17, 1999. In its rst ve months, it has attra ted over a million parti ipants. Together they have ontributed over 100,000 years of omputer time, making SETIhome the largest omputation ever performed. 2. S ien e Design SETIhome is a SETI sky survey at the National Astronomy and Ionospheri Center's 305 meter radio teles ope in Are ibo, Puerto Ri o. It shares the piggyba k re eiver used by SERENDIP IV (Werthimer et al. 1997), but its sear h spa e is roughly orthogonal to that of SERENDIP IV; although SETIhome has 1 1/40 the frequen y overage of SERENDIP IV, its sensitivity is ten times better. The SETIhome sear h also overs a ri her variety of signal bandwidths, drift rates, and time s ales than SERENDIP IV or any other SETI program to date. Primary data analysis, done using distributed omputing, omputes power spe tra and sear hes for \ andidate" signals su h as spikes, Gaussians, and other signal types. Se ondary analysis, done on the proje t's own omputers, reje ts RFI and sear hes for repeated events within the database of andidate signals. 2.1. Re eiver and Data Re ording SETIhome uses a dedi ated at feed and ryogeni re eiver mounted on the arriage house of the Are ibo teles ope. The feed provides a single linear polarization with a gain of 3K/Jy and a 0.1 degree beam width. System temperature is 45K. The SETIhome survey overs 28% of the sky (de linations ranging from +1 to +35 degrees) with a sensitivity of 3E-25 W/m2. SETIhome observations will span a total of two years, during whi h most of the sky will be observed two or three times. Observations began in O tober 1998. SETIhome overs a 2.5 MHz bandwidth entered at the 1420 MHz Hydrogen line. The re eiver output is down- onverted with quadrature analog mixers and lters, then digitized and onverted to baseband by a digital quadrature mixer and a pair of 256 tap nite impulse response low pass lters. The resulting 2.5 MHz band is re orded ontinuously on 35 Gbyte DLT IV tapes with two bit omplex sampling, along with data on teles ope oordinates, time and engineering monitors. Tapes are mailed to UC Berkeley for analysis; the omplete sky survey requires 1100 tapes to re ord a total of 39 terabytes of data. We expe t to re ord high quality data 65% of the time, observing ea h of the one million beams two or three times during the two year program. It is important to observe ea h beam several times be ause sour es may s intillate (Cordes, 1991) or have short duty y les, and most of our robust dete tion algorithms require multiple dete tions. SETIhome is able to olle t useful data whenever the teles ope is stationary or the Gregorian feed is tra king a sour e. When the Gregorian system tra ks a sour e, the SETIhome feed is moving at 1 to 2 times sidereal rate on the sky and a sour e remains in the beam for 12 to 24 se onds. When the teles ope is stationary, a sour e is in the beam for 24 se onds. 2.2. Primary Data Analysis SETIhome data tapes from the Are ibo teles ope are divided into small \work units" as follows: the 2.5 MHz bandwidth data is rst divided into 256 subbands by means of a 2048 point fast Fourier transform (FFT) and 256 eight point inverse transforms. Ea h work unit onsists of 107 se onds of data from a given 9,765 Hz sub-band. Work units are then sent over the Internet to the lient programs for the primary data analysis. Be ause an extraterrestrial ivilization's signal has unknown bandwidth and time s ale, the lient software sear hes for signals at 15 o tave spa ed bandwidths ranging from 0.075 Hz to 1220 Hz, and time s ales from 0.8 mSe to 13.4 se onds. The rest frame of the transmitter is also unknown (it may be on a planet that is rotating and revolving), so extraterrestrial signals are likely to be drifting in frequen y with respe t to the observatory's topo entri referen e frame. 2 Be ause the referen e frame is unknown, the lient software examines 6761 different Doppler a eleration frames of rest (dubbed \ hirp rates"), ranging from -10 Hz/se to +10 Hz/se . At ea h hirp rate, peak sear hing is done by omputing non-overlapping FFTs and their resulting power spe tra. FFT lengths range from 8 to 131,072 in 15 o tave steps. Peaks greater than 22 times the mean power are re orded and sent ba k to the SETIhome server for further analysis. Besides sear hing for peaks in the multi-spe tral-resolution data, SETIhome also sear hes for signals that mat h the teles ope's Gaussian beam pattern. Gaussian beam tting is omputed at every frequen y and every hirp rate at spe tral resolutions ranging from 0.6 to 1220 Hz (temporal resolutions from 0.8 mS to 1.7 se onds). The beam tting algorithm attempts to t a Gaussian urve at ea h time and frequen y in the multi-resolution spe tral data, of the form: P = B + Ae where: P B A t t0 b ( ( (t t0 ) 2 ) ) b = predi ted power = baseline power = peak power = time = time of Gaussian peak = half power beamwidth B , A, and t0 are free parameters in the t, but the beamwidth is known, al ulated from the slew rate of the teles ope beam for ea h work unit. Gaussian ts whose A/B ex eeds 3.2 and whose hi-squared < 10 are reported by the lient software to the server for se ondary analysis. We plan to extend the primary analysis to sear h for pulsed signals using the Fast Folding Algorithm (FFA) (Staelin 1969) and to sear h for regularly-spa ed triplet peaks. 2.3. Se ondary Data Analysis Most of the signals found by the lient programs turn out to be terrestrial based radio frequen y interferen e (RFI). We employ a substantial number of algorithms to reje t the several types of RFI (see Cobb et al, these pro eedings). After the RFI is reje ted, we sear h the remaining data set for multiple dete tions in any referen e frame, giving higher weights to drifting or pulsed signals, those that repeat in the bary entri frame, that mat h the antenna beam pattern, or dete tions oin ident with newly dete ted planets, nearby stars (from the Gliese atalog) or globular lusters (again, details in Cobb, et al). We ompare andidates signals with SERENDIP IV data, and will follow up interesting andidates with dedi ated observations. 3. Software Design The SETIhome software an be divided into two parts: the \ lient", the program that runs on volunteer omputers, and the \server", whi h runs on omputers at UC Berkeley. 3 3.1. Server The indexing and sear hing apabilities of a relational database are riti al for the huge volumes of information handled by SETIhome. The SETIhome server uses a database ontaining tables for:         Users (name, email address, work ompleted). A ounting re ords storing the amount of work done. Country, CPU type, Internet domain, and so on. Tapes pro essed. Work units, whether on disk or not. Results (per work unit). Platforms (types of lient omputers). Versions of the lient software. For performan e, the database is divided between two servers, ea h running Informix, a ommer ial database server. The fun tions of the server are divided among several programs (see Figure 1):  Data Server: This program ommuni ates with lients. It sends work units, a epts results, and handles requests to reate new user a ounts. It must handle about 10-15 requests per se ond, some of whi h an take up to a minute to omplete, so several hundred opies of the program are run on urrently.  Splitter: This program onverts raw data into work units, as des ribed in Se tion 2.2. Splitting is slower than real time on a Spar Ultra 10, so we run the splitter on several ma hines.  Disk Cleaner: This program deletes work units, making room for new work units. It deletes work units for whi h a result has been re eived, and if disk spa e is low it also deletes work units that have been sent several times.  A ountant: This program s ans at les des ribing results returned, and updates a ounting re ords. It maintains a memory a he of frequentlya essed re ords, minimizing database traÆ .  CGI program: This program, invoked from an Apa he web server, provides database-driven features on the web site, su h as Groups and Polls.  Web page generator: This program generates frequently-a essed dynami web pages su h as Totals and Country Totals (generated every hour) and Group pages (generated every 24 hours). All these programs are written in C++ using ESQL (embedded SQL) for database a ess. 4 Figure 1. The SETIhome server system. Ea h re tangle represents a omputer. Shaded ovals represent programs developed by SETIhome. The ow of radio teles ope data is shown with heavy lines. 5 Figure 2. The lient software ar hite ture. Platform-dependent parts are in dotted boxes. SETI-spe i parts are in shaded boxes. 3.2. Client The lient program is ar hite ted so that it an be easily ported to many platforms, and so that it an be retargeted for omputations other than SETIhome (see Figure 2). On Windows and Ma intosh the program a ts as a s reensaver: it runs only when the user is not a tive. When the program runs, its main loop repeatedly attemps to onne t to the server, get a work unit, pro ess it, and return the result. While analyzing data, the program periodi ally writes a \ he kpoint" le to disk, so that it will resume from the same pla e if the omputer is stopped and restarted. The lient program has been ported to about 50 platforms. Two of these (Windows and Ma intosh) required platform-spe i programming. The other platforms (Linux and other forms of UNIX, BeOS, OS/2, VMS) are POSIX ompliant and support the Gnu development environment; they all ompile from the same sour e ode. To ensure that ea h version is numeri ally orre t, we test ea h one on a \referen e work unit" and validate its result before making it available for download. 4. User Involvement The SETIhome proje t was announ ed early in 1998 and laun hed on May 17, 1999. During the interim our web site allowed people to sign up for noti ation. 6 We re eived over 400,000 su h requests. Within 2 weeks of the laun h there were 200,000 users, and the number has steadily in reased, to a urrent total of 1,300,000. Users ome from 224 ountries, and about 50% of the users are from outside the U.S. 4.1. Web site The SETIhome web site (http://setiathome.berkeley.edu) serves to attra t SETIhome users, to edu ate them, and to maintain their interest and involvement in the proje t. The web site has many fun tions:  It allows users to download the lient program.  It has edu ational material about SETI in general and SETIhome in par- ti ular. Separate versions are aimed at a general and s ienti audien es.  It has a Frequently Asked Questions (FAQ) se tion for ommon user prob- lems, and a bug report submission form.  It has News se tions, updated every few days, for urrent general and te hni al information.  It shows urrent usage statisti s (work units ompleted, CPU time) both in total and broken down by various riteria (Internet domains, CPU types, ountries, top 100 users, et .).  It shows urrent s ien e results: a map of the sky showing where data has been analyzed, graphs of the distributions of spikes and Gaussians with respe t to frequen y and hirp rate, and so on. 4.2. Groups The web site allows users to form \groups", typi ally of users within a ompany or s hool. These groups are divided into nine ategories: ompanies (large, medium, small) s hools (primary, se ondary, 2-year, university), government agen ies, and lubs. Users an see the top 100 (ordered by usage) groups within a ategory, and an sear h for groups by name. The group me hanism has been very popular. Over 25,000 groups have been formed. Group leaders, in order to in rease the standing of their group, often a tively re ruit new SETIhome users; this has expanded our user base. 4.3. Polls In an e ort to learn more about our users, we ondu ted a poll on the web site. This poll has questions in several areas. Some examples:  Demographi s: 92.7% of users are male.  Attitudes about SETI: 95.6% of users think that life exists outside Earth; only 9.2% think that humans will dete t an extraterrestrial signal within 2 years.  Attitudes about distributed omputing: 38% of users leave their omputer on 24 hours a day be ause of SETIhome. 34% run SETIhome on more than one omputer. 7 5. Con lusion In its rst ve months, SETIhome has performed the largest omputation in history. While it is not lear if other resear h proje ts will have the same mass appeal as does SETI, this learly shows the viability of distributed omputing for other s ienti problems. We are investigating the adaptation of the SETIhome infrastru ture for handling other problems. Applied to radio SETI, distributed omputing allows greater sensitivity and generality than dedi ated super omputers. However, limitations in the rates of re ording and sending data limit the frequen y range that an be handled. Referen es Cobb, J., Lebofsky, M., Werthimer, D., Bowyer, S., & Lampton, M. 2000, this volume Cordes. J., Lazio, T., Joseph, W., & Sagan, C. 1997, ApJ, 487, 782 Staelin, D. H. 1969, Pro . IEEE, 57, 724 Sullivan, W., Werthimer, D., Bowyer, S., Cobb, J., Gedye, D., & Anderson, D. 1997, in Astronomi al and Bio hemi al Origins and the Sear h for Life in the Universe, ed: Cosmovi i, Bowyer and Werthimer Werthimer, D., Bowyer, S., Ng, D., Donnelly, C., Cobb, J., Lampton, M., & Airieau, S. 1997, in Astronomi al and Bio hemi al Origins and the Sear h for Life in the Universe, ed: Cosmovi i, Bowyer and Werthimer 8