Demo Notebook
Demo Notebook
Demo Notebook
Let’s take a look at the training data for the lunar dataset. In addition to the data itself, we
include a catalog that will tell you which events happen when in the data. The catalog includes
the name of the file, the absolute time, the relative time in seconds (relative to the start of the
file), the event ID (evid), and the type of moonquake. The types of moonquakes include impacts,
deep moonquakes, and shallow moonquakes. You do not have to worry about predicting the type
of moonquakes, that’s just fun information for you to know!
Note: For your prediction, feel free to include either the absolute time or relative time, just make
sure to mark it using the same header in the CSV file so we can easily score it!
1
75 xa.s12.00.mhz.1975-06-26HR00_evid00198 1975-06-26T03:24:00.000000
arrival_time
[4]: # If we want the value of relative time, we don't need to use datetime
arrival_time_rel = row['time_rel(sec)']
arrival_time_rel
[4]: 72060.0
[5]: 'xa.s12.00.mhz.1970-06-26HR00_evid00009'
2
data_cat = pd.read_csv(csv_file)
data_cat
3
What if you wanted to plot in absolute time instead? The operations are very similar, just with a
little extra datetime. It takes a bit longer, so we recommend working in relative time to start with!
csv_data = np.array(data_cat['velocity(m/s)'].tolist())
4
2.2.1 Alternatively: read the miniseed file corresponding to that detection
Same procedure as above, just using the miniseed file.
[10]: # The stream file also contains some useful header information
st[0].stats
[10]: network: XA
station: S12
location: 00
channel: MHZ
starttime: 1970-06-26T00:00:00.116000Z
endtime: 1970-06-27T00:00:03.436755Z
sampling_rate: 6.625
delta: 0.1509433962264151
npts: 572423
calib: 1.0
_format: MSEED
mseed: AttribDict({'dataquality': 'D', 'number_of_records': 1136,
'encoding': 'FLOAT64', 'byteorder': '>', 'record_length': 4096, 'filesize':
4653056})
[11]: # This is how you get the data and the time, which is in seconds
tr = st.traces[0].copy()
5
tr_times = tr.times()
tr_data = tr.data
# Start time of trace (another way to get the relative arrival time using␣
↪datetime)
starttime = tr.stats.starttime.datetime
arrival = (arrival_time - starttime).total_seconds()
arrival
[11]: 72059.884
# Plot trace
ax.plot(tr_times,tr_data)
# Mark detection
ax.axvline(x = arrival, color='red',label='Rel. Arrival')
ax.legend(loc='upper left')
6
There are multiple ways that we can do the absolute time using datetime, here is a simple way
using the .timedelta method
# Plot trace
ax.plot(tr_times_dt,tr_data)
# Mark detection
arrival_line = ax.axvline(x=arrival_time, c='red', label='Abs. Arrival')
ax.legend(handles=[arrival_line])
It’s completely up to you whether to work with the CSV file or the miniseed files. We recommend
working with the miniseed file as it’s a bit faster to run.
7
[14]: # Set the minimum frequency
minfreq = 0.5
maxfreq = 1.0
[15]: # To better see the patterns, we will create a spectrogram using the scipy␣
↪function
# It requires the sampling rate, which we can get from the miniseed header as␣
↪shown a few cells above
# Mark detection
ax.axvline(x = arrival, color='red',label='Detection')
ax.legend(loc='upper left')
ax2 = plt.subplot(2, 1, 2)
vals = ax2.pcolormesh(t, f, sxx, cmap=cm.jet, vmax=5e-17)
ax2.set_xlim([min(tr_times_filt),max(tr_times_filt)])
ax2.set_xlabel(f'Time (Day Hour:Minute)', fontweight='bold')
ax2.set_ylabel('Frequency (Hz)', fontweight='bold')
ax2.axvline(x=arrival, c='red')
cbar = plt.colorbar(vals, orientation='horizontal')
cbar.set_label('Power ((m/s)^2/sqrt(Hz))', fontweight='bold')
8
3 Sample short-term average / long-term average (STA/LTA) de-
tection algorithm
A STA/LTA algorithm moves two time windows of two lengths (one short, one long) across the
seismic data. The algorithm calculates the average amplitude in both windows, and calculates the
ratio between them. If the data contains an earthquake, then the short-term window containing
the earthquake will be much larger than the long-term window – resulting in a detection.
9
df = tr.stats.sampling_rate
# How long should the short-term and long-term window be, in seconds?
sta_len = 120
lta_len = 600
Next, we define the values of the characteristic function (i.e. amplitude ratio between short-term
and long-term windows) where we flag a seismic detection. These values are called triggers. There
are two types of triggers – “on” and “off”, defined as follows:
1. “on” : If the characteristic function is above this value, then a seismic event begins.
2. “off” : If the characteristic function falls below this value (after an “on” trigger), than a
seismic event ends.
[18]: # Play around with the on and off triggers, based on values in the␣
↪characteristic function
thr_on = 4
thr_off = 1.5
on_off = np.array(trigger_onset(cft, thr_on, thr_off))
# The first column contains the indices where the trigger is turned "on".
10
# The second column contains the indices where the trigger is turned "off".
# Plot seismogram
ax.plot(tr_times,tr_data)
ax.set_xlim([min(tr_times),max(tr_times)])
ax.legend()
Note: You do not have to worry about marking the end of the seismic trace (as you can see, even
for us it’s not very accurate!). For this challenge, all we care about is the start of the seismic
waveform.
11
detection_times.append(on_time_str)
fnames.append(fname)
detect_df.head()
time_rel(sec)
0 72201.207547
12
An earthquake is composed of the following types of waves (in order): pressure (P-wave), shear
(S-wave), and surface (Rayleigh and Love). For our challenge, we are only interested in identifying
the start of the earthquake. The IRIS dataset contains P-wave arrivals (onset of the P-wave at the
seismometer) for each earthquake. In order to get noise prior to the earthquake arrival, we pick
our data traces to span 101 seconds before to 60 seconds past the P-wave arrival:
As you can see from the output list, some of the earthquakes don’t record any earthquake data (3.4
Ml 2005-08-31) and others have an incorrect P-wave arrival time (4.0 Ml 2005-08-31). Make sure
to go through the earthquakes and remove those types of events from the waveform preview prior
to download. For output file type, choose miniseed to match the planetary data (SAC is probably
fine too, but the file sizes tend to be a bit bigger).
4.1 Thank you very much for being a part of this challenge! Good luck!!!
13