Atish PHD Thesis NEW
Atish PHD Thesis NEW
Atish PHD Thesis NEW
GRADUATE COLLEGE
A DISSERTATION
Degree of
DOCTOR OF PHILOSOPHY
By
ATISH ROY
Norman, Oklahoma
2013
LATENT SPACE CLASSIFICATION OF SEISMIC FACIES
BY
_________________________________
Dr. Kurt J. Marfurt, Chair
_________________________________
Dr. Roger M. Slatt
_________________________________
Dr. G. Randy Keller
_________________________________
Dr. Deepak Devegowda
_________________________________
Dr. Vikram Jayaram
© Copyright by ATISH ROY 2013
All Rights Reserved.
To my beloved late dida
ACKNOWLEDGMENTS
To earn a PhD in Geophysics is like a dream come true for me. It has been an
eventful three and half years of my life at University of Oklahoma. I have reached this
from my supervisor Dr. Kurt Marfurt. Initially when I started under Dr. Marfurt I was
not very comfortable in computer programming. However, once he wrote the PCA
program, wrote my name as the author of the program, and made everybody
understand that I was the author of the program. This gave me tremendous
encouragement to understand how the code works and it was how I started to write
programs for my research. I think my self very lucky to be his student. He was never
strict with me and always encouraged me visit home frequently and spend time with
my family back in India. This gave me further encouragement to work hard when I
returned. Dr. Marfurt is one of the most wonderful persons I have ever met in my life.
complicated part of my research (the GTM technique), and because of this quick
understanding of the GTM theory, formulation of GTM into AASPI took me just a
couple of months. He also sat with me for long to make my research writings more
helped whenever I faced problems in the codes that I was unable to solve. He has been
very helpful to me. He gave me some most important new ideas on how to explore the
probabilistic GTM technique, which is one of the important parts of my PhD research.
Also, I want to thank Marcilio Matos who was one of my initial mentors to
understand the SOM algorithm. My PhD research is based on his initial works.
I would also like to thank Dr. Roger Slatt who is also one of my committee
members for his important comments and feedbacks on my research. I learnt a lot of
reservoir characterization from his courses. I also thank my other committee members
Dr. Deepak Devegowda for giving me the chance to work with his dataset and provide
me the opportunity to present my works to one of his industry sponsor. I would also
like to thank Dr. Randy Keller for being one my committee member. He is always nice
I would like to thank the entire staff of ConocoPhillips School of Geology and
Geophysics: Donna S. Mullins, Nancy Leonard, Adrianne Fox, Teresa Hackney and
v
I also want to thank the AASPI Consortium sponsors, all the student members
School of Geology and Geophysics for their help and support. I also like to thank my
friends: Roderick, Ben, Alfredo, Toan, Oswaldo, Brad, and Araceli at who helped me
in various ways in my research. I would also like to thank my close friends Avinash
who has been here since I came to OU, helped me in different ways and is my very
good partner for computer games. I also like to thank Sumit for his bhokali help in
various ways. I would like to acknowledge the support of some of my other friends
I will not be able to express in few words how the love, affections and support
I would also like to thank deeply my love Tora for staying beside me and have
faith in me always. We had several ups and downs in our relationship, however we
endured everything together and hopefully we will be together in the coming years.
Her sweet smile and love gives me constant encouragement for my works.
chotomoni, bondhu, puchke, bonti, mejomoni and meso for their everyday wish, love
and blessings of dadubati and thama, that made me succeed and feel that I am never
alone. Finally, I would like to thank my late dida without whose love and blessings I
vi
TABLE OF CONTENTS
ACKNOWLEDGMENTS ............................................................................................... iv
CHAPTER 1 ...................................................................................................................... 1
REFERENCES ......................................................................................................... 4
CHAPTER 2 ...................................................................................................................... 5
INTRODUCTION .................................................................................................... 5
CHAPTER 3 .................................................................................................................... 10
ABSTRACT............................................................................................................ 10
INTRODUCTION .................................................................................................. 12
APPLICATION ...................................................................................................... 19
vii
SOM on a 3D seismic dataset from the Northern Gulf of Mexico ......................... 22
DISCUSSIONS ....................................................................................................... 23
CONCLUSIONS .................................................................................................... 27
ACKOWLEDGEMENTS....................................................................................... 29
RFERENCES .......................................................................................................... 43
CHAPTER 4 .................................................................................................................... 46
ABSTRACT............................................................................................................ 46
INTRODUCTION .................................................................................................. 47
METHODOLOGY ................................................................................................. 51
APPLICATION ...................................................................................................... 56
The Mississippi “Lime” – Tight Limestone, Tight layered Chert and lime, Porous
Tripolite, and Fractured chert ................................................................................. 56
DISCUSSIONS ....................................................................................................... 63
CONCLUSIONS .................................................................................................... 66
viii
LIST OF FIGURES ................................................................................................ 68
CHAPTER 5 .................................................................................................................... 87
ABSTRACT............................................................................................................ 87
INTRODUCTION .................................................................................................. 88
APPLICATIONS .................................................................................................... 99
ABSTRACT.......................................................................................................... 133
ix
INTRODUCTION ................................................................................................ 134
Unsupervised facies model of the EOC-10 and EOC-30 reservoir units ............. 144
Correlating the production data with the seismic facies the facies model ............ 146
xi
LISTS OF FIGURES
Figure 3.3. 2D map of the PVs in latent space (A) before and (B) after the training.
Around the “Winner PV” the larger green circle represents a neighborhood at an early
iteration while the smaller green circle represents a neighborhood at a later iteration.
After the training the PVs form three clusters, thus classifying the three different types
of input data properly. .................................................................................................. 32
Figure 3.5. A synthetic data volume 30ms (16 – 2ms samples) having four different
waveforms consisting of 50 inlines by 26 crosslines. Noise level is 0.1 of the RMS
amplitude of the signal. (A) Map view and (B) Vertical slice along line QQ’. (C) The
four noise-free waveforms used to generate the synthetic volume. ............................. 33
xii
Figure 3.6. (A) Initial set of PVs. (B) Training and HSV coloring of the projected
Prototype Vectors (PVs) in latent space. Although more than 300 prototype vectors
were used, the resulting facies map clusters into four different colors corresponding to
the four different waveforms in the input dataset. Arrows indicate the location of each
of the noise-free synthetic waveforms A, B, C, and D. Almost all of the clusters fall on
top of each other. Scatter away from the clusters such as the dark blue ‘cluster’ are
associated with stretch in the latent space. After clustering, the PVs are colored with
the HSV colorscale (left) described by Equations 5 and 6........................................... 34
Figure 3.7. (A) The seismic facies map showing four different colors corresponding to
the four different waveforms that constitute the synthetic dataset. (B) The waveforms
are superposed on the 2D colormap showing the color assignment to the different
waveforms. (C) These four different waveforms are explicitly shown with
corresponding colors assigned to it. Note the transitional waveforms along the
boundaries which do not represent any data, but provide a continuum in the latent
space. Rather than choosing the final number of clusters (in this case, four) as
described by Matos et al. (2007), we simply let the interpreter visually cluster
according to their color perception (Matos et al, 2009). .............................................. 35
Figure 3.8. (A) The location of the study area within the northern salt minibasin
province of Gulf of Mexico (Okuma et al, 2000). (B) Regional sedimentary fairway
map with the survey location. (C) Some of the deepwater depositional features
highlighted from Posamentier et al, 2003. ................................................................... 36
Figure 3.9. An arbitrary seismic section across the survey showing the different zones
(identified by Ferrero, 2007) used for analysis with the SOM clustering. Zone 1 has
Mass transport complex (MTC) flow-A. A more chaotic MTC flow-B is in zone 2. A
Basin floor fan sequence is highlighted in the zone 3. Zone 4 is also a regional MTC
with some submarine channel complex visible............................................................ 37
xiii
Figure 3.10. An arbitrary unsupervised seismic facies section co-rendered with the
seismic amplitude section across AA’. The four zones are analyzed independently.
The MTC flow in the zone 1 is more continuous (blue facies). The zone 2 consists of
chaotic MTC (magenta), shale (yellowish green) and buried sand bodies (dark blue
facies). Zone 3 is a basin floor sequence formed by a complex set of distributaries.
Zone 4 is another regional MTC with some submarine channel complex visible in the
north east corner of the survey. .................................................................................... 38
Figure 3.11. Unsupervised seismic facies horizon probe showing different stratal
slices in the four different zones starting from the lowest sequence. In the below
corner the location of the stratal slice is marked on the vertical section. (A) The strata
in zone 1 showing a relatively continuous Flow-A (blue facies) from north east of
debris and MTC being encroached by a more chaotic flow-B (magenta) in the north
and south. The flow-B is more erosive in nature and is more related to slope failure to
salt tectonics. (B) The strata in zone 2 within the chaotic MTC flow-B. The chaotic
nature of the MTC can be seen in this zone. (C) The chaotic MTC in zone 2 caused
due to slope failure and salt tectonics have eroded large sand bodies within itself as
shown by the dark blue colored facies. With their confined nature and high seismic
amplitude these sand bodies may have hydrocarbon charged. (D) Higher up in the
zone there is relatively some less chaotic MTC flow. (E) In the zone 3 the basin floor
fan sequence can be properly visualized from this facies volume. The strata shows a
complex set of distributaries and basin fans in the west (blue facies) with pale green
background of pelagic shale. (F) Moving up in the zone 3 the later deposited basin
floor fan sequence and other set of distributaries are interpreted in the east. .............. 40
Figure 3.12. The zone 4 forms another MTC complex but is more regional in nature.
A younger feeder channel in the north east corner can be easily interpreted in this
strata from the unsupervised classified volume. AA’ is a vertical section of the
unsupervised volume overlayed on the seismic amplitude. The feeder distributary
channel complex is highlighted in this vertical section with a blue arrow. ................. 41
xiv
Figure 3.13. Extracting geobodies from the unsupervised classified volume. Although
coloring of the unsupervised classified volume was done in the SOM grid space using
a 2D colorbar, the coloring in most of the interpretation software is done by a 1D
colorscale. Modifying the transparency we can highlight certain seismic facies. The
blue facies from the Figure 3.11E showing mostly the BFF and other similar
depositional seismic facies. (A) The blue facies is made opaque and the rest
transparent in the 1D colorbar shown below. (B) With this single colored facies an
automatic geobody extraction is done to create different geobodies which can be
quantified as highlighted in the table. .......................................................................... 42
Figure 4.2. Workflow for the unsupervised SOM classification of the multi-attribute
dataset. .......................................................................................................................... 69
xv
Figure 4.3. (a) The general stratigraphic column with the Mississippian tripolitic chert
interval is present at the unconformity between the Pennsylvanian and Mississippian
age. (b) The location of the seismic survey from Osage County, Oklahoma, within the
Cherokee Platform province. The Cherokee Platform is bounded on the west by the
Nemaha Uplift and to the east by the Ozark Uplift. Matson (2013) subdivides the
Mississippian in this study area into the tight St. Joe limestone, and the Osage A and
Osage B levels. ............................................................................................................. 70
Figure 4.4. Two different models for the diagenesis the Mississippian Chert
developed from weathered and/or eroded limestone (after Rogers, 2001 using
nomenclature of Matson, 2013). In the reef model the chert formation at the reef
margin. In the Breccia model the chert forms as sub-aerially exposed breccia deposits.
(a) Stage-one for both the settings is the silica replacement of calcite in a submarine
environment. The siliceous limestone and the layered chert and limestone
(corresponding to Osage B formation) are formed in this environment. (b): Stage-two
digenesis is a result of erosion and uplift, infiltration by meteoric water. This results in
flushing of the rock and dissolution of the remaining calcite by low-pH fluid and
absence of no new silica precipitation. This results in moldic porosity and vuggy
porosity (yellow arrows) that is common in the low-density high-porosity tripolitic
chert (corresponding to Osage A). ............................................................................... 72
Figure 4.5. Structural attributes used in unsupervised analysis (workflow 1). (a)
Coherent energy, (b) Coherence, (c) dip magnitude and (d) reflector convergence.
Blue arrows indicate discontinuities and structural features that will form their own
clusters in Figure 4.6. ................................................................................................... 73
xvi
Figure 4.6. (a) The result of multi-attribute unsupervised seismic facies classification
(workflow 1) within the Mississippi lime using the four “structural” attribute volumes
shown in Figure 4.5 as input. (b) The projections of the 256 prototype vectors
(clusters) plotted in the 2D SOM grid space clump into four main clusters
magenta/dark pink, red/orange, yellow/green or cyan in color. The blue cluster is one
of the outliers. A posteriori analysis from the borehole image log and fracture density
diagram of Well A shows magenta or dark pink color correlate to the tight limestone
facies (St. Joe Limestone) (indicated by the magenta arrows) while orange and red
correlate to the dense layered chert and limestone facies (Osage B) (indicated by the
red arrows). Similar a posteriori analysis from Well B shows that light green and
yellow correlate to tripolitic chert facies (Osage A) (indicated by the yellow arrow).
The forth cluster blue and cyan correlates to areas of low coherence and high dip
magnitude, which we interpret to be areas of faults, fractures and karst (blue arrows as
in Figure 4.5). ............................................................................................................... 74
Figure 4.7. The input attribute volumes used in workflow 2: (a) coherent energy, (b)
spectral bandwidth, (c) GLCM energy, (d) The GLCM entropy, and (e) GLCM
homogeneity. Our implementation of GLCM measures lateral variations of reflectivity
along structure. In contrast we use bandwidth over spectral attributes to measure the
vertical variations in texture. The GLCM energy is a measure of the GLCM matrix
energy, not the square of the seismic amplitude. GLCM energy increases if amplitude
values repeat (e.g. as a homogeneous, striped, or checkerboard pattern). GLCM
entropy is high if the amplitude values are random. GLCM homogeneity is high if
amplitude values are smoothly varying........................................................................ 75
xvii
Figure 4.8. The result of multi-attribute unsupervised seismic facies classification
from workflow 2 within the Mississippian lime zone using the “texture” attribute
volumes shown in Figure 4.7. (b) The projections of the 256 prototype vectors
(clusters) plotted in the 2D SOM grid space clump into mainly into four different
clusters. The dark red/brown projection of the PV (as an outlier) forms the fifth facies
type. A posteriori analysis from the borehole image log and fracture density diagram
of Well A shows dark red and brown correlate to the tight cherty St. Joe’s limestone
facies (indicated by the brown arrows) while light pink and violet correlate to the
layered chert and lime facies (Osage B formation) (indicated by the pink arrows).
Similar a posteriori analysis from Well B shows that light green and yellow correlate
to tripolitic chert facies (Osage A formation) (indicated by the yellow arrow). .......... 76
Figure 4.9. Subvolumes used to extract three average data-vectors, which will be used
in supervised multi-attribute seismic facies classification. The three sub-volumes have
been chosen on the basis of the image log interpretation from White (2013). Hot colors
indicate high fracture densities based on the two wells. (a) Two sub-volumes are
considered around Well A. The blue-violet Facies 1 corresponds to the tight/non-
porous limestone (St Joe’s Limestone). The yellow Facies 2 corresponds to the
fractured dense chert (Osage B). (b) The sub-volume considered around Well B. The
light green facies 3 corresponds to tripolitic chert facies and the region having the
highest fracture density (Osage A). (c) Map view of the location of the two wells in
the survey. .................................................................................................................... 77
xviii
Figure 4.10. The three average waveforms extracted from the three sub-volumes
around the wells with the images of the borehole image logs for each of the facies
type. (a) The borehole image log of Well A corresponds to the tight St. Joe limestone
formation. The corresponding facies type is defined to be Facies-1 (violet color) (b)
The borehole image log within the Osage B formation shows many natural fractures
and interbeded chert and lime. The corresponding facies type is defined to be Facies-2
(yellow color). The average data-vectors selected from the Well A have similar
patterns with high coherent energy, high energy and high GLCM homogeneity, narrow
bandwidth and low GLCM entropy. However, the amplitude values of the attributes
are less in Facies-2 indicating that this facies is fractured and layered chert not as tight
and dense as the Facies 1. (c) The image log shows the presence of low density
diagenetically altered tripolitic chert (corresponding to the Osage A formation). The
average multi-attribute data-vectors selected from the Well B (Facies 3) has low
coherent energy, spectral amplitude, GLCM homogeneity and GLCM energy values
and high GLCM entropy value, which is a seismic signature of the low density, high
porous tripolitic chert. .................................................................................................. 79
Figure 4.11. The multi-attribute supervised seismic facies volume, with the facies
defined in Figure 4.10. The violet seismic facies (Facies-1) corresponds to the tight St.
Joe’s limestone, the yellow seismic facies (Facies 2) corresponds to fractured chert
with interbeded chert and lime regions and the green seismic facies (Facies 3)
corresponds to the tripolitic chert rich zones. The facies, which are not similar to these
three facies, are color-coded gray. ............................................................................... 80
xix
Figure 4.12. The results of post-stack P-impedance inversion. (a) The less dense
highly porous tripolitic chert regions have low impedance (red and yellow colors). The
green areas correspond to the fractured or layered dense chert and lime regions. The
high impedance regions (cyan-blue violet) correspond to the dense chert and cherty
limestone. This result when compared with unsupervised seismic facies analysis
workflow 2 shows that the dense limestone and chert rich zones correspond to high
impedance regions and the low impedance corresponds to the tripolitic chert rich
areas. Similarly, for the fractured layered chert and lime corresponds to the regions
with medium to low impedance. (b) The P-impedance volume is co-rendered with the
positive principal curvature k1. When compared with the multi-attribute analysis with
the structural attributes (workflow 1) it shows similar discontinuities/fractures, faults
and karst features accompanied with the different chert facies. .................................. 81
Figure 5.1. Non-linear mapping of the latent space to the data space. The prior
distribution consists of latent space variables ordered on a regular grid (blue circles)
residing in an L-dimensional latent space. In this figure L=2. consists of a regular
array of J non-linear basis functions. With the linear combination of these basis
functions the latent space (blue circles) are mapped to the data-space (blue spheres)
where they form a 2D non-Euclidean manifold, S, such that each node uk is mapped to
a corresponding vector mk in data-space ................................................................... 113
Figure 5.2. Generating Gaussian probability density functions (PDF) to represent the
data vectors. The Gaussians are centered about mk lying in the 2D non-Euclidean
manifold S. In general vectors are scattered about S. We assume the misfit represented
by a suite of PDFs. Specially we assume an isotropic Gaussian noise distribution
centered at mk and having a variance of 1/β (equation 3), defines each data-vector xn.
The final probability density function in the data space is obtained by integrating the
Gaussian PDFs for all mk where k=1, 2, ….K grid points, where P(k) is the prior
distribution at each node in the latent space............................................................... 114
xx
Figure 5.3. Workflow for data visualization in GTM. After training the new estimated
parameters Wnew and βnew, the new posterior probabilities representing the data-
vectors can be obtained using Bayes’ theorem and is given by .
These posterior distribution “responsibility” values can be plotted for all the grid
points, in the 2D latent space. The mean location will assign the value
to be the weighted average of the posterior distribution values and will in general fall
between neighboring values of . The mode will assign the value to be
the location of the greatest posterior distribution value in the 2D latent space and will
always correspond to a discrete gridded value of . ................................................ 115
Figure 5.4. The flow chart for generative topographic mapping workflow. ............. 116
Figure 5.5. The Well Locations for the unconventional shale play of an area roughly
1000 km2. Colors correlate to scaled EUR ranging between 0 and 1. 137 wells are used
to train and eight wells used to validate the GTM. In general the reservoir properties of
the shale are laterally smoothly varying. However near the center of the survey note
the proximity of high- and low-EUR wells. We attribute these variations to difference
in completion practices, including stimulations of geohazard. .................................. 116
Figure 5.6. The mean posterior distribution map of the “responsibilities” of the data in
the 2D Latent space. The mean location of a data-vector is the weighted average of the
posterior distribution values and will in general fall in between neighboring values of
. (a) Initial and (b) final distribution of the posterior mean projections of all 137
well data onto the latent space after 100 iterations. (c) The PDF of a representative
well vector projected onto the latent space. While only the mean value is plotted in (a)
and (b), the full distribution can be used to better quantify risk. The plot is color-coded
by the scaled EURs. Note the two different clusters corresponding to the good (red)
and the bad EURs (blue) in the final mean distribution map. .................................... 117
xxi
Figure 5.7. The same GTM classification shown in Figure 6, but now using the mode
of the posterior probability projections of the data. (a) Initial and (b) final distribution
of the posterior mode projections of all 137 well data onto the latent space after 100
iterations. (c) The data-vectors are projected onto the most likely grid points (grid
points with the most posterior probability value) and will always correspond to a
discrete gridded value of . Note that in this mode projection the posterior
probability values are assigned only to the grid locations. The plot is color-coded from
low to high EUR values. The latent space shows a more orderly separation between
the good, moderate and the bad EURs for the final iteration in the mode distribution
map. ............................................................................................................................ 118
Figure 5.8. (a) Some of the wells from the mean posterior probability distribution map
are analyzed as highlighted. The upper corner corresponds to wells with high EUR and
the bottom corner corresponds to wells with low EUR. (b) The 15 normalized well
parameters for each of the two set of wells are averaged and are plotted forming two
sets of averaged data-vectors. The red corresponds to the average data-vector for wells
with good EURs and blue is the average data-vector for wells with low EURs. Note
that mostly the well parameters differ radically for the two cases. The wells with good
EURs have higher proppant, sand volume, less cluster spacing, higher fracture stages,
more perforations, and higher porosity, whereas the wells with bad EURs have
opposite characters (highlighted with arrows). .......................................................... 119
Figure 5.9. EUR prediction through GTM modeling. (a) The posterior probability of
the data-vector from the nth well. (b) The EUR for the nth well, En is multiplied with
the posterior projection values onto 2D latent space in (a). The result gives an EUR
map for 1 well (Figure 5.9b). Then, we can formulate a weighted sum of the EUR at
each grid point k in the latent space for all the wells (given by Equation 9) and form
the EUR “map” over the latent space. Note the high correlation of the latent space
with the EURs. (b) Plot showing the predicted EUR from the EUR property map in (a)
vs. the true EUR for the 8 validation wells not used in training the GTM. The least
square fit shows and excellent correlation. ................................................................ 120
xxii
Figure 5.10. (a) The map of Texas highlighting the Ft. Worth basin and other major
basins and uplifts. The Ft. Worth basin is bounded by the Muenster Arch to the
northeast, the Ouachita Thrust Front to the east, the Bend Arch to the west, the Red
River Arch to the north, and the Llano Uplift to the south. (b) Stratigraphic section
including the gamma ray and resistivity logs showing the major units. The Barnett
Shale sits on an angular unconformity above the Cambrian to upper-Ordovician-age
carbonates of the Ellenburger Group and Viola Formation and is overlain by the
Pennsylvanian-age Marble Falls Limestone, and is divided into Upper and Lower units
by the Forestburg Limestone (c) Cross-section of the stratigraphy of Ft. Worth basin
from Montgomery et al. (2005). ................................................................................. 121
Figure 5.12. QC of the GTM analysis after performing 100 iterations. (a) The plot of
the inverse variance (β), stabilizing as the number of iterations increases. (b) A 2D
histogram of the mean projections of posterior probability of the dataset defined above
and below by the Lower Marble Falls and the Viola limestone respectively. ........... 123
xxiii
Figure 5.13. Well-probes for Well A generated by cross-plotting the two GTM
projection volumes. (a) The 2D histogram generated from the cross-plot of the two
GTM projection volumes. (b) Eight user-defined polygons drawn around the clusters
seen in (a). (c) Well-probe data colored by the clusters selected in (b). The Upper
Barnett, the Lower Barnett exhibit a different cluster composition and are in turn
different from the intervening Forestburg Limestone (in gray). (d) The microseismic
events from this well are plotted along with the well-probe. Note the microseismic
events are more localized in the red and light green facies and misses the brown facies,
thus the red and light green facies are interpreted as brittle and brown facies to be
ductile. The results are consistent with the 2nd order brittle-ductile couplets proposed
by Slatt et al, (2011). .................................................................................................. 124
Figure 5.14. Well-probes generated for a suite of horizontal wells B1, B2, B3, and B4,
generated cross-plotting the two GTM projection volumes. (a) The 2D histogram
generated from the cross-plot of the two GTM projection volumes. (b) Nine user-
defined polygons drawn about clusters seen in (a). (c) The well-probe data colored by
the nine clusters. The Forestburg Limestone appears as gray and divides the Upper
Barnett from the Lower Barnett Shale. (d) The microseismic events from these wells
are plotted along with the well-probe. Similar to Well A the microseismic events are
more localized in the red, pink, and the light green facies and are interpreted as brittle
facies and the brown facies has less microseismic events and are interpreted as ductile
facies. Here also the results are consistent with the 2nd order brittle-ductile couplets
(Slatt et al., 2011). ...................................................................................................... 125
Figure 5.15. (a) Four different zones within the Barnett Shale selected using a gamma
ray log from a well within the survey. The corresponding 2D histograms of the mean
posterior probability projections for (b) the Upper Barnett (zone B), (c) the top of the
Lower Barnett (zone C), (d) the middle of the Lower Barnett (zone D) and (e) the
bottom of the Lower Barnett Shale (zone E) are shown. ........................................... 126
xxiv
Figure 5.16. (a) 2d histogram of zone B defined above and below by stratal slices
corresponding to the well log picks in Figure 5.15a. (b) User defined polygons are
created and are colored consistent with the well-probes in Figures 5.13 and 5.14. (c)
The facies volume probe of the Upper Barnett Shale zone B visualized along with the
well-probes. The seismic dataset is colored accordingly to the clusters selected. Most
of the facies of the Upper Barnett Shale falls into the blue, cyan and the yellow
clusters........................................................................................................................ 127
Figure 5.17. (a) 2d histogram of zone C defined above and below by stratal slices
corresponding to the well log picks in Figure 5.15a. (b) User defined polygons are
created and are colored consistent with the well-probes in Figures 5.13 and 5.14. (c)
The facies volume probe of top section of the Lower Barnett Shale zone C visualized
along with the well-probes. The seismic dataset is colored accordingly to these clusters
selected. The yellow and blue facies is common in the Upper Barnett Shale is also
found abundantly in this zone. ................................................................................... 128
Figure 5.18. (a) 2d histogram of zone D defined above and below by stratal slices
corresponding to the well log picks in Figure 5.15a. (b) User defined polygons are
created and are colored consistent with the well-probes in Figures 5.13 and 5.14. (c)
The facies volume probe of middle section of the Lower Barnett Shale zone D
visualized along with the well-probes with the colors selected according to the clusters
in (b). Note that this zone has least similarity to the Upper Barnett Shale. With more of
the microseismic events concentrated in the pink, light green and red facies as seen in
the well-probes, and the dominance of siliceous non-calcareous shale lithofacies
(Singh, 2008), this zone 3 is interpreted as brittle with results consistent with Perez
(2013). ........................................................................................................................ 129
xxv
Figure 5.19. (a) 2d histogram of zone E defined above and below by stratal slices
corresponding to the well log picks in Figure 5.15a. (b) User defined polygons are
created and are colored consistent with the well-probes in Figures 5.13 and 5.14. (c)
The facies volume probe of bottom section of the Lower Barnett Shale zone E
visualized along with the well-probes with the colors selected according to the clusters
in (b). This zone corresponds to the hot gamma ray zone (Pollastro et al., 2007). Six
clusters are also identified from the mean posterior probability projections (in the top
inset) are polygons are drawn and are colored consistent with the well-probes. With
very few of the microseismic events in the brown colored facies we interpret from
(Singh, 2008 and Perez 2013) the brown colored rock to be ductile and high in TOC
content. The pink, light green and red facies are the regions with brittle shale. ........ 130
Figure 6.1. Non-linear mapping of the latent space to the data space. The prior
distribution consists of latent space variables (K) ordered on a regular grid (blue
circles) residing in an 2-dimensional latent space. consists of a regular array of J
non-linear basis functions. With the linear combination of these basis functions the
latent space (blue circles) are mapped to the data-space (blue spheres) where they form
a 2D non-Euclidean manifold, S, such that each node uk is mapped to a corresponding
vector mk in data-space. In general data-vectors are scattered about S. Specially we
assume an isotropic Gaussian noise distribution centered at mk and having a variance
of 1/β. The final probability density function in the data space is obtained by
integrating the Gaussian PDFs for all mk where k=1, 2, ….K grid points. ................ 154
xxvi
Figure 6.2. Workflow for data visualization in GTM. After training the new estimated
parameters Wnew and βnew, the new posterior probabilities representing the data-
vectors can be obtained using Bayes’ theorem and is given by .
These posterior distribution “responsibility” values can be plotted for all the grid
points, in the 2D latent space. The mean location will assign the value
to be the weighted average of the posterior distribution values and will in general fall
between neighboring values of . The mode will assign the value to be
the location of the greatest posterior distribution value in the 2D latent space and will
always correspond to a discrete gridded value of . ................................................ 155
Figure 6.3. The flow chart for generative topographic mapping workflow. ............. 156
Figure 6.4. (a) Detail of the Veracruz Basin showing its boundaries and its two
geological subdivisions: The Plataforma de Córdoba, which is the buried tectonic front
of the Sierra de Zongolica, and the Veracruz Tertiary Basin. The field of study
(indicated by the red arrow) is located in the western margin of the buried tectonic
front in Upper and Middle Eocene age Tertiary sediments. (Map courtesy of PEMEX
E&P based on previous work by Prost and Aranda, 2001; Romero, 2012). (b)
Structural cross section through the field. EOC–10 and EOC–20 produce in the
western portion of the structure; EOC–40 is water bearing and is distributed over the
western portion of the anticline; EOC–30 and EOC–50 produce in the eastern flank
(Hernández-Martínez, 2009). EOC-10 and EOC-30 are the reservoir units, which we
analyzed through GTM clustering. (Image is courtesy of PEMEX E & P). .............. 157
Figure 6.5. 2D histogram plot generated from cross-plotting the different volumes
taken as input to GTM of the EOC-30 zone. (a) Cross-plot from seismic volumes
Vp/Vs vs. P-impedance. (b) Cross-plot from seismic: Mu-Rho vs. Lambda –Rho. Note
that the crossplot generated forms a single cluster and does not help in distinguishing
different facies ............................................................................................................ 158
xxvii
Figure 6.6. QC of the GTM analysis after performing 50 iterations. (a) The plot of the
inverse variance (β), stabilizing as the number of iterations increases. (b) A 2D
histogram of the mean projections of posterior probability of the dataset within the
reservoir zone between EOC-50 and EOC-5 (from Figure 6.4b). ............................. 158
Figure 6.7. The 2D cross- plot of the mean posterior distribution map of the
“responsibilities” of the data onto the 2D Latent space for the reservoir units EOC-30
and EOC-10. The cross plot is generated by cross plotting to GTM projection
volumes. (a) The projection of the mean “responsibilities” of EOC-30 unit. The 2D
histogram is on the right and the scatter crossplot is on the left. Seven clusters are
visible on the latent pace corresponding to the high-density points. These clusters are
delineated by polygons with different colors and in the subsequent figure will help to
visualize the different classes in the seismic data. (b) The projection of the mean
“responsibilities” of EOC-10 unit. The mineralogy content and porosity distribution
for the EOC-10 and the EOC-30 reservoir units being similar the clusters for both
these reservoir units lie on the same location in the 2D latent space. They are also
color-coded similarly since both reservoir units have similar rock type. (c) Regional
conceptual sedimentary model (Courtesy of Petróleos Mexicanos (PEMEX) E & P.
Hernández-Martínez, 2009) ....................................................................................... 159
Figure 6.8. Generating the seismic facies volume (geobodies) from GTM clustering
within the reservoir units EOC-10 and EOC-30, considering the input seismic volumes
- lambda-rho (λρ) vs. mu-rho (μρ) and P-wave impedance (Zp) vs. Vp to Vs ratio
(Vp/Vs). Different polygons around classes signify rock types for reservoir units (a)
EOC-10 and (b) EOC-30. Seven different facies class have been identified from the
clusters in the latent space and are delineated by polygons of different colors. (c) The
horizon probe generated for the EOC -10 and the EOC-30 reservoir units after the
unsupervised GTM analysis. The white arrows highlight the faults. The most abundant
facies are the orange facies. ....................................................................................... 160
xxviii
Figure 6.9. Horizon phantom slice 10ms below the top of EOC-10 and the EOC-30
reservoir units, through (a) clay volume, (b) effective porosity predicted from
supervised probabilistic neural networks (PNN), and (c) unsupervised seismic facies
volume from GTM for the of EOC-10 reservoir unit. (d) The clay volume and (e)
effective porosity predicted from supervised probabilistic neural network (PNN), and
(f) seismic facies volume from GTM of EOC-30 reservoir unit. ............................... 161
Figure 6.10. (a) The GTM seismic facies volume with the well locations for the EOC-
10 reservoir unit. The red wells are the producing wells and the blue the injector well.
The well X is a dry well and falls within the brown facies region. These brown colored
faces are probably bad reservoir quality rocks making X a dry well. In this unit also
the wells are located at the structural highs. Most of the producing wells are along the
greyish-green and light green facies. Correlating with the clay volume the pink-purple
colored facies corresponds to the high clay content facies from. (b) Map view of the
top of the EOC-10 reservoir unit. The pie charts at the well locations show the average
production of the well for a seven-month period. In the pie chart green is for oil, red
for gas and blue for water. Structural high is in the North. The green facies in the north
are good producers (Wells J and I). Well H in orange facies is a moderate producer.
Well G in the west lies in the structural low making it a moderate producer. In the
south the well K lying in the light green facies is the most productive whereas the
wells in the greyish facies are moderate-low producers. Note the pink and purple
facies correlates with clay-rich areas. ........................................................................ 162
xxix
Figure 6.11. (a) The GTM seismic facies volume with the well locations for the EOC-
30 reservoir unit. The red wells are the producing wells and the blue the injector well.
The wells are mostly situated at the structural highs. The most abundant rock type
within this zone is the one with the orange color. Comparing with the well information
and with the clay and effective porosity volumes from probabilistic neural network
(PNN) these corresponds to the conglomerate sandstones with moderate porosity. The
brown and pink color facies are mostly along the faults and probably shows the
depositions close to normal faults along hanging walls. These regions also correspond
low porosity and relatively rich in clay. The dark green corresponds the highest
impedance regions, with moderate effective porosity, which is interpreted as the hard
conglomerate deposits having least clay content. (b) The map view of the top of the
EOC-30 reservoir unit. The pie charts at the well locations show the average
production of the well for a seven-month period. In the pie chart green is for oil, red
for gas and blue for water. Structural high is in the North. Wells A, B, C, D and E are
low-moderate producing wells. Thus the orange rock is a moderate quality reservoir
rock with moderate production. The water productivity increases as we go south (wells
E and F). The well F is having the largest production in terms of both oil and water is
situated in the mixed grey/light green and pink facies. The grey/light green facies are
better reservoir quality rocks. ..................................................................................... 164
Figure 6.13. The average of the data-vectors calculated from a sub-volume (displayed
as small cubes) around the each well location. Each input attribute has been previously
normalized using a Z-score algorithm. These three average (target) vector around the
wells are used to train the GTM, resulting in a supervised analysis. ......................... 167
xxx
Figure 6.14. The results of the most likely occurrence of the seismic facies
corresponding to dry well X within the EOC-10 and the EOC-30 reservoir units. The
regions, which are most similar to Well X facies appears as hot colors. The least
similar regions appear in cold colors. Note the occurrence of Well X Facies is
confined mostly along the faults and around Well X. ................................................ 168
Figure 6.15. A geobody within the EOC-10 and the EOC-30 reservoir units which
shows the most likely occurrence of the facies type corresponding to good producer
well K. The regions, which are most similar to the facies type at Well K, appear in hot
colors. The least similar regions appears in cold colors. Note that this facies is not so
abundant in the reservoir units and the likelihood of occurrence ranges from 40-70%.
.................................................................................................................................... 169
Figure 6.16. A geobody within the EOC-10 and the EOC-30 reservoir units which
shows the most likely occurrence of the facies type corresponding to moderate
producer well E. The regions, which are most similar to the facies type at Well E,
appear as hot colors and the least similar regions appear as cold colors. Note that this
type of facies is very abundant in the reservoir units and the likelihood of occurrence
mostly ranges from 75-100%. .................................................................................... 170
xxxi
ABSTRACT
Unsupervised clustering is data driven, unbiased by the interpreter beyond the choice
of input data and brings out the natural clusters present in the data. There are several
use the popular unsupervised Kohonen self-organizing maps (SOMs) algorithms and
depositional facies including basin floor fans, mass transport complexes and feeder
Chert reservoir from Oklahoma and map the locations of the tight/non-porous chert
and limestone vs. more prospective porous tripolitic chert and fractured chert zones.
The tight chert and dense limestone can be highly fractured, giving rise to an
additional seismic facies. In both the case studies, a large number of potential classes
are fed into the SOM algorithm. These “prototype vectors” are clustered and colors are
interpretation.
xxxii
Kohonen SOM suffers from the absence of any proper convergence criterion
and rules for parameter selection. These shortcomings are addressed by the more
to map the data about a lower dimensional “topographic” surface residing in high
dimensional attribute space. GTM predicts not only which cluster best represents the
data, but also how well it is predicted by all other clusters. For this reason, GTM
interfaces neatly with modern risk analysis workflows. I apply the GTM technique to
classify 15 sets of horizontal well parameters in one of the recent unconventional shale
plays, correlating the results with normalized estimated ultimate recovery (EURs),
do seismic facies classification for a Barnett shale survey. With the aid of
microseismic data, the clusters from GTM analysis are interpreted as brittle or ductile.
I also apply the GTM technique to the P-impedance (ZP), lambda-rho (λρ), mu-rho
(μρ) and the VP/VS volumes from a Veracruz Basin survey in Southern Mexico that
Finally, I introduce limited supervision into both the SOM and GTM
algorithms. The target vectors for both SOM and GTM are the average attribute vector
about the different facies identified from the well logs. This supervision introduced
results. For GTM, I calculate the probability of occurrence of the well facies in the
survey.
workflows will not only accelerate seismic facies identification, but also with GTM,
workflows, user interfaces and user documentation allowing others to build upon and
xxxiv
CHAPTER 1
output map of seismic facies. Seismic attributes provide quantitative estimates of these
properties including not only frequency, amplitude and phase, but also dip magnitude,
dip azimuth, similarity, curvature and reflector convergence. With the adoption of 3D
technology and increasing survey size, such manual techniques became extremely
time consuming. The number of seismic attributes has also increased dramatically,
For these reasons, considerable effort has been devoted to more efficient 3D
volume based seismic facies estimation, resulting in two types of algorithms and
target vectors around wells or interesting features in the seismic dataset and the dataset
1
Supervised interpretation is well established in the exploration industry with
public health and national security needs. This dissertation focuses on hybrid
algorithms with both unsupervised and supervised classification for seismic facies
exploration. There are several such unsupervised seismic facies analysis techniques
knowledge of the number of clusters before classification. However with the noisy
seismic data, with their continuity and low dimensionality does not hold the perquisite
for a good K-means clustering (Coleou et al., 2003). Principal component analysis is
widely used to reduce the redundancy and excess dimension of the data. The PCA is
accurate and efficient. Each of the techniques solves different objective functions to
form the classification, with each having their merits and demerits.
2
Kohonen SOM (Kohonen, 1982) originally developed for gene pattern
recognition, is one of the most popular classification techniques, and it has been
classification techniques is that the clusters in the SOM grid space are ‘topologically”
related to the input dataset through the means of an intermediate latent space. 2D and
3D SOM grid spaces can be mapped to 2D and 3D colorbars to show the data classes
classification.
the latent space (Bishop et al, 1998). There has been very little work on the application
of GTM technique to seismic data and exploration problems. Apart from general
seismic facies classification, GTM can be used to predict the probability of occurrence
of a particular seismic facies. The GTM can also be used for estimating physical
parameters such as porosity, permeability and EUR estimations from the training
parameters. For this reason, GTM not only can be used for seismic facies classification
but also applied into modern risk analysis and used in decision making.
In this dissertation, I implement and apply Kohonen SOM and GTM to real 3D
multi-attribute seismic datasets. I also introduce supervision to SOM and GTM using
well control, resulting in a hybrid algorithm that finds user-defined clusters and uses
seismic facies which can be included in the modern risk analysis workflow.
REFERENCES
Matos, M. C., P. L. Osorio, and P. Johann, 2007, Unsupervised seismic facies analysis
using wavelet transform and self-organizing maps: Geophysics, v. 72, p. P9-
P21
4
CHAPTER 2
INTRODUCTION
probability of occurrence for each category make more sense (Duda and Hart, 2001).
other physical properties of the reservoir rocks, and is an important part of seismic
facies map. However, with the ever-increasing 3D seismic data volumes and the
requirement to have fast and accurate results, we need to automate the process of
K-means and PCA were some of the earliest attempts at computer assisted
method have been proven much more effective allowing the computer to imitate the
vectors of input attributes and a suite of user defined seismic facies target vectors
(picks). Such supervised analysis in seismic add user insight and/or bias to the
classification. When the geology is unknown or when there are very few wells in a
large seismic survey, a supervised classification may reveal only a subset of the
5
natural classes present in the data. In this situation, unsupervised classification
DISSERTATION STRUCTURE
(chapter 1), introduction (chapter 2) and a conclusion (chapter 7). Chapters 3-6 are in
the form of scientific papers and have been presented at international conferences or
northern Gulf of Mexico. The objective of the paper is to show how this unsupervised
features including basin floor fans, mass transport complexes, debris flows and feeder
channels. Proper selection of the input seismic attribute volumes is the key to effective
classification differentiate the different depositional features. After the SOM training
colorscale. Those traces with similar seismic nature are assigned the same colors,
resulting in a 3D seismic facies volume. This chapter was published and presented in
the Gulf Coast Section SEPM Annual Convention and Exhibition in Houston in 2011.
6
Chapter 4 applies SOM unsupervised workflow to characterize heterogeneous
Mississippian limestone. There are two major types of chert: tight/non-porous chert
and the highly porous tripolitic chert. The dense chert and the dense limestone facies
can be highly fractured, giving rise to two additional seismic facies. I evaluate two
convergence and coherent energy) and a second with the texture, frequency and
classification was also done where the facies similarity was based upon the minimum
Euclidean distance measures (MED). Three average wavelets, based on the borehole
image logs from two horizontal wells in the survey, formed the target vectors in this
supervised analysis. Further the results are compared with the post-stack P-impedance
results within the chert reservoir. The works from this chapter has been submitted to
algorithm, which takes care of the limitations of the Kohonen SOM process. This is
more of a tutorial paper on GTM. In the first application GTM technique is applied to
correlating the results with normalized EURs, allowing an estimation of EUR based on
the most relevant parameters. In addition, I have re-coded GTM to handle a much
7
larger seismic dataset as input and generated a new workflow for unsupervised seismic
impedance, lambda-rho and mu-rho) from a Barnett shale survey as input to the GTM
software. By this method, the posterior probability values of the data are projected
onto the 2D latent space forming different clusters. These clusters are interpreted and
different user-defined lithofacies are defined. These lithofacies within the Barnett
Shale are first correlated with the microseismic events at the well locations. Based on
the concentration of the microseismic events the lithofacies are interpreted as brittle or
ductile. Four zones within the Barnett Shale formation are identified and the seismic in
each of the zones are then colored consistent with the different facies types interpreted
from the microseismic events. This study does a 3D regional mapping of the
“fracable” and ductile shale. The works from this chapter will submitted in the
“Geophysics” journal.
carbonate conglomerate oil field in the Veracruz Basin of southern Mexico. The multi-
attribute analysis is done with the different seismic inversion volumes (P-impedance,
lambda-rho (μρ), mu-rho (λρ) and Vp/Vs). After GTM modeling the dataset is
formed in the 2D latent space are color-coded by the user to identify different seismic
facies present in the dataset. The average EURs are plotted with the unsupervised
seismic facies volume to classify the good reservoir and the bad reservoir facies. In the
8
next stage the average multi-attribute waveforms are calculated from three different
wells in the survey and a PDF is created for each of the three well data vector. Then
each of these three PDF is compared with all the data vectors to quantitative estimate
probability of occurrence of a well vector within the reservoir units. This study helps
analysis can be used to study the geologic risk analysis related to this reservoir. The
woks from this chapter will be submitted to “Interpretation” jointly published by the
includes the documentation for running geometric attribute as a batch mode, 2D SOM,
3D SOM and the GTM at the, allowing the reader to replicate my unsupervised and
9
CHAPTER 3
ABSTRACT
The most popular seismic attributes fall into three broad categories – those that
are sensitive to lateral changes in waveform and structure such as coherence and
curvature, those sensitive to thin bed tuning and stratigraphy, such as spectral
components, and those sensitive to lithology and fluid properties – such as AVO and
in texture. The best input attributes are those that are mathematically independent and
Supervised and artificial neural networks form the popular clustering algorithms
in the seismic industry. Among the artificial neural network techniques Kohonen self-
data volumes are compared to these clusters; some are assigned to the predefined
clusters, while others form clusters outside the area of supervision. The self-
organizing map (SOM) is one of the most effective unsupervised pattern recognition
techniques, because it identifies natural clusters in the dataset and the SOM grid points
have over-defined the number of initial clusters through the use of a large number of
“prototype vectors” (PVs), which after subsequent iterations tend to converge to the
lesser number of “natural clusters” using a Kohonen SOM neighborhood training rule.
After the training is complete the modified PVs are then color-coded by using a 2D
gradational colorscale. Those traces with similar seismic nature are assigned the same
rendering the colored PVs with the original seismic amplitude data, input attributes,
11
INTRODUCTION
sections through seismic volumes to determine the seismic character, which is then
correlated to a seismic facies using well data. Given large 3D seismic data volumes,
there is a need for robust automated seismic facies classification algorithms which can
help segment the large data volume into seismic facies components. Through
Our work is built upon that done by many others who have worked on
volumetric seismic facies mapping. Coleou et al. (2003) showed how by treating
seismic amplitude samples within a vertical analysis window as “attributes”, one could
generate a 2D map of seismic facies having similar waveforms. Each waveform was
plotted against a 1D latent space and a corresponding 1D color bar giving rise to the
vast majority of the probability mass (Wallet et al., 2009). In an ideal situation,
12
virtually all the waveforms (or alteratively, suite of attributes at a given voxel) should
be mapped onto a 1D, 2D or 3D subspace. Strecker and Uden (2002) were perhaps
the first to use 2D latent spaces with geophysical data, using multidimensional
included envelope, bandwidth, impedance, AVO slope and intercept, dip magnitude,
and coherence. These attributes were projected onto a 2D latent space and their results
plotted against a 2D color table. Gao (2007) used GLCM texture attributes, principal
component analysis and SOM using a 1D latent space to map seismic facies offshore
classification using SOM to cluster GLCM texture attributes, Corradi et al., (2009)
mapped sand and sealing vs. non-sealing shale facies offshore west Africa. There are
works well where the data are noise free and have a relatively high dimensionality
(Coleou et al, 2003). Statistically, seismic data have high continuity, and therefore a
relatively low dimensionality. Coleou et al., (2003) found that the addition of noise
analysis.
organizing maps (SOM). Coleou et al., (2003) were among the first to apply
recently Matos et al. (2009) and Roy and Marfurt (2010) built on work by Strecker
and Uden (2002) to show the advantage of extending the SOM grid space to 2D and
13
3D, with corresponding 2D and 3D colorbars used to delineate the subsurface
signals (a “primary event space”) and tried to link these ideas with the functionality of
the human brain (Murtagh, 1995). This popular neural network method was initially
used in biology and computer science for data mining purposes, has since been used in
SOM (Kohonen, 2001) clusters data such that the statistical relationship
between multidimensional data is converted into a much lower dimensional SOM grid
space that preserves the geometrical relationship among the data points.
Mathematically each SOM unit within the SOM grid space preserves the metric
vectors (PVs) or neurons have the same dimension as the input data (e.g. 12
dimensions for 12 input attribute values). These PVs are arranged in a regular low-
dimensional grid or map (2D or 3D for our work), thereby topologically connecting it
14
to its neighbors. By construction, SOM preserves the original topological structure
within this N-dimensional attribute space, making it amenable for seismic facies
analysis.
We assume that the input are represented by J vectors in a N-dimensional vector space
Rn, xj= [xj1, xj2, xj3 …. xjN] where N is the number of input attributes (or amplitude
analyzed. The objective of the algorithm is to organize the dataset of input seismic
attributes into a geometric structure called the SOM. SOM consists of neurons or
prototype vectors (PVs) organized by a lower-dimension grid, usually 2D, which are
representative of the input data that lies in the same N-dimensional space as the input
seismic attributes. PVs are also termed as SOM units and typically arranged in 2D
among the PVs. In this manner PVs close to each other are associated with input
seismic attribute vectors that are similar to each other. The number of these PVs in the
mi2…. miN], where i=1, 2, …, P and N is the dimension of these vectors defined by the
number of input attributes (or samples for waveform classification). The 2D SOM can
be understood as a 2D sheet upon which the interconnected PVs are imbedded. After
15
assigning the initial location of the PVs to form either a hexagonal (Figure 3.1B) or
rectangular grid, we “train” them (i.e., changing their direction and moving them
around laterally in their SOM grid space) to best represent the input data. After SOM
neighborhood training these PVs become cluster centers representing the different
example from Matos et al. (2009). Figure 3.2A show some fruits that have different
properties. Let us consider two of their properties: their aspect ratio (fruit shape), peak
frequency (fruit color), and vitamin C content of each fruit. After training, the fruits
are arranged into three major groups as shown in Figure 3.2B. Now if we draw an
analogy with a 2D SOM grid map, the PVs represent the fruits, the shape and the color
of the fruits are the attributes of the PVs and these different attributes helps to arrange
them into different groups or clusters. Figure 3.1A shows 300 samples belonging to
three distinct Gaussian distributions having the same standard deviation but different
means. These 300 input vectors have three attributes or dimensions (N=3). Each
sample vector in 3D space is cross-correlated with itself and all other vectors, resulting
in a 3 by 3 covariance matrix. To begin, we choose three times the square root of each
of the first two eigenvalues (three standard deviations) of this covariance matrix to
define our initial 2D SOM grid space, which we sample with an 11x7 regular
hexagonal grid (Figure 3.1B). This definition results in the SOM grid space
representing approximately 99.7% of the input dataset. Thus the total PVs or the
During the SOM training process, an input vector is initialized and is compared
with all N-dimensional PVs on the 2D grid, or latent space. The prototype vector with
the best match (the winning PV) will be updated as a part of SOM neighborhood
training. A small neighborhood of PVs around the “winner” prototype vector are also
updated (Figure 3.3A) such that the PVs lying within a distance of “ ” is excited
and updated. With successive iteration, this neighborhood radius decreases. Thus,
in each iteration, the winning prototype vector is brought closer to the input vectors.
Given this background, Kohonen (2001) defines the SOM training algorithm
Step 1: Consider an input vector, which is randomly chosen from the set of input
vectors.
Step 2: Compute the Euclidean distance between this vector and all PVs ,i=1,
2,…p. The prototype vector , which has the minimum distance to the input
Step 3: Update the “winner” prototype vector and its neighbors. The updating rule for
the weight of the ith PV inside and outside the neighborhood radius is given by
[ ] || || (2a)
|| || , (2b)
17
where the neighborhood radius defined as is predefined for a problem and
decreases with each iteration and are the position vectors of the winner PV
and decrease with each iteration in the learning process and they are
defined as
|| ||
, and …………………………. (3)
. …………………………. (4)
The training of the PVs for the above dataset is illustrated in Figure 3.3. The
vectors are used, after training they group themselves into three separate clusters in the
2D latent space.
Step 4: Iterate through each learning step (steps 1-3) until the convergence criterion
(which depends on the predefined lowest neighborhood radius and the minimum
Step 5: Color-code the trained PVs using 2D or 3D gradational colors (Matos et al.
2009). We will use an HSV model with for 2D spaces will be defined as hue, ,
⁄
( ) …………………………. (5)
⁄
and saturation, as
18
[( ⁄ ) ( ⁄ ) ] …………………………. (6)
where and are the projected components onto the 2D SOM grid space defined by
the eigenvectors and The input data in this example visually contain three
separate classes which gives rise to three separate clusters in this 2D SOM grid space
(Figure 3.4A). These new set of PVs are colored using the 2D HSV color palette with
equations 5 and 6 (Figure 3.4B). This coloring scheme will be used for generating the
APPLICATION
location of the different waveforms A, B, C and D are highlighted both on the map
view and crossline view of the synthetic model (Figures 3.5A and B). The data are
sampled at 2ms, and consist of 50 inline by 26 crossline 30ms (16 sample) long traces.
The 2D SOM grid space containing the prototype vectors is defined as before
eigenvalues λ1 and λ2) along the two principal component directions (eigenvectors v(1)
and v(2)). Thus this SOM grid space represents approximately 99.7% of the input
19
dataset. We sample this SOM grid space using a regular grid of more than 300, 16-
dimensional prototype vectors similar to that in Figure 3.3A. The minimum Euclidean
distance between the 16-dimensional input data with the prototype vectors, defines a
winner for each data vector as part of the SOM training algorithm (Steps 2 and 3).
Although more than 300 prototype vectors were used, the resulting facies map
clusters into four different colors corresponding to the four different waveforms in the
input dataset (Figure 3.6). The waveforms are superposed on the 2D map (Figure
3.7B) to show how the different colors are assigned to the different waveforms. The
synthetic seismic is colored based on the similarity of the waveform (Figure 3.7A).
The waveforms with their respective colors are explicitly shown in Figure 3.7C.
the data. Due to the limitation of our visualization software which provides only 256
this application, we normalize our input data vectors using a Z-score algorithm. Thus
our input data has a vector assigned to each of the (x, y, z) location in our volume
(which are actually the normalized input attribute values at that location). We call this
new volume the normalized multi-attribute volume and project it onto a 2D SOM grid
20
space by Principal Component Analysis. The 2D SOM grid space is defined as
explained earlier. If there are five input attribute volumes, each of the PVs in the 2D
SOM grid space is 5-dimensional. This 2D SOM grid space is sampled uniformly by
256 PVs. The PVs are trained in the 2D SOM grid space and their positions updated
after each iteration, resulting in the new updated position of the PVs. When the
updating slows down the training process stops. With an increasing number of
iterations, the PVs move closer to each other and to the data points within the SOM
grid space. The HSV colors are assigned to the PVs according to their distance from
their center of mass and their azimuth (equations 5 and 6). Once trained, the distance
is computed between each PV, pi′, and the multiattribute data vector, x, at each voxel
using
where is the nearest trained PV to the input data sample vector . Each voxel is
then assigned the color of . In this manner, two dissimilar neighboring samples in
the seismic volume will be far apart in the SOM grid space and have different colors.
Conversely, two similar samples in the seismic volume will have nearly the same
color. Each color represents a seismic facies, most of which are geologic facies, but
21
SOM on a 3D seismic dataset from the Northern Gulf of Mexico
province (Diegel et al, 1995) of the Northern Gulf of Mexico (GOM) margin (Figure
3.8A). There are primarily four depositional environments for deep water systems:
channels, thin bed-levees, basin floor fan lobes or sheets and mass transport
complexes (MTCs) (Figure 3.8C). The sediments in the salt minibasins follow a “fill
and spill” depositional process (e.g., Winker, 1996) preserving the basin floor fan and
channel-levee sediments.
covers almost this entire salt minibasin. Sarkar et al., (2010) describes a suite of
different mass transport complexes (MTCs) preserving the basin floor fan in the
system. They co-render strata-slices of coherent energy and inline amplitude gradient
identified by Ferrero, (2007) and Sarkar et al.’s (2010), which highlights the MTCs
and Basin floor fan (BFF). They did the interpretation with the well logs and sparse
arbitrary vertical section across the survey (Figure 3.9). Zone 1 is the deepest MTC
with a less chaotic flow A. Above it zone 2 has the second MTC flow B which is more
erosive in nature and is more related to salt tectonics. This mass transport complex
contains some small slump bocks along with some debrites and undifferentiated MTC
deposits. The MTC deposits are also characterized by the presence of a buried
channel-submarine fan unit within the chaotic sediments. The depositional patterns of
22
the upper and the lower MTCs are quite random (Sarkar et al., 2010). Overlying this is
the zone 3 having strong continuous seismic amplitude, which forms the basin floor
fan sequence. Zone 4 is the uppermost zone in our analysis, which is a lower energy
MTC and is a basin wide depositional feature. Some of the submarine channel systems
which help discriminate different seismic facies distribution: the coherent energy,
eigenstructure coherence, two GLCM texture attributes (entropy and variance) and the
choice of seismic attributes. Note that these attributes are not sensitive to the direction
of the survey or rotation of the sediments with compaction. We note that attributes
such as phase and azimuth are inappropriate for this cluster analysis since they are
mathematically discontinuous. These five attribute values are first normalized using a
z-score algorithm for each sample to remove any bias prior to clustering with the
DISCUSSIONS
associated with different deep water deposits. Figure 3.10 shows an arbitrary vertical
23
section through the facies volume co-rendered with the seismic amplitude section. The
interpretation is done in a top-down sequence and the zones from 1-4 are interpreted
independently. The continuous high amplitude depositional features are colored light
to dark blue. The random chaotic MTCs are colored in magenta. The background
marine pelagic shale is colored in lighter yellowish green color. Figures 3.11A to E
show different phantom horizon slices through the data volumes and highlights
different depositional features. The location of the phantom horizon slice is marked in
Low coherence,
Moderate to high high energy.
amplitude, Rotated blocks and
Mass discontinuous, pinch out patterns Different hues of
Transport chaotic, resulting in high blue and cyan
Complexes hummocky, rotated reflector
blocks convergence.
Irregular bed
thickness gives rise
to non-uniform
tuning frequencies.
High values of
GLCM entropy and
variance.
24
High coherence,
High amplitude, high energy. Yellowish green
Basin Floor continuous, Subparallel and green
Fans isolated or bedding resulting
connected features in low reflector
within the fan. convergence.
Sub-parallel Thicker bedding
reflectors. gives rise to lower
tuning frequencies.
Moderate values of
GLCM entropy and
variance.
Moderate
Moderate to low coherence, lower Pink to cyan
Marine amplitude, energy. Subparallel
Pelagic Shale continuous, very bedding resulting
thin in low reflector
and separated from convergence. Thin
the bedding gives rise
MTC to higher tuning
frequencies.
Moderate values of
GLCM entropy and
variance.
Figure 3.11A is a phantom horizon slice through the MTC complex in zone 1.
The blue color facies highlights the relatively continuous flow A. It flows from the
northeast and the more chaotic, incoherent flow B (magenta facies) encroaches the
flow A from the north and south. Figure 3.11B is another phantom slice, which
highlights the chaotic flow B from zone 2. The flow is mostly from the northeast
marked by a yellow dotted line. The Figure 3.11C shows some confined high
amplitude sand bodies (dark blue) with strong seismic amplitude, which may be gas
25
charged. The flow B being chaotic has more characteristic of a MTC caused by slope
The basin floor fan sequence identified in Ferrero, (2008) and Sarkar et al,
2010 is properly highlighted with this facies classification volume. Figure 3.11D
shows the complex depositional features causing a fan deposit (blue color facies) in
zone 3. The lighter green color forms a thin deposit of marine pelagic background
shale. A stratal slice above this basin fan sequence in Figure 3.11E highlights number
of channels which feeds the fan deposits. As we go above the zone other basin floor
fan can interpreted across the survey. The continuous high amplitude in the west and
in the east of the survey is interpreted as fan deposits (Figure 3.11F). Some feeder
The last zone analyzed is the zone 4, which is also a MTC but is more regional
in nature. Figure 3.12 shows a stratal slice in the middle of the zone. The younger
submarine channel systems are interpreted in the north east of the survey. It forms a
distributary system as this channel (highlighted in blue arrow) moves south and forms
fan deposits at the mouth. The edges of this feeder channel system are defined by the
magenta color highlighted in the zone 4 across the vertical seismic section AA’.
Figures 3.13a and b show the geobodies formed corresponding to the basin
floor fan. Although we are using a 2D SOM grid space, plotted against hue and
interpretation packages allow only a 1D color bar. To the strata shown in Figure
3.11E, we apply transparency to the desired blue facies, which is easily achieved using
the 1D opacity table below where the blue facies are set to be opaque and all others to
26
be transparent. Figure 3.13A shows the volume probe with only the blue facies.
Automatic geobodies can be extracted in most of the commercial software. With this
opaque blue facies an automatic extraction of geobodies forms the different geo-blob
CONCLUSIONS
applied to a window of seismic amplitude data about a given horizon, giving rise to
disadvantages. Waveforms are not sensitive to the behavior of adjacent traces, unlike
attributes such as dip magnitude and coherence. Waveforms are also quite sensitive to
the source wavelet and thus cannot readily be applied to stratal slices. Our research is
in very early stages, and we do not claim to have used the ‘best attributes’ for our
27
independent attributes is critical to classification. We feel intuitively that the attributes
should be azimuthally invariant unless there are strong reasons to prefer given
directions.
based on 1D colorbars rather than 2D or 3D color tables. The use of pastel vs. dark
colors allows co-rendering the seismic facies classifications with seismic amplitude on
vertical slices and coherence and other edge-sensitive attributes on time and horizon
Supervision can be introduced by explicitly defining, and then fixing prototype vectors
slice, or of a less well-defined facies centered about a producing well or dry hole.
More sophisticated supervision will form a continuum between SOM as shown here
and traditional applied neural network technologies, blurring the differences between
By using a large number of PVs and colors, we avoid the need to know the
number of clusters in the data. Instead, the interpreter lumps PVs of similar colors into
clusters and associates them with a particular facies as part of the interpretation
process. As with multiattribute blending, 24-bit color will provide much greater depth
and detail in images than the current 8-bit color limitations on many interpretation
workstations. A limitation to this 2D color mapping is the folding of the PVs when
28
plotted in the 2D SOM grid space, such that multiple PVs of different nature can have
the same color. A partial solution is to project the data in a 3D SOM grid space and
prototype vectors formed by averaging the input data vectors within a user-defined
subvolume.
ACKOWLEDGEMENTS
We acknowledge Hess Inc. for the use of its deepwater GOM data volume for
research and Schlumberger for providing the Petrel software, in which interpretation
and visualization was done. The interpretation was done based on earlier studies by
Ferrero in her MS thesis and discussions with Dr. Roger M. Slatt of OU. We would
also like to thank the financial support of the industry members for our Attribute-
of Oklahoma.
29
LIST OF FIGURES
distributions with the same standard deviation but different means. The red plus signs
represents the original data points and the blue dots the Prototype Vectors in 3D; (B)
the initial PVs with the same dimensionality as the input is plotted in the latent space
30
Figure 3.2. Analogy of classification of different fruits illustrating unsupervised SOM
clustering analysis. (a) The unorganized fruits before training; (b) Clustering of the
fruits after training using three attributes (color, aspect ratio and Vitamin C content of
31
Figure 3.3. 2D map of the PVs in latent space (A) before and (B) after the training.
Around the “Winner PV” the larger green circle represents a neighborhood at an early
iteration while the smaller green circle represents a neighborhood at a later iteration.
After the training the PVs form three clusters, thus classifying the three different types
colorbar. (A) PVs in the 2D latent space before assigning colors, (B) colored by the
2D HSV color table using the Equations 5 and 6 for Hue and Saturation.
32
Figure 3.5. A synthetic data volume 30ms (16 – 2ms samples) having four different
waveforms consisting of 50 inlines by 26 crosslines. Noise level is 0.1 of the RMS
amplitude of the signal. (A) Map view and (B) Vertical slice along line QQ’. (C) The
four noise-free waveforms used to generate the synthetic volume.
33
Figure 3.6. (A) Initial set of PVs. (B) Training and HSV coloring of the projected
Prototype Vectors (PVs) in latent space. Although more than 300 prototype vectors
were used, the resulting facies map clusters into four different colors corresponding to
the four different waveforms in the input dataset. Arrows indicate the location of each
of the noise-free synthetic waveforms A, B, C, and D. Almost all of the clusters fall on
top of each other. Scatter away from the clusters such as the dark blue ‘cluster’ are
associated with stretch in the latent space. After clustering, the PVs are colored with
34
Figure 3.7. (A) The seismic facies map showing four different colors corresponding to
the four different waveforms that constitute the synthetic dataset. (B) The waveforms
are superposed on the 2D colormap showing the color assignment to the different
waveforms. (C) These four different waveforms are explicitly shown with
corresponding colors assigned to it. Note the transitional waveforms along the
boundaries which do not represent any data, but provide a continuum in the latent
space. Rather than choosing the final number of clusters (in this case, four) as
described by Matos et al. (2007), we simply let the interpreter visually cluster
35
Figure 3.8. (A) The location of the study area within the northern salt minibasin
province of Gulf of Mexico (Okuma et al, 2000). (B) Regional sedimentary fairway
map with the survey location. (C) Some of the deep-water depositional features
36
Figure 3.9. An arbitrary seismic section across the survey showing the different zones
(identified by Ferrero, 2007) used for analysis with the SOM clustering. Zone 1 has
Mass transport complex (MTC) flow-A. A more chaotic MTC flow-B is in zone 2. A
Basin floor fan sequence is highlighted in the zone 3. Zone 4 is also a regional MTC
37
Figure 3.10. An arbitrary unsupervised seismic facies section co-rendered with the
seismic amplitude section across AA’. The four zones are analyzed independently.
The MTC flow in the zone 1 is more continuous (blue facies). The zone 2 consists of
chaotic MTC (magenta), shale (yellowish green) and buried sand bodies (dark blue
Zone 4 is another regional MTC with some submarine channel complex visible in the
38
39
Figure 3.11. Unsupervised seismic facies horizon probe showing different stratal
slices in the four different zones starting from the lowest sequence. In the below
corner the location of the stratal slice is marked on the vertical section. (A) The strata
in zone 1 showing a relatively continuous Flow-A (blue facies) from north east of
debris and MTC being encroached by a more chaotic flow-B (magenta) in the north
and south. The flow-B is more erosive in nature and is more related to slope failure to
salt tectonics. (B) The strata in zone 2 within the chaotic MTC flow-B. The chaotic
nature of the MTC can be seen in this zone. (C) The chaotic MTC in zone 2 caused
due to slope failure and salt tectonics have eroded large sand bodies within itself as
shown by the dark blue colored facies. With their confined nature and high seismic
amplitude these sand bodies may have hydrocarbon charged. (D) Higher up in the
zone there is relatively some less chaotic MTC flow. (E) In the zone 3 the basin floor
fan sequence can be properly visualized from this facies volume. The strata shows a
complex set of distributaries and basin fans in the west (blue facies) with pale green
background of pelagic shale. (F) Moving up in the zone 3 the later deposited basin
floor fan sequence and other set of distributaries are interpreted in the east.
40
Figure 3.12. The zone 4 forms another MTC complex but is more regional in nature.
A younger feeder channel in the north east corner can be easily interpreted in this
strata from the unsupervised classified volume. AA’ is a vertical section of the
41
Figure 3.13. Extracting geobodies from the unsupervised classified volume. Although
coloring of the unsupervised classified volume was done in the SOM grid space using
colorscale. Modifying the transparency we can highlight certain seismic facies. The
blue facies from the Figure 3.11E showing mostly the BFF and other similar
depositional seismic facies. (A) The blue facies is made opaque and the rest
transparent in the 1D colorbar shown below. (B) With this single colored facies an
42
RFERENCES
495-503.
Gao, D., 2007, Application of three-dimensional seismic texture analysis with special
43
Kohonen, T. ,1982 Self-organized formation of topologically correct feature maps:
Biological Cybernetics, v. 43 p. 59-69.
Expanded Abstracts.
Matos, M. C., P. L. Osorio, and P. Johann, 2007, Unsupervised seismic facies analysis
P21.
Murtagh, F., and M. H. Pajares, 1995, The Kohonen self-organizing map method: An
Abstracts, p. 1591-1595.
Sarkar, S., K. J. Marfurt., and R. Slatt, , 2010, Generation of sea-level curves from
Strecker, U., and R. Uden, 2002, Data mining of 3D poststack attribute volumes using
44
Wallet, C. B., M. C. Matos, and J. T. Kwiatkowski, , 2009, Latent space modeling of
45
CHAPTER 4
Atish Roy *, Benjamin L. Dowell and Kurt J. Marfurt, The University of Oklahoma
ABSTRACT
surfaces, and other surfaces. In principal, a given pattern can be explicitly defined as a
concepts. Seismic attributes quantify characteristics not only of the seismic reflection
The Mississippi Lime resource play of northern Oklahoma and southern Kansas
provides a particularly challenging problem. Here the seismic facies are diagenetic
46
(tight limestone, stratified limestone and non-porous chert, and highly porous tripolitic
chert) or structural (fractured vs. not-fractured chert and limestone) rather than
stratigraphic. Skilled interpreters can recognize the different facies even though they
may not able to accurately define the exact diagenetic or tectonic history. During the
past several years, over 4700 wells have been drilled in the Mississippi Lime play.
clusters that represent the data. Using a 3D seismic survey acquired in Osage County
represent different diagenetically altered facies of the Mississippi Lime play. The 256
prototype vectors (potential clusters) reduce to only three or four distinct “natural”
clusters. We use ground truth of seismic facies seen on horizontal image logs to fix
three average attribute data-vectors near the well locations, resulting in three “known”
INTRODUCTION
year old in finding Waldo (Handford, 2012). Older, experienced interpreters are
usually more proficient at such pattern recognition than younger, perhaps more
technically savvy interpreters. Nevertheless, 3D seismic surveys are not only growing
larger, but also are being acquired by small companies and even smaller partnerships
in resource plays like the Mississippi Lime of Oklahoma and Kansas, companies that
47
often do not have seismic stratigraphers on their staff. Our workflow will help seismic
interpreters with an easy and quick way to estimate the variation of seismic facies in
Seismic attributes quantify not only reflector amplitude, frequency, and phase
but also through estimates of dip magnitude, dip azimuth, reflector convergence,
reflector rotation, and coherence, they also quantify reflector configurations, which
forms the basis of seismic stratigraphy. These physically independent (but statistically
interactive crossplot tool. The analyses of more than three attributes require a different
There are competing techniques for coping with the classification of data with
reduce dimensionality and provides an orderly suite of projections that best represents
classify our multi-attribute dataset. Before SOM training, we need to reduce the data
dimension into a low dimensional space through PCA analysis of the data. This lower
dimensional space represents the majority probability mass of the data (Wallet et al.,
2009). The vector components of the latent space are hidden and the SOM algorithm
techniques, including K-means, fuzzy clustering, and SOM. Their primary finding was
48
that the clustering algorithm used was less important than the choice of attributes used.
Among the clustering algorithms, they favored SOM since there is ordered mapping of
the clusters (topologically ordered); with similar clusters lying adjacent to each other
in the latent space. Coleou et al.’s (2003) seismic “waveform classification” algorithm
is implemented using SOM, where the “attributes” are seismic amplitudes on a suite of
16 phantom horizon slices, where the mean in 16-dimensional space when plotted one
element after the other can be envisioned to be a waveform. Plotting each cluster
against a 1D color bar results in a 2D map of seismic facies having similar waveforms.
and Uden (2002) were perhaps the first to use 2D latent spaces with geophysical data,
voxel. Typical attributes included envelope, bandwidth, impedance, AVO slope and
intercept, dip magnitude, and coherence. These attributes were projected onto a 2D
latent space and their results plotted against a 2D color table. Gao (2007) clustered
GLCM texture attributes based on their Euclidean distance in the texture attribute
space and 1D SOM to map seismic facies offshore Angola. He used 256 prototype
vectors to map the “natural” clusters. These natural clusters were then calibrated using
well control, giving rise to what is called a posteriori supervision. Roy et al. (2011)
a subset of vertical slices through the seismic amplitude. A simple but very important
49
innovation was to project the clusters onto a 2D non-linear Sammon space (Sammon,
1969). This projection was then colored using a gradational 2D colorscale like that of
significant well control or when the interpreter has significant expertise, one can
are based on artificial neural networks (ANN). Meldahl et al. (1999) used seismic
energy and coherence attributes coupled with interpreter control (picked locations) to
train a neural network to identify hydrocarbon chimneys. West et al. (2002) used a
similar workflow but where the objective was seismic facies analysis and the input
attributes were textures. Corradi et al. (2009) used GLCM textures and ANN, with
controls based on wells and skilled interpretation of some key 2D vertical slices and
mapped sand and sealing vs. non-sealing shale facies offshore west Africa.
introduce the Mississippi Lime play and how it is expressed by seismic attributes. We
then cluster our data in an unsupervised manner, finding the natural clusters. We then
extracted three average attribute data-vectors from two horizontal wells in our survey
based on the fracture density calculated by White (2013) with the borehole image logs.
those penetrated by the poor well are considered to be higher risk, and should be
avoided, while seismic facies that correspond to those penetrated by the good well
50
serve as higher priority targets. We conclude with a comparison of our clusters against
METHODOLOGY
attributes) on a 2D SOM grid. While the vectors can move to better represent the
nearby data, the spatial relationship between the prototype vectors and its neighbors on
the SOM grid is preserved, or topologically ordered. In our case, we will use multi-
over-define the number of initial clusters through the use of a large number of
a Kohonen SOM neighborhood training function results in the large number of PVs
“clumping” into a smaller number of actual clusters that represent the true variability
in the data. After the training is complete, the modified PVs are then color-coded by
colors. Each N-dimensional data vector of attributes is compared to these 256 trained
prototype vectors. The data voxel is then assigned the color associated with the
prototype vector that most closely matches the corresponding data vector, resulting in
51
Kohonen SOM Clustering Analysis
The SOM was first developed by Kohonen in the biological sciences, but is
least two commercial interpretation packages (one of which was used by Coleou et al.,
2002, and Gao, 2007). However, the commercial packages today appear to use a 1D
latent space, while the Matlab implementations are not amenable to handling large 3D
(2002) does use a 2D latent space but may no longer be commercially available. We
vectors in an N-dimensional vector space Rn, xj= [xj1, xj2, xj3 …. xjN] where N is the
number of input attributes (or amplitude samples for “waveform” classification) and
j=1,2,…,J is the number of vectors (one vector per voxel in 3D, one vector for map
location in 2D). The input attributes have different units of measurements, resulting in
Our SOM implementation defines mapping from the Rn, input data space to a
2D SOM grid space where the PVs assigned to each grid point are topologically
ordered. The 2D SOM grid space can be understood as a 2D sheet upon which the
52
interconnected imbedded PVs lie. The prototype vectors are represented by mi, mi=
[mi1, mi2…. miN], where i=1, 2, …, P where P is the number of PVs. We project the
PVs onto a rectangular structure map that preserves the neighborhood relationship
among the PVs. A projected PV in the 2D SOM grid space occupies a single grid
point. We initialize the PVs using PCA with the 2D SOM being defined by the first
deviations of the variability in the data as defined by the eigenvalues λ1 and λ2.
After training, these PVs deform and move out of the 2D plane to move closer to the
data-vector such that the 2D SOM grid better represents the natural clusters present in
the input data. After several iterations, the PVs continue to move (organize), clumping
into subsets. PVs that are close in the SOM grid space will represent attribute vectors
that are similar to each other. The number of these clumped PVs determines the
For example Figure 4.1a shows 300 samples belonging to three distinct
Gaussian distributions having the same standard deviation but different means. These
300 input vectors have three attributes or dimensions (N=3). Each sample vector in 3D
covariance matrix. To begin, we choose three times the square root of each of the first
two eigenvalues (three standard deviations) of this covariance matrix to define our
initial 2D SOM grid space, which we sample with an 11x7 regular hexagonal grid
(Figure 4.1b). This definition results in the SOM grid space representing
approximately 99.7% variance of the input dataset. There are 77 PVs, much more than
53
needed to represent the three natural clusters. Each of these individual PVs is denoted
vectors).
During the SOM training process, an input vector is initialized and is compared
with all N-dimensional PVs. The prototype vector with the best match “winner” and
its surrounding PVs will be updated, thereby “training” that neighborhood of the
SOM. A Gaussian neighborhood function is defined with about the “winner” PV as its
center and as its variance. With each subsequent iteration, the neighborhood
radius (variance) decreases. Thus, in each iteration, the winning prototype vector
is brought closer to the data vectors in the input data space while its corresponding
node organizes (or clumps) into one of the clusters formed in the 2D SOM space.
After 100 iterations, the node SOM grid space clumps into the 3 classes present in the
input data (Figure 4.1c). The trained PVs are color-coded using 2D gradational colors
(Matos et al. 2009). We will use an HSV model with for 2D spaces defined as hue, ,
⁄
( ) …………………………. (1)
⁄
and saturation, as
[( ⁄ ) ( ⁄ ) ] …………………………. (2)
where and are the projected components onto the 2D SOM grid space The input
data in this example visually contain three separate classes which gives rise to three
separate clusters and they are color-coded using the 2D HSV color palette with
equations 1 and 2 (Figure 4.1d). This coloring scheme will be used for generating the
54
seismic facies volumes discussed in the remainder of the paper. A more mathematical
explanation of all the above steps are given in Kohonen (2001) and Roy et al. (2011).
We start the training by first projecting this normalized multi-attribute data onto
a 2D SOM grid space defined by using the eigenvalues and eigenvectors obtained
from PCA. The SOM grid space is thus uniformly sampled with grid points that are
projections of PVs having the same dimensions as the number of attribute volumes
taken as input. Due to the limitation of our visualization software, which provides only
256 colors, we have limited our over-defined PVs to be 256. We apply the SOM
training rule to cluster these vectors in the SOM grid space. The PVs are updated after
each iteration and they slowly move towards the data-vector in the input space,
resulting in an updated projection of the PVs onto the SOM grid space. As the
updating slows down, the training process stops. The SOM manifold can be far from
planar and can even unfold itself, linearly projecting the multi-dimensional PVs onto a
2D SOM grid space causes some overlap in the projections. Thus after the final
dimensional vectors to a lower dimensional space such that the inherent structure of
the data is approximately preserved. Sammon mapping helps in reducing some of the
space. We use a 2D HSV color model to assign continuous color to the PVs according
to the distance from the center of mass and the azimuth of their projections. Once
trained, the Euclidean distance is computed between each trained PV, mi′ and the
55
multi-attribute input data vector, xn at each voxel using
|| || || ||
where, mb′ is the nearest trained PV to the input data sample vector xn. Each voxel in
the 3D data space is then assigned the color corresponding to mb′. In this manner, two
dissimilar neighboring samples in the seismic volume that are far apart in the SOM
grid space will have different colors (Roy et al., 2011). Conversely, two similar
samples in the seismic volume will have nearly the same color. Each color represents a
seismic facies, most of which are geologic facies, but some which may be seismic
‘noise’ facies. Figure 4.2 shows a flowchart explaining the multi-attribute SOM
process.
APPLICATION
The Mississippi “Lime” – Tight Limestone, Tight layered Chert and lime, Porous
The general stratigraphic column (Figure 4.3a) shows the Mississippian tripolitic
targets are within the Mississippian limestone. These weathered and/or detrital
intervals of highly porous rock are present in north-central Oklahoma and south-
central Kansas. The Mississippi Lime and tripolitic chert reservoirs have been
producing hydrocarbons for more than 50 years. Although the tripolite is widespread
across the region, it is not continuous and is highly heterogeneous (Rogers, 2001). Our
56
dataset is from Osage County, Oklahoma, (Figure 4.3b) which sits within the
Cherokee Platform which is bounded to the west by the Nemaha Uplift and to the east
Matson (2013) subdivides the Mississippian in this study area into the tight St.
Joe limestone, and the Osage A and Osage B levels. Osage B consists of alternating
thinly layered limestone and nonporous chert which does not undergo subsequent
diagenetic alteration (but often shattered chert). Osage A can be diagenetically altered
through meteoric processes and consists of siliceous limestone and high porosity
tripolitic chert.
from weathered or eroded Mississippian Lime that was deposited as muddy debris
flows. Figures 4.4a and b show the two schematic depositional models for the chert
In the reef model, the reef is eroded by wave action and material is deposited
downslope as debris flows. Silica then replaces some of the limestone (stage-one
diagenesis) to form high-density, low-porosity chert. Later, sea level drops and
meteoric water dissolves much of the remaining calcite (stage-two diagenesis) to form
subaerial environment are submerged and the silica replaces some of the limestone
(stage-one diagenesis) to form chert. When the sea level drops the meteoric water
throughout the Mississippian section (Rogers, 2001). Much of the limestone has also
57
been altered to a dense, non-porous chert that exhibits fractures due to shrinkage and
subsequent tectonic activities. The tripolitic chert reservoirs are heterogeneous and
have high-porosity and low-permeability forming sweet spots in the reservoir. The
limestone and non-porous chert are highly fractured. Ideally, horizontal wells are
drilled perpendicular to the fractures in the limestone and tight chert parts of the
plumbing to produce multiple sweet spots. The log responses of tripolitic chert zones
show low-resistivity, low-density and high porosity (25-30%). With the proposed
seismic facies analysis, we try to visualize and map the heterogeneous chert reservoirs
volumetric attributes where the number of input attributes determines the intrinsic
dimensionality of the data. In this application, we normalize our input data vectors
using a Z-score algorithm. Thus our input data consists of a vector assigned at each
estimate seismic facies. The first workflow applies unsupervised Kohonen SOM
analysis to structural attributes, while the second workflow applies the same
multi-attribute wavelets near the wells are compared to the multi-attribute dataset,
58
based on the MED measure. The case studies based on the three workflows provide a
input volumetric attributes to our clustering algorithm. This is done in order to find the
Structural Attributes
convergence, coherence, coherent energy, and dip magnitude (Figure 4.5) better map
the structural heterogeneity of the chert layers. Coherent energy will bring out
amplitude changes associated with tripolitic and tight limestone. Coherence will
highlight the discontinuities within the reservoir. Dip magnitude will capture
deformation, while reflector convergence will map unconformities. The shattered chert
and the Tripolitic chert will be more discontinuous compared to the tight limestone
regions. Using these attributes as input data to SOM algorithm will result in a
continuity and configuration in the data volume. Figure 4.6 show different features in
the 3D seismic facies volumes generated as output after the first analysis.
59
Texture Attributes
Our second hypothesis (workflow 2) assumes that the tripolitic chert, silica-rich
limestone and the St. Joe tight limestone have different textures (Figure 4.7). Coherent
energy, the spectral bandwidth computed from spectral decomposition (Zhang et al.,
2008), and three GLCM (Gray level co-occurrence matrix) texture attributes are used
in the analysis. A common everyday example of textures is the association of the grain
seen on furniture and the tree from which it was made. Matos et al. (2011) used
texture to identify Osage-age channel features in a different Osage Co. seismic survey.
For GLCM analysis an ensemble of traces are examined as an image and the
2002). The GLCM energy is the sum of the squared values of the pixels defined this
randomness of the seismic amplitude and the GLCM homogeneity highlights regions
having strict stationary statistics (invariant mean and variance). Dense cherty
limestone and tight chert exhibit high coherent energy, thicker continuous reflectors,
higher amplitude, more homogeneous and exhibit less entropy (Figure 4.7). Layered
chert and lime will exhibit comparatively less coherent energy, thin reflectors, less
homogeneity and more entropy. Tripolitic chert with high porosity and low-density,
will have lower amplitude, be most discontinuous, least homogeneous and exhibit high
entropy (be more disorganized) (Figure 4.7). We use these differences to classifying
the dataset on the basis of amplitude and texture variations. Figures 4.8 and 4.9 show
60
different features in the 3D seismic facies volumes generated as output after
unsupervised classification.
Supervised Classification
Waveform similarity based on distance metrics have long been used in seismic
2000) or attributes (e.g. Michelena et al., 1998) extracted from productive or non-
productive wells to every trace along the horizon. Here we perform a similar exercise
but do the comparison volumetrically. Furthermore if the MED between the vector at a
given well lies beyond a user-defined threshold distance the voxel is assigned to an
correlating wells to seismic waveforms, where the supervision was not only the actual
seismic about the well but also a suite of synthetic seismic traces generated through
based upon distance metrics similarity characteristics. These metrics compare two
multi-attribute data vectors and return a scalar value based on some notion of
similarity. We begin by training some of the dataset near the wells and turned their
average into target vectors into target or training vectors. These are then compared
with the rest of the data vectors. The MED is measured between the training vectors
and the data vectors. This measure is utilized for providing quantitative measures of
61
In our supervised classification workflow, three facies are selected based on
the borehole image log interpretation from White, (2013) (Figure 4.10). Three small
zones are taken, two along Well-A and one along Well-B to perform a supervised
classification about the Mississippian chert reservoir zone (Figure 4.9). The texture
classification. Three sets of average waveform are calculated from the three zones and
are these training data-vectors called facies-1, facies-2, and facies-3 as shown in
Figure 4.9.
Well A was drilled in the north east corner of the survey and encountered
mostly the St. Joe limestone and dense cherty limestone. However the dense chert and
the dense limestone facies can be highly fractured, giving rise to two additional
seismic facies. It is observed from the fracture density calculations along the well bore
(White, 2013) that there are some natural fractures in the dense chert of the Well-A
(Figure 4.9a). Thus two zones are identified, one in the region of dense cherty
limestone and the other in the region of dense chert with natural fractures (layered
chert and limestone). The average data-vector of the facies-1 (light blue-violet color)
is extracted from Well A around the vertical and horizontal zone, having the tight,
least fractured cherty limestone. The average data-vector of facies-2 (yellow color) is
extracted from the horizontal section of Well A, in which tight fractured chert are
observed. The average data-vector of both facies-1 and facies-2 have similar nature
with high homogeneity, high energy and low entropy (measure of disorder). However,
these two facies type have different amplitudes for each attribute value as observed in
and has been drilled mostly in the tripolitic chert formation. Facies-3 (light green
color) is the average data-vector extracted from the horizontal section of the Well B
that has more tripolitic chert facies present (Figure 4.9b). The average data-vector
from facies-3 shows less coherent energy, lower bandwidth spectrum, higher disorder
(entropy), low homogeneity and moderate energy (Figure 4.10c). These average data-
vectors are compared to the same z-score normalized dataset. We measure the
Euclidean distance between the facies vectors and each data-vector, where the most
similar data-vector to the facies vector will have the least distance. The confidence
values of the well data-vectors are fixed to 80%. The data-vector in the reservoir zone
is identified and the corresponding data voxel is color-coded according to its closest
facies type. A cutoff value is defined which controls the unclassified class facies-4
(gray color) that does not fall in either of the above three groups.
DISCUSSIONS
Although we started our analysis with an over-defined number of 256 PVs, the
unsupervised SOM clumps (clusters) them into a smaller number of “natural clusters”
present in the multi-attribute dataset. From Figures 4.6b and 4.8b we see that the
projections of these PVs form four/five clusters after the final iteration.
(workflow 1) helps to map the discontinuity and the overall distribution of rock types
63
within the Mississippian chert reservoir (Figure 4.6a). The harder higher-density chert
will have greater amplitude and a higher coherence compared to either the fractured or
the low-density tripolitic cherts. Analyzing Well A, we find the magenta, pink and
orange facies correlates to tight limestone and fractured tight chert (corresponds to St
Joes Limestone and Osage B) as shown in Figure 4.6a. From a posteriori analysis of
Well B we interpret the yellow and green colors correlates to the tripolitic chert rich
the combined effects of the dip-magnitude, coherency and the reflector convergence
attributes helps in distinguishing and the subtle variations, fractures, faults and karst
features which corresponds to the blue and cyan color. Some of the structural features
are concentrated in the fractured chert zones but most of the structural features are
The second workflow using unsupervised seismic facies analysis from the
the survey area (Figure 4.8a). After the final iteration five major clusters are formed
which are interpreted by analyzing Wells A and B. The dark red zones correspond to
the dark red cluster formed in the outliers, interpreted as tight cherty limestone (St. Joe
limestone). Figure 4.9a shows the horizontal well overlayed on the vertical section of
the seismic facies volume. Overlaying the vertical section with the fracture density
plot (White, 2013) we correlate, that the lighter red areas with high-density fractured
chert rich or the layered chert and limestone. From Figure 4.9b we infer that the lighter
colored (light green and yellow) facies are tripolitic chert rich zones (Osage A).
64
The supervised seismic facies volume (Figure 4.11) is quite similar to the
layered chert and limestone (Osage B formation) and corresponds to yellow facies
(facies-2). The tripolitic chert rich areas corresponding to Osage A formation, (facies-
3) and are in green. These green areas correspond to the diagenetically altered chert
with low-density and high-porosity values. The tripolitic chert facies is not uniform
across the survey and occur in patches. The remaining data those does not fall in either
and the results are compared with the multi-attribute unsupervised seismic facies
analysis (Figure 4.12). High impedance (purple) corresponds to the regions interpreted
in the facies volumes as the high-density chert or having St. Joe’s limestone.
chert or layered chert. Finally, low-impedance regions (yellow and red) correspond to
the rocks of low density and high porosity and correspond to the tripolitic chert-rich
zones (Figure 4.12a). The unsupervised seismic facies volume from the texture
limestone correspond to the regions having high impedance values. Similarly, the low-
facies volume from the “texture” attributes. In this case we can replace coherent
compared to the limestone-rich areas (White, 2013). The unsupervised seismic facies
volume from the structural attributes (Workflow 1) shows structural features in blue
and shows similar distribution pattern of the k1 principal curvature as in Figure 4.12b.
CONCLUSIONS
diagenetic altered rocks like chert is not a common workflow. Proper selection of the
input seismic attribute volumes is the key to effective classification to differentiate the
generates natural clusters that are formed from the overdefined classes. We have used
The three facies (St. Joe’s tight limestone, fractured and layered Osage B chert,
and high-porosity Osage A tripolitic chert) are identified from the borehole image logs
and are used as training vectors for supervised classification. This supervised
identifying porous tripolitic chert as the sweet spots, the tight layered cherts with the
natural fractures are identified as areas that may be more effectively stimulated by
hydraulic fracturing. The post stack inversion result combined with the different
66
seismic facies volumes helps understanding the prospective zones of this
With all the three workflows, we infer the “best” attributes for classifying the
like the Mississippi Lime. More conventional techniques like P-impedance inversion
requires better well control for the area and good well ties that require much effort
from geophysicists and is a time consuming job. The proposed unsupervised multi-
attribute workflow requires comparatively much less effort and can be done before the
wells are available. This analysis can be done as a preliminary workflow to understand
the reservoir. This classification is a step forward where we choose the best attributes
that can represent the reservoir and perform our clustering on the chosen set of multi-
attribute volumes. For our study, we show that SOM multi-attribute clustering
successfully maps the various chert facies within the reservoir zone. These results are
volume and the borehole image logs. Thus the proposed workflow can be confidently
and easily performed when there is not much well information available in the survey.
67
LIST OF FIGURES
HSV coloring. (a) Three distinct Gaussian distribution with same standard deviations
but different means (blue circles). To begin with we choose a representation of the
data-vectors (called the prototype vectors) having same three dimensions as the data-
vectors. A PV in the input space (black dots in (a)) is assigned to each grid point in the
2D SOM grid space. (b) The initial projections of the PVs in the 2D SOM grid space.
(c) Final projection of the trained PVs in the 2D SOM grid space forming three classes
as present in the input dataset. The trained PVs are plotted in the 3D data-space (black
dots). (d) A 2D gradational HSV color scaling of the trained PVs are done for better
68
Figure 4.2. Workflow for the unsupervised SOM classification of the multi-attribute
dataset.
69
Figure 4.3. (a) The general stratigraphic column with the Mississippian tripolitic
Mississippian age. (b) The location of the seismic survey from Osage County,
Oklahoma, within the Cherokee Platform province. The Cherokee Platform is bounded
on the west by the Nemaha Uplift and to the east by the Ozark Uplift. Matson (2013)
subdivides the Mississippian in this study area into the tight St. Joe limestone, and the
70
71
Figure 4.4. Two different models for the diagenesis the Mississippian Chert
developed from weathered and/or eroded limestone (after Rogers, 2001 using
nomenclature of Matson, 2013). In the reef model the chert formation at the reef
margin. In the Breccia model the chert forms as sub-aerially exposed breccia deposits.
(a) Stage-one for both the settings is the silica replacement of calcite in a submarine
environment. The siliceous limestone and the layered chert and limestone
digenesis is a result of erosion and uplift, infiltration by meteoric water. This results in
flushing of the rock and dissolution of the remaining calcite by low-pH fluid and
absence of no new silica precipitation. This results in moldic porosity and vuggy
72
Figure 4.5. Structural attributes used in unsupervised analysis (workflow 1). (a)
Coherent energy, (b) Coherence, (c) dip magnitude and (d) reflector convergence.
Blue arrows indicate discontinuities and structural features that will form their own
73
Figure 4.6. (a) The result of multi-attribute unsupervised seismic facies classification
(workflow 1) within the Mississippi lime using the four “structural” attribute volumes
shown in Figure 4.5 as input. (b) The projections of the 256 prototype vectors
(clusters) plotted in the 2D SOM grid space clump into four main clusters
magenta/dark pink, red/orange, yellow/green or cyan in color. The blue cluster is one
of the outliers. A posteriori analysis from the borehole image log and fracture density
diagram of Well A shows magenta or dark pink color correlate to the tight limestone
facies (St. Joe Limestone) (indicated by the magenta arrows) while orange and red
correlate to the dense layered chert and limestone facies (Osage B) (indicated by the
red arrows). Similar a posteriori analysis from Well B shows that light green and
yellow correlate to tripolitic chert facies (Osage A) (indicated by the yellow arrow).
The forth cluster blue and cyan correlates to areas of low coherence and high dip
74
magnitude, which we interpret to be areas of faults, fractures and karst (blue arrows as
in Figure 4.5).
Figure 4.7. The input attribute volumes used in workflow 2: (a) coherent energy, (b)
spectral bandwidth, (c) GLCM energy, (d) The GLCM entropy, and (e) GLCM
along structure. In contrast we use bandwidth over spectral attributes to measure the
vertical variations in texture. The GLCM energy is a measure of the GLCM matrix
energy, not the square of the seismic amplitude. GLCM energy increases if amplitude
entropy is high if the amplitude values are random. GLCM homogeneity is high if
75
Figure 4.8. The result of multi-attribute unsupervised seismic facies classification
from workflow 2 within the Mississippian lime zone using the “texture” attribute
volumes shown in Figure 4.7. (b) The projections of the 256 prototype vectors
(clusters) plotted in the 2D SOM grid space clump into mainly into four different
clusters. The dark red/brown projection of the PV (as an outlier) forms the fifth facies
type. A posteriori analysis from the borehole image log and fracture density diagram
of Well A shows dark red and brown correlate to the tight cherty St. Joe’s limestone
facies (indicated by the brown arrows) while light pink and violet correlate to the
layered chert and lime facies (Osage B formation) (indicated by the pink arrows).
Similar a posteriori analysis from Well B shows that light green and yellow correlate
76
Figure 4.9. Subvolumes used to extract three average data-vectors, which will be used
been chosen on the basis of the image log interpretation from White (2013). Hot colors
indicate high fracture densities based on the two wells. (a) Two sub-volumes are
porous limestone (St Joe’s Limestone). The yellow Facies 2 corresponds to the
fractured dense chert (Osage B). (b) The sub-volume considered around Well B. The
light green facies 3 corresponds to tripolitic chert facies and the region having the
highest fracture density (Osage A). (c) Map view of the location of the two wells in
the survey.
77
78
Figure 4.10. The three average waveforms extracted from the three sub-volumes
around the wells with the images of the borehole image logs for each of the facies
type. (a) The borehole image log of Well A corresponds to the tight St. Joe limestone
formation. The corresponding facies type is defined to be Facies-1 (violet color) (b)
The borehole image log within the Osage B formation shows many natural fractures
and interbeded chert and lime. The corresponding facies type is defined to be Facies-2
(yellow color). The average data-vectors selected from the Well A have similar
patterns with high coherent energy, high energy and high GLCM homogeneity, narrow
bandwidth and low GLCM entropy. However, the amplitude values of the attributes
are less in Facies-2 indicating that this facies is fractured and layered chert not as tight
and dense as the Facies 1. (c) The image log shows the presence of low density
average multi-attribute data-vectors selected from the Well B (Facies 3) has low
coherent energy, spectral amplitude, GLCM homogeneity and GLCM energy values
and high GLCM entropy value, which is a seismic signature of the low density, high
79
Figure 4.11. The multi-attribute supervised seismic facies volume, with the facies
defined in Figure 4.10. The violet seismic facies (Facies-1) corresponds to the tight St.
Joe’s limestone, the yellow seismic facies (Facies 2) corresponds to fractured chert
with interbeded chert and lime regions and the green seismic facies (Facies 3)
corresponds to the tripolitic chert rich zones. The facies, which are not similar to these
80
Figure 4.12. The results of post-stack P-impedance inversion. (a) The less dense
highly porous tripolitic chert regions have low impedance (red and yellow colors). The
green areas correspond to the fractured or layered dense chert and lime regions. The
high impedance regions (cyan-blue violet) correspond to the dense chert and cherty
limestone. This result when compared with unsupervised seismic facies analysis
workflow 2 shows that the dense limestone and chert rich zones correspond to high
impedance regions and the low impedance corresponds to the tripolitic chert rich
areas. Similarly, for the fractured layered chert and lime corresponds to the regions
with medium to low impedance. (b) The P-impedance volume is co-rendered with the
81
positive principal curvature k1. When compared with the multi-attribute analysis with
82
REFERENCES
seis- mic facies classification volume: Marine and Petroleum Geology , v. 26,
p. 495-503.
Gao, D., 2007, Application of three-dimensional seismic texture analysis with special
Handford, M., 2012, Where’s Waldo?: The 25th Anniversary Edition: Candlewick
Press, 32 p.
Johnson, K.S., 2008, Geologic History of Oklahoma, in K.S.Johnson and K.V. Luza,
83
Matos, M. C., K. J. Marfurt., and P. R. S. Johann, 2009, Seismic color Self-Organizing
Expanded Abstracts.
Matos, M., M. Yenugu, and K.J. Marfurt, 2011, Integrated seismic texture
Matson, S., 2013, Mississippi lime play: reservoir definition, production examples and
Oklahoma City.
Meldahl, P., R. Heggland, B. Bril, and P. de Groot, 1999, The chimney cube, an
1998.
sheet.
Poupon, M., Gil, J., D. Vannaxay, and B. Cortiula, Tracking Tertiary delta sands
84
Rogers, S.M., 2001, Deposition and diagenesis of Mississippian chat reservoirs, north-
Roy, A., M. Matos, and K. J. Marfurt, 2011, Application of 3D clustering analysis for
Conference, 410-439.
Sammon, W. J., 1969, A nonlinear mapping for data structure analysis, IEEE
Snyder, R., A case history of the East Hardy Unit, Mississippian Highway 60 trend,
Oklahoma City.
Strecker, U., and R. Uden, 2002, Data mining of 3D post- stack attribute volumes
for the Osage Indian Reservation, Oklahoma: U.S. Geological Survey and
Watney, W.L., W.J. Guy, and A.P. Byrnes, 2001, Characterization of the
facies classification using textural attributes and neural networks: The Leading
Zeller, D. E., 1968, The stratigraphic succession in Kansas: Kansas Geological Survey
86
CHAPTER 5
ABSTRACT
Classification methods fall into two major classes, the supervised and
does not require direct human intervention and subdivides the data volume into its
“natural clusters”. However classification in the actual higher dimensional data space
input data-space to a lower dimensional model space. Among these techniques neural
net and Kohonen self-organizing maps (a sub-group of artificial neural network) are
the most popular classification methods routinely done for Generally well log
prediction or analysis and seismic facies modeling. Although they have been
limitations. Here we explore one of the recent techniques known as the Generative
Topographic Maps (GTM), which takes care of the shortcomings of the Kohonen Self
organizing maps. We explore the formulation of this technique and applied it on two
technique to classify 15 sets of horizontal well parameters in a one of the new shale
87
plays, correlating the results with normalized EURs, allowing an estimation of EUR
shale dataset for a multi-attribute unsupervised clustering. The final seismic facies
volume was a result of user defined a posteriori supervision to classify the higher
probability density of the data in the 2D latent space. Different clusters formed from
the GTM analysis of the Barnett shale survey are interpreted from different horizontal
wells with micro-seismic data. The study shows the effectiveness of GTM technique
INTRODUCTION
only linear structures (hyper-planes) in the data space. The technique described in this
approximately contains the vast majority of the probability mass of the data
this latent space generate the large number of input dataset. Every latent variable is
88
the D-dimensional data-space. Therefore, a distribution defined in the latent space
relationship between the two spaces (Sevensen, 1998). After proper parameter
estimation, we can relate the points in the D-dimensional data-space to grid points in
space. Thus for data points in the high dimensional space one can find some
representation in the lower dimensional latent space. The above technique formulates
a generative model where vectors in the data space are generated by grid points in the
latent/model space. Bishop et al, 1998 proposed such a generative model called the
Generative Topographic Model (GTM). Data visualization after GTM modeling is the
exploration. Wallet et al., (2009) were the first to apply the GTM technique to a suite
dataset as input and generated a new workflow for unsupervised seismic facies
analysis.
89
paper our input data will consists of D-dimensional data vectors. In our first example
discrete wells. In our second example, the data are a suite of seismic attributes forming
functions are used to map points u = u1,u2,…,uL in the L-dimensional latent space into
corresponding points in the data space. This mapping generates points defining L-
A Gaussian noise model (PDF) for a data-vector xn is then defined centered on the
mapped grid points on the non-Euclidean manifold S. This Gaussian noise model
defines the space in which the data vector lies. In this manner, the probability
with each of the mapped grid points. This initial stage is called a constrained Gaussian
mixture because the Gaussian centers are constrained by the grid points in the latent
space. As we iterate each component of the mixture model is moved towards the data-
vector that it best represents. For each iteration the mapping parameters are
1977). Since the points on the manifold S relates to the points on the latent space by a
probability distribution, a posterior probability projection is then found for each of the
the data onto the latent space is used for data visualization. Detailed implementation of
90
GTM Theory
cross-plotting tool, which will be later used for visualization. An array of regularly
spaced latent space variable points (nodes) is arranged in this 2D latent space, labeled
k=1,2,3….,K and are denoted by uk. In addition, a set of j =1,2,3…., J non-linear basis
functions are introduced in the mapping function (Figure 5.1). Following Bishop et al.,
centers j also uniformly arranged in regular grid spacing. These Gaussian basis
( )
(1)
where, is the position vector of the center of the jth basis function, is the position
vector of the kth node, and is the common width (standard deviation) of these
In order to determine the PDF in the data-space, we need to map the pre-
defined latent space variables uk onto the D-dimensional data space to a corresponding
set of reference vectors, mk. These reference vectors mk lie on the 2D non-Euclidean
∑ (2)
91
where, k=1,2,3….,K are the indices for latent space variables, and J is the set of non-
linear Gaussian basis functions. W is a D x J matrix that denotes the weights of the
mapping function. Figure 5.1 depicts an overview of the GTM mapping from the
latent space to the data space. Specifically, nine (K=9) latent space variables are
mapped to nine reference vectors in the 2D non-Euclidean manifold with four (J=4)
It is highly improbable that the data lies only on the 2D manifold S in the data-
space. We assume these deviations are normally distributed and can be represented by
For simplicity, the variance =1/β is assumed to be same for all nodes,
however, these nodes can move, such that the noise model is defined for a data-vector
given by
|| ||
|
Since the centers of the Gaussian mixture components are dependent on points in the
constrained Gaussian mixture model (Hinton et al, 1992). The probability distribution
in the data-space for the GTM model is obtained by summing over all K Gaussian
92
|| ||
| ∑ | ∑
To make the above problem mathematically tractable, we require that the latent space
points are constrained to a grid. Thus the prior probabilities P(k) of each of these latent
space variables are given by the sum of delta functions centered on the nodes uk,
where the indices of the nodes runs from k=1,2,….., K. We initialize P(k)=1/K factor
is the normalization factor such that the total probability sums to unity.
well suited for solution using an Expectation Maximum (EM) algorithm (Dempstar et
al, 1977). The EM algorithm consists of two major steps. The first or expectation step
is to find the expected value for a data-vector using the current estimated parameters.
The second, or maximization step uses the expectation value to provide a new estimate
of the parameters. The EM algorithm is a good choice for estimation problems like
Consider for a certain iteration the weight matrix is given by Wold and the
inverse of the noise variance given by βold. In the “E-step” of the EM algorithm we
components in latent space for every data-vector xn using these GTM model
parameters Wold and βold. Using Bayes’ theorem we obtain the posterior probability as,
93
|| ||
|| ||
∑
where the dataset are represented by X = x1, x2, x3…….xN. However the value of
|| ||
becomes very small for most of the values of K causing a numerical
zero. This is avoided by subtracting a constant factor from the argument of the
|| ||
exponential, which is equivalent to multiplying the numerator and
use these responsibilities to update the model to compute a new weight matrix W new
given by equation (5). The updated value of the inverse of the noise variance is given
by βnew where
∑∑ || ||
94
To regularize the matrix inversion we add a constant and rewrite the equation (6) to
be
This constant makes a prior distribution over W and scales the 2D manifold.
After GTM training the 2D manifold S is stretched to represent regions of lower data
density and is squeezed to represent regions with greater data density. To construct a
The selection of the input data vector (e.g. which seismic attributes best
represent the desired facies) is the most critical parameter selection of GTM and to
the grid points (nodes) K in the latent space should be such that it is dense enough to
points is equal to the number of Gaussian mixtures in the data-space. Also to note,
choosing a very large number of grid points increases the computation time and
memory usage. Thus a trade-off should be maintained between selecting the number
95
We also define a set of non-linear Gaussian basis function centers J, on this 2D
latent space grid and take care to set J < K to avoid rank deficiency of the matrix.
The common width of these basis functions is also set prior to the training.
controls the smoothness of the 2D manifold. A smooth manifold in the data space
facilitates fitting the data-vectors during training. This parameter remains constant
for the whole process. The matrix consisting of the non-linear basis functions
We initialize the weight matrix W such that the initial GTM model
approximates the principal component of the dataset. The common value of the
inverse variance of the Gaussian PDFs β is initialized to be the inverse of the (L+1)th
eigenvalue from PCA where L is the dimension of the latent space. Since in our case
parameters Wnew and βnew, we are able to define a new posterior probability
|| ||
|| ||
∑
96
These posterior probabilities or the “responsibility”, values in equation 9
can be mapped to the entire grid points uk (Figure 5.3). Such an explicit projection for
all the data-vectors will in general results in too much redundant information in the
latent space. For this reason Bishop et al., (1998) proposed projecting the mean or the
obtained by projecting onto all the grid points (nodes) the responsibilities of each
Figure 5.3 illustrates such a calculation of the posterior probability for one data-vector.
The mode of the discrete K nodes occurs at the the magenta point. The mean of the
posterior probability projection of the same data-vector xn occurs at the green point
Initialization
97
Define the number of latent variables K, the basis function J, the relative width
Generate the grid for the Gaussian basis function centers rj, where j=1,2,….., J,
Training
Training continues until the model converges (i.e. the value of the inverse
98
APPLICATIONS
process and reservoir properties are correlated with high and low expected ultimate
recovery (EUR). The test dataset consists of 137 horizontal wells from shale survey
displayed in the map view in Figure 5.5. Each well is color-coded by its scaled EURs.
In contrast, the rapidly varying EUR in the center of the map is less likely to be a
reservoir property and more dependent on the specific engineering parameters. Geo-
affect the EUR of the wells. We train on 137 and validate with 8 wells. The 15
Cluster spacing,
99
Total perforations,
Contour permeability,
Formation thickness,
Porosity,
All the variables are normalized using a z-score algorithm to minimize any
bias due to units of measure. The 2D latent space is uniformly sampled using 144
points and forming a square grid of 12 x 12 points. These 144 latent space variables
are mapped into the N=15 dimensional data-space using four Gaussian basis functions
with equal variance. After GTM training the manifold in the data space will be
stretched in regions of low-data density and compressed in the region of high data
density. With the trained GTM model parameters, the posterior probabilities
9). We then use these posterior probabilities and project them onto the 2D latent space
to form either a mean and or a mode distribution map using equations 10 and 11.
Finally projected points are colored by the scaled EUR values. Once trained we
validate the clustering using the estimated EURs from the GTM property map and
compare with the true EURs for the 8 wells not used in training.
100
The mean (Figure 5.6) or the mode (Figure 5.7) of the posterior probability
before and after 100 iterations. The color bar represents scaled EURs. Note that the
data-vectors from wells with high EUR projection map onto the upper right corner of
Some of the wells from the mean posterior probability distribution map are
analyzed as highlighted in Figure 5.8a. The upper set correspond to the set of wells
having high EURs and the below set corresponds to the wells having low EURs. The
average of the normalized well parameters for these two set of wells are plotted in
Figure 5.8b. The average data-vector from wells with high EURs (in red) are having
different characteristics compared to the wells with low EURs. The wells with good
EURs have higher proppant, sand volume, less cluster spacing, higher fracture stages,
more perforations, and higher porosity, whereas the wells with bad EURs have
opposite characters.
These posterior probabilities are crucial to our application of the GTM for
but the EURs) that is associated with the nth well. We can multiply this En to the
posterior probabilities of the nth well data-vector which are projected on the 2D latent
space (Figure 5.9a). From this we get the EUR map for 1 well (Figure 5.9b). Then, we
can formulate a weighted sum of the EUR at each grid point k in the latent space for
101
∑
̃
∑
This will give us a most likely EUR map from all the wells over the latent space
(Figure 5.9c). We first should analyze this EUR map to qualitatively determine if there
is significant correlation between the latent space and the physical property. Notice for
our EUR map, the latent space highly correlates with the EUR values. In addition, we
can also use this new ̃ to predict the EUR at another location in the latent space ̂ ,
which is not present in the training data set. This is done by calculating the posterior
probability of the new mth data-vector and multiplying the posterior probability
value at each grid point k with the weighted sum of the EUR values. Thus, for
̂ ∑ ̃
Figure 5.9d is the plot of the predicted EURs from the GTM model vs. the true given
EURs for the validating sets of 8 wells with a very good linear least-square fit.
This study shows how different horizontal well parameters affect the average
EUR for horizontal wells. The horizontal well parameters like the sand used for
fracturing, the proppant volume, the number of perforations, their cluster spacing, the
porosity mostly affects which wells have the good or bad EURs. The model
parameters used in our GTM model fit the dataset properly and correlates the latent
102
space to the EURs of each well. A very good least square fit of the predicted EURs to
the actual EURs gives confidence to our model. This workflow can be conveniently
extended to estimate which of the parameters most affects the EURs of the horizontal
classification for the Mississippian Barnett Shale of the Fort Worth Basin in Texas.
103
GTM example 2: Probabilistic Multi-attribute seismic facies analysis of the
The Fort Worth Basin is a foreland basin, located in north-central Texas and is
associated with the late Paleozoic Ouachita orogeny. This basin is bounded by the
Muenster Arch to the northeast, the Ouachita Thrust Front to the east, the Bend Arch
to the west, the Red River Arch to the north, and the Llano Uplift to the south (Figure
5.10a). In this study area the Barnett sits on an angular unconformity above the
In between, the Forestburg Limestone divides the Barnett formation into Upper and
Lower Barnett zone (Figure 5.10c). The Barnett Shale is not homogeneous, but rather
can be subdivided into siliceous shale, argillaceous shale, calcareous shale, and
The Fort Worth Barnett shale gas play is traditionally more of an engineering
driven play. It requires hydraulic fracturing for gas production. Our 3D seismic survey
consists of a 200 square mile survey in the North East Fort Worth Basin. The data are
sampled at 110ft by 110ft by 2ms. However, the above survey was acquired after
numerous vertical and horizontal wells have been drilled and hydraulically fractured.
For effective well placement within the survey in future drilling, care should be taken
104
to identify the brittle zones by mapping the geomechanical rock type of the Barnett
shale.
The inputs to our GTM algorithm are different seismic inversion volumes (P-
heterogeneous nature of the Barnett shale. For the above attribute generations the
seismic data between the Marble faults horizon and the Viola limestone is considered.
The impedance volumes better reflect a heterogeneous shale reservoir based rock type.
The Lamé parameters of seismic inversion such as lambda-rho (λρ) and mu-rho (μρ)
between two such elastic properties from the wells sometimes help in segregating
different rock types. However, it is very difficult to separate between classes when we
cross-plot any of these two seismic volumes. Figure 5.11 shows the cross-plot of the
lambda-rho (λρ) vs. mu-Rho (μρ) for the Upper Barnett shale (Figure 5.11a) and
Lower Barnett shale (Figure 5.11b). Both the cross-plots form one single cluster of
data-points and we are unable differentiate this cluster. The same is true when we
cross-plot with the P-impedance volume. Thus it is not possible to identify the
105
The above three volumes: P-impedance, μρ and λρ, between the Lower Marble
Falls and the Viola limestone, are considered as input for the GTM unsupervised
clustering analysis. Thus our data become 3-dimensional. However this data with
off the mean and then dividing it by the standard deviation for each of the three
regular grid points (nodes) and square grid is defined with 40 grid points in each side.
An additional square grid of J = 144 defined the centers of the non-linear basis
functions in the same 2D space as the mapping function. These non-linear Gaussians
basis functions have the same standard deviation equals to half the separation of the
the weight matrix W and the inverse of the noise variance β of the GTM model. The
We stopped the training after 100 iterations when the value of the inverse
variance β stabilized (Figure 5.12a). After training the latent space the final GTM
model is applied to the original normalized dataset, which is then projected onto this
2D latent space (Figure 5.12b). This projected mean posterior probability distribution
106
in the 2D latent is then typically used for user defined cluster analysis. An innovative
and easy workflow to do this is to store the values of the posterior probability
projections for each of the 2D latent space axis into two separate volumes, which we
call as GTM projection volumes. Thus, the GTM output consists of two separate
volumes. Four different zones are identified in the gamma-ray log from a well within
the survey. The GTM projection volumes constrained in the four zones within the
posterior projections of these four zones form separate clusters onto the latent space
(Figure 5.15). These different clusters formed are then interpreted by qualitatively by
visualization and the micro-seismic dataset from the wells separated them into
“fracable” or ductile lithofacies. Finally, we came up with 9 clusters within the Barnett
shale.
the Barnett shale interval combining them with the microseismic dataset.
Microseismic is one of the most important fracture diagnostic techniques to map the
slippage events that accompany a hydraulic fracturing and then locates the origin of
107
The well locations are given in Figure 5.16. Microseismic events from Well A
and a suite of Wells B (B1, B2, B3, and B4) are discussed in this study (Figures 5.13
and 5.14). Well-probes are created by cross plotting the two projection volumes from
GTM. Next, we identify, draw polygons around the clusters and assign them with
different colors. Figure 5.13 shows the well-probe for well A. The 2D histogram
(Figure 5.13a) shows the occurrence of various clusters over which eight user-defined
accordingly. The Upper and the Lower Barnett Shale exhibit different cluster
composition and in turn different from the intervening Forestburg limestone in gray
color. The Upper Barnett is mostly identified in three clusters which are colored blue,
cyan and yellow. The most variability is found in the Lower Barnett Shale with 4.5
different clusters formed in the 2D latent space. Figure 5.13d shows the well-probe
overlaid with the microseismic events. Slatt et al, (2011) identified 1st order, 2nd order
and 3rd order ductile-brittle couplets within the Barnett Shale (Figure 5.13c). It is
observed that more microseismic events are confined in the light green and the red
lithofacies of the Lower Barnett Shale and less microseismic events penetrate into the
brown facies. The well-probe corresponds to the 2nd order brittle ductile couplets. The
arrow highlights the ductile and brittle zone in the well probe and in the 2D histogram
plot.
of the 2D latent space consistent with Well A. The Upper Barnett consists of the
similar three facies as before and the Forestburg limestone is colored gray. Analyzing
the 2D histogram (Figure 5.14a) we can add one more class (pink) as shown in Figure
108
5.14b. The well-probe (Figure 5.14c) is colored with these nine user-defined clusters.
The microseismic data are added for all the four wells (Figure 5.14d) and it shows the
presence of most of the microseismic events confined in the light green, red and pink
colored lithofacies. Again, it is observed the microseismic events miss the brown
lithofacies. Thus light green, red and pink lithofacies are brittle in nature compared to
the brown colored facies, which is more ductile. This is consistent with the 2nd order
brittle ductile couplets (Slatt, et al, 2011) and the studies done by Perez (2013).
Similarly the arrow highlights the ductile and brittle zone in the well probe and in the
2D histogram plot.
After analyzing the well-probes we extend our interpreted clusters to the seismic
volume. The GTM projection volumes are cross-plotted for four different zones with
their boundaries defined above and below by stratal slices corresponding to the zones
defined in the gamma-ray log from Figure 5.15a. Figure 5.15 b, c, d and e show the
mean posterior probability projections of these four zones (B, C, D, and E) within the
Barnett shale formation. Different clusters are recognized in the cross-plot and are
A horizon-probe within the upper Barnett shale (zone B) shown in Figure 5.16c.
The 2D histogram and the user defined clusters of the data from zone B are given in
Figure 5.16a and b. Three clusters are formed in the upper half of the 2D latent space.
Which are then colored consistent with the well-probes. Most of the facies of the
Upper Barnett Shale falls into the blue, green and the yellow clusters.
For the Lower Barnett Shale three different horizon-probes (Figure 5.17, 5.18
and 5.19) are created based on the different zones C, D and E identified in the gamma
109
ray signature of the lower Barnett shale (Figure 5.15 a). Figure 5.17c shows the top
section of the Lower Barnett Shale (zone C). It is observed in the mean posterior
probability projections (Figure 5.15c and Figure 5.17a) that it has some similar nature
as the Upper Barnett Shale but also have different rock type since a group of clusters
Figure 5.18c is the horizon-probe for the middle zone D of the Lower Barnett
Shale. The cluster occurs mostly in the lower half of the latent space (Figure 5.15d and
Figure 5.18a). User-defined polygons are drawn around these clusters (Figure 5.18b)
to color the seismic horizon-probe. Earlier we discussed from the well-probes that the
microseismic events are mostly concentrated in the pink, light green and the red color
facies. Also this zone mostly has siliceous non-calcareous shale lithofacies (Singh,
2008). Thus we infer that zone D with mostly pink, light green and the red color
The lowermost zone (zone E) of the Lower Barnett Shale is the zone of hot shale
(Pollastro et al, 2007) with very high gamma ray (Figure 5.15a). Figure 5.19c is a
horizon-probe from the lower most zones. The brown facies is the most abundant
facies in this zone. From the high gamma values in the well logs and the fewer
occurrences of the microseismic events we interpret the brown facies as the most
ductile shale which may be due to high TOC concentration in this zone (Singh, 2008
110
CONCLUSIONS
The two case studies, first on the horizontal well dataset in an unconventional
shale survey and the second on the 3D seismic data from Barnett shale discusses the
various workflows for clustering with the GTM algorithm. For the reservoir
engineering problem we had a high correlation of the latent space model with the
horizontal well parameters. The horizontal well parameters like the sand used for
fracturing, the proppant volume, the number of perforations, their cluster spacing, and
the porosity mostly affects EURs of the wells. Finally we got a very good prediction
model predicting EURs for our blind test wells. This unsupervised clustering
Secondly the GTM application in the Barnett shale with the P-impedance, λρ
and μρ, well characterizes the brittle and the ductile zones present in the shale. Each of
the shale lithofacies is tied to the microseismic data set from the wells of the survey.
From these microseismic events we can interpret clusters and differentiate between the
more “fracable” and ductile shale. Nine set of lithofacies have been identified within
the Barnett formation from the GTM unsupervised cluster analysis. Within the
four/five lithofacies identified in the Lower Barnett Shale we conclude that the brown
facies is more ductile in nature and probably more TOC rich. With larger
concentration of the microseismic events the red, light green and the pink lithofacies
are identified as more brittle in nature. This study will help in identifying the sweet
spots within the survey to identify zones for placing new horizontal well effectively.
111
Because the GTM theory is deep rooted in probability, other than seismic
facies classification it can also be applied into modern risk analysis and used in
projecting the mean posterior probabilities of a target data in the 2D latent space and
then calculate the probability estimates of the data falling in the category of the target
vector. We thus have a probabilistic estimate of the facies type and allows add another
classification output.
112
LIST OF FIGURES
Figure 5.1. Non-linear mapping of the latent space to the data space. The prior
distribution consists of latent space variables (K) ordered on a regular grid (blue
regular array of J non-linear basis functions. With the linear combination of these
basis functions the latent space (blue circles) are mapped to the data-space (blue
spheres) where they form a 2D non-Euclidean manifold, S, such that each node uk is
(equation 2).
113
Figure 5.2. Generating Gaussian probability density functions (PDF) to represent the
data vectors. In general data-vectors are scattered about S. We assume the misfit
defines each data-vector xn. The final probability density function in the data space is
obtained by integrating the Gaussian PDFs for all mk where k=1, 2, ….K grid points
114
Figure 5.3. Workflow for data visualization in GTM. After training the new estimated
parameters Wnew and βnew, the new posterior probabilities representing the data-vectors
posterior distribution “responsibility” values can be plotted for all the grid points,
in the 2D latent space. The mean location will assign the value to be the
weighted average of the posterior distribution values and will in general fall between
location of the greatest posterior distribution value in the 2D latent space and will
115
Figure 5.4. The flow chart for generative topographic mapping workflow.
Figure 5.5. The Well Locations for the unconventional shale play of an area roughly
1000 km2. Colors correlate to scaled EUR ranging between 0 and 1. 137 wells are used
to train and eight wells used to validate the GTM. In general, the reservoir properties
of the shale are smoothly varying laterally. However, near the center of the survey
116
note the proximity of high- and low-EUR wells. We attribute these variations to
Figure 5.6. The mean posterior distribution map of the “responsibilities” of the data in
the 2D Latent space. The mean location of a data-vector is the weighted average of the
posterior distribution values and will in general fall in between neighboring values of
. (a) Initial and (b) final distribution of the posterior mean projections of all 137
well data onto the latent space after 100 iterations. (c) The PDF of a representative
well vector projected onto the latent space. While only the mean value is plotted in (a)
and (b), the full distribution can be used to better quantify risk. The plot is color-coded
117
by the scaled EURs. Note the two different clusters corresponding to the good (red)
and the bad EURs (blue) in the final mean distribution map.
Figure 5.7. The same GTM classification shown in Figure 5.6, but now using the
mode of the posterior probability projections of the data. (a) Initial and (b) final
distribution of the posterior mode projections of all 137 well data onto the latent space
after 100 iterations. (c) The data-vectors are projected onto the most likely grid points
(grid points with the most posterior probability value) and will always correspond to a
discrete gridded value of . Note that in this mode projection the posterior
probability values are assigned only to the grid locations. The plot is color-coded from
low to high EUR values. The latent space shows a more orderly separation between
the good, moderate and the bad EURs for the final iteration in the mode distribution
map.
118
Figure 5.8. (a) Some of the wells from the mean posterior probability distribution map
are analyzed as highlighted. The upper corner corresponds to wells with high EUR and
the bottom corner corresponds to wells with low EUR. (b) The 15 normalized well
parameters for each of the two set of wells are averaged and are plotted forming two
sets of averaged data-vectors. The red corresponds to the average data-vector for wells
with good EURs and blue is the average data-vector for wells with low EURs. Note
that mostly the well parameters differ radically for the two cases. The wells with good
EURs have higher proppant, sand volume, less cluster spacing, higher fracture stages,
more perforations, and higher porosity, whereas the wells with bad EURs have
119
Figure 5.9. EUR prediction through GTM modeling. (a) The posterior probability of
the data-vector from the nth well. (b) The EUR for the nth well, En is multiplied with
the posterior projection values onto 2D latent space in (a). The result gives an EUR
map for 1 well (Figure 5.9b). Then, we can formulate a weighted sum of the EUR at
each grid point k in the latent space for all the wells (given by Equation 9) and form
the EUR “map” over the latent space. Note the high correlation of the latent space
with the EURs. (b) Plot showing the predicted EUR from the EUR property map in (a)
vs. the true EUR for the 8 validation wells not used in training the GTM. The least
120
Figure 5.10. (a) The map of Texas highlighting the Ft. Worth basin and other major
basins and uplifts. The Ft. Worth basin is bounded by the Muenster Arch to the
northeast, the Ouachita Thrust Front to the east, the Bend Arch to the west, the Red
River Arch to the north, and the Llano Uplift to the south. (b) Stratigraphic section
including the gamma ray and resistivity logs showing the major units. The Barnett
carbonates of the Ellenburger Group and Viola Formation and is overlain by the
Pennsylvanian-age Marble Falls Limestone, and is divided into Upper and Lower units
121
by the Forestburg Limestone (c) Cross-section of the stratigraphy of Ft. Worth basin
volumes. The cross-plot shows a single cluster for (a) the Upper Barnett Shale (b)
Lower Barnett Shale. A small cluster (marked with an arrow) at the edge of the two
122
Figure 5.12. QC of the GTM analysis after performing 100 iterations. (a) The plot of
the inverse variance (β), stabilizing as the number of iterations increases. (b) A 2D
histogram of the mean projections of posterior probability of the dataset defined above
and below by the Lower Marble Falls and the Viola limestone respectively
123
Figure 5.13. Well-probes for Well A generated by cross-plotting the two GTM
projection volumes. (a) The 2D histogram generated from the cross-plot of the two
GTM projection volumes. (b) Eight user-defined polygons drawn around the clusters
seen in (a). (c) Brittle-ductile couplets proposed by Slatt et al, (2011). (d) Well-probe
data colored by the clusters selected in (b). The Upper Barnett, the Lower Barnett
exhibit a different cluster composition and are in turn different from the intervening
Forestburg Limestone (in gray). (e) The microseismic events from this well are plotted
along with the well-probe. Note the microseismic events are more localized in the red
and light green facies and misses the brown facies, thus the red and light green facies
124
are interpreted as brittle and brown facies to be ductile. The results are consistent with
Figure 5.14. Well-probes generated for a suite of horizontal wells B1, B2, B3, and B4,
generated cross-plotting the two GTM projection volumes. (a) The 2D histogram
generated from the cross-plot of the two GTM projection volumes. (b) Nine user-
defined polygons drawn about clusters seen in (a). (c) The well-probe data colored by
the nine clusters. The Forestburg Limestone appears as gray and divides the Upper
Barnett from the Lower Barnett Shale. (d) The microseismic events from these wells
are plotted along with the well-probe. Similar to Well A the microseismic events are
more localized in the red, pink, and the light green facies and are interpreted as brittle
125
facies and the brown facies has less microseismic events and are interpreted as ductile
facies. Here also the results are consistent with the 2nd order brittle-ductile couplets
Figure 5.15. (a) Four different zones within the Barnett Shale selected using a gamma
ray log from a well within the survey. The corresponding 2D histograms of the mean
posterior probability projections for (b) the Upper Barnett (zone B), (c) the top of the
Lower Barnett (zone C), (d) the middle of the Lower Barnett (zone D) and (e) the
126
Figure 5.16. (a) 2d histogram of zone B defined above and below by stratal slices
corresponding to the well log picks in Figure 5.15a. (b) User defined polygons are
created and are colored consistent with the well-probes in Figures 5.13 and 5.14. (c)
The facies volume probe of the Upper Barnett Shale zone B visualized along with the
well-probes. The seismic dataset is colored accordingly to the clusters selected. Most
of the facies of the Upper Barnett Shale falls into the blue, cyan and the yellow
clusters.
127
Figure 5.17. (a) 2d histogram of zone C defined above and below by stratal slices
corresponding to the well log picks in Figure 5.15a. (b) User defined polygons are
created and are colored consistent with the well-probes in Figures 5.13 and 5.14. (c)
The facies volume probe of top section of the Lower Barnett Shale zone C visualized
along with the well-probes. The seismic dataset is colored accordingly to these clusters
selected. The yellow and blue facies is common in the Upper Barnett Shale is also
128
Figure 5.18. (a) 2d histogram of zone D defined above and below by stratal slices
corresponding to the well log picks in Figure 5.15a. (b) User defined polygons are
created and are colored consistent with the well-probes in Figures 5.13 and 5.14. (c)
The facies volume probe of middle section of the Lower Barnett Shale zone D
visualized along with the well-probes with the colors selected according to the clusters
in (b). Note that this zone has least similarity to the Upper Barnett Shale. With more of
the microseismic events concentrated in the pink, light green and red facies as seen in
(Singh, 2008), this zone 3 is interpreted as brittle with results consistent with Perez
(2013).
129
Figure 5.19. (a) 2d histogram of zone E defined above and below by stratal slices
corresponding to the well log picks in Figure 5.15a. (b) User defined polygons are
created and are colored consistent with the well-probes in Figures 5.13 and 5.14. (c)
The facies volume probe of bottom section of the Lower Barnett Shale zone E
visualized along with the well-probes with the colors selected according to the clusters
in (b). This zone corresponds to the hot gamma ray zone (Pollastro et al., 2007). Six
clusters are also identified from the mean posterior probability projections (in the top
inset) are polygons are drawn and are colored consistent with the well-probes. With
very few of the microseismic events in the brown colored facies we interpret from
(Singh, 2008 and Perez 2013) the brown colored rock to be ductile and high in TOC
content. The pink, light green and red facies are the regions with brittle shale.
130
REFERENCES
Duda, R.O., P. E. Hart, and D. G. Stork, Pattern Classification, 2001, 2nd Edition,
Wiley Publications.
Mississippian Barnett Shale, Fort Worth Basin, north central Texas: Gas-shale
play with multi-trillion cubic foot potential, AAPG Bulletin; February 2005;
89; 155-175.
131
Perez, Altamar, R., 2013, Correlation of surface seismic measurements to completion
petroleum system, Bend Arch-Fort Worth Basin, Texas: AAPG Bulletin, 91,
405-436.
Singh, P., 2008, Lithofacies and Sequence Stratigraphic Framework of the Barnett
Svensen, M., 1998, GTM: The Generative Topographic Mapping: PhD Dissertation,
Aston University.
132
CHAPTER 6
ABSTRACT
technology and increasing survey size, manual techniques of facies classification have
become increasingly time consuming. Besides, the numbers of seismic attributes have
morphology, but also greatly expanding the amount of data to be analyzed. This in
turn leads to add a “dimension” to the data. Principal component analysis, self-
dimensionality by projecting the data onto a lower order space in which clusters can
Mexico. The presence of conglomerate carbonates makes the reservoirs both laterally
133
and vertically highly heterogeneous, which is observed at well log, core slab and thin
clusters in the dataset. Finally we introduce supervision into GTM and calculate the
probability of occurrence of seismic facies seen at the wells over the reservoir units. In
this manner, we are able to assign a level of confidence (or risk) to encountering facies
INTRODUCTION
clusters are ordered, with similar clusters adjacent to each other. This similarity is
visualized by mapping the clusters against 1D (Poupon et al. 1999, Coleou et al. 2003)
or 2D (Strecker et al. 2002) color bars. However, there are several limitations to the
SOM algorithm. First, there is no definite rule for selecting the neighborhood function,
training radius, and learning rate because these parameters are data dependent.
because of the absence of any defined cost function to indicate the convergence in the
final iteration.
For these reasons, Bishop et al. (1998) developed the generative topographic
probability density function (PDF) that describes the distribution of the D-dimensional
134
data in terms of a smaller number of latent variables, or cluster nodes that approximate
the vast majority of the probability mass of the data (Sevensen, 1998). After iterative
parameter estimation, this manifold (curved surface in the data space) allows us to
relate the points in the D-dimensional data-space to grid points in the lower L-
dimensional latent space. In general, GTM estimates the probability that any given
data vector is represented by each and every cluster node, providing a direct link to
risk analysis. To visualize the PDFs of large 3D seismic data volumes for
interpretation, we can approximate each data vector by the mean or the mode of the
Wallet et al. (2009) were the first to apply the GTM technique to seismic data,
137 horizontal wells from the Haynesville Shale. In this study mapping EUR in the
latent space shows the correlation of good vs. poor wells to variability in proppant
used, fluid injected, and porosity. These original MATLAB codes were
et al, 2009). For this reason Roy et al. (2013) re-implemented GTM to handle much
larger input data volumes including P-impedance, lambda-rho (λρ), mu-rho (μρ) and
other volumes as input. Using a 2D latent space, we define polygons around the
clusters to classify different lithofacies found within the Barnett Shale. Using the
“carbonate wash” reservoir in the Veracruz Basin of Mexico and correlate them to
production. Then we incorporate supervision into the GTM algorithm by using the
wells to define attribute vectors that represent good and poor production. Finally, we
use Bhattacharya (1943) measure to derive similarity between the probability density
function to produce mapped regions corresponding to low and high risk of production.
METHODOLOGY
manifold S embedded within the D-dimensional data-space (Figure 6.1). Data vectors
xn are modelled by a suite of Gaussian PDFs centered on the mapped grid points on
the non-Euclidean manifold S, thereby defining the space in which the data vector
lies. The initial stage is called a constrained Gaussian mixture because the Gaussian
centers are constrained by the grid points in the latent space. As we iterate each
136
component of the mixture model is moved towards the data-vectors that it best
represents. For each iteration the mapping parameters are determined using an
Expectation Maximization (EM) algorithm (Dempstar et al, 1977). Relating the points
on the manifold S relates to the points on the latent space by a probability distribution
latent space. This posterior probability projection of the data onto the latent space is
used not only for data visualization but also to estimate the confidence in which each
given in Chapter 5.
The selection of the input data vector (e.g. which seismic attributes best
represent the desired facies) is the most critical parameter selection of GTM and to
forming the grid in the latent space should be dense enough to approximate a
continuous distribution of the data and differentiate the target facies. By construction,
the number of grid points is equal to the number of Gaussian mixtures in the data-
space. Choosing a very large number of grid points increases the computation time
latent space grid and take care to set J < K to avoid rank deficiency of the matrix.
137
The common width of these basis functions is also set prior to the training.
controls the smoothness of the 2D manifold. A smooth manifold in the data space
facilitates fitting the data-vectors during training. The parameter remains constant
for the whole process. The matrix consisting of the non-linear basis functions
We initialize the weight matrix W such that the initial GTM model
approximates the principal component analysis (PCA) of the dataset. The common
value of the inverse variance of the Gaussian PDFs β is initialized to be the inverse of
the (L+1)th eigenvalue from PCA where L is the dimension of the latent space. Since
parameters Wnew and βnew, we are able to define a new posterior probability
distribution of the data by the latent space grid points. Using Bayes’ theorem we
obtain,
|| ||
|| ||
∑
138
These posterior probabilities or the “responsibility”, , values in equation 1
can be mapped to the latent space grid points uk (Figure 6.2). We will use such an
explicit projection of average data vectors about good and bad wells when we
seismic attribute data vectors (one per voxel) in general results in too much
information in the latent space. We therefore follow Bishop et al. (1998) and project
the mean or the mode posterior probability projections onto the latent space.
obtained by projecting onto all the grid points (nodes) the responsibilities of each
This mean projection of the posterior probability of the data vector is used in
Initialization
Generate the grid for the Gaussian basis function centers rj, where j=1,2,….., J,
Training
Training continues until the model converges (i.e. the value of the inverse
140
APPLICATION
The Veracruz Tertiary Basin (VTB) (Figure 6.4a) is a foreland basin (Prost and
Aranda, 2001) developed at the foothills of the buried tectonic front, filled with
recent (Cruz-Helu et al., 1977). The sediments come from a variety of sources:
igneous complexes (such as the Santa Anna high), metamorphic complexes (La
Mixtequita, the Sierra de Juarez and Macizo de Chiapas), and carbonates from the
shale layers, of which the older are Middle Eocene in age (EOC–3, EOC–10, EOC–20
and EOC–30) that produce an average 22⁰ API oil. EOC–10, EOC–20 and EOC–3
produce in the top of the structure (Figure 6.4b). In contrast, EOC–30 and EOC–50 are
present only in the eastern flank of the anticline, where they produce 500 m down dip
The seismic survey consists of a 100 km2 of high quality prestack time
migrated gathers and a stacked volume cut out from the much larger mega-merge
survey over the Tertiary Veracruz Basin. The prestack impedance inversion volumes
used in the classification algorithm are derived from this seismic dataset.
141
GTM workflow for multi-attribute seismic facies classification
and vertically highly heterogeneous. This heterogeneity is observed at well log, core
slab and thin section scales (Romero-Pelaez, 2012). The facies varies from breccias to
conglomerates interbedded with poorly sorted calcareous sandstones and shale re-
deposited in slope and basin-floor environments. The conglomerates and sandstone are
providing secondary porosity, which is much more prevalent than the primary
porosity. The porosity for the pay zones ranges from 11 to 17%. However, the
reservoir may be under-sampled, such that the recovered samples may not be fully
seismic facies model of the EOC-30 and the EOC-10 reservoir units were created and
the results were then correlated with the productivity of the wells.
The inputs to our unsupervised GTM algorithm are different seismic inversion
the porosity and the rock type. They also help to determine the relationship between
the desired rock property such as lithology, porosity or clay volume. The other
142
Mu-Rho (μρ) helps in understanding the rock-type and different elastic properties of
rocks. Cross plotting between two such elastic properties from the wells sometimes
helps in segregating different rock types. However, for this reservoir, two-component
crossplots do not separate the classes. Figure 6.5 shows the crossplot of the seismic
volumes λρ vs. μρ (Figure 6.5a) and P-wave impedance (Zp) vs. VP to VS ratio (VP/VS)
(Figure 6.5b).
With the above four volumes (VP/VS, ZP, λρ and μρ) our input data dimension
becomes 4-dimensional, that is, each voxel in 3D physical space is associated with an
minimize the error between a projected space and the data. Without normalization,
small errors in fitting ZP or ZS impedances (with values on the order of 109 Pa) would
overwhelm the signal in VP/VS (with dimensionless values ranging between 1.5 and 3).
Data vectors are therefore normalized by subtracting off the mean and dividing by the
algorithm.
grid of nodes (the blue circles in Figure 6.1a). A slightly coarser square grid of J =
16x16=256 nodes define the centers of the non-linear Gaussian basis functions in the
same 2D space (the green circles in Figure 6.1a). The separation between the
neighboring Gaussian Basis function centers is set to half the standard deviation of the
Basis function centers and the grid points on the 2D latent space. A multi-attribute
PCA analysis initializes the weight matrix W and the inverse of the Gaussian (noise)
variance β of the GTM model. We randomly select 0.1% of the data vectors to train
143
the latent space, updating W and β with each iteration using an expectation-
possible local maxima onto the likelihood surface. We stop the iteration when the
value of the inverse variance, β, stabilizes (Figure 6.6a). After training the latent
space, the final GTM model parameters are applied to all 100% of the data vectors,
which are then projected onto the 2D latent space (Figure 6.6b). The interpreter then
We focused our analysis on the resulting GTM seismic facies volume of the
EOC-10 and the EOC-30 reservoir units. Both reservoir units are dated as Middle
Eocene, produce 220 API oil, are composed of similar mineralogy content, and exhibit
a similar porosity distribution. Figures 6.7a and b show the mean projection after
GTM training of the dataset for reservoir units lying only within EOC-10 and EOC-30
unit respectively. The posterior probability values of the data vectors from each of the
voxel locations of the probe are projected onto all the grid locations and their mean
locations calculated and projected onto the 2D latent space forming a 2D histogram.
(Figure 6.7c) to be a slope and basin-floor fan, with distributary channels oriented
144
described by Bouma (2000). We define clusters in this histogram by interactively
drawing polygons about the high-density areas of the plot, thereby hypothesizing
similar seismic facies for reservoir units EOC-30 and EOC-10. The mineralogy
content, the porosity and reservoir type of these two reservoir units are similar, which
appear as similar location of the clusters onto the 2D latent space (Figure 6.7). The
dataset falling within these polygons are highlighted for these two reservoir units. This
creates the 3D seismic facies volumes (or geobodies) of the two reservoir units shown
in Figure 6.8.
facies estimation of the GTM seismic facies volume and visually correlated with
(PNN). Figure 6.9 compares predicted effective porosity and clay volume to seismic
classes obtained from GTM for reservoir unit EOC-10 and the EOC-30 reservoir units.
conglomerate sandstone with minimum clay content. The pink- and purple-facies
GTM volumes approximately correlate to the low effective porosity clay-rich facies
seen in the EOC-30 reservoir unit and along the normal faults.
145
Correlating the production data with the seismic facies the facies model
month average well production. The wells in these two zones are along the structural
highs.
Figure 6.10a shows the volume probe within the EOC-10 reservoir unit with
the well locations. Figure 6.10b shows a phantom horizon 10ms below the EOC-10
top. In general the light green and the orange color facies are better reservoir rock. The
green facies in the north are good producers (Wells J and I). Well W on the left tested
brine water and falls into the orange GTM-facies, which we associate with good
reservoir quality, but lies in the structural lows. Well X is a dry well with logs and
core showing hard-cemented facies and falls in the brown-colored GTM facies making
it a poor reservoir quality rock. The pink- and purple- facies correspond to facies rich
in clay content.
A similar analysis was done for the EOC-30 reservoir unit. Figure 6.11a shows
the GTM facies volume and the well locations in this reservoir unit. Figure 6.11b
shows productivity pie charts of the wells on a phantom slice 10ms below the EOC-30
top. Wells A, B, C, D and E exhibit mediocre production and lie in the orange facies.
Well F exhibits the highest oil production, although for a short term before producing
water. Inspecting the GTM facies volume (Figure 6.11a) Well F is located adjacent to
the light green-GTM facies. In general, the orange facies, and light green facies are
146
associated with good reservoir quality rock for this reservoir unit. However since the
Now we take average seismic attribute data-vector around three of the well
locations, calculate the PDF of each well data-vector and compare the overlap of the
PDF with other seismic attribute data-vectors. This will give the most likely facies
Since GTM is based on probability measures and statistics, we can use the
PDFs. Let Rwk be the posterior probability distribution corresponding to a well data-
vector as shown in Figure 6.12a. Let Rnk be the posterior probability of any other
(1943) measure, we can find the similarities (Figure 6.12c) between the two PDFs by
∑ √ ……………………. (3)
where, k are the grid points of the 2D latent space. Thus when two distributions are
there is no overlap between the PDFs . Thus, this coefficient ranges from
.
147
In this manner, one computes the value of dn for all the data-vectors in the
The average data-vectors around three wells are calculated: poor producer Well
X, good producer Well K and moderate producer Well E. The same four inversion
“attribute” volumes (VP/VS, ZP, λρ and μρ) are used as input, with the average vectors
calculated for sub-volumes around each of the three well locations. Figure 6.13 shows
the average vector for each of the three well locations. The following three well data-
vectors have the following characteristics. The average vector for the dry Well X has a
lower μρ value, which is different from rest of the two wells. The average data-vector
for Well K shows a larger μρ value. The average data-vector for Well E shows a
higher λρ, μρ and VP/VS value compared to the rest. Finally, we calculate the overlap of
the posterior probability distribution of each of these well data-vectors and the
measure, for each of the data-vectors. The output “similarity” volumes for each well
148
Probabilistic estimation of facies for the EOC-10 and EOC-30 reservoir units
Three volumes are created after the similarities between the PDF of the well-
vector and the data-vectors are calculated with Bhattacharya measure. Each volume
highlights the likelihood that a given voxel corresponds to those about a target well.
Figure 6.14 shows a geobody within the EOC-10 and EOC-30 reservoir unit through
Bhattacharya measure showing the likelihood of facies corresponding to the dry Well
X. The most probable regions appear as hot colors and the least probable regions
appear as cold colors. The regions surrounding well X show the occurrence of bad
reservoir rock. This facies type is also present along the faults mapped using
Figure 6.15 shows the geobody after PDF similarity is calculated between the
data and the good producer Well K. Hot areas correspond to regions with high likely
that the facies is similar to Well K. Note that there is only limited zones in the two
reservoir units similar to Well K. Most of the reservoir exhibits a low probability of
Figure 6.16 is the geobody through the Bhattacharya measure between the
moderate producer Well E data-vector and all the data-vectors. Most of the regions
within the EOC-10 and EOC-30 reservoir units have high likelihood of facies being
similar to well E. Note that this facies is least likely to occur around well X and
149
DISCUSSION
provide similar facies to those generated using a supervised PNN to estimate Vclay, and
helps in identifying the different rock-types of the reservoir units. Geobodies can be
conveniently calculated for each facies, which will quantify the amount of good or bad
Analysis of the clusters was done from the regional geological information and
from the unsupervised GTM facies analysis results. Figure 6.18 summarizes the results
different clusters are identified by their separation in latent space. The meanings of
these clusters are evaluated a posteriori through the use of well logs, core, production
data, the ANN-generated Vclay volume, and the original impedance volumes. The
violet and the pink facies along the faults are clay- rich, the brown facies
corresponding to the dry well X is tight, the orange and the light green facies comprise
good reservoir rocks made up of porous limestone conglomerate “wash”. The dark-
The three PDF similarity volumes, shown in Figures 6.14 to 6.16, give the
most likely occurrence of the facies type within the reservoir units. Around Well X
and along the faults (Figure 6.14) GTM predicts bad reservoir rocks. Thus for the
arrow marked in Figure 6.14 it is 90-100% more likely that the facies is similar to that
150
of Well X. The moderate reservoir rock corresponding to Well E (Figure 6.16) is
abundant and most likely facies type to be found in the reservoir units. Thus this
probabilistic estimate of the reservoir facies can add a factor in the modern day risk
analysis workflows. However, the most of the productive wells are in the north with
structure also play an important role in the well production (Romero-Paleaz, 2012).
151
CONCLUSIONS
high dimensional data space. In this paper, where each dimension corresponds to an
manifold S which in turn is constrained by the K grid points defined in the 2D latent
space. This latent space contains the mean locations representing probability density
of the data-vectors.
We chose our lower dimensional space to be 2D since the results take the form
software, the interpreter recognizes (or hypothesizes) clusters, defines them with
colored polygons, and displays the corresponding data vectors by coloring their voxel
locations as geobodies. Thus, unlike K-means and SOM, the proposed unsupervised
GTM workflow facilitates a posteriori supervision to interactively define and color the
seismic clusters/facies.
very simple, but versatile workflow by assigning each data vector to the well data
vector that it best represents. Unlike earlier similarity workflows used by Johnson et
152
al. (2001) and Michelena et al. (1998), all well data are used simultaneously to define
the latent space distribution. Therefore, we know not only which well-vector matches
a data-vector but also a probabilistic estimate as to how well each attribute data-vector
represents a given well. This capability allows us to develop and apply a supervised
GTM workflow that generates a suite of seismic “similarity” volumes with the
probabilistic estimates of the facies types being similar to the well vectors used to train
the GTM model. This provides an estimate of how likely we should find a facies if we
drill in a certain location. We anticipate that such probabilistic estimates will fall
neatly into modern risk analysis evaluation of drill locations and reservoir valuation.
153
LIST OF FIGURES
Figure 6.1. Non-linear mapping of the latent space to the data space. The prior
distribution consists of latent space variables (K) ordered on a regular grid (blue
non-linear basis functions. With the linear combination of these basis functions the
latent space (blue circles) are mapped to the data-space (blue spheres) where they form
density function in the data space is obtained by integrating the Gaussian PDFs for all
154
Figure 6.2. Workflow for data visualization in GTM. After training the new estimated
parameters Wnew and βnew, the new posterior probabilities representing the data-vectors
posterior distribution “responsibility” values can be plotted for all the grid points,
in the 2D latent space. The mean location will assign the value to be the
weighted average of the posterior distribution values and will in general fall between
location of the greatest posterior distribution value in the 2D latent space and will
155
Figure 6.3. The flow chart for generative topographic mapping workflow.
156
Figure 6.4. (a) Detail of the Veracruz Basin showing its boundaries and its two
geological subdivisions: The Plataforma de Córdoba, which is the buried tectonic front
of the Sierra de Zongolica, and the Veracruz Tertiary Basin. The field of study
(indicated by the red arrow) is located in the western margin of the buried tectonic
front in Upper and Middle Eocene age Tertiary sediments. (Map courtesy of PEMEX
E&P based on previous work by Prost and Aranda, 2001; Romero, 2012). (b)
Structural cross section through the field. EOC–10 and EOC–20 produce in the
western portion of the structure; EOC–40 is water bearing and is distributed over the
western portion of the anticline; EOC–30 and EOC–50 produce in the eastern flank
(Hernández-Martínez, 2009). EOC-10 and EOC-30 are the reservoir units, which we
157
Figure 6.5. 2D histogram plot generated from cross-plotting the different volumes
taken as input to GTM of the EOC-30 zone. (a) Cross-plot from seismic volumes
Vp/Vs vs. P-impedance. (b) Cross-plot from seismic: Mu-Rho vs. Lambda –Rho. Note
that the crossplot generated forms a single cluster and does not help in distinguishing
different facies
Figure 6.6. QC of the GTM analysis after performing 50 iterations. (a) The plot of the
158
histogram of the mean projections of posterior probability of the dataset within the
Figure 6.7. The 2D cross- plot of the mean posterior distribution map of the
“responsibilities” of the data onto the 2D Latent space for the reservoir units EOC-30
and EOC-10. The cross plot is generated by cross plotting to GTM projection
volumes. (a) The projection of the mean “responsibilities” of EOC-30 unit. The 2D
histogram is on the right and the scatter crossplot is on the left. Seven clusters are
visible on the latent pace corresponding to the high-density points. These clusters are
delineated by polygons with different colors and in the subsequent figure will help to
visualize the different classes in the seismic data. (b) The projection of the mean
for the EOC-10 and the EOC-30 reservoir units being similar the clusters for both
159
these reservoir units lie on the same location in the 2D latent space. They are also
color-coded similarly since both reservoir units have similar rock type. (c) Regional
Hernández-Martínez, 2009)
Figure 6.8. Generating the seismic facies volume (geobodies) from GTM clustering
within the reservoir units EOC-10 and EOC-30, considering the input seismic volumes
- lambda-rho (λρ) vs. mu-rho (μρ) and P-wave impedance (Zp) vs. Vp to Vs ratio
(Vp/Vs). Different polygons around classes signify rock types for reservoir units (a)
EOC-10 and (b) EOC-30. Seven different facies class have been identified from the
160
clusters in the latent space and are delineated by polygons of different colors. (c) The
horizon probe generated for the EOC -10 and the EOC-30 reservoir units after the
unsupervised GTM analysis. The white arrows highlight the faults. The most abundant
Figure 6.9. Horizon phantom slice 10ms below the top of EOC-10 and the EOC-30
reservoir units, through (a) clay volume, (b) effective porosity predicted from
supervised probabilistic neural networks (PNN), and (c) unsupervised seismic facies
volume from GTM for the of EOC-10 reservoir unit. (d) The clay volume and (e)
effective porosity predicted from supervised probabilistic neural network (PNN), and
161
Figure 6.10. (a) The GTM seismic facies volume with the well locations for the EOC-
10 reservoir unit. The red wells are the producing wells and the blue the injector well.
The well X is a dry well and falls within the brown facies region. These brown colored
162
faces are probably bad reservoir quality rocks making X a dry well. In this unit also
the wells are located at the structural highs. Most of the producing wells are along the
greyish-green and light green facies. Correlating with the clay volume the pink-purple
colored facies corresponds to the high clay content facies from. (b) Map view of the
top of the EOC-10 reservoir unit. The pie charts at the well locations show the average
production of the well for a seven-month period. In the pie chart green is for oil, red
for gas and blue for water. Structural high is in the North. The green facies in the north
are good producers (Wells J and I). Well H in orange facies is a moderate producer.
Well G in the west lies in the structural low making it a moderate producer. In the
south the well K lying in the light green facies is the most productive whereas the
wells in the greyish facies are moderate-low producers. Note the pink and purple
163
Figure 6.11. (a) The GTM seismic facies volume with the well locations for the EOC-
30 reservoir unit. The red wells are the producing wells and the blue the injector well.
The wells are mostly situated at the structural highs. The most abundant rock type
164
within this zone is the one with the orange color. Comparing with the well information
and with the clay and effective porosity volumes from probabilistic neural network
(PNN) these corresponds to the conglomerate sandstones with moderate porosity. The
brown and pink color facies are mostly along the faults and probably shows the
depositions close to normal faults along hanging walls. These regions also correspond
low porosity and relatively rich in clay. The dark green corresponds the highest
impedance regions, with moderate effective porosity, which is interpreted as the hard
conglomerate deposits having least clay content. (b) The map view of the top of the
EOC-30 reservoir unit. The pie charts at the well locations show the average
production of the well for a seven-month period. In the pie chart green is for oil, red
for gas and blue for water. Structural high is in the North. Wells A, B, C, D and E are
low-moderate producing wells. Thus the orange rock is a moderate quality reservoir
rock with moderate production. The water productivity increases as we go south (wells
E and F). The well F is having the largest production in terms of both oil and water is
situated in the mixed grey/light green and pink facies. The grey/light green facies are
165
Figure 6.12. A schematic representation of the supervised GTM analysis workflow:
(a) The PDF representing a data vector computed from the seismic attributes about any
well. (b) The PDF of a data vector corresponding to voxel n in the seismic attribute
volume. (c) The joint PDF of the average well data vector and the data vector. The
coefficient dn from the Bhattacharya measure is the measure of the similarity (overlap)
166
Figure 6.13. The average of the data-vectors calculated from a sub-volume (displayed
as small cubes) around the each well location. Each input attribute has been previously
normalized using a Z-score algorithm. These three average (target) vector around the
167
Figure 6.14. The results of the most likely occurrence of the seismic facies
corresponding to dry well X within the EOC-10 and the EOC-30 reservoir units. The
regions, which are most similar to Well X facies appears as hot colors. The least
similar regions appear in cold colors. Note the occurrence of Well X Facies is
168
Figure 6.15. A geobody within the EOC-10 and the EOC-30 reservoir units which
shows the most likely occurrence of the facies type corresponding to good producer
well K. The regions, which are most similar to the facies type at Well K, appear in hot
colors. The least similar regions appear in cold colors. Note that this facies is not so
abundant in the reservoir units and the likelihood of occurrence ranges from 40-70%.
169
Figure 6.16. A geobody within the EOC-10 and the EOC-30 reservoir units which
shows the most likely occurrence of the facies type corresponding to moderate
producer well E. The regions, which are most similar to the facies type at Well E,
appear as hot colors and the least similar regions appear as cold colors. Note that this
type of facies is very abundant in the reservoir units and the likelihood of occurrence
170
REFERENCES
Bouma A. H., 2000, Fine Grained, mud-rich turbidite systems: model and comparison
2000, Fine- grained turbidite systems: AAPG Memoir 72, and Society for
171
Coleou, T., M. Poupon, and K. Azbel, 2003, Unsupervised seismic facies classification:
22, 942-953.
Hernández-Martínez, R., 2009, Modelo Geológico del Campo, Internal report and
http://www.aapg.org/explorer/geophysical_corner/2001/03gpc.cfm.
Michelena, R. J., Gonzales, E. and Capello de P., M., 1998, Similarity analysis: A new
tool to summarize seismic attributes information: The Leading Edge, 17, 545-
548.
172
Poupon, M., Gil, J., D. Vannaxay, and B. Cortiula, Tracking Tertiary delta sands
Prost, G., and M. Aranda, 2001, Tectonics and hydrocarbon systems of the Veracruz
Romero, Pelaez, A. S., 2012, Prediction of reservoir quality with seismic attributes in
Oklahoma.
Strecker, U., and R. Uden, 2002, Data mining of 3D post- stack attribute volumes using
173
Svensen, M., 1998, GTM: The Generative Topographic Mapping: PhD Thesis, Aston
University.
174
CHAPTER 7
CONCLUSIONS
Mapping (GTM). Implemented in LINUX, these algorithms are able to handle multiple
work shows application of SOM to either fluvial or deep water systems. I have applied
these algorithms and developed workflows not only to a deep water/mass transport
including my own, do not provide a measure of “how close” a given data vector lies to
the best matching prototype vector, or whether it can be estimated almost as well by
other prototype vectors. In contrast, GTM is based on sound probabilistic and statistical
theories. GTM predicts not only which cluster best represents the data, but how well it
is predicted by all other clusters. Thus GTM interfaces neatly with the modern risk
analysis workflows. By plotting the cluster results as 2D latent space components, I link
posteriori user control and hypothesis testing. While GTM is computationally more
175
demanding than SOM cluster analysis, it is easily parallelized, while the “sequential”
SOM implementation presented here does not. For these reasons, I prefer GTM
Chapter 3 shows how SOM can be used for seismic stratigraphy, by applying it
on the multi-attribute dataset from deep water Gulf of Mexico. After testing alternative
GLCM entropy and variance texture attributes and the magnitude of reflector
stratigraphy interpretation to the full 3D data volume. Although I do not claim this suite
channel deposits, and basin floor fans. This new workflow shows how over-defined
classes clump together after training and how 2D gradational HSV color bars aid in
visual interpretation.
diagenetically altered Mississippi Lime play of Oklahoma and Kansas. The Mississippi
Lime has undergone significant wrench faulting and erosion, suggesting the use of
“structural” attributes to differentiate the lithologies. The Mississippi Lime has also
differentiate the lithologies. In both cases, the input 256 prototype vectors clumped into
four or five “natural” clusters. A posteriori analysis using image logs acquired in two
horizontal wells shows the three major clusters correlate to the tripolitic chert, the
176
interbedded lime-chert and the tight St. Joe’s Limestone within the reservoir zone.
Given these observations, I used the borehole image logs from two wells, to define
average attribute vectors about these three facies and implemented a supervised
workflow based on minimum Euclidean distance (MED) similarity. All three workflows
gave consistent results with each other and with a P-impedance result generated by
that addresses the shortcomings of the Kohonen SOM algorithm. Other than initial work
by colleagues Wallet et al. (2010), it does not appear that GTM has been used for
seismic data analysis. My first workflow predicts EUR and a suite of completion and
reservoir properties made in a suite of horizontal well parameters in one of the recent
clusters are projected onto the 2D latent space and stored as two separate components.
These two projection volumes form a cross-plot histogram, allowing the interpreter to
interactively construct polygons about hypothesized clusters and visualize the resulting
clusters as 3D geobodies. I apply this workflow to a Barnett Shale survey, map clusters,
and then classify the clusters as brittle vs. ductile geobodies using microseismic events
as control.
(facies) for locations away from the wells within the survey, providing a preliminary
In summary, both SOM and GTM provide clusters that are ordered on a 1D, 2D,
or 3D latent space. Such ordering and mapping to 1D, 2D, or 3D color bars eliminates
the need to know a priori how many clusters exist in the data. If the number of clusters
are over-specified to be 256 or 4092, both algorithms and human color perception will
dependent on the choice of input attributes. Attributes such as P and S impedances are
reflector convergence vs. parallelism, edge detectors, and textures are direct measures
of the depositional and diagenetic history, which are in turn indirectly related to
lithology. Given the appropriate 3D attribute volumes, SOM and GTM workflows will
not only accelerate, but with GTM, quantify the identification of different lithofacies,
petrotypes or heterogeneity present in the reservoir zone. As more wells are drilled
within the survey the confidence of the unsupervised seismic facies clustered volume
will increase.
178
RECOMMENDATIONS
volumes to bring out the geology of the reservoir correctly. Based on my research I
postulate the “best” combination of attribute volumes for the different depositional
However more work needs to be done on sensitivity analysis and to find out the
most and the least contributing attributes during multi-attribute clustering. Such
(linearized) Jacobian at the last iteration shows which model parameter influences
which measurements, while the eigenvectors of the Hessian show which linear
combination of model parameters are well (or not well) resolved. Since GTM is based
GTM code runs sequentially but could be programmed to run in parallel under MPI. I
with the idea that the batch process could be parallelized, while the sequential could not.
crosslines and the start and end times as input in AASPI, and then form vectors out of
them, which is not only tedious, but allows for possible user errors.
My implementations were done on target zones, constrained by the start and end
time for the input volumes or with flattened volumes. A further software improvement
would be to modify the SOM and the GTM code to limit the multi-attribute volume
analysis to lie within a given user-defined target formation constrained by the top and
below surfaces, thereby removing redundant data affecting the model while training.
180
Table 2. Best Choice of Attributes
181
APPENDICES
Attribute computation of a very large data volumes can take considerable time.
Experienced interpreters may already have familiarity with other data volumes from the
same basin. Alternatively, they may need to have multi-attribute dataset to do some
unsupervised of probabilistic seismic facies analysis. In this situation it may be useful to
set up a workflow that will run a suite of attribute programs in the background, perhaps
overnight.
182
The AASPI Geometrical Attribute Workflow GUI can be invoked from the main
aaspi_util as shown above or by typing in aaspi_geom_attr_workflow separately in the
terminal window. The following workflow GUI will then pop up.
183
Step 1: Save the workflow environment parameters
In step 1 we need to input the seismic amplitude file and set up the project name
and the MPI parameters which will be used for all the MPI processes. The seismic
amplitude file is selected first (Arrow 1). Enter the project name and the suffix (Arrow
2). Verbose can be selected if required (Arrow 3). It is recommended to use MPI
because except euler_curvature all the other processes run on MPIs (Arrow 4). Mention
the processors per nodes and the node list. Each of our machines tripolite.ou.edu and
hematite.ou.edu have 12 processors in it. Thus in the processors per node 12 is
mentioned (Arrow 5) and in the node list tripolite and hematite is mentioned (Arrow 6).
After entering out all the parameters these parameters are saved (Green Arrow)
which will be subsequently used for all the processes. Note that initially all the attribute
buttons will be disabled. When the “Save Environment parameters” is clicked the dip3d
and the spec_cmp buttons will be highlighted as shown. These two takes in only the
seismic amplitude as inputs and are thus activated. The subsequent attribute buttons will
be activated after their input file criterions are met.
184
Step 2: Save the parameters for the volumetric attributes
In this step each of the attribute program is opened and their parameters are
saved. The buttons are activated only when their input criterion are met. For example
the imagefilt3d gets activated only after we open and save the dip3d parameters. The
next figure shows the GUIs for dip3d and imagefilt3d. The parameters are mentioned
and the Save and Exit button (green arrow) is pressed.
185
Step 3: Execute the geometric attribute workflow
After saving all the *.parms (parameter) files they show up in the terminal
window. We can do cat aaspi_env.parms to see the file contents. The other *.parms
contents the saved parameters from all saved programs. After that, press the “Execute
Geometrical Attribute Workflow” button. The reset selection button can be pressed if
one wants to reset the program selections.
186
A typical workflow for amplitude geometrical attributes will be
dip3d>imagefilt3d>similarity3d>sof3d>rerun-
similarity3d>e_curvature3d>e_euler_curvature>glcm3d>spec_cmp
At any time, the terminal window will show the progress of the workflow. The
text file aaspi_geom_attr_workflow.out can be checked to see the completion status of
the workflow or whether there is any error in the execution of the program.
187
APPENDIX 2: 2D Kohonen SOM GUI
188
This Program 2D Facies Analysis is launched from the Formation Attributes in the
main aaspi_util GUI.
This is the first step of analysis. The input attribute volume is selected on which
the waveform classification is to be done. Only the first two eigenvectors are used for
som2d, thus it has been fixed at 2 (yellow arrow 2). The start time and the end time
(arrows 3 and 4 respectively) are used to define the window of data used for the
analysis.
189
Step 1: Choose the input attribute
volume. Mention the start and
end time of the data. This module
outputs the eigenvalues,
eigenvectors, the standard
deviation, mean and the average
waveform of the input data
eigenvectors, the eigenvalues and the means in the simple graph utility as shown above.
The eigenvalues and the eigenvectors, which form the initial set or a priori training
vectors, are shown below. The horizontal axis represents the samples of the waveform
used in the analysis. The plot of the eigenvectors is the plot of the first two
eigenvectors.
190
Computing som2d module
The som2D-clustering is the second step for analysis step and appears in the
next tab (Tab 2). The input for the real_pca_waveform should remain the same for this
som2D analysis. The output from the previous real_pca_waveform serves as input for
som3D (shown in yellow arrows). The maximum number of classes should be <= 256,
because most visualization software can only display 256 colors. The eigenvalues are
used to scale the dimension of the latent space. The eigenvectors serve as the first
approximation to the latent space forming the initial set of untrained vectors. The
maximum number of iterations is required. The rate of convergence is printed in the
output file and can be use guide subsequent jobs. The start time should be same as the
real_pca_waveform.
191
Step 2: The input attribute volume should remain the same as
real_pca_waveform. The eigenvectors, the eigenvalues and mean values are
selected as input (marked with yellow arrows 5,6,7). The number of classes
has a maximum limit of 256 because of the display limitations of the most
visualization software which has a max of 256 colors that can be imported into
it. Select the number of iterations (arrow10). Keep the start and end time same
as before. This program outputs the 2D clustered data set for each iteration and
the projected Trained Vectors (Prototype vectors). It also outputs the final
trained 2D dataset, which can be exported as a SEGY file into other
visualization software.
192
Plotting: plot_clusters module
The 3rd step is to color the trained 2D seismic dataset and highlight the variation
in seismic facies. Go to Tab3 for the SOM 2D viewer. Different facies are represented
by different colors. The outputs from the SOM2D serve as input to this module (shown
in yellow arrows). This module helps in QC the facies volume after each training. This
also generates a suite of color files which can be taken as input in visualization software
like Petrel.
193
Below shows the output of a seismic facies map considering the coherent energy as the
input volume after 10 iterations.
194
APPENDIX 3: 3D Multi-attribute Kohonen SOM GUI
195
This Program 3D Facies Analysis is launched from the Formation Attributes in the
main aaspi_util GUI
196
Computing som3d module
The som3D-clustering is the first step of analysis. Use the browser on the first
eight lines to choose the input seismic data file (Arrow 1). It is not mandatory to take in
eight inputs. The number of inputs can vary from two – eight. Specify the number of
input attributes in the field labeled “Number of attributes to use” (Arrow 2). The
maximum number of classes can be any large number (Arrow 3). Most of the
commercial visualization software can only display 256 colors thus generally is <= 256.
However the more the uniform sampling of the latent space takes place we generally
have more confidence in clustering. Thus, in this case we take a 4096 classes which
later can be represented by a 64 – by – 64 colorscale. The initial neighborhood radius
can be specified as the constant factor of the initial SOM 2D grid spacing of the PVs
(Arrow 5). The number of iteration is mentioned next (Arrow 6). Select the decimation
factor for using a subset of dataset used for training (Arrow 7). For example the value
100 means every 100th data-vector is used for training. The start time and the end time
are automatically read in from the input files (Arrow 8 and 9).However it is better to
mention them to focus the analysis only within the reservoir zone. The GUI for the
SOM3D is shown in the next page.
197
198
The 2nd step is to color the trained 3D seismic dataset and highlight the variation
in seismic facies. Different facies are represented by different colors. The SOM3D
Viewer in the next tab. The outputs from the SOM3D serve as input to this module
(Arrow 12 and 13). This module helps in QC the facies volume after each training. This
also generates a suite of color files, which can be taken as input in visualization
software like Petrel.
199
Output generated from the SOM3D facies clustering module with 4096 number
of initial classes. Note that clusters starts forming for iteration 4 and the overdefined
4096 classes have actually represents 4-5 colors. In the visualization software like Petrel
only a max of 256 colors can be imported. Thus we can use a lesser number of initial
classes (256).
200
Supervised classification can be done based on either the Minimum Euclidean
Distance (MED) or Spectral Angle Mapper (SAM). When Supervised (Arrow 3B) is
pressed the well parameter window appears where we can define a cube specifying the
inline, crossline, neighboring radius and start and end time. An average attribute data-
vector is calculated from this cube, which becomes the training data-vector. A cutoff
can be given for MED (Arrow 4B) and for SAM (Arrow 5B). The MED parameter is
the similarity ratio and the SAM parameter is the cutoff angle.
201
The results of the MED and the SAM classifier can be visualized by clicking on
the checkbox (Arrow 10B). The number of facies used for the classification is equal to
the number of wells used (Arrow 11B). All the other parameters are not required until
one wants to visualize the SOM facies classification outputs too. The results of the
MED and the SAM classifier are given in the next page.
202
203
APPENDIX 4: Probabilistic 3D seismic facies classification GTM GUI
In the AASPI software, the GUI for GTM can be invoked by typing
aaspi_gtm_pf or from the main aaspi_util window by selecting the Formation
Attributes drop down menu (3D GTM Facies Analysis):
204
205
As with SOM, the input consists of a suite of (1) seismic attribute volumes that
the interpreter has chosen to differentiate different seismic facies, rock types,
lithologies, or other clusters. For example, a mass transport complex may be
characterized by relatively low coherence, strongly converging reflectors, and high
entropy (measured by the GLCM algorithm). Surrounding marine shales may be
characterized by moderate coherence, low reflector convergence (i.e. parallel reflectors)
and low GLCM entropy. Next, (2) enter the number of input volumes represents the
dimensionality of the dataset (automatically updated). Then (3) select the number of
grid points to span the 2D latent space, K, (GTM theory Chapter 5). These points are
mapped to the data-space. Then (4) select the number of non-linear basis functions, J,
that form a regular array of Gaussian functions. A linear combination of these basis
functions is used to map the points in the latent space to the data space. Both the latent
space samples and the basis functions should be squared integers value, (e.g.
(256…400…625…900 etc.….). They can be automatically selected from some pre-
defined values in dropdown menu. Care should be taken so that the number of basis
functions (J) should be less than the number of grid points in the 2D latent space (K).
Next, (5) Enter the width of the basis functions relative to the distance between two
neighboring basis function centers. This width is used to define the standard deviation
of the non-linear basis functions, which is constant for a GTM model. If ε=2 the basis
functions will have widths (std. dev) equals to two times the distance between two
neighboring basis function centers. Initially the code runs a multiattribute PCA to
initialize the starting values of W and β (see GTM theory Chapter 5). Next, (6) enter the
regularization factor, used to stabilizing the linear equation for solving the new W.
This prevents any division by zero. Next, (7), enter the number of iterations to run
GTM. To minimize run times, only a fraction of the input dataset is used for training.
Therefore (8) enter the decimation factor for using a subset of dataset used for training
(yellow arrow 8). For example the value 100 means every 100th data-vector is used for
training. Different training percentage of the data can be selected from the dropdown
menu. It is recommended to run GTM only in the prospective zone because it is a
computation intensive job. Finally, enter the (9) start time and the (10) end time of the
206
data for GTM seismic facies classification. The “Save GTM Parameters” is an optional
feature to create a parameter file, which can be used for a batch workflow (not included
in this release). After providing all these parameters click “Execute GTM” program
(green arrow).
After the GTM model building is done, click on “GTM QC Train Utility” to QC
the mean distribution of the posterior probability (responsibility) projections of the
training data on the 2D latent space (presently this utility uses gnuplot). Otherwise the
ASCII files mean_train_proj_${train} (shown below) generated can be viewed in any
other graph utility manually.
207
The output files will be created from the GTM analysis is shown below. The
gtm_proj1_${project_name}_${suffix}.H and the
gtm_proj2_${project_name}_${suffix}.H are final projection files which stores the
mean posterior probabilities along two axis. These two files should be imported into any
visualization software for interactive picking the clusters (as discussed in the theory).
The crossplot, hsplot and the hlplot utility GUIs can be invoked by clicking on the
“GTM Plot Menu” as shown in the GUI.
208
For a better analysis of the how likely one type of facies occurs we need to input
some average wavelets around the wells within the survey or areas of interest. This is
done by clicking the “supervision” button (red arrow). The well supervision GUI pops
up and the inline, crossline, start-time, end-time and the neighborhood radius
information are given (Arrow 1B).
With this supervised case the number of latent space variables (Arrow 2B) and
the number of basis functions (Arrow 3B) should be reduced. The number of wells used
for supervision is mentioned (Arrow 6B). Thus the new GTM model will be applied on
the whole dataset to find the most likely facies corresponding to the wells. The reset
button (Arrow 7B) can be used to start over unsupervised GTM analysis.
Three different facies types are given in the well information panels. Bhattacharya
measure is used to calculate the overlap value between two PDFs (see chapter 6).
209
The above two figures show the most likely occurrence of facies Type 1 and Type 2.
The magenta color highlights regions with the highest probability (90-100 %) of
occurrence of the facies similar to the input well. The blue regions have very low likely
that the facies is similar to the input well facies. And the black regions have no
similarity to the facies type of the input well.
210