A Database Reverse Engineering Case Study: DAMA Chicago Aug 17, 2016

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 44

DAMA Chicago

Aug 17, 2016

A Database Reverse
Engineering Case Study

Michael R. Blaha, DSc.


[email protected]
www.superdataguy.com
What is Database Reverse
Engineering?
Reverse engineering is the inverse to normal
development
Start with an application and work backwards to
understand the software and infer its intent
Reverse engineering can apply to a variety of
artifacts
Hardware, programming code, databases,
Our focus here is on databases
Why Would Anyone Want
to Do DBRE?
To elicit requirements
DBRE is not intended to perpetuate past flaws
DBRE is merely a source of tentative requirements
To convert legacy data
To integrate application stovepipes
To assess software
To assist maintenance
To construct documentation
Case Study 1:
Reverse Engineer
WordPress
Rationale
WordPress is an interesting DBRE case study
because...
WordPress is a well-known application
The case study has a populated database
The data is real (not synthetic)
The data is not proprietary
Illustrate DBRE techniques for a small database
Processing Details
1. Export MySQL db from www.superdataguy.com
website
The exported localhost.sql is unreadable
2. Import SQL code into a local MySQL db
3. Export schema only from local MySQL db
The exported file is readable
4. Manually edit the SQL by deleting
`, unsigned, COLLATE, KEY, UNIQUE KEY, ENGINE
5. Reverse engineer schema with ERwin
Initial ERwin Model
wp _fq ir_comments
wp _fq ir_links
comment_ID
link_id
comment_post_ID
comment_author link_url
comment_author_email link_name
wp _fq ir_commentmeta wp _fqir_options wp _fqir_postmeta
comment_author_url link_image
meta_id comment_author_IP link_target option_id meta_id
comment_date link_description
comment_id option_name post_id
comment_date_gmt link_visible
meta_key option_value meta_key
comment_content link_owner
meta_value autoload meta_value
comment_karma link_rating
comment_approved link_updated
comment_agent link_rel
wp _fqir_posts comment_type link_notes
comment_parent link_rss
ID
user_id
post_author
post_date
post_date_gmt
post_content
post_title
post_excerpt
wp _fq ir_term_taxonomy
post_status
wp _fq ir_termmeta wp _fq ir_terms
comment_status wp _fq ir_term_relationships term_taxonomy_id
ping_status meta_id term_id
object_id term_id
post_password term_taxonomy_id taxonomy term_id name
post_name
description meta_key slug
to_ping term_order
parent meta_value term_group
pinged
count
post_modified
post_modified_gmt
post_content_filtered
post_parent
guid
menu_order
post_type
post_mime_type
comment_count wp _fqir_users
ID
wp _fqir_usermeta user_login
umeta_id user_pass
user_nicename
user_id user_email
meta_key user_url
meta_value user_registered
user_activation_key
user_status
display_name
Record Counts
From querying the MySQL database
wp_fqir_commentmeta 68
wp_fqir_comments 23

wp_fqir_links 0

wp_fqir_options 359

wp_fqir_postmeta 656

wp_fqir_posts 406

wp_fqir_term_relationships 59

wp_fqir_term_taxonomy 15

wp_fqir_termmeta 0

wp_fqir_terms 15

wp_fqir_usermeta 47

wp_fqir_users 1
Manually Add FKs
Look for name similarity
Verify with data analysis
SELECT * FROM wp_fqir_commentmeta
WHERE comment_ID NOT IN
(SELECT comment_ID FROM wp_fqir_comments);
-- 0 records
SELECT * FROM wp_fqir_commentmeta
WHERE comment_ID IS NULL;
-- 0 records
DBRE ERwin Model
wp_fqir_commentmeta wp_fqir_comments wp_fqir_posts w p_fqir_postmeta
meta_id comment_ID ID meta_id
comment_ID (FK) comment_post_ID (FK) post_author post_ID (FK)
meta_key comment_author post_date meta_key
meta_value comment_author_email post_date_gmt meta_value
comment_author_url post_content
comment_author_IP post_title
comment_date post_excerpt
comment_date_gmt post_status
comment_content comment_status
comment_karma ping_status
comment_approved post_password
comment_agent post_name
comment_type to_ping
comment_parent pinged
user_id post_modified
post_modified_gmt
post_content_filtered
post_parent (FK)
guid
menu_order
post_type
post_mime_type
comment_count

wp_ fqir_term_relationships wp_fqir_term_taxonomy wp_fqir_terms w p_ fqir_termmeta


object_id term_taxonomy_id term_id meta_id
term_taxonomy_id (FK)
term_id (FK) name term_id (FK)
term_order taxonomy slug meta_key
description term_group meta_value
parent
count

wp_fqir_usermeta wp_fqir_users wp_fqir_links wp_fqir_options


umeta_id ID link_id option_id
user_ID (FK) user_login link_url option_name
meta_key user_pass link_name option_value
meta_value user_nicename link_image autoload
user_email link_target
user_url link_description
user_registered link_visible
user_activation_key link_owner
user_status link_rating
display_name link_updated
link_rel
link_notes
link_rss
Data Dictionary
By inspecting the data of each table
wp_fqir_commentmeta Key-value pairs from Akismet plugin
wp_fqir_comments Comments posted to the Website (mostly spam)
wp_fqir_links
wp_fqir_options Key-value pairs for WordPress and plugins
wp_fqir_postmeta Key-value pairs for WordPress, embedded media, and plugins
wp_fqir_posts Each record is a WordPress page or post
Appears to keep a history
wp_fqir_term_relationships
wp_fqir_term_taxonomy
wp_fqir_termmeta
wp_fqir_terms Mostly user defined categories in WordPress
wp_fqir_usermeta Key-value pairs for users
wp_fqir_users Authorized users for the Website
Commentary
WordPress has a very small database
Only 12 tables
I had expected more tables
Note the lack of connectivity among tables
Typically tables are more highly interconnected
WordPress has no dangling references
I dont think WordPress uses RI
ERwin only partially reverse engineers MySQL
It chokes on some keywords
Case Study 2:
Reverse Engineer
Adventure Works 2012
Rationale
Adventure Works 2012 is an interesting DBRE
case study because...
Adventure Works is a free database provided with MS
SQL Server
The case study has a populated database
The data is not proprietary
The database is of medium size (71 tables)
The database defines referential integrity
Only one FK is missing
Mechanical Approach
Strip down the schema to get to a core model
This is like skimming a book
We are working towards an abridgement of a model
We can quickly get a sense of a schema
We will use ER/Studio
Record Counts
Query the SQL Server database
dbo.AWBuildVersion 1 Person.BusinessEntity 20777
dbo.DatabaseLog 1597 Person.BusinessEntityAddress 19614

dbo.ErrorLog 0 Person.BusinessEntityContact 909

HumanResources.Department 16 Person.ContactType 20

HumanResources.Employee 290 Person.CountryRegion 238

HumanResources. 296 Person.EmailAddress 19972


EmployeeDepartmentHistory
Person.Password 19972
HumanResources.EmployeePayHistory 316
Person.Person 19972
HumanResources.JobCandidate 13
Person.PersonPhone 19972
HumanResources.Shift 3
Person.Address 19614 Person.PhoneNumberType 3

Person.AddressType 6 Person.StateProvince 181


Record Counts
Record counts partially indicate table purpose
Production.BillOfMaterials 2679 Production.ProductModel 128
Production.Culture 8 Production.ProductModelIllustration 7
Production.Document 13 Production.ProductModel 762
ProductDescriptionCulture
Production.Illustration 5
Production.ProductPhoto 101
Production.Location 14
Production.ProductProductPhoto 504
Production.Product 504
Production.ProductReview 4
Production.ProductCategory 4
Production.ProductSubcategory 37
Production.ProductCostHistory 395 Production.ScrapReason 16
Production.ProductDescription 762 Production.TransactionHistory 113443
Production.ProductDocument 32 Production.TransactionHistoryArchive 89253
Production.ProductInventory 1069 Production.UnitMeasure 38
Production.ProductListPriceHistory 395 Production.WorkOrder 72591
Record Counts
Note different counts for types and instances

Purchasing.WorkOrderRouting 67131 Sales.SalesOrderHeader 31465

Purchasing.ProductVendor 460 Sales.SalesOrderHeaderSalesReason 27647

Purchasing.PurchaseOrderDetail 8845 Sales.SalesPerson 17

Purchasing.PurchaseOrderHeader 4012 Sales.SalesPersonQuotaHistory 163

Purchasing.ShipMethod 5 Sales.SalesReason 10

Purchasing.Vendor 104 Sales.SalesTaxRate 29

Sales.CountryRegionCurrency 109 Sales.SalesTerritory 10

Sales.CreditCard 19118 Sales.SalesTerritoryHistory 17

Sales.Currency 105 Sales.ShoppingCartItem 3

Sales.CurrencyRate 13532 Sales.SpecialOffer 16

Sales.Customer 19820 Sales.SpecialOfferProduct 538

Sales.PersonCreditCard 19118 Sales.Store 701

Sales.SalesOrderDetail 121317
Processing Details
1. Import schema into ER/Studio
File / New / Reverse engineer AdventureWorks2012
All owners, user tables, no inferences
Add FK Production.WorkOrderRouting.ProductID ->
Production.Product.ProductID
2. Successively delete all entity types with 0,1
connections
3. Delete entity types with few (<= 3) connections
Step 1: ER/Studio Model
Data ba se L og

Da tab as e Lo gID
AW Bu ild Vers io n
Po s tT i m e
Sy s tem Info rm at io nID
Da tab as e Us er
Da ta ba s e Vers io n Ev e nt
Ve rs i o nDate Sc h em a
Shift
M od i fi ed Da te Ob j ec t
Sh i ftID
TSQL

Na m e Xm l Ev en t

Sta rtT i m e
En d Ti m e
M o di fi ed Date
Erro rL og T ran sa c tio n Histo ryArch ive

ErrorL og ID T ra n s a c ti o nID

ErrorT i m e Pro d uc tID


De p artm e nt
Us erNam e Refe re n c e Ord e rID
Em plo ye eDe pa rtm en tHis to ry De pa rtm e ntID ErrorNum be r Refe re n c e Ord e rL i ne ID

Bus i ne s s Enti ty ID (F K) Na m e ErrorSev eri ty T ra n s a c ti o nDa te Em p loy ee Pay His tory


Dep artm en tID (FK) ErrorState T ra n s a c ti o nT y pe
Gro up Na m e Bu s in es s En tity ID (F K)
Shi ftID (FK) ErrorProc ed u re Qua nti ty Em plo ye e Sa les Pe rso n Qu ota His tory
M o di fi ed Da te J o bCan did a te
Ra teCha n ge Da te
StartDa te ErrorL i ne Ac tu al Cos t Bu s i n es s En tity ID (FK) J ob Ca nd i d ate ID Bus i ne s s Enti ty ID (F K)
ErrorM es s a g e M o d ifi e dDate Ra te Quo taDate
End Da te Na tio n al IDNu m b er Bus i ne s s Enti ty ID (F K) Pa y Freq u en c y
M od ifi e dDa te L o gi n ID Sal e s Quo ta
Res um e M o di fi ed Date
Org an i z a ti o n No de rowg ui d Em a ilAd dre ss
M o d ifi ed Date
Org an i z a ti o n Le v e l M o d ifi e dDate Bu s in es s En tity ID (F K)
J o bT i tle Em ai l Ad d res s ID
Prod uc tSu b ca teg ory
Bi rthDate
Pro d uc tSu bc a te go ry ID Pro d uc tCa teg o ry Em ai l Ad d res s
M a ri ta lStatu s
Lo c ation Pro d uc tCa teg ory ID rowgu i d
Ge nd e r
Pro d uc tCa te go ry ID (FK) Prod uc tInv e ntory M o di fi ed Date
Lo c ati on ID Hi reDa te
Na m e Nam e Pro du c tID (F K)
Wo rk Ord e rRo uting Sa l ari ed Fl a g Pe rs on Pho n e
Na m e rowg ui d rowg ui d Lo c ati on ID (FK)
Va c a ti on Ho u rs
Wo rk Orde rID (F K) Co s tRate M o d ifi ed Date M o d ifi ed Date Bu s i ne s s En tit y ID (F K)
Pro d uc tID (FK) Sh e lf Si c k L ea v e Hou rs Ph o ne Nu m b er
Av a il a bi l i ty
Sa le sPe rso n Cu rre n tF l ag
Ope rati on Seq u en c e M o di fi ed Da te Bi n Ph o ne Nu m b erTy pe ID (F K)
Qu a nti ty Bus i ne s s Enti ty ID (F K) ro wgu i d
L oc ati o nID (F K) M o di fi ed Da te M o d ifi ed Date
rowgu i d
Sc h ed u le d Sta rtDate T erri to ry ID (FK)
M o di fi ed Date Prod uc tDo c um en t
Sc h ed u le d En d Da te Sal e s Quo ta
Prod uc tID (FK)
Ac tu al Sta rtDa te Bon us
Doc um en tNod e (FK)
Ac tu al End Da te Com m i s s i o nPc t

Ac tu al Res ou rc eHrs M od ifi e dDa te Sal e s YTD Pho ne Num be rTy pe

Pla n ne dCo s t Sal e s L as tYea r


Ph o ne Nu m b erT y p e ID
Ac tu al Cos t Doc um e nt ro wg ui d
Wo rkOrde r
M o d ifi e dDate Do c um e n tNo de M od ifi e dDa te Sa les Te rrito ryHistory Na m e
W ork Ord erID M o di fi ed Da te
Bu s i ne s s En ti ty I D (FK)
Do c um e n tL e v e l T e rri to ry ID (FK)
Prod uc tID (F K) Pu rch as e Ord erHea de r
Ti tl e
Orde rQty StartDa te
Own er (F K) Pu rc h as e Ord erID
Stoc k e d Qty
Fo l de rFl a g En dDa te
Sc ra pp e dQty Re v i s i on Nu m b e r
Fi l e Na m e ro wg ui d
StartDa te Sta tu s Pe rso n Pa s sword
Fi l e Ex te n s i o n M od i fi ed Da te
En dDa te Em pl o y e eID (F K) Bu s i ne s s En ti ty ID (F K)
Re v is i on Bu s i n es s En tity ID (FK)
Du eDa te Ve nd o rID (F K)
Ch a ng eNum be r Pa s s wo rdHa s h
Sc ra pRe as o nID (F K) Sh i pM eth od ID (FK) Pe rs o nT y pe
Sta tus Pa s s wo rdSa lt
M od i fi ed Da te Ord erDate Na m e Sty l e
Do c um e n tSu m m a ry Ti tl e ro wg ui d
Sh i pDate
Do c um e n t M od i fi e dDa te
Su bT o ta l Fi rs tNam e
ro wgu i d
Ta x Am t M i dd l eNa m e
Scra pRea s on M o di fi ed Da te
Fre ig h t La s tNam e
Sc rap Re a s o nID Su ffi x
To tal Due

Na m e M o di fi ed Da te Em ai l Pro m o tio n
Pu rch as e Ord erDeta il
Ad di ti on a lCo nta c tInfo Con tac tTy p e
M o di fi ed Da te
Pu rc ha s eOrde rID (FK)
De m o gra ph i c s Co n tac tT y pe ID
Pu rc ha s eOrde rDetai l ID ro wgu i d
Store Na m e
Du eDa te M o di fi ed Da te
Bu s i ne s s En tit y ID (F K) M o d ifi ed Date
Orde rQty Cu stom e r
Ve n do r
Tra ns ac tion His tory Prod uc tID (F K) Na m e Cu s tom erID
Bu s i ne s s En ti ty I D (FK)
T ran s a c ti o nID Un itPri c e Sa l e s Pers on ID (FK)
Pe rs on ID (F K)
L i ne To ta l Ac c o u ntNu m b er De m og ra p hi c s
Prod uc tID (FK) Store ID (FK)
Re c e iv e dQty Na m e rowg ui d
Refe re n c e Ord e rID T e rri tory ID (FK) Sa le sT a xRa te
Re je c t ed Qty Cred i tRa tin g M o d ifi ed Date
Refe re n c e Ord e rL i ne ID Ac c o un tNu m b er
Stoc k e d Qty Prefe rred Ve nd orStatu s Sa l e s T ax Ra te ID
T ran s a c ti o nDa te ro wg ui d
M od i fi ed Da te Ac ti v e Fl ag
T ran s a c ti o nT y pe M od i fi e dDa te Sta teProv i nc e ID (F K)
Pu rc ha s i ng We b Se rv i c e URL Sa le sT e rrito ry
Qua nti ty Ta x Ty pe
M od i fi ed Da te
Te rrito ry ID Ta x Ra te
Ac tu al Cos t
Bus ine ss EntityCon tac t Na m e
M od ifi e dDa te Na m e
Bus i ne s s Enti ty ID (F K) rowg ui d
Prod uc tVen do r Co u ntry Re gi on Cod e (F K)
Pro d uc t Pers on ID (FK) M o d ifi ed Date
Prod uc tID (F K) Gro up
Pro du c tID Con tac tT y pe ID (FK)
Bu s i ne s s En ti ty ID (F K) Sa l es YT D
Sh ipM etho d
Sa l es La s tYe ar rowg ui d
Na m e
Av erag e Le a dT im e Shi p M e tho dID
Sho pp ing CartIte m Pro du c tNum be r Co s tYTD M o d ifi ed Date
Stan da rdPri c e Co s tL as tYea r
Sh o pp in g Ca rtIte m ID M a k e Fl a g Nam e
L a s tRe c ei p tCo s t rowgu i d
Fi n i s h ed Go o ds Fl a g Shi p Ba s e Cou ntry Re g io n
Sh o pp in g Ca rtID L a s tRe c ei p tDa te Bu sine s sEn tity M o di fi ed Date
Co l or Shi p Ra te
M i nOrd e rQty Co u ntry Re gi on Cod e
Qu a nti ty Sa fety Sto c k Le v el ro wg ui d Bus i ne s s Enti ty ID
Pro du c tID (F K) M ax Ord erQty
Re o rd erPoi n t M od ifi e dDa te Na m e
On Orde rQty ro wg ui d
Da teCrea ted M o d ifi ed Date
Sta nd ard Co s t M od ifi e dDa te
M o di fi ed Da te Un itM e a s u reCod e (F K)
Li s tPri c e
M od i fi e dDa te
Si z e Sa le sOrde rHea de r
Si z eUni tM e a s u reCod e (F K) Sal e s Ord e rID
We i gh tUn itM ea s u reCo de (FK)
Rev i s i on Num be r
We i gh t
Da y s T oM an ufa c tu re Sa les Ord e rDeta il Ord e rDate
Pe rs on Cre d itCard
Pro du ctPho to Pro du c tL i ne Due Da te Co u ntryRe gion Curre nc y
Sa l es Ord erID (F K)
Shi p Da te Bus i ne s s Enti ty ID (F K)
Prod uc tPho toID Cl a s s Co un try Re g i on Co de (FK)
Sa l es Ord erDeta i lID Cre d itCard ID (F K)
Sta tus
Sty l e Cu rre nc y Co de (F K)
T h um b Nai l Pho to
Pro du c tSub c ate go ry ID (F K) Ca rri e rTrac k i ng Nu m b e r Onl i ne Ord e rF l ag
M o d ifi e dDate
T h um b n ai l Pho to F i le Nam e Ord erQty Sal e s Ord e rNu m b er M od i fi e dDa te
Pro du c tM od el ID (F K)
L a rg ePh oto Pro du c tID (F K) Purc ha s e Ord e rNum b er
Se l l StartDate
L a rg ePh oto Fi l eNa m e Sp ec i al Offe rID (FK) Ac c ou ntNu m b er
Se l l En dDate
M od i fi ed Da te
Di s c o nti n ue dDa te Un i tPri c e Cus tom e rID (F K)
Cred itCa rd
ro wgu i d Un i tPri c e Di s c o u nt Sal e s Pers on ID (FK)
Cred itCard ID
L i n eT ota l T erri to ry ID (F K)
M o di fi ed Da te
ro wgu i d Bil l T oAdd res s ID (F K) Card Ty pe Curren c y
M o di fi ed Da te Shi p To Ad d re s s ID (F K) Card Nu m b e r Cu rre n c y Co d e
Shi p M eth o dID (F K) Ex p M o nth
Pro d uc tPro du ctPho to
Pro du ctM od elIllus tra tion Cre d itCard ID (FK) Ex p Ye a r Na m e
Pro d uc tID (F K) Cre d itCard Ap p ro v a l Cod e M o di fi ed Da te
Pro d uc tM o de l ID (FK) M od ifi e dDa te
Pro d uc tPh otoI D (FK)
Sp ec ia lOffe rProd u ct Il lu s tra tio n ID (FK) Curren c y Rate ID (FK)

Pri m ary Sp ec ia l Offe rID (F K) Sub To tal


M o d ifi e dDate T ax Am t Curren c yRa te
M o d ifi ed Date Prod uc tID (F K)
F re i gh t Cu rren c y Ra teID
Illu s tra tio n Prod uc tM o d el
ro wg ui d T ota lDue
Il lu s tra ti o n ID Prod uc tM o de l ID Cu rren c y Ra teDa te
M od i fi e dDa te Com m en t
Sp ec ialOffe r Fro m Curren c y Cod e (F K)
Pro d uc tCo stHistory Dia g ra m Nam e rowg ui d
Cata lo g De s c ri pti on M o d ifi ed Date To Curren c y Cod e (F K)
Pro d uc tID (F K) Sp ec ia l Offe rID M o d ifi ed Date
Av e ra ge Rate
Sta rtDa te In s truc tio n s
De s c ri p tio n Sale sOrde rHe a de rSa les Re a so n En d OfDa y Ra te
ro wg ui d
En d Da te Di s c ou n tPc t Sal e s Orde rID (FK) M o di fi ed Da te
M od ifi e dDa te State Pro vin ce
Sta n da rd Cos t T y pe Sal e s Rea s on ID (FK)
Pro du ctRe view Ca te go ry Sta teProv i nc e ID
M o d ifi ed Date
M od ifi e dDa te
Pro du c tRev i ewID StartDa te Sta teProv i nc e Co de
Pro du ctDes c rip tio n Add res s
En dDate Co u ntry Re gi on Cod e (F K)
Pro du c tID (F K) Pro du c tDes c ri p tio nID Ad d res s ID
M i nQty Is On ly Sta tePro v i nc eF l a g
Cu lture
Re v ie we rNa m e M ax Qty
BillOfM a teria ls Un itM ea s ure De s c ri pti on Ad d res s L i ne 1 Na m e
Re v ie wDa te Cul ture ID
Bi ll OfM ate ri al s ID ro wg ui d ro wgu i d Ad d res s L i ne 2 Sa les Re a so n Te rri to ry ID (F K)
Em ai l Ad d re s s Prod uc tLis tPrice His tory Un i tM ea s u re Co de
M od i fi e dDa te M o di fi ed Da te Nam e Ci ty rowg ui d
Bus ine ss EntityAdd res s Sa l es Re a s o nID
Ra ti ng Pro d uc tID (F K) Prod uc tAs s em bl y ID (FK) Na m e M od ifi e dDa te Sta teProv i nc e ID (F K) Add res s Ty pe M o d ifi ed Date
Co m m en ts Sta rtDa te Com po ne n tID (FK) M o di fi ed Da te Bu s i n es s En tity ID (FK) Na m e
Po s tal Co d e
M o di fi ed Da te Ad d res s T y p eI D
StartDa te Sp a tia l Lo c a tio n Ad dre s s ID (F K) Re as o nT y p e
En d Da te
End Da te Ad dre s s Ty p eID (F K) Na m e M o di fi ed Da te
Li s tPri c e Pro du ctM od elProd u ctDes crip tio nCu ltu re rowg ui d
Uni tM e as u reCod e (F K) rowg ui d
M o d ifi ed Date M o d ifi ed Date ro wgu i d
Pro du c tM od el ID (F K)
BOM Le v e l M o d ifi ed Date
Pro du c tDes c ri p tio n ID (FK) M o di fi ed Da te
PerAs s em b ly Qty
M od ifi e dDa te Cu l tureID (F K)

M o di fi ed Da te
Step 2: ER/Studio Model
Person

BusinessEntityContact BusinessEntityID (FK)

BusinessEntityID (FK) PersonType


PersonID (FK)
NameStyle

rowguid Title
ModifiedDate FirstName
MiddleName
Employee LastName
BusinessEntityID (FK) Suffix
EmailPromotion
NationalIDNumber C ount ryRegionC urrency
ProductVendor AdditionalContactInfo
LoginID Demographics CountryRegionCode (FK)
Produc tID ( FK) OrganizationNode rowguid CurrencyCode (FK)
BusinessEntityID ( FK) Vendor OrganizationLevel
UnitMeasure ModifiedDate
JobTitle ModifiedDate
UnitMeasureCode Av erageLeadTime BusinessEntityID (FK)
BirthDate
Standar dPr ice
BillOfMaterials Name AccountNumber MaritalStatus
LastReceiptCost
ModifiedDate Name Gender
BillOfMaterialsID LastReceiptDate
CreditRating HireDate PersonCreditCard
MinOrderQty Currency
ProductAssembly ID ( FK) PreferredVendorStatus SalariedFlag
MaxOrderQty BusinessEntityID (FK)
ComponentID (FK) ShipMethod CurrencyCode
ActiveFlag VacationHours CreditCardID (FK)
OnOrderQty
StartDate PurchasingWebServiceURL ShipMethodID SickLeaveHours
U nitMeasureC ode ( FK) C urrencyRate Name
EndDate ModifiedDate CurrentFlag ModifiedDate
ModifiedD ate PurchaseOrderDetail Name ModifiedD ate
UnitMeasureCode ( FK) rowguid CurrencyRateID
Purchas eOr der ID (FK) ShipBase
BOMLevel ModifiedDate
Purchas eOr der DetailID ShipRate CurrencyRateDate
PerAssemblyQty
rowguid FromCurrencyCode (FK)
ModifiedDate D ueDate
ModifiedDate ToCurrencyCode (FK)
OrderQty
AverageRate
ProductID ( FK)
PurchaseOrderHeader BusinessEntity BusinessEntityAddress EndOfDayRate
U nitPrice
ProductInventory BusinessEntityID BusinessEntityID (FK) ModifiedDate
LineTotal PurchaseOrderID
ProductID (FK) AddressID (FK)
R eceivedQty rowguid CreditCard
RevisionNumber
LocationID (FK) R ejectedQty
Status ModifiedDate CreditCardID rowguid
Shelf StockedQty ModifiedDate
EmployeeID (FK)
ModifiedD ate CardType
Bin VendorID (FK)
CardNumber
Quantity ShipMethodID (FK)
ExpMonth
r owguid OrderDate
ExpYear
ModifiedDate ShipDate
ModifiedDate
SubTotal A ddress
Product TaxAmt
AddressID
ProductID Freight
TotalDue AddressLine1
Location Name
ModifiedDate AddressLine2
ProductNumber SalesPerson
LocationID City Customer
MakeFlag
BusinessEntityID (FK) StateProvinceID (FK)
Name FinishedGoodsFlag CustomerID
PostalCode
CostRate Color TerritoryID (FK)
SpatialLocation PersonID (FK)
Availability SafetyStockLevel Document SalesQuota
rowguid StoreID (FK)
ModifiedDate ReorderPoint DocumentNode Bonus
ModifiedDate Terr itoryID (FK)
StandardC ost CommissionPct
DocumentLevel AccountNumber
WorkOrderRouting ListPrice SalesYTD
Title r owguid
WorkOrderID (FK) Size SalesLastYear
Owner (FK) ModifiedDate
ProductID (FK) SizeUnitMeasur eC ode (FK) rowguid
FolderFlag
WeightUnitMeasureCode (FK) ModifiedDate
OperationSequence FileName
Weight Stat eProvince
LocationID (FK) FileExtension
DaysToManufacture StateProvinceID
ScheduledStartDate Revision Store
WorkOrder ProductLine
ScheduledEndDate ChangeNumber StateProvinceC ode
Class BusinessEntityID (FK)
ActualStartDate WorkOrderID Status CountryR egionCode (FK)
Style
ActualEndDate DocumentSummary Name IsOnlyStateProvinceFlag
ProductID (FK) SellStartDate
ActualResourceHrs Document SalesPersonID (FK) Name
OrderQty SellEndDate
PlannedCost ProductDocument rowguid Demographics Terr itoryID (FK)
StockedQty DiscontinuedDate
ActualCost ModifiedDate rowguid row guid
ScrappedQty rowguid ProductID (FK)
ModifiedDate DocumentNode (FK) ModifiedDate ModifiedD ate
StartDate ModifiedDate
EndDate ModifiedDate
DueDate SalesOrderDetail
ModifiedDate
SalesOrderID (FK)
SalesOrderDetailID SalesOrderHeader

SalesOrderID CountryRegion
CarrierTrackingNumber
OrderQty RevisionNumber CountryRegionCode
SpecialOfferProduct ProductID (FK) OrderDate
Name
UnitPrice DueDate
ProductID (FK) ModifiedDate
UnitPriceDiscount ShipDate
rowguid LineTotal Status
ModifiedDate rowguid OnlineOrderFlag
ModifiedDate SalesOrderNumber
PurchaseOrderNumber SalesTerrit ory
AccountNumber Terr itoryID
CustomerID (FK)
SalesPersonID (FK) N ame

TerritoryID (FK) C ountryRegionCode ( FK)

BillToAddressID (FK) Group

ShipToAddressID (FK) SalesYTD

ShipMethodID (FK) SalesLastYear

CreditCardID (FK) C ostYTD


SalesTerritoryHistory C ostLastYear
CreditCardApprovalCode
CurrencyRateID (FK) BusinessEntityID (FK) r owguid

SubTotal Terr itoryID (FK) ModifiedDate


TaxAmt StartDate
Freight
EndDate
TotalDue
r owguid
Comment
ModifiedDate
rowguid
ModifiedDate
Product
Step 3: ER/Studio Model
SalesOrderHeader
SalesPerson
Employee
PurchaseOrderHeader
ProductID SalesOrderID BusinessEntityID (FK)
BusinessEntityID (FK) PurchaseOrderID
Name NationalIDNumber
RevisionNumber
TerritoryID (FK) RevisionNumber
ProductNumber OrderDate LoginID
SalesQuota Status
MakeFlag DueDate OrganizationNode
FinishedGoodsFlag Bonus EmployeeID (FK)
ShipDate OrganizationLevel
CommissionPct OrderDate
Color Status JobTitle
SalesYTD ShipDate
SafetyStockLevel OnlineOrderFlag BirthDate
ReorderPoint SalesLastYear SubTotal
SalesOrderNumber MaritalStatus
rowguid TaxAmt
StandardCost PurchaseOrderNumber Gender
ListPrice ModifiedDate Freight
AccountNumber HireDate
TotalDue
Size SalariedFlag
CustomerID (FK) ModifiedDate
SizeUnitMeasureCode (FK) SalesPersonID (FK) VacationHours
WeightUnitMeasureCode (FK) TerritoryID (FK) SickLeaveHours
Weight CurrentFlag
BillToAddressID (FK)
DaysToManufacture SalesTerritory
ShipToAddressID (FK) rowguid
ProductLine CreditCardApprovalCode TerritoryID ModifiedDate
Class SubTotal
Name
Style TaxAmt
Group
SellStartDate Freight
SalesYTD
SellEndDate TotalDue
SalesLastYear
DiscontinuedDate Comment
CostYTD
rowguid rowguid
CostLastYear Person
ModifiedDate ModifiedDate rowguid
BusinessEntityID (FK) BusinessEntity
ModifiedDate
BusinessEntityID
PersonType
NameStyle rowguid
Title ModifiedDate
FirstName
Address
Customer MiddleName
UnitMeasure AddressID
LastName
UnitMeasureCode CustomerID
AddressLine1 Suffix
Name AddressLine2 PersonID (FK) EmailPromotion
ModifiedDate City TerritoryID (FK) AdditionalContactInfo
PostalCode AccountNumber Demographics
SpatialLocation rowguid rowguid
rowguid ModifiedDate ModifiedDate
ModifiedDate
Step 3: Final Tables
Remaining tables in black
dbo.AWBuildVersion 1 Person.BusinessEntity 20777
dbo.DatabaseLog 1597 Person.BusinessEntityAddress 19614

dbo.ErrorLog 0 Person.BusinessEntityContact 909

HumanResources.Department 16 Person.ContactType 20

HumanResources.Employee 290 Person.CountryRegion 238

HumanResources. 296 Person.EmailAddress 19972


EmployeeDepartmentHistory
Person.Password 19972
HumanResources.EmployeePayHistory 316
Person.Person 19972
HumanResources.JobCandidate 13
Person.PersonPhone 19972
HumanResources.Shift 3
Person.Address 19614 Person.PhoneNumberType 3

Person.AddressType 6 Person.StateProvince 181


Step 3: Final Tables
Deleted tables in red
Production.BillOfMaterials 2679 Production.ProductModel 128
Production.Culture 8 Production.ProductModelIllustration 7
Production.Document 13 Production.ProductModel 762
ProductDescriptionCulture
Production.Illustration 5
Production.ProductPhoto 101
Production.Location 14
Production.ProductProductPhoto 504
Production.Product 504
Production.ProductReview 4
Production.ProductCategory 4
Production.ProductSubcategory 37
Production.ProductCostHistory 395 Production.ScrapReason 16
Production.ProductDescription 762 Production.TransactionHistory 113443
Production.ProductDocument 32 Production.TransactionHistoryArchive 89253
Production.ProductInventory 1069 Production.UnitMeasure 38
Production.ProductListPriceHistory 395 Production.WorkOrder 72591
Step 3: Final Tables
Purchasing.WorkOrderRouting 67131 Sales.SalesOrderHeader 31465

Purchasing.ProductVendor 460 Sales.SalesOrderHeaderSalesReason 27647

Purchasing.PurchaseOrderDetail 8845 Sales.SalesPerson 17

Purchasing.PurchaseOrderHeader 4012 Sales.SalesPersonQuotaHistory 163

Purchasing.ShipMethod 5 Sales.SalesReason 10

Purchasing.Vendor 104 Sales.SalesTaxRate 29

Sales.CountryRegionCurrency 109 Sales.SalesTerritory 10

Sales.CreditCard 19118 Sales.SalesTerritoryHistory 17

Sales.Currency 105 Sales.ShoppingCartItem 3

Sales.CurrencyRate 13532 Sales.SpecialOffer 16

Sales.Customer 19820 Sales.SpecialOfferProduct 538

Sales.PersonCreditCard 19118 Sales.Store 701

Sales.SalesOrderDetail 121317
Commentary
DBRE depends on having FKs
The 0,1 connection deletions lose little info
The few connections deletions are speculative

Supertype/subtypes are troublesome


From a separate manual DBRE
BusinessEntity -> Employee, Vendor, Person, Store
Employee -> SalesPerson
Supertype / Subtype
BusinessEntity
BusinessEntityID

Vendor Employee Person Store


BusinessEntityID (FK) BusinessEntityID (FK) BusinessEntityID (FK) BusinessEntityID (FK)

Customer
CustomerID

PersonID (FK)
StoreID (FK)

SalesPerson
BusinessEntityID (FK)
Case Study 3:
Core DBRE
Rationale
A project building a very large data warehouse
100 facts
200 dimensions
The primary operational feeder application has
8500 tables
I was new to the project and there was a lot to
learn
I wanted to reverse engineer the feeder
application so that I could understand it
Available Inputs
We had the following inputs (paper printouts) for
the feeder application
A thorough data dictionary
Primary key definitions
Foreign key definitions
The DBRE Problem
Reverse engineer a database with 8500 tables
With smaller schema, we could type the database
structure into a modeling tool and then analyze it
However, 8500 tables would take too long
We decided to determine the tightly connected
tables and hope that would yield a much smaller
model
We presume that the tightly connected tables are the
most important ones
DBRE Approach
Do a graph analysis
Create a meta-table with FK to PK references
The FK in the source table points to the PK in the
target table
Using SQL, successively delete tables with 0,1
FK connections
The final result is the multiply connected tables
Example
TableReferences
sourceTable targetTable
A B TableReferences
B A
D C sourceTable targetTable
E C A B
E D B A
G B D C
H A E C
F G E D
I G G B

H A B G I A B G

C F C

E D E D
Example
TableReferences
TableReferences
sourceTable targetTable
sourceTable targetTable
A B
B A A B
D C B A
E C D C
E D E C
G B E D

A B G C
A B

C E D

E D

Repeatedly subtract tables with one reference


until there is no change
Finding Core Tables
DELETE FROM TableReferences AS T3
WHERE EXISTS (
SELECT T1.sourceTable
FROM TableReferences AS T1
WHERE NOT EXISTS (
SELECT *
FROM TableReferences AS T2
WHERE T1.sourceTable = T2.targetTable )
AND T3.sourceTable = T1.sourceTable
GROUP BY T1.sourceTable
HAVING COUNT(*)=1 );
The middle query finds tables with one source reference
The innermost query limits the one-source tables to those that
are not the target of any other sources
The outer query does the deletion
Results
Initial: 8500 tables
Several thousand FK definitions
854 tables have FK columns
254 tables are referenced by FKs

Final result: 553 core tables


Case Study 4:
Enterprise Data Model
Rationale
Construct an enterprise data model
My client a financial software vendor was a
fusion of five formerly separate companies
The applications were greatly dissimilar because
they were built by separate organizations
The purpose of the EDM was to provide a basis
for integrating the applications and help the new
company strengthen their brand
DBRE Approach
We wanted to seed the EDM with application
content
We tried full DBRE but it was not helpful because
the resulting models were so different
We tried core DBRE but the models were still
confusing because they were so different
Finally we decided to count the FK references to
each table
We included the tables with the highest counts
This worked
Example
AddressType 2
Applicant 29
ApplicantAddressHistory 7
ApplicantType 2
LenderApplicantDetails 2
Country 2
OverseasCorrespondence 2
ExistingInsuranceCover 6
Provider 3
PaymentFrequency 2
Fee 11
FeeDueType 2
FeeType 2
ProductFee 2
Results
Holding P roduct

A ccount Customer A ctivity D ocument

FinancialScenario
DBRE and Zachman FW
The What column
Populated database Technician perspective
Database schema Engineer perspective
Physical data model Architect perspective
Logical data model Business mgmt. perspective
Conceptual data model Executive perspective
Thank you for attending

Any questions???

You might also like