Building a scalable data warehouse with Data Vault 2.0:
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Amsterdam [und 11 andere]
Morgan Kaufmann
[2016]
|
Schlagworte: | |
Online-Zugang: | http://digitool.hbz-nrw.de:1801/webclient/DeliveryManager?pid=6483459&custom%5Fatt%5F2=simple%5Fviewer Inhaltsverzeichnis Inhaltsverzeichnis |
Beschreibung: | xx, 661 Seiten Illustrationen |
ISBN: | 9780128025109 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV043192249 | ||
003 | DE-604 | ||
005 | 20220701 | ||
007 | t | ||
008 | 151208s2016 a||| |||| 00||| eng d | ||
020 | |a 9780128025109 |c pbk |9 978-0-12-802510-9 | ||
035 | |a (OCoLC)950473421 | ||
035 | |a (DE-599)BSZ450441520 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-473 |a DE-703 |a DE-523 |a DE-355 | ||
084 | |a ST 530 |0 (DE-625)143679: |2 rvk | ||
100 | 1 | |a Linstedt, Daniel |e Verfasser |0 (DE-588)1068367636 |4 aut | |
245 | 1 | 0 | |a Building a scalable data warehouse with Data Vault 2.0 |c Daniel Linstedt, Michael Olschimke |
264 | 1 | |a Amsterdam [und 11 andere] |b Morgan Kaufmann |c [2016] | |
264 | 4 | |c © 2016 | |
300 | |a xx, 661 Seiten |b Illustrationen | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
650 | 0 | 7 | |a Big Data |0 (DE-588)4802620-7 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Skalierbarkeit |0 (DE-588)4520890-6 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Data-Warehouse-Konzept |0 (DE-588)4406462-7 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Big Data |0 (DE-588)4802620-7 |D s |
689 | 0 | 1 | |a Data-Warehouse-Konzept |0 (DE-588)4406462-7 |D s |
689 | 0 | 2 | |a Skalierbarkeit |0 (DE-588)4520890-6 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Olschimke, Michael |d 1981- |e Verfasser |0 (DE-588)1077889798 |4 aut | |
856 | 4 | 2 | |u http://digitool.hbz-nrw.de:1801/webclient/DeliveryManager?pid=6483459&custom%5Fatt%5F2=simple%5Fviewer |
856 | 4 | 2 | |m HBZ Datenaustausch |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=028615848&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
856 | 4 | 2 | |m Digitalisierung UB Bamberg - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=028615848&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-028615848 |
Datensatz im Suchindex
_version_ | 1804175682032369664 |
---|---|
adam_text | Titel: Building a scalable data warehouse with Data Vault 2.0
Autor: Linstedt, Daniel
Jahr: 2016
Contents
Authors Biography..............................................................................................................................xiii
Foreword...............................................................................................................................................xv
Preface.................................................................................................................................................xvii
Acknowledgments................................................................................................................................xix
CHAPTER 1 Introduction to Data Warehousing...........................................................1
1.1 History of Data Warehousing.....................................................................................2
1.1.1 Decision Support Systems................................................................................3
1.1.2 Data Warehouse Systems.................................................................................4
1.2 The Enterprise Data Warehouse Environment...........................................................5
1.2.1 Access...............................................................................................................5
1.2.2 Multiple Subject Areas.....................................................................................5
1.2.3 Single Version of Truth....................................................................................5
1.2.4 Single Version of Facts....................................................................................6
1.2.5 Mission Criticality............................................................................................6
1.2.6 Scalability.........................................................................................................6
1.2.7 Big Data............................................................................................................7
1.2.8 Performance Issues...........................................................................................7
1.2.9 Complexity.......................................................................................................8
1.2.10 Auditing and Compliance...............................................................................9
1.2.11 Costs.............................................................................................................10
1.2.12 Other Business Requirements.......................................................................11
1.3 Introduction to Data Vault 2.0..................................................................................11
1.4 Data Warehouse Architecture...................................................................................12
1.4.1 Typical Two-Layer Architecture....................................................................12
1.4.2 Typical Three-Layer Architecture..................................................................13
References........................................................................................................................14
CHAPTER 2 Scalable Data Warehouse Architecture...............................................17
2.1 Dimensions of Scalable Data Warehouse Architectures..........................................17
2.1.1 Workload........................................................................................................18
2.1.2 Data Complexity.............................................................................................18
2.1.3 Analytical Complexity....................................................................................19
2.1.4 Query Complexity..........................................................................................19
2.1.5 Availability.....................................................................................................20
2.1.6 Security...........................................................................................................20
Please refer the companion site for more details — http*//booksite.elsevier.com/9780128025109
V
vi Contents
21
2.2 Data Vault 2.0 Architecture......................................................................................
2.2.1 Business Rules Definition...............................................................................
2.2.2 Business Rules Application............................................................................
2.2.3 Staging Area Layer.........................................................................................25
2.2.4 Data Warehouse Layer...................................................................................2^
2.2.5 Information Mart Layer..................................................................................22
2.2.6 Metrics Vault..................................................................................................22
2.2.7 Business Vault................................................................................................2^
2.2.8 Operational Vault...........................................................................................^9
2.2.9 Managed Self-Service BI...............................................................................30
2.2.10 Other Features..............................................................................................31
References........................................................................................................................31
CHAPTER 3 The Data Vault 2.0 Methodology...........................................................33
3.1 Project Planning........................................................................................................33
3.1.1 Capability Maturity Model Integration..........................................................39
3.1.2 Managing the Project......................................................................................42
3.1.3 Defining the Project........................................................................................50
3.1.4 Estimation of the Project................................................................................54
3.2 Project Execution......................................................................................................62
3.2.1 Traditional Software Development Life-Cycle..............................................63
3.2.2 Applying Software Development Life-Cycle to the Data
Vault 2.0 Methodology...................................................................................67
3.2.3 Parallel Teams................................................................................................69
3.2.4 Technical Numbering.....................................................................................71
3.3 Review and Improvement.........................................................................................73
3.3.1 Six Sigma........................................................................................................74
3.3.2 Total Quality Management.............................................................................81
References........................................................................................................................86
CHAPTER 4 Data Vault 2.0 Modeling.........................................................................89
4.1 Introduction to Data Vault Modeling.......................................................................89
4.2 Data Vault Modeling Vocabulary.............................................................................90
4.2.1 Hub Entities....................................................................................................91
4.2.2 Link Entities.....................................................................................................
4.2.3 Satellite Entities.......................................................................................... 91
4.3 Hub Definition....................................................................................................................................................................93
4.3.1 Definition of a Business Key..................................................................................................95
4.3.2 Hub Entity Structure..........................................................................................................gg
4.3.3 Hub Examples............................................................................................jqq
4.4 Link Definition........................................................................................................^
4.4.1 Reasons for Many-to-Many Relationships...................................................103
Contents vii
4.4.2 Flexibility of Links.......................................................................................105
4.4.3 Granularity of Links.....................................................................................106
4.4.4 Link Unit-of-Work.......................................................................................109
4.4.5 Link Entity Structure....................................................................................110
4.4.6 Link Examples..............................................................................................Ill
4.5 Satellite Definition..................................................................................................112
4.5.1 Importance of Keeping History....................................................................114
4.5.2 Splitting Satellites.........................................................................................114
4.5.3 Satellite Entity Structure...............................................................................116
4.5.4 Satellite Examples........................................................................................118
4.5.5 Link Driving Key.........................................................................................119
References......................................................................................................................121
CHAPTER 5 Intermediate Data Vault Modeling.......................................................123
5.1 Hub Applications....................................................................................................123
5.1.1 Business Key Consolidation.........................................................................124
5.2 Link Applications...................................................................................................127
5.2.1 Link-on-Link................................................................................................127
5.2.2 Same-as Links..............................................................................................129
5.2.3 Hierarchical Links........................................................................................129
5.2.4 Nonhistorized Links.....................................................................................132
5.2.5 Nondescriptive Links....................................................................................136
5.2.6 Computed Aggregate Links..........................................................................137
5.2.7 Exploration Links.........................................................................................139
5.3 Satellite Applications..............................................................................................139
5.3.1 Overloaded Satellites....................................................................................139
5.3.2 Multi-Active Satellites..................................................................................141
5.3.3 Status Tracking Satellites.............................................................................143
5.3.4 Effectivity Satellites.....................................................................................145
5.3.5 Record Tracking Satellites...........................................................................146
5.3.6 Computed Satellites......................................................................................149
References......................................................................................................................150
CHAPTER 6 Advanced Data Vault Modeling............................................................151
6.1 Point-in-Time T ables..............................................................................................151
6.1.1 Point-in-Time Table Structure......................................................................153
6.1.2 Managed PIT Window.................................................................................156
6.2 Bridge Tables..........................................................................................................158
6.2.1 Bridge Table Structure.................................................................................159
6.2.2 Comparing PIT Tables with Bridge Tables..................................................160
6.3 Reference Tables....................................................................................................160
6.3.1 No-History Reference Tables.......................................................................161
viii Contents
6.3.2 History-Based Reference Tables..................................................................
164
6.3.3 Code and Descriptions.................................................................................
..................169
Reference..................................................................................................
CHAPTER 7 Dimensional Modeling..........................................................................
7.1 Introduction.............................................................................................................
172
7.2 Star Schemas...........................................................................................................
7.2.1 Fact Tables....................................................................................................
7.2.2 Dimension Tables.........................................................................................^
7.2.3 Querying Star Schemas................................................................................177
7.3 Multiple Stars..........................................................................................................*7^
7.3.1 Conformed Dimensions................................................................................*7^
7.4 Dimension Design..................................................................................................180
7.4.1 Slowly Changing Dimensions......................................................................181
7.4.2 Hierarchies....................................................................................................183
7.4.3 Snowflake Design.........................................................................................189
References......................................................................................................................193
CHAPTER 8 Physical Data Warehouse Design........................................................195
8.1 Database Workloads...............................................................................................195
8.1.1 Workload Characteristics.............................................................................196
8.2 Separate Environments for Development, Testing, and Production.......................197
8.2.1 Blue-Green Deployment...............................................................................198
8.3 Microsoft Azure Cloud Computing Platform.........................................................200
8.4 Physical Data Warehouse Architecture on Premise...............................................203
8.4.1 Hardware Architectures and Databases........................................................203
8.4.2 Processor Options.........................................................................................206
8.4.3 Memory Options...........................................................................................207
8.4.4 Storage Options............................................................................................207
8.4.5 Network Options..........................................................................................209
8.5 Database Options....................................................................................................210
8.5.1 Tempdb Options...........................................................................................210
8.5.2 Partitioning...................................................................................................211
8.5.3 Filegroups.....................................................................................................212
8.5.4 Data Compression........................................................................................212
8.6 Setting up the Data Warehouse...............................................................................213
8.6.1 Setting up the Stage Area.............................................................................213
8.6.2 Setting up the Data Vault.............................................................................217
8.6.3 Setting up Information Marts.......................................................................222
8.6.4 Setting up the Meta, Metrics, and Error Marts.............................................226
References...................................
Contents ix
CHAPTER 9 Master Data Management.....................................................................229
9.1 Definitions...............................................................................................................229
9.1.1 Master Data..................................................................................................229
9.1.2 Data Management.........................................................................................230
9.1.3 Master Data Management.............................................................................230
9.2 Master Data Management Goals............................................................................231
9.3 Drivers for Managing Master Data.........................................................................232
9.4 Operational vs. Analytical Master Data Management............................................235
9.5 Master Data Management as an Enabler for Managed Self-Service BI.................238
9.6 Master Data Management as an Enabler for Total Quality Management..............239
9.6.1 MDS Object Model......................................................................................241
9.6.2 Master Data Manager...................................................................................249
9.6.3 Explorer........................................................................................................250
9.6.4 Version Management....................................................................................252
9.6.5 Integration Management...............................................................................253
9.6.6 System Administration.................................................................................254
9.6.7 User and Group Permissions........................................................................255
9.7 Creating a Model....................................................................................................256
9.7.1 Creating Entities...........................................................................................258
9.7.2 Creating Business Rules...............................................................................261
9.8 Importing a Model..................................................................................................263
9.9 Integrating MDS with the Data Vault and Operational Systems............................265
9.9.1 Stage Tables..................................................................................................267
9.9.2 Subscription Views.......................................................................................278
References......................................................................................................................282
CHAPTER 10 Metadata Management........................................................................283
10.1 What is Metadata?................................................................................................283
10.1.1 Business Metadata....................................................................................284
10.1.2 Technical Metadata...................................................................................286
10.1.3 Process Execution Metadata.....................................................................287
10.2 Implementing the Meta Mart................................................................................287
10.2.1 SQL Server BI Metadata Toolkit..............................................................288
10.2.2 Naming Conventions................................................................................292
10.2.3 Capturing Source System Definitions.......................................................296
10.2.4 Capturing Hard Rules...............................................................................298
10.2.5 Capturing Metadata for the Staging Area.................................................300
10.2.6 Capturing Requirements to Source Tables...............................................301
10.2.7 Capturing Source Tables to Data Vault Tables.........................................302
10.2.8 Capturing Soft Rules.................................................................................311
10.2.9 Capturing Data Vault Tables to Information Mart Table Mappings........315
Contents
10.2.10 Capturing Requirements to Information Mart Tables.............................317
10.2.11 Capturing Access Control Lists and Other Security Measures...............318
10.3 Implementing the Metrics Vault...........................................................................3^0
10.3.1 Capturing Performance Data in SQL Server Integration Services...........323
10.4 Implementing the Metrics Mart............................................................................333
10.5 Implementing the Error Mart................................................................................333
10.5.1 Capturing Erroneous Data in SQL Server Integration Services...............336
References......................................................................................................................
CHAPTER 11 Data Extraction......................................................................................343
11.1 Purpose of Staging Area.......................................................................................343
11.2 Hashing in the Data Warehouse...........................................................................347
11.2.1 Hash Functions Revisited.........................................................................350
11.2.2 Applying Hash Functions to Data.............................................................351
11.2.3 Risks of Using Hash Functions.................................................................355
11.2.4 Hashing Business Keys.............................................................................360
11.2.5 Hashing for Change Detection..................................................................364
11.3 Purpose of the Load Date.....................................................................................370
11.4 Purpose of the Record Source...............................................................................372
11.5 Types of Data Sources..........................................................................................373
11.6 Sourcing Flat Files................................................................................................375
11.6.1 Control Flow.............................................................................................375
11.6.2 Flat File Connection Manager..................................................................380
11.6.3 Data Flow..................................................................................................383
11.7 Sourcing Historical Data.......................................................................................399
11.7.1 SSIS Example for Sourcing Historical Data.............................................401
11.8 Sourcing the Sample Airline Data........................................................................403
11.8.1 Authenticating with Google Drive............................................................404
11.8.2 Control Flow.............................................................................................406
11.8.3 GoogleSheets Connection Manager..........................................................411
11.8.4 Data Flow..................................................................................................414
11.9 Sourcing Denormalized Data Sources..................................................................422
11.10 Sourcing Master Data from MDS.......................................................................425
References.......................................................................................................................
CHAPTER 12 Loading the Data Vault.........................................................................429
12.1 Loading Raw Data Vault Entities........................................................................................................432
12.1.1 Hubs............................................................... ...434
12.1.2 Links..............................................................................................................................................................44^
12.1.3 No-History Links............................................................................................................................457
12.1.4 Satellites..........................................................................................................................................4^
12.1.5 End-Dating Satellites..................................................................................................4
Contents xi
12.1.6 Separate New from Changed Rows..........................................................491
12.1.7 No-History Satellites.................................................................................496
12.1.8 Soft-Deleting Data in Hubs and Links......................................................499
12.1.9 Dealing with Missing Data.......................................................................501
12.2 Loading Reference Tables....................................................................................505
12.2.1 No-History Reference Tables...................................................................506
12.2.2 History-Based Reference Tables...............................................................509
12.2.3 Code and Descriptions..............................................................................511
12.2.4 Code and Descriptions with History.........................................................514
12.3 Truncating the Staging Area.................................................................................517
References......................................................................................................................518
CHAPTER 13 Implementing Data Quality..................................................................519
13.1 Business Expectations Regarding Data Quality...................................................519
13.2 The Costs of Low Data Quality............................................................................520
13.3 The Value of Bad Data.........................................................................................521
13.4 Data Quality in the Architecture...........................................................................523
13.5 Correcting Errors in the Data Warehouse.............................................................524
13.6 Transform, Enhance and Calculate Derived Data................................................525
13.6.1 T-SQL Example........................................................................................526
13.7 Standardization of Data........................................................................................528
13.7.1 T-SQL Example........................................................................................528
13.8 Correct and Complete Data..................................................................................530
13.8.1 T-SQL Example........................................................................................531
13.8.2 DQS Example...........................................................................................532
13.8.3 SSIS Example...........................................................................................537
13.9 Match and Consolidate Data.................................................................................548
13.9.1 SSIS Example...........................................................................................550
13.10 Creating Dimensions from Same-as Links.........................................................560
References......................................................................................................................566
CHAPTER 14 Loading the Dimensional Information Mart......................................567
14.1 Using the Business Vault as an Intermediate to the Information Mart.................567
14.1.1 Computed Satellite....................................................................................567
14.1.2 Building an Exploration Link...................................................................569
14.2 Materializing the Information Mart......................................................................579
14.2.1 Loading Type 1 Dimensions.....................................................................580
14.2.2 Loading Type 2 Dimensions.....................................................................582
14.2.3 Loading Fact Tables..................................................................................585
14.2.4 Loading Aggregated Fact Tables..............................................................590
14.3 Leveraging PIT and Bridge Tables for Virtualization..........................................592
14.3.1 Factors that Affect Performance of Virtualized Facts..............................594
xii Contents
14.3.2 Advantages of Virtualization....................................................................695
14.3.3 Loading PIT Tables..................................................................................696
14.3.4 Creating Virtualized Dimensions..............................................................601
14.3.5 Loading Bridge Tables..............................................................................604
14.3.6 Creating Virtualized Facts........................................................................608
14.4 Implementing Temporal Dimensions...................................................................614
14.5 Implementing Data Quality Using PIT Tables.....................................................616
14.6 Dealing with Reference Data................................................................................618
14.7 About Hash Keys in the Information Mart...........................................................620
14.7.1 Advantages of Using Hash Keys in the Information Mart.......................620
14.7.2 Reduce the Number of Dimensions in Cube.............................................620
14.7.3 Use Fixed Binary Data Type for Hash Values..........................................620
14.7.4 Reduce the Size of the Hash Key..............................................................621
14.7.5 Introduce Additional Sequence Numbers.................................................621
References......................................................................................................................621
CHAPTER 15 Multidimensional Database................................................................623
15.1 Accessing the Information Mart...........................................................................624
15.1.1 Creating a Data Source.............................................................................624
15.1.2 Creating Data Source View......................................................................626
15.2 Creating Dimensions............................................................................................631
15.2.1 Date Dimension........................................................................................633
15.3 Creating Cubes......................................................................................................639
15.3.1 Processing the Cube..................................................................................645
15.4 Accessing the Cube...............................................................................................646
References.........................................................................................................................
Subject Index............................................................................................................................................................................................................................649
Contents
Authors Biography.......................................................................xiii
Foreword..................................................................................
Preface.................................................................................xvii
Acknowledgments.........................................................................x i x
CHAPTER 1 Introduction to Data Warehousing..........................................i
1.1 History of Data Warehousing...............................................2
1.1.1 Decision Support Systems............................................3
1.1.2 Data Warehouse Systems..............................................4
1.2 The Enterprise Data Warehouse Environment................................5
1.2.1 Access..............................................................5
1.2.2 Multiple Subject Areas..............................................5
1.2.3 Single Version of Truth.............................................5
1.2.4 Single Version of Facts.............................................6
1.2.5 Mission Criticality.................................................6
1.2.6 Scalability.........................................................6
1.2.7 Big Data............................................................7
1.2.8 Performance Issues..................................................7
1.2.9 Complexity..........................................................8
1.2.10 Auditing and Compliance............................................9
1.2.11 Costs.............................................................10
1.2.12 Other Business Requirements....
1.3 Introduction to Data Vault 2.0........
1.4 Data Warehouse Architecture...........
1.4.1 Typical Two-Layer Architecture..
1.4.2 Typical Three-Layer Architecture
References...................................................................14
CHAPTER 2 Scalable Data Warehouse Architecture...........................................17
2,1 Dimensions of Scalable Data Warehouse Architectures.....................17
2.1.1 Workload...........................................................1«
2.1.2 Data Complexity....................................................18
2.1.3 Analytical Complexity..............................................19
2.1.4 Query Complexity...................................................19
2.1.5 Availability......................................................֊**
2.1.6 Security..........................................................֊()
Please refer the companion site for more details — http://booksite.elsevier.com/9780128025109
V
vi Contents
2.2 Data Vault 2.0 Architecture................................................^
2.2.1 Business Rules Definition.............................................22
2.2.2 Business Rules Application............................................23
2.2.3 Staging Area Layer....................................................25
2.2.4 Data Warehouse Layer..................................................26
2.2.5 Information Mart Layer................................................22
2.2.6 Metrics Vault.........................................................22
2.2.7 Business Vault........................................................28
2.2.8 Operational Vault.....................................................29
2.2.9 Managed Self-Service BI...............................................30
2.2.10 Other Features.......................................................31
References......................................................................31
CHAPTER 3 The Data Vault 2.0 Methodology.................................................33
3.1 Project Planning...........................................................33
3.1.1 Capability Maturity Model Integration.................................39
3.1.2 Managing the Project..................................................42
3.1.3 Defining the Project..................................................50
3.1.4 Estimation of the Project.............................................54
3.2 Project Execution...........................................................62
3.2.1 Traditional Software Development Life-Cycle............................63
3.2.2 Applying Software Development Life-Cycle to the Data
Vault 2.0 Methodology.................................................67
3.2.3 Parallel Teams........................................................69
3.2.4 Technical Numbering...................................................71
3.3 Review and Improvement.....................................................73
3.3.1 Six Sigma.............................................................74
3.3.2 Total Quality Management..............................................81
References......................................................................86
CHAPTER 4 Data Vault 2.0 Modeling.....................................................89
4.1 Introduction to Data Vault Modeling.........................................89
4.2 Data Vault Modeling Vocabulary..............................................90
4.2.1 Hub Entities............................................................
4.2.2 Link Entities......................................................... 91
4.2.3 Satellite Entities.................................................... 91
4.3 Hub Definition............................................................. 92
4.3.1 Definition of a Business Key.......................................... 95
4.3.2 Hub Entity Structure................................................. 9g
4.3.3 Hub Examples........................................................ Iqq
4.4 Link Definition........................................................... jqj
4.4.1 Reasons for Many-to-Many Relationships................................103
Contents vii
4.4.2 Flexibility of Links................................................ ^
4.4.3 Granularity of Links.............................................. ¡0^
4.4.4 Link Unit-of-Work................................................ I0 ^
4.4.5 Link Entity Structure....................... I j q
4.4.6 Link Examples............................... j j j
4.5 Satellite Definition....................... I j
4.5.1 Importance of Keeping History....................................I j 4
4.5.2 Splitting Satellites.............................................. 114
4.5.3 Satellite Entity Structure................................... I ] ^
4.5.4 Satellite Examples........................................... 11 ^
4.5.5 Link Driving Key................................ I j9
References................................. j ^ j
CHAPTER 5 Intermediate Data Vault Modeling......................................123
5.1 Hub Applications....................................... 123
5.1.1 Business Key Consolidation...................................... 124
5.2 Link Applications................................... 127
5.2.1 Link-on-Link..........................................................
5.2.2 Same-as Links..................................................... 129
5.2.3 Hierarchical Links.................................................129
5.2.4 Nonhistorized Links................................................132
5.2.5 Nondescriptive Links..............................................I 36
5.2.6 Computed Aggregate Links...........................................137
5.2.7 Exploration Links..................................................139
5.3 Satellite Applications..................................................139
5.3.1 Overloaded Satellites...............................................139
5.3.2 Multi-Active Satellites.............................................141
5.3.3 Status Tracking Satellites..........................................143
5.3.4 Effectivity Satellites..............................................¡45
5.3.5 Record Tracking Satellites..........................................146
5.3.6 Computed Satellites.................................................149
References......................................................................150
CHAPTER 6 Advanced Data Vault Modeling...................................................151
6.1 Point-in-Time Tables.......................................................151
6.1.1 Point-in-Time Table Structure.......................................153
6.1.2 Managed PIT Window..................................................156
6.2 Bridge Tables.............................................................158
6.2.1 Bridge Table Structure.............................................159
6.2.2 Comparing PIT Tables with Bridge Tables.............................160
6.3 Reference Tables..........................................................160
6.3.1 No-History Reference Tables........................................161
viii Contents
6.3.2 History-Based Reference Tables
6.3.3 Code and Descriptions.........
Reference...............................
163
164
169
CHAPTER 7 Dimensional Modeling..................
7.1 Introduction....................
7.2 Star Schemas...................
7.2.1 Fact Tables...............
7.2.2 Dimension Tables..........
7.2.3 Querying Star Schemas.....
7.3 Multiple Stars.................
7.3.1 Conformed Dimensions......
7.4 Dimension Design...............
7.4.1 Slowly Changing Dimensions
7.4.2 Hierarchies...............
7.4.3 Snowflake Design..........
References..........................
171
171
,172
.174
.176
.177
.179
.179
.180
.181
,.183
..189
..193
CHAPTER 8 Physical Data Warehouse Design.....................................
8.1 Database Workloads..........................................
8.1.1 Workload Characteristics..............................
8.2 Separate Environments for Development, Testing, and Production
8.2.1 Blue-Green Deployment.................................
8.3 Microsoft Azure Cloud Computing Platform....................
8.4 Physical Data Warehouse Architecture on Premise.............
8.4.1 Hardware Architectures and Databases..................
8.4.2 Processor Options.....................................
8.4.3 Memory Options..........................................
8.4.4 Storage Options.......................................
8.4.5 Network Options.......................................
8.5 Database Options............................................
8.5.1 Tempdb Options........................................
8.5.2 Partitioning..........................................
8.5.3 Filegroups............................................
8.5.4 Data Compression......................................
8.6 Setting up the Data Warehouse.........................
8.6.1 Setting up the Stage Area.......................
8.6.2 Setting up the Data Vault.......................
8.6.3 Setting up Information Marts..........................
8.6.4 Setting up the Meta, Metrics, and Error Marts.........
References.................
195
.195
.196
.197
.198
.200
.203
.203
„206
„207
„207
„209
„210
„210
„211
,..212
...212
...213
...213
...217
...222
...226
...228
Contents
IX
CHAPTER 9 Master Data Management......................................................229
9.1 Definitions...........................................................229
9.1.1 Master Data.....................................................229
9.1.2 Data Management.................................................230
9.1.3 Master Data Management..........................................230
9.2 Master Data Management Goals..........................................231
9.3 Drivers for Managing Master Data......................................232
9.4 Operational vs. Analytical Master Data Management.....................235
9.5 Master Data Management as an Enabler for Managed Self-Service BI......238
9.6 Master Data Management as an Enabler for Total Quality Management.....239
9.6.1 MDS Object Model................................................241
9.6.2 Master Data Manager.............................................249
9.6.3 Explorer........................................................250
9.6.4 Version Management..............................................252
9.6.5 Integration Management..........................................253
9.6.6 System Administration...........................................254
9.6.7 User and Group Permissions......................................255
9.7 Creating a Model......................................................256
9.7.1 Creating Entities...............................................258
9.7.2 Creating Business Rules.........................................261
9.8 Importing a Model.....................................................263
9.9 Integrating MDS with the Data Vault and Operational Systems...........265
9.9.1 Stage Tables....................................................267
9.9.2 Subscription Views..............................................2?8
References................................................................282
CHAPTER 10 Metadata Management...................................................283
10.1 What is Metadata?....................................................283
10.1.1 Business Metadata..............................................284
10.1.2 Technical Metadata.............................................286
10.1.3 Process Execution Metadata.....................................287
10.2 Implementing the Meta Mart...........................................287
10.2.1 SQL Server BI Metadata Toolkit.................................288
10.2.2 Naming Conventions.............................................292
10.2.3 Capturing Source System Definitions............................296
10.2.4 Capturing Hard Rules...........................................298
10.2.5 Capturing Metadata for the Staging Area........................300
10.2.6 Capturing Requirements to Source Tables........................301
10.2.7 Capturing Source Tables to Data Vault Tables...................302
10.2.8 Capturing Soft Rules...........................................311
10.2.9 Capturing Data Vault Tables to Information Mart Table Mappings.315
X
Contents
10.2.10 Capturing Requirements to Information Mart Tables.................317
10.2.11 Capturing Access Control Lists and Other Security Measures........318
10.3 Implementing the Metrics Vault............................................320
10.3.1 Capturing Performance Data in SQL Server Integration Services......323
10.4 Implementing the Metrics Mart.............................................333
10.5 Implementing the Error Mart...............................................335
10.5.1 Capturing Erroneous Data in SQL Server Integration Services........336
742
References.....................................................................
CHAPTER 11 Data Extraction...........................-......................................343
11.1 Purpose of Staging Area....................................................343
11.2 Hashing in the Data Warehouse.............................................347
11.2.1 Hash Functions Revisited...........................................350
11.2.2 Applying Hash Functions to Data....................................351
11.2.3 Risks of Using Hash Functions.......................................355
11.2.4 Hashing Business Keys...............................................360
11.2.5 Hashing for Change Detection........................................364
11.3 Purpose of the Load Date...................................................370
11.4 Purpose of the Record Source...............................................372
11.5 Types of Data Sources......................................................373
11.6 Sourcing Flat Files........................................................375
11.6.1 Control Flow........................................................375
11.6.2 Flat File Connection Manager........................................380
11.6.3 Data Flow...........................................................383
11.7 Sourcing Historical Data...................................................399
11.7.1 SSIS Example for Sourcing Historical Data..........................401
11.8 Sourcing the Sample Airline Data...........................................403
11.8.1 Authenticating with Google Drive....................................404
11.8.2 Control Flow........................................................406
11.8.3 GoogleSheets Connection Manager.....................................411
11.8.4 Data Flow.............................................................
11.9 Sourcing Denormalized Data Sources........................................422
11.10 Sourcing Master Data from MDS..............................................
References................................................. 42 у
CHAPTER 12 Loading the Data Vault.....................................................429
12.1 Loading Raw Data Vault Entities......................................... 432
12.1.1 Hubs............................................................... 434
12.1.2 Links...............................................................446
12.1.3 No-History Links................................................. 45-y
12.1.4 Satellites.....
12.1.5 End-Dating Satellites...............
Contents
XI
12.1.6 Separate New from Changed Rows.................................491
12.1.7 No-History Satellites...........................................496
12.1.8 Soft-Deleting Data in Hubs and Links...........................499
12.1.9 Dealing with Missing Data......................................501
12.2 Loading Reference Tables..............................................505
12.2.1 No-History Reference Tables.....................................506
12.2.2 History-Based Reference Tables..................................509
12.2.3 Code and Descriptions...........................................511
12.2.4 Code and Descriptions with History..............................514
12.3 Truncating the Staging Area...........................................517
References..................................................................518
CHAPTER 13 Implementing Data Quality....................................................519
13.1 Business Expectations Regarding Data Quality..........................519
13.2 The Costs of Low Data Quality.........................................520
13.3 The Value of Bad Data.................................................521
13.4 Data Quality in the Architecture......................................523
13.5 Correcting Errors in the Data Warehouse...............................524
13.6 Transform, Enhance and Calculate Derived Data.........................525
13.6.1 T-SQL Example...................................................526
13.7 Standardization of Data...............................................528
13.7.1 T-SQL Example...................................................528
13.8 Correct and Complete Data.............................................530
13.8.1 T-SQL Example...................................................531
13.8.2 DQS Example.....................................................532
13.8.3 SSIS Example....................................................537
13.9 Match and Consolidate Data..........................................*548
13.9.1 SSIS Example....................................................550
13.10 Creating Dimensions from Same-as Links..............................֊560
References.................................................................֊566
CHAPTER 14 Loading the Dimensional Information Mart..........................567
14.1 Using the Business Vault as an Intermediate to the Information Mart...567
14.1.1 Computed Satellite..............................................567
14.1.2 Building an Exploration Link....................................569
14.2 Materializing the Information Mart....................................579
14.2.1 Loading Type 1 Dimensions.......................................580
14.2.2 Loading Type 2 Dimensions.......................................582
14.2.3 Loading Fact Tables.............................................585
14.2.4 Loading Aggregated Fact Tables..................................590
14.3 Leveraging PIT and Bridge Tables for Virtualization...................592
14.3.1 Factors that Affect Performance of Virtualized Facts...........594
XII
Contents
14.3.2 Advantages of Virtualization.......................................595
14.3.3 Loading PIT Tables.................................................596
14.3.4 Creating Virtualized Dimensions....................................601
14.3.5 Loading Bridge Tables..............................................604
14.3.6 Creating Virtualized Facts.........................................608
14.4 Implementing Temporal Dimensions.........................................614
14.5 Implementing Data Quality Using PIT Tables...............................616
14.6 Dealing with Reference Data..............................................618
14.7 About Hash Keys in the Information Mart.................................620
14.7.1 Advantages of Using Hash Keys in the Information Mart.............620
14.7.2 Reduce the Number of Dimensions in Cube............................620
14.7.3 Use Fixed Binary Data Type for Hash Values........................620
14.7.4 Reduce the Size of the Hash Key...................................621
14.7.5 Introduce Additional Sequence Numbers..............................621
References.....................................................................621
CHAPTER 15 Multidimensional Database............................................... 623
15.1 Accessing the Information Mart...........................................624
15.1.1 Creating a Data Source.............................................624
15.1.2 Creating Data Source View..........................................626
15.2 Creating Dimensions......................................................631
15.2.1 Date Dimension.....................................................633
15.3 Creating Cubes...........................................................639
15.3.1 Processing the Cube................................................645
15.4 Accessing the Cube.......................................................646
References.......................................................................
Subject Index
649
|
any_adam_object | 1 |
author | Linstedt, Daniel Olschimke, Michael 1981- |
author_GND | (DE-588)1068367636 (DE-588)1077889798 |
author_facet | Linstedt, Daniel Olschimke, Michael 1981- |
author_role | aut aut |
author_sort | Linstedt, Daniel |
author_variant | d l dl m o mo |
building | Verbundindex |
bvnumber | BV043192249 |
classification_rvk | ST 530 |
ctrlnum | (OCoLC)950473421 (DE-599)BSZ450441520 |
discipline | Informatik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01996nam a2200409 c 4500</leader><controlfield tag="001">BV043192249</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20220701 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">151208s2016 a||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780128025109</subfield><subfield code="c">pbk</subfield><subfield code="9">978-0-12-802510-9</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)950473421</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BSZ450441520</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-473</subfield><subfield code="a">DE-703</subfield><subfield code="a">DE-523</subfield><subfield code="a">DE-355</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Linstedt, Daniel</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1068367636</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Building a scalable data warehouse with Data Vault 2.0</subfield><subfield code="c">Daniel Linstedt, Michael Olschimke</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Amsterdam [und 11 andere]</subfield><subfield code="b">Morgan Kaufmann</subfield><subfield code="c">[2016]</subfield></datafield><datafield tag="264" ind1=" " ind2="4"><subfield code="c">© 2016</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xx, 661 Seiten</subfield><subfield code="b">Illustrationen</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Big Data</subfield><subfield code="0">(DE-588)4802620-7</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Skalierbarkeit</subfield><subfield code="0">(DE-588)4520890-6</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data-Warehouse-Konzept</subfield><subfield code="0">(DE-588)4406462-7</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Big Data</subfield><subfield code="0">(DE-588)4802620-7</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Data-Warehouse-Konzept</subfield><subfield code="0">(DE-588)4406462-7</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Skalierbarkeit</subfield><subfield code="0">(DE-588)4520890-6</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Olschimke, Michael</subfield><subfield code="d">1981-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1077889798</subfield><subfield code="4">aut</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">http://digitool.hbz-nrw.de:1801/webclient/DeliveryManager?pid=6483459&custom%5Fatt%5F2=simple%5Fviewer</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">HBZ Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=028615848&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Bamberg - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=028615848&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-028615848</subfield></datafield></record></collection> |
id | DE-604.BV043192249 |
illustrated | Illustrated |
indexdate | 2024-07-10T07:20:10Z |
institution | BVB |
isbn | 9780128025109 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-028615848 |
oclc_num | 950473421 |
open_access_boolean | |
owner | DE-473 DE-BY-UBG DE-703 DE-523 DE-355 DE-BY-UBR |
owner_facet | DE-473 DE-BY-UBG DE-703 DE-523 DE-355 DE-BY-UBR |
physical | xx, 661 Seiten Illustrationen |
publishDate | 2016 |
publishDateSearch | 2016 |
publishDateSort | 2016 |
publisher | Morgan Kaufmann |
record_format | marc |
spelling | Linstedt, Daniel Verfasser (DE-588)1068367636 aut Building a scalable data warehouse with Data Vault 2.0 Daniel Linstedt, Michael Olschimke Amsterdam [und 11 andere] Morgan Kaufmann [2016] © 2016 xx, 661 Seiten Illustrationen txt rdacontent n rdamedia nc rdacarrier Big Data (DE-588)4802620-7 gnd rswk-swf Skalierbarkeit (DE-588)4520890-6 gnd rswk-swf Data-Warehouse-Konzept (DE-588)4406462-7 gnd rswk-swf Big Data (DE-588)4802620-7 s Data-Warehouse-Konzept (DE-588)4406462-7 s Skalierbarkeit (DE-588)4520890-6 s DE-604 Olschimke, Michael 1981- Verfasser (DE-588)1077889798 aut http://digitool.hbz-nrw.de:1801/webclient/DeliveryManager?pid=6483459&custom%5Fatt%5F2=simple%5Fviewer HBZ Datenaustausch application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=028615848&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis Digitalisierung UB Bamberg - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=028615848&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Linstedt, Daniel Olschimke, Michael 1981- Building a scalable data warehouse with Data Vault 2.0 Big Data (DE-588)4802620-7 gnd Skalierbarkeit (DE-588)4520890-6 gnd Data-Warehouse-Konzept (DE-588)4406462-7 gnd |
subject_GND | (DE-588)4802620-7 (DE-588)4520890-6 (DE-588)4406462-7 |
title | Building a scalable data warehouse with Data Vault 2.0 |
title_auth | Building a scalable data warehouse with Data Vault 2.0 |
title_exact_search | Building a scalable data warehouse with Data Vault 2.0 |
title_full | Building a scalable data warehouse with Data Vault 2.0 Daniel Linstedt, Michael Olschimke |
title_fullStr | Building a scalable data warehouse with Data Vault 2.0 Daniel Linstedt, Michael Olschimke |
title_full_unstemmed | Building a scalable data warehouse with Data Vault 2.0 Daniel Linstedt, Michael Olschimke |
title_short | Building a scalable data warehouse with Data Vault 2.0 |
title_sort | building a scalable data warehouse with data vault 2 0 |
topic | Big Data (DE-588)4802620-7 gnd Skalierbarkeit (DE-588)4520890-6 gnd Data-Warehouse-Konzept (DE-588)4406462-7 gnd |
topic_facet | Big Data Skalierbarkeit Data-Warehouse-Konzept |
url | http://digitool.hbz-nrw.de:1801/webclient/DeliveryManager?pid=6483459&custom%5Fatt%5F2=simple%5Fviewer http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=028615848&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=028615848&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT linstedtdaniel buildingascalabledatawarehousewithdatavault20 AT olschimkemichael buildingascalabledatawarehousewithdatavault20 |
Es ist kein Print-Exemplar vorhanden.
Inhaltsverzeichnis