Deciphering Data Architectures Choosing Between A Modern Data Warehouse, Data Fabric, Data Lakehouse And Data Mesh
Language: English Publication details: SPD 2024Description: 250ISBN:- 9789355425928
| Cover image | Item type | Current library | Home library | Collection | Shelving location | Call number | Materials specified | Vol info | URL | Copy number | Status | Notes | Date due | Barcode | Item holds | Item hold queue priority | Course reserves | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Books | Cummins College of Engineering for Women Pune | 005.74 SER (Browse shelf(Opens below)) | Available (not for issue) | CCEP-BK-67508 |
Table of Contents
Foreword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
Part I. Foundation
1. Big Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
What Is Big Data, and How Can It Help You? 4
Data Maturity 7
Stage 1: Reactive 8
Stage 2: Informative 8
Stage 3: Predictive 9
Stage 4: Transformative 9
Self-Service Business Intelligence 9
Summary 10
2. Types of Data Architectures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Evolution of Data Architectures 14
Relational Data Warehouse 16
Data Lake 18
Modern Data Warehouse 20
Data Fabric 21
Data Lakehouse 21
Data Mesh 22
Summary 23
ix
3. The Architecture Design Session. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
What Is an ADS? 25
Why Hold an ADS? 26
Before the ADS 27
Preparing 27
Inviting Participants 29
Conducting the ADS 31
Introductions 31
Discovery 31
Whiteboarding 36
After the ADS 37
Tips for Conducting an ADS 38
Summary 40
Part II. Common Data Architecture Concepts
4. The Relational Data Warehouse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
What Is a Relational Data Warehouse? 43
What a Data Warehouse Is Not 46
The Top-Down Approach 47
Why Use a Relational Data Warehouse? 49
Drawbacks to Using a Relational Data Warehouse 52
Populating a Data Warehouse 53
How Often to Extract the Data 53
Extraction Methods 54
How to Determine What Data Has Changed Since the Last Extraction 54
The Death of the Relational Data Warehouse Has Been Greatly Exaggerated 56
Summary 57
5. Data Lake. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
What Is a Data Lake? 60
Why Use a Data Lake? 60
Bottom-Up Approach 62
Best Practices for Data Lake Design 63
Multiple Data Lakes 69
Advantages 69
Disadvantages 72
Summary 72
x | Table of Contents
6. Data Storage Solutions and Processes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Data Storage Solutions 76
Data Marts 76
Operational Data Stores 77
Data Hubs 79
Data Processes 81
Master Data Management 81
Data Virtualization and Data Federation 82
Data Catalogs 87
Data Marketplaces 87
Summary 89
7. Approaches to Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Online Transaction Processing Versus Online Analytical Processing 92
Operational and Analytical Data 94
Symmetric Multiprocessing and Massively Parallel Processing 94
Lambda Architecture 96
Kappa Architecture 98
Polyglot Persistence and Polyglot Data Stores 100
Summary 101
8. Approaches to Data Modeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Relational Modeling 103
Keys 103
Entity–Relationship Diagrams 104
Normalization Rules and Forms 104
Tracking Changes 106
Dimensional Modeling 107
Facts, Dimensions, and Keys 107
Tracking Changes 108
Denormalization 109
Common Data Model 111
Data Vault 111
The Kimball and Inmon Data Warehousing Methodologies 113
Inmon’s Top-Down Methodology 114
Kimball’s Bottom-Up Methodology 115
Choosing a Methodology 117
Hybrid Models 118
Methodology Myths 120
Summary 123
Table of Contents | xi
9. Approaches to Data Ingestion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
ETL Versus ELT 125
Reverse ETL 127
Batch Processing Versus Real-Time Processing 129
Batch Processing Pros and Cons 130
Real-Time Processing Pros and Cons 130
Data Governance 131
Summary 132
Part III. Data Architectures
10. The Modern Data Warehouse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
The MDW Architecture 135
Pros and Cons of the MDW Architecture 140
Combining the RDW and Data Lake 142
Data Lake 142
Relational Data Warehouse 142
Stepping Stones to the MDW 143
EDW Augmentation 143
Temporary Data Lake Plus EDW 145
All-in-One 146
Case Study: Wilson & Gunkerk’s Strategic Shift to an MDW 147
Challenge 147
Solution 147
Outcome 148
Summary 148
11. Data Fabric. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
The Data Fabric Architecture 152
Data Access Policies 154
Metadata Catalog 154
Master Data Management 155
Data Virtualization 155
Real-Time Processing 155
APIs 155
Services 156
Products 156
Why Transition from an MDW to a Data Fabric Architecture? 156
Potential Drawbacks 157
Summary 157
xii | Table of Contents
12. Data Lakehouse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Delta Lake Features 160
Performance Improvements 162
The Data Lakehouse Architecture 163
What If You Skip the Relational Data Warehouse? 165
Relational Serving Layer 167
Summary 167
13. Data Mesh Foundation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
A Decentralized Data Architecture 170
Data Mesh Hype 171
Dehghani’s Four Principles of Data Mesh 172
Principle #1: Domain Ownership 172
Principle #2: Data as a Product 173
Principle #3: Self-Serve Data Infrastructure as a Platform 175
Principle #4: Federated Computational Governance 176
The “Pure” Data Mesh 177
Data Domains 178
Data Mesh Logical Architecture 179
Different Topologies 181
Data Mesh Versus Data Fabric 182
Use Cases 183
Summary 185
14. Should You Adopt Data Mesh? Myths, Concerns, and the Future. . . . . . . . . . . . . . . . . 187
Myths 187
Myth: Using Data Mesh Is a Silver Bullet That
Solves All Data Challenges Quickly 187
Myth: A Data Mesh Will Replace Your Data Lake and Data Warehouse 188
Myth: Data Warehouse Projects Are All Failing,
and a Data Mesh Will Solve That Problem 188
Myth: Building a Data Mesh Means Decentralizing Absolutely Everything 188
Myth: You Can Use Data Virtualization to Create a Data Mesh 189
Concerns 190
Philosophical and Conceptual Matters 190
Combining Data in a Decentralized Environment 191
Other Issues of Decentralization 192
Complexity 193
Duplication 193
Feasibility 194
People 196
Domain-Level Barriers 197
Table of Contents | xiii
Organizational Assessment: Should You Adopt a Data Mesh? 198
Recommendations for Implementing a Successful Data Mesh 199
The Future of Data Mesh 201
Zooming Out: Understanding Data Architectures and Their Applications 202
Summary 203
Part IV. People, Processes, and Technology
15. People and Processes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Team Organization: Roles and Responsibilities 208
Roles for MDW, Data Fabric, or Data Lakehouse 208
Roles for Data Mesh 210
Why Projects Fail: Pitfalls and Prevention 213
Pitfall: Allowing Executives to Think That BI Is “Easy” 213
Pitfall: Using the Wrong Technologies 213
Pitfall: Gathering Too Many Business Requirements 213
Pitfall: Gathering Too Few Business Requirements 214
Pitfall: Presenting Reports Without Validating Their Contents First 214
Pitfall: Hiring an Inexperienced Consulting Company 214
Pitfall: Hiring a Consulting Company That Outsources
Development to Offshore Workers 215
Pitfall: Passing Project Ownership Off to Consultants 215
Pitfall: Neglecting the Need to Transfer Knowledge
Back into the Organization 215
Pitfall: Slashing the Budget Midway Through the Project 215
Pitfall: Starting with an End Date and Working Backward 216
Pitfall: Structuring the Data Warehouse to Reflect the
Source Data Rather Than the Business’s Needs 216
Pitfall: Presenting End Users with a Solution with Slow Response Times or
Other Performance Issues 216
Pitfall: Overdesigning (or Underdesigning) Your Data Architecture 217
Pitfall: Poor Communication Between IT and the Business Domains 217
Tips for Success 217
Don’t Skimp on Your Investment 217
Involve Users, Show Them Results, and Get Them Excited 218
Add Value to New Reports and Dashboards 219
Ask End Users to Build a Prototype 219
Find a Project Champion/Sponsor 219
Make a Project Plan That Aims for 80% Efficiency 220
Summary 220
xiv | Table of Contents
16. Technologies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
Choosing a Platform 223
Open Source Solutions 223
On-Premises Solutions 226
Cloud Provider Solutions 227
Cloud Service Models 230
Major Cloud Providers 232
Multi-Cloud Solutions 232
Software Frameworks 235
Hadoop 235
Databricks 238
Snowflake 240
Summary 241
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
Table of Contents | xv
There are no comments on this title.