Amazon cover image
Image from Amazon.com
Image from Google Jackets

Deciphering Data Architectures Choosing Between A Modern Data Warehouse, Data Fabric, Data Lakehouse And Data Mesh

By: Language: English Publication details: SPD 2024Description: 250ISBN:
  • 9789355425928
Summary: Table of Contents Foreword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix Part I. Foundation 1. Big Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 What Is Big Data, and How Can It Help You? 4 Data Maturity 7 Stage 1: Reactive 8 Stage 2: Informative 8 Stage 3: Predictive 9 Stage 4: Transformative 9 Self-Service Business Intelligence 9 Summary 10 2. Types of Data Architectures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Evolution of Data Architectures 14 Relational Data Warehouse 16 Data Lake 18 Modern Data Warehouse 20 Data Fabric 21 Data Lakehouse 21 Data Mesh 22 Summary 23 ix 3. The Architecture Design Session. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 What Is an ADS? 25 Why Hold an ADS? 26 Before the ADS 27 Preparing 27 Inviting Participants 29 Conducting the ADS 31 Introductions 31 Discovery 31 Whiteboarding 36 After the ADS 37 Tips for Conducting an ADS 38 Summary 40 Part II. Common Data Architecture Concepts 4. The Relational Data Warehouse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 What Is a Relational Data Warehouse? 43 What a Data Warehouse Is Not 46 The Top-Down Approach 47 Why Use a Relational Data Warehouse? 49 Drawbacks to Using a Relational Data Warehouse 52 Populating a Data Warehouse 53 How Often to Extract the Data 53 Extraction Methods 54 How to Determine What Data Has Changed Since the Last Extraction 54 The Death of the Relational Data Warehouse Has Been Greatly Exaggerated 56 Summary 57 5. Data Lake. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 What Is a Data Lake? 60 Why Use a Data Lake? 60 Bottom-Up Approach 62 Best Practices for Data Lake Design 63 Multiple Data Lakes 69 Advantages 69 Disadvantages 72 Summary 72 x | Table of Contents 6. Data Storage Solutions and Processes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Data Storage Solutions 76 Data Marts 76 Operational Data Stores 77 Data Hubs 79 Data Processes 81 Master Data Management 81 Data Virtualization and Data Federation 82 Data Catalogs 87 Data Marketplaces 87 Summary 89 7. Approaches to Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Online Transaction Processing Versus Online Analytical Processing 92 Operational and Analytical Data 94 Symmetric Multiprocessing and Massively Parallel Processing 94 Lambda Architecture 96 Kappa Architecture 98 Polyglot Persistence and Polyglot Data Stores 100 Summary 101 8. Approaches to Data Modeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Relational Modeling 103 Keys 103 Entity–Relationship Diagrams 104 Normalization Rules and Forms 104 Tracking Changes 106 Dimensional Modeling 107 Facts, Dimensions, and Keys 107 Tracking Changes 108 Denormalization 109 Common Data Model 111 Data Vault 111 The Kimball and Inmon Data Warehousing Methodologies 113 Inmon’s Top-Down Methodology 114 Kimball’s Bottom-Up Methodology 115 Choosing a Methodology 117 Hybrid Models 118 Methodology Myths 120 Summary 123 Table of Contents | xi 9. Approaches to Data Ingestion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 ETL Versus ELT 125 Reverse ETL 127 Batch Processing Versus Real-Time Processing 129 Batch Processing Pros and Cons 130 Real-Time Processing Pros and Cons 130 Data Governance 131 Summary 132 Part III. Data Architectures 10. The Modern Data Warehouse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 The MDW Architecture 135 Pros and Cons of the MDW Architecture 140 Combining the RDW and Data Lake 142 Data Lake 142 Relational Data Warehouse 142 Stepping Stones to the MDW 143 EDW Augmentation 143 Temporary Data Lake Plus EDW 145 All-in-One 146 Case Study: Wilson & Gunkerk’s Strategic Shift to an MDW 147 Challenge 147 Solution 147 Outcome 148 Summary 148 11. Data Fabric. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 The Data Fabric Architecture 152 Data Access Policies 154 Metadata Catalog 154 Master Data Management 155 Data Virtualization 155 Real-Time Processing 155 APIs 155 Services 156 Products 156 Why Transition from an MDW to a Data Fabric Architecture? 156 Potential Drawbacks 157 Summary 157 xii | Table of Contents 12. Data Lakehouse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Delta Lake Features 160 Performance Improvements 162 The Data Lakehouse Architecture 163 What If You Skip the Relational Data Warehouse? 165 Relational Serving Layer 167 Summary 167 13. Data Mesh Foundation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 A Decentralized Data Architecture 170 Data Mesh Hype 171 Dehghani’s Four Principles of Data Mesh 172 Principle #1: Domain Ownership 172 Principle #2: Data as a Product 173 Principle #3: Self-Serve Data Infrastructure as a Platform 175 Principle #4: Federated Computational Governance 176 The “Pure” Data Mesh 177 Data Domains 178 Data Mesh Logical Architecture 179 Different Topologies 181 Data Mesh Versus Data Fabric 182 Use Cases 183 Summary 185 14. Should You Adopt Data Mesh? Myths, Concerns, and the Future. . . . . . . . . . . . . . . . . 187 Myths 187 Myth: Using Data Mesh Is a Silver Bullet That Solves All Data Challenges Quickly 187 Myth: A Data Mesh Will Replace Your Data Lake and Data Warehouse 188 Myth: Data Warehouse Projects Are All Failing, and a Data Mesh Will Solve That Problem 188 Myth: Building a Data Mesh Means Decentralizing Absolutely Everything 188 Myth: You Can Use Data Virtualization to Create a Data Mesh 189 Concerns 190 Philosophical and Conceptual Matters 190 Combining Data in a Decentralized Environment 191 Other Issues of Decentralization 192 Complexity 193 Duplication 193 Feasibility 194 People 196 Domain-Level Barriers 197 Table of Contents | xiii Organizational Assessment: Should You Adopt a Data Mesh? 198 Recommendations for Implementing a Successful Data Mesh 199 The Future of Data Mesh 201 Zooming Out: Understanding Data Architectures and Their Applications 202 Summary 203 Part IV. People, Processes, and Technology 15. People and Processes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Team Organization: Roles and Responsibilities 208 Roles for MDW, Data Fabric, or Data Lakehouse 208 Roles for Data Mesh 210 Why Projects Fail: Pitfalls and Prevention 213 Pitfall: Allowing Executives to Think That BI Is “Easy” 213 Pitfall: Using the Wrong Technologies 213 Pitfall: Gathering Too Many Business Requirements 213 Pitfall: Gathering Too Few Business Requirements 214 Pitfall: Presenting Reports Without Validating Their Contents First 214 Pitfall: Hiring an Inexperienced Consulting Company 214 Pitfall: Hiring a Consulting Company That Outsources Development to Offshore Workers 215 Pitfall: Passing Project Ownership Off to Consultants 215 Pitfall: Neglecting the Need to Transfer Knowledge Back into the Organization 215 Pitfall: Slashing the Budget Midway Through the Project 215 Pitfall: Starting with an End Date and Working Backward 216 Pitfall: Structuring the Data Warehouse to Reflect the Source Data Rather Than the Business’s Needs 216 Pitfall: Presenting End Users with a Solution with Slow Response Times or Other Performance Issues 216 Pitfall: Overdesigning (or Underdesigning) Your Data Architecture 217 Pitfall: Poor Communication Between IT and the Business Domains 217 Tips for Success 217 Don’t Skimp on Your Investment 217 Involve Users, Show Them Results, and Get Them Excited 218 Add Value to New Reports and Dashboards 219 Ask End Users to Build a Prototype 219 Find a Project Champion/Sponsor 219 Make a Project Plan That Aims for 80% Efficiency 220 Summary 220 xiv | Table of Contents 16. Technologies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Choosing a Platform 223 Open Source Solutions 223 On-Premises Solutions 226 Cloud Provider Solutions 227 Cloud Service Models 230 Major Cloud Providers 232 Multi-Cloud Solutions 232 Software Frameworks 235 Hadoop 235 Databricks 238 Snowflake 240 Summary 241 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Table of Contents | xv
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)
Holdings
Cover image Item type Current library Home library Collection Shelving location Call number Materials specified Vol info URL Copy number Status Notes Date due Barcode Item holds Item hold queue priority Course reserves
Books Cummins College of Engineering for Women Pune 005.74 SER (Browse shelf(Opens below)) Available (not for issue) CCEP-BK-67508

Table of Contents
Foreword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
Part I. Foundation
1. Big Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
What Is Big Data, and How Can It Help You? 4
Data Maturity 7
Stage 1: Reactive 8
Stage 2: Informative 8
Stage 3: Predictive 9
Stage 4: Transformative 9
Self-Service Business Intelligence 9
Summary 10
2. Types of Data Architectures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Evolution of Data Architectures 14
Relational Data Warehouse 16
Data Lake 18
Modern Data Warehouse 20
Data Fabric 21
Data Lakehouse 21
Data Mesh 22
Summary 23
ix
3. The Architecture Design Session. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
What Is an ADS? 25
Why Hold an ADS? 26
Before the ADS 27
Preparing 27
Inviting Participants 29
Conducting the ADS 31
Introductions 31
Discovery 31
Whiteboarding 36
After the ADS 37
Tips for Conducting an ADS 38
Summary 40
Part II. Common Data Architecture Concepts
4. The Relational Data Warehouse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
What Is a Relational Data Warehouse? 43
What a Data Warehouse Is Not 46
The Top-Down Approach 47
Why Use a Relational Data Warehouse? 49
Drawbacks to Using a Relational Data Warehouse 52
Populating a Data Warehouse 53
How Often to Extract the Data 53
Extraction Methods 54
How to Determine What Data Has Changed Since the Last Extraction 54
The Death of the Relational Data Warehouse Has Been Greatly Exaggerated 56
Summary 57
5. Data Lake. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
What Is a Data Lake? 60
Why Use a Data Lake? 60
Bottom-Up Approach 62
Best Practices for Data Lake Design 63
Multiple Data Lakes 69
Advantages 69
Disadvantages 72
Summary 72
x | Table of Contents
6. Data Storage Solutions and Processes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Data Storage Solutions 76
Data Marts 76
Operational Data Stores 77
Data Hubs 79
Data Processes 81
Master Data Management 81
Data Virtualization and Data Federation 82
Data Catalogs 87
Data Marketplaces 87
Summary 89
7. Approaches to Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Online Transaction Processing Versus Online Analytical Processing 92
Operational and Analytical Data 94
Symmetric Multiprocessing and Massively Parallel Processing 94
Lambda Architecture 96
Kappa Architecture 98
Polyglot Persistence and Polyglot Data Stores 100
Summary 101
8. Approaches to Data Modeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Relational Modeling 103
Keys 103
Entity–Relationship Diagrams 104
Normalization Rules and Forms 104
Tracking Changes 106
Dimensional Modeling 107
Facts, Dimensions, and Keys 107
Tracking Changes 108
Denormalization 109
Common Data Model 111
Data Vault 111
The Kimball and Inmon Data Warehousing Methodologies 113
Inmon’s Top-Down Methodology 114
Kimball’s Bottom-Up Methodology 115
Choosing a Methodology 117
Hybrid Models 118
Methodology Myths 120
Summary 123
Table of Contents | xi
9. Approaches to Data Ingestion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
ETL Versus ELT 125
Reverse ETL 127
Batch Processing Versus Real-Time Processing 129
Batch Processing Pros and Cons 130
Real-Time Processing Pros and Cons 130
Data Governance 131
Summary 132
Part III. Data Architectures
10. The Modern Data Warehouse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
The MDW Architecture 135
Pros and Cons of the MDW Architecture 140
Combining the RDW and Data Lake 142
Data Lake 142
Relational Data Warehouse 142
Stepping Stones to the MDW 143
EDW Augmentation 143
Temporary Data Lake Plus EDW 145
All-in-One 146
Case Study: Wilson & Gunkerk’s Strategic Shift to an MDW 147
Challenge 147
Solution 147
Outcome 148
Summary 148
11. Data Fabric. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
The Data Fabric Architecture 152
Data Access Policies 154
Metadata Catalog 154
Master Data Management 155
Data Virtualization 155
Real-Time Processing 155
APIs 155
Services 156
Products 156
Why Transition from an MDW to a Data Fabric Architecture? 156
Potential Drawbacks 157
Summary 157
xii | Table of Contents
12. Data Lakehouse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Delta Lake Features 160
Performance Improvements 162
The Data Lakehouse Architecture 163
What If You Skip the Relational Data Warehouse? 165
Relational Serving Layer 167
Summary 167
13. Data Mesh Foundation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
A Decentralized Data Architecture 170
Data Mesh Hype 171
Dehghani’s Four Principles of Data Mesh 172
Principle #1: Domain Ownership 172
Principle #2: Data as a Product 173
Principle #3: Self-Serve Data Infrastructure as a Platform 175
Principle #4: Federated Computational Governance 176
The “Pure” Data Mesh 177
Data Domains 178
Data Mesh Logical Architecture 179
Different Topologies 181
Data Mesh Versus Data Fabric 182
Use Cases 183
Summary 185
14. Should You Adopt Data Mesh? Myths, Concerns, and the Future. . . . . . . . . . . . . . . . . 187
Myths 187
Myth: Using Data Mesh Is a Silver Bullet That
Solves All Data Challenges Quickly 187
Myth: A Data Mesh Will Replace Your Data Lake and Data Warehouse 188
Myth: Data Warehouse Projects Are All Failing,
and a Data Mesh Will Solve That Problem 188
Myth: Building a Data Mesh Means Decentralizing Absolutely Everything 188
Myth: You Can Use Data Virtualization to Create a Data Mesh 189
Concerns 190
Philosophical and Conceptual Matters 190
Combining Data in a Decentralized Environment 191
Other Issues of Decentralization 192
Complexity 193
Duplication 193
Feasibility 194
People 196
Domain-Level Barriers 197
Table of Contents | xiii
Organizational Assessment: Should You Adopt a Data Mesh? 198
Recommendations for Implementing a Successful Data Mesh 199
The Future of Data Mesh 201
Zooming Out: Understanding Data Architectures and Their Applications 202
Summary 203
Part IV. People, Processes, and Technology
15. People and Processes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Team Organization: Roles and Responsibilities 208
Roles for MDW, Data Fabric, or Data Lakehouse 208
Roles for Data Mesh 210
Why Projects Fail: Pitfalls and Prevention 213
Pitfall: Allowing Executives to Think That BI Is “Easy” 213
Pitfall: Using the Wrong Technologies 213
Pitfall: Gathering Too Many Business Requirements 213
Pitfall: Gathering Too Few Business Requirements 214
Pitfall: Presenting Reports Without Validating Their Contents First 214
Pitfall: Hiring an Inexperienced Consulting Company 214
Pitfall: Hiring a Consulting Company That Outsources
Development to Offshore Workers 215
Pitfall: Passing Project Ownership Off to Consultants 215
Pitfall: Neglecting the Need to Transfer Knowledge
Back into the Organization 215
Pitfall: Slashing the Budget Midway Through the Project 215
Pitfall: Starting with an End Date and Working Backward 216
Pitfall: Structuring the Data Warehouse to Reflect the
Source Data Rather Than the Business’s Needs 216
Pitfall: Presenting End Users with a Solution with Slow Response Times or
Other Performance Issues 216
Pitfall: Overdesigning (or Underdesigning) Your Data Architecture 217
Pitfall: Poor Communication Between IT and the Business Domains 217
Tips for Success 217
Don’t Skimp on Your Investment 217
Involve Users, Show Them Results, and Get Them Excited 218
Add Value to New Reports and Dashboards 219
Ask End Users to Build a Prototype 219
Find a Project Champion/Sponsor 219
Make a Project Plan That Aims for 80% Efficiency 220
Summary 220
xiv | Table of Contents
16. Technologies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
Choosing a Platform 223
Open Source Solutions 223
On-Premises Solutions 226
Cloud Provider Solutions 227
Cloud Service Models 230
Major Cloud Providers 232
Multi-Cloud Solutions 232
Software Frameworks 235
Hadoop 235
Databricks 238
Snowflake 240
Summary 241
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
Table of Contents | xv

There are no comments on this title.

to post a comment.