Amazon cover image
Image from Amazon.com
Image from Google Jackets

Architecting Data And Machine Learning Platforms :Enable Analytics And AI-Driven Innovation In The Cloud

By: Contributor(s): Language: English Publication details: SPD 2023Description: 338ISBN:
  • 9789355428158
Summary: Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1. Modernizing Your Data Platform: An Introductory Overview. . . . . . . . . . . . . . . . . . . . . . . 1 The Data Lifecycle 2 The Journey to Wisdom 2 Water Pipes Analogy 3 Collect 4 Store 5 Process/Transform 7 Analyze/Visualize 8 Activate 9 Limitations of Traditional Approaches 10 Antipattern: Breaking Down Silos Through ETL 10 Antipattern: Centralization of Control 13 Antipattern: Data Marts and Hadoop 15 Creating a Unified Analytics Platform 16 Cloud Instead of On-Premises 17 Drawbacks of Data Marts and Data Lakes 18 Convergence of DWHs and Data Lakes 19 Hybrid Cloud 23 Reasons Why Hybrid Is Necessary 24 Challenges of Hybrid Cloud 25 Why Hybrid Can Work 26 Edge Computing 27 Applying AI 29 Machine Learning 29 Uses of ML 30 Why Cloud for AI? 31 iii Cloud Infrastructure 31 Democratization 32 Real Time 34 MLOps 35 Core Principles 36 Summary 38 2. Strategic Steps to Innovate with Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Step 1: Strategy and Planning 42 Strategic Goals 43 Identify Stakeholders 45 Change Management 45 Step 2: Reduce Total Cost of Ownership by Adopting a Cloud Approach 47 Why Cloud Costs Less 47 How Much Are the Savings? 49 When Does Cloud Help? 50 Step 3: Break Down Silos 50 Unifying Data Access 51 Choosing Storage 52 Semantic Layer 53 Step 4: Make Decisions in Context Faster 55 Batch to Stream 55 Contextual Information 56 Cost Management 56 Step 5: Leapfrog with Packaged AI Solutions 57 Predictive Analytics 58 Understanding and Generating Unstructured Data 59 Personalization 60 Packaged Solutions 60 Step 6: Operationalize AI-Driven Workflows 61 Identifying the Right Balance of Automation and Assistance 61 Building a Data Culture 62 Populating Your Data Science Team 62 Step 7: Product Management for Data 64 Applying Product Management Principles to Data 64 1. Understand and Maintain a Map of Data Flows in the Enterprise 65 2. Identify Key Metrics 65 3. Agreed Criteria, Committed Roadmap, and Visionary Backlog 66 4. Build for the Customers You Have 67 5. Don’t Shift the Burden of Change Management 67 6. Interview Customers to Discover Their Data Needs 68 7. Whiteboard and Prototype Extensively 68 iv | Table of Contents 8. Build Only What Will Be Used Immediately 69 9. Standardize Common Entities and KPIs 69 10. Provide Self-Service Capabilities in Your Data Platform 70 Summary 70 3. Designing Your Data Team. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Classifying Data Processing Organizations 73 Data Analysis–Driven Organization 76 The Vision 77 The Personas 78 The Technological Framework 80 Data Engineering–Driven Organization 82 The Vision 82 The Personas 84 The Technological Framework 86 Data Science–Driven Organization 89 The Vision 89 The Personas 91 The Technological Framework 92 Summary 94 4. A Migration Framework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Modernize Data Workflows 95 Holistic View 95 Modernize Workflows 96 Transform the Workflow Itself 98 A Four-Step Migration Framework 98 Prepare and Discover 99 Assess and Plan 100 Execute 103 Optimize 104 Estimating the Overall Cost of the Solution 105 Audit of the Existing Infrastructure 105 Request for Information/Proposal and Quotation 106 Proof of Concept/Minimum Viable Product 107 Setting Up Security and Data Governance 108 Framework 108 Artifacts 110 Governance over the Life of the Data 111 Schema, Pipeline, and Data Migration 113 Schema Migration 113 Pipeline Migration 113 Table of Contents | v Data Migration 116 Migration Stages 121 Summary 122 5. Architecting a Data Lake. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Data Lake and the Cloud—A Perfect Marriage 125 Challenges with On-Premises Data Lakes 125 Benefits of Cloud Data Lakes 126 Design and Implementation 127 Batch and Stream 127 Data Catalog 129 Hadoop Landscape 130 Cloud Data Lake Reference Architecture 131 Integrating the Data Lake: The Real Superpower 136 APIs to Extend the Lake 136 The Evolution of Data Lake with Apache Iceberg, Apache Hudi, and Delta Lake 136 Interactive Analytics with Notebooks 138 Democratizing Data Processing and Reporting 140 Build Trust in the Data 141 Data Ingestion Is Still an IT Matter 143 ML in the Data Lake 145 Training on Raw Data 145 Predicting in the Data Lake 146 Summary 146 6. Innovating with an Enterprise Data Warehouse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 A Modern Data Platform 149 Organizational Goals 150 Technological Challenges 151 Technology Trends and Tools 152 Hub-and-Spoke Architecture 154 Data Ingest 157 Business Intelligence 161 Transformations 164 Organizational Structure 169 DWH to Enable Data Scientists 171 Query Interface 171 Storage API 172 ML Without Moving Your Data 173 Summary 177 vi | Table of Contents 7. Converging to a Lakehouse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 The Need for a Unique Architecture 179 User Personas 179 Antipattern: Disconnected Systems 180 Antipattern: Duplicated Data 180 Converged Architecture 182 Two Forms 183 Lakehouse on Cloud Storage 184 SQL-First Lakehouse 189 The Benefits of Convergence 193 Summary 195 8. Architectures for Streaming. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 The Value of Streaming 197 Industry Use Cases 198 Streaming Use Cases 199 Streaming Ingest 200 Streaming ETL 200 Streaming ELT 202 Streaming Insert 203 Streaming from Edge Devices (IoT) 204 Streaming Sinks 205 Real-Time Dashboards 205 Live Querying 206 Materialize Some Views 206 Stream Analytics 207 Time-Series Analytics 207 Clickstream Analytics 208 Anomaly Detection 210 Resilient Streaming 211 Continuous Intelligence Through ML 212 Training Model on Streaming Data 212 Streaming ML Inference 215 Automated Actions 215 Summary 216 9. Extending a Data Platform Using Hybrid and Edge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 Why Multicloud? 219 A Single Cloud Is Simpler and Cost-Effective 220 Multicloud Is Inevitable 220 Multicloud Could Be Strategic 221 Multicloud Architectural Patterns 223 Table of Contents | vii Single Pane of Glass 223 Write Once, Run Anywhere 224 Bursting from On Premises to Cloud 225 Pass-Through from On Premises to Cloud 226 Data Integration Through Streaming 227 Adopting Multicloud 229 Framework 229 Time Scale 231 Define a Target Multicloud Architecture 231 Why Edge Computing? 233 Bandwidth, Latency, and Patchy Connectivity 233 Use Cases 235 Benefits 236 Challenges 237 Edge Computing Architectural Patterns 237 Smart Devices 238 Smart Gateways 238 ML Activation 239 Adopting Edge Computing 241 The Initial Context 241 The Project 241 The Final Outcomes and Next Steps 244 Summary 245 10. AI Application Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 Is This an AI/ML Problem? 248 Subfields of AI 248 Generative AI 249 Problems Fit for ML 253 Buy, Adapt, or Build? 254 Data Considerations 254 When to Buy 255 What Can You Buy? 256 How Adapting Works 258 AI Architectures 260 Understanding Unstructured Data 261 Generating Unstructured Data 263 Predicting Outcomes 265 Forecasting Values 266 Anomaly Detection 268 Personalization 269 Automation 271 viii | Table of Contents Responsible AI 272 AI Principles 273 ML Fairness 274 Explainability 275 Summary 276 11. Architecting an ML Platform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 ML Activities 279 Developing ML Models 280 Labeling Environment 281 Development Environment 281 User Environment 282 Preparing Data 283 Training ML Models 284 Deploying ML Models 286 Deploying to an Endpoint 287 Evaluate Model 288 Hybrid and Multicloud 288 Training-Serving Skew 288 Automation 293 Automate Training and Deployment 293 Orchestration with Pipelines 294 Continuous Evaluation and Training 296 Choosing the ML Framework 298 Team Skills 298 Task Considerations 299 User-Centric 299 Summary 300 12. Data Platform Modernization: A Model Case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 New Technology for a New Era 303 The Need for Change 304 It Is Not Only a Matter of Technology 305 The Beginning of the Journey 307 The Current Environment 307 The Target Environment 309 The PoC Use Case 311 The RFP Responses Proposed by Cloud Vendors 312 The Target Environment 312 The Approach on Migration 316 The RFP Evaluation Process 323 The Scope of the PoC 323 Table of Contents | ix The Execution of the PoC 324 The Final Decision 325 Peroration 326 Summary 326 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 x | Table of Contents
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)
Holdings
Cover image Item type Current library Home library Collection Shelving location Call number Materials specified Vol info URL Copy number Status Notes Date due Barcode Item holds Item hold queue priority Course reserves
Books Cummins College of Engineering for Women Pune 006.31 TRA (Browse shelf(Opens below)) Available (not for issue) CCEP-BK-67503

Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
1. Modernizing Your Data Platform: An Introductory Overview. . . . . . . . . . . . . . . . . . . . . . . 1
The Data Lifecycle 2
The Journey to Wisdom 2
Water Pipes Analogy 3
Collect 4
Store 5
Process/Transform 7
Analyze/Visualize 8
Activate 9
Limitations of Traditional Approaches 10
Antipattern: Breaking Down Silos Through ETL 10
Antipattern: Centralization of Control 13
Antipattern: Data Marts and Hadoop 15
Creating a Unified Analytics Platform 16
Cloud Instead of On-Premises 17
Drawbacks of Data Marts and Data Lakes 18
Convergence of DWHs and Data Lakes 19
Hybrid Cloud 23
Reasons Why Hybrid Is Necessary 24
Challenges of Hybrid Cloud 25
Why Hybrid Can Work 26
Edge Computing 27
Applying AI 29
Machine Learning 29
Uses of ML 30
Why Cloud for AI? 31
iii
Cloud Infrastructure 31
Democratization 32
Real Time 34
MLOps 35
Core Principles 36
Summary 38
2. Strategic Steps to Innovate with Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Step 1: Strategy and Planning 42
Strategic Goals 43
Identify Stakeholders 45
Change Management 45
Step 2: Reduce Total Cost of Ownership by Adopting a Cloud Approach 47
Why Cloud Costs Less 47
How Much Are the Savings? 49
When Does Cloud Help? 50
Step 3: Break Down Silos 50
Unifying Data Access 51
Choosing Storage 52
Semantic Layer 53
Step 4: Make Decisions in Context Faster 55
Batch to Stream 55
Contextual Information 56
Cost Management 56
Step 5: Leapfrog with Packaged AI Solutions 57
Predictive Analytics 58
Understanding and Generating Unstructured Data 59
Personalization 60
Packaged Solutions 60
Step 6: Operationalize AI-Driven Workflows 61
Identifying the Right Balance of Automation and Assistance 61
Building a Data Culture 62
Populating Your Data Science Team 62
Step 7: Product Management for Data 64
Applying Product Management Principles to Data 64
1. Understand and Maintain a Map of Data Flows in the Enterprise 65
2. Identify Key Metrics 65
3. Agreed Criteria, Committed Roadmap, and Visionary Backlog 66
4. Build for the Customers You Have 67
5. Don’t Shift the Burden of Change Management 67
6. Interview Customers to Discover Their Data Needs 68
7. Whiteboard and Prototype Extensively 68
iv | Table of Contents
8. Build Only What Will Be Used Immediately 69
9. Standardize Common Entities and KPIs 69
10. Provide Self-Service Capabilities in Your Data Platform 70
Summary 70
3. Designing Your Data Team. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Classifying Data Processing Organizations 73
Data Analysis–Driven Organization 76
The Vision 77
The Personas 78
The Technological Framework 80
Data Engineering–Driven Organization 82
The Vision 82
The Personas 84
The Technological Framework 86
Data Science–Driven Organization 89
The Vision 89
The Personas 91
The Technological Framework 92
Summary 94
4. A Migration Framework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Modernize Data Workflows 95
Holistic View 95
Modernize Workflows 96
Transform the Workflow Itself 98
A Four-Step Migration Framework 98
Prepare and Discover 99
Assess and Plan 100
Execute 103
Optimize 104
Estimating the Overall Cost of the Solution 105
Audit of the Existing Infrastructure 105
Request for Information/Proposal and Quotation 106
Proof of Concept/Minimum Viable Product 107
Setting Up Security and Data Governance 108
Framework 108
Artifacts 110
Governance over the Life of the Data 111
Schema, Pipeline, and Data Migration 113
Schema Migration 113
Pipeline Migration 113
Table of Contents | v
Data Migration 116
Migration Stages 121
Summary 122
5. Architecting a Data Lake. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Data Lake and the Cloud—A Perfect Marriage 125
Challenges with On-Premises Data Lakes 125
Benefits of Cloud Data Lakes 126
Design and Implementation 127
Batch and Stream 127
Data Catalog 129
Hadoop Landscape 130
Cloud Data Lake Reference Architecture 131
Integrating the Data Lake: The Real Superpower 136
APIs to Extend the Lake 136
The Evolution of Data Lake with Apache Iceberg,
Apache Hudi, and Delta Lake 136
Interactive Analytics with Notebooks 138
Democratizing Data Processing and Reporting 140
Build Trust in the Data 141
Data Ingestion Is Still an IT Matter 143
ML in the Data Lake 145
Training on Raw Data 145
Predicting in the Data Lake 146
Summary 146
6. Innovating with an Enterprise Data Warehouse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
A Modern Data Platform 149
Organizational Goals 150
Technological Challenges 151
Technology Trends and Tools 152
Hub-and-Spoke Architecture 154
Data Ingest 157
Business Intelligence 161
Transformations 164
Organizational Structure 169
DWH to Enable Data Scientists 171
Query Interface 171
Storage API 172
ML Without Moving Your Data 173
Summary 177
vi | Table of Contents
7. Converging to a Lakehouse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
The Need for a Unique Architecture 179
User Personas 179
Antipattern: Disconnected Systems 180
Antipattern: Duplicated Data 180
Converged Architecture 182
Two Forms 183
Lakehouse on Cloud Storage 184
SQL-First Lakehouse 189
The Benefits of Convergence 193
Summary 195
8. Architectures for Streaming. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
The Value of Streaming 197
Industry Use Cases 198
Streaming Use Cases 199
Streaming Ingest 200
Streaming ETL 200
Streaming ELT 202
Streaming Insert 203
Streaming from Edge Devices (IoT) 204
Streaming Sinks 205
Real-Time Dashboards 205
Live Querying 206
Materialize Some Views 206
Stream Analytics 207
Time-Series Analytics 207
Clickstream Analytics 208
Anomaly Detection 210
Resilient Streaming 211
Continuous Intelligence Through ML 212
Training Model on Streaming Data 212
Streaming ML Inference 215
Automated Actions 215
Summary 216
9. Extending a Data Platform Using Hybrid and Edge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
Why Multicloud? 219
A Single Cloud Is Simpler and Cost-Effective 220
Multicloud Is Inevitable 220
Multicloud Could Be Strategic 221
Multicloud Architectural Patterns 223
Table of Contents | vii
Single Pane of Glass 223
Write Once, Run Anywhere 224
Bursting from On Premises to Cloud 225
Pass-Through from On Premises to Cloud 226
Data Integration Through Streaming 227
Adopting Multicloud 229
Framework 229
Time Scale 231
Define a Target Multicloud Architecture 231
Why Edge Computing? 233
Bandwidth, Latency, and Patchy Connectivity 233
Use Cases 235
Benefits 236
Challenges 237
Edge Computing Architectural Patterns 237
Smart Devices 238
Smart Gateways 238
ML Activation 239
Adopting Edge Computing 241
The Initial Context 241
The Project 241
The Final Outcomes and Next Steps 244
Summary 245
10. AI Application Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Is This an AI/ML Problem? 248
Subfields of AI 248
Generative AI 249
Problems Fit for ML 253
Buy, Adapt, or Build? 254
Data Considerations 254
When to Buy 255
What Can You Buy? 256
How Adapting Works 258
AI Architectures 260
Understanding Unstructured Data 261
Generating Unstructured Data 263
Predicting Outcomes 265
Forecasting Values 266
Anomaly Detection 268
Personalization 269
Automation 271
viii | Table of Contents
Responsible AI 272
AI Principles 273
ML Fairness 274
Explainability 275
Summary 276
11. Architecting an ML Platform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
ML Activities 279
Developing ML Models 280
Labeling Environment 281
Development Environment 281
User Environment 282
Preparing Data 283
Training ML Models 284
Deploying ML Models 286
Deploying to an Endpoint 287
Evaluate Model 288
Hybrid and Multicloud 288
Training-Serving Skew 288
Automation 293
Automate Training and Deployment 293
Orchestration with Pipelines 294
Continuous Evaluation and Training 296
Choosing the ML Framework 298
Team Skills 298
Task Considerations 299
User-Centric 299
Summary 300
12. Data Platform Modernization: A Model Case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
New Technology for a New Era 303
The Need for Change 304
It Is Not Only a Matter of Technology 305
The Beginning of the Journey 307
The Current Environment 307
The Target Environment 309
The PoC Use Case 311
The RFP Responses Proposed by Cloud Vendors 312
The Target Environment 312
The Approach on Migration 316
The RFP Evaluation Process 323
The Scope of the PoC 323
Table of Contents | ix
The Execution of the PoC 324
The Final Decision 325
Peroration 326
Summary 326
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
x | Table of Contents

There are no comments on this title.

to post a comment.