<?xml version="1.0" encoding="UTF-8"?>
<mods xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.loc.gov/mods/v3" version="3.1" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-1.xsd">
  <titleInfo>
    <title>Architecting Data And Machine Learning Platforms</title>
    <subTitle>:Enable Analytics And AI-Driven Innovation In The Cloud</subTitle>
  </titleInfo>
  <name type="personal">
    <namePart>Tranquillin M.</namePart>
    <role>
      <roleTerm authority="marcrelator" type="text">creator</roleTerm>
    </role>
  </name>
  <name type="personal">
    <namePart>Lakshmanan V.  Tekiner F.</namePart>
  </name>
  <typeOfResource/>
  <originInfo>
    <publisher>SPD</publisher>
    <dateIssued>2023</dateIssued>
    <issuance/>
  </originInfo>
  <language>
    <languageTerm authority="iso639-2b" type="code">eng</languageTerm>
  </language>
  <language>
    <languageTerm authority="iso639-2b" type="code">Eng</languageTerm>
  </language>
  <language>
    <languageTerm authority="iso639-2b" type="code">lis</languageTerm>
  </language>
  <language>
    <languageTerm authority="iso639-2b" type="code">h</languageTerm>
  </language>
  <physicalDescription>
    <extent>338</extent>
  </physicalDescription>
  <abstract>Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
1. Modernizing Your Data Platform: An Introductory Overview. . . . . . . . . . . . . . . . . . . . . . . 1
The Data Lifecycle 2
The Journey to Wisdom 2
Water Pipes Analogy 3
Collect 4
Store 5
Process/Transform 7
Analyze/Visualize 8
Activate 9
Limitations of Traditional Approaches 10
Antipattern: Breaking Down Silos Through ETL 10
Antipattern: Centralization of Control 13
Antipattern: Data Marts and Hadoop 15
Creating a Unified Analytics Platform 16
Cloud Instead of On-Premises 17
Drawbacks of Data Marts and Data Lakes 18
Convergence of DWHs and Data Lakes 19
Hybrid Cloud 23
Reasons Why Hybrid Is Necessary 24
Challenges of Hybrid Cloud 25
Why Hybrid Can Work 26
Edge Computing 27
Applying AI 29
Machine Learning 29
Uses of ML 30
Why Cloud for AI? 31
iii
Cloud Infrastructure 31
Democratization 32
Real Time 34
MLOps 35
Core Principles 36
Summary 38
2. Strategic Steps to Innovate with Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Step 1: Strategy and Planning 42
Strategic Goals 43
Identify Stakeholders 45
Change Management 45
Step 2: Reduce Total Cost of Ownership by Adopting a Cloud Approach 47
Why Cloud Costs Less 47
How Much Are the Savings? 49
When Does Cloud Help? 50
Step 3: Break Down Silos 50
Unifying Data Access 51
Choosing Storage 52
Semantic Layer 53
Step 4: Make Decisions in Context Faster 55
Batch to Stream 55
Contextual Information 56
Cost Management 56
Step 5: Leapfrog with Packaged AI Solutions 57
Predictive Analytics 58
Understanding and Generating Unstructured Data 59
Personalization 60
Packaged Solutions 60
Step 6: Operationalize AI-Driven Workflows 61
Identifying the Right Balance of Automation and Assistance 61
Building a Data Culture 62
Populating Your Data Science Team 62
Step 7: Product Management for Data 64
Applying Product Management Principles to Data 64
1. Understand and Maintain a Map of Data Flows in the Enterprise 65
2. Identify Key Metrics 65
3. Agreed Criteria, Committed Roadmap, and Visionary Backlog 66
4. Build for the Customers You Have 67
5. Don’t Shift the Burden of Change Management 67
6. Interview Customers to Discover Their Data Needs 68
7. Whiteboard and Prototype Extensively 68
iv | Table of Contents
8. Build Only What Will Be Used Immediately 69
9. Standardize Common Entities and KPIs 69
10. Provide Self-Service Capabilities in Your Data Platform 70
Summary 70
3. Designing Your Data Team. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Classifying Data Processing Organizations 73
Data Analysis–Driven Organization 76
The Vision 77
The Personas 78
The Technological Framework 80
Data Engineering–Driven Organization 82
The Vision 82
The Personas 84
The Technological Framework 86
Data Science–Driven Organization 89
The Vision 89
The Personas 91
The Technological Framework 92
Summary 94
4. A Migration Framework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Modernize Data Workflows 95
Holistic View 95
Modernize Workflows 96
Transform the Workflow Itself 98
A Four-Step Migration Framework 98
Prepare and Discover 99
Assess and Plan 100
Execute 103
Optimize 104
Estimating the Overall Cost of the Solution 105
Audit of the Existing Infrastructure 105
Request for Information/Proposal and Quotation 106
Proof of Concept/Minimum Viable Product 107
Setting Up Security and Data Governance 108
Framework 108
Artifacts 110
Governance over the Life of the Data 111
Schema, Pipeline, and Data Migration 113
Schema Migration 113
Pipeline Migration 113
Table of Contents | v
Data Migration 116
Migration Stages 121
Summary 122
5. Architecting a Data Lake. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Data Lake and the Cloud—A Perfect Marriage 125
Challenges with On-Premises Data Lakes 125
Benefits of Cloud Data Lakes 126
Design and Implementation 127
Batch and Stream 127
Data Catalog 129
Hadoop Landscape 130
Cloud Data Lake Reference Architecture 131
Integrating the Data Lake: The Real Superpower 136
APIs to Extend the Lake 136
The Evolution of Data Lake with Apache Iceberg,
Apache Hudi, and Delta Lake 136
Interactive Analytics with Notebooks 138
Democratizing Data Processing and Reporting 140
Build Trust in the Data 141
Data Ingestion Is Still an IT Matter 143
ML in the Data Lake 145
Training on Raw Data 145
Predicting in the Data Lake 146
Summary 146
6. Innovating with an Enterprise Data Warehouse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
A Modern Data Platform 149
Organizational Goals 150
Technological Challenges 151
Technology Trends and Tools 152
Hub-and-Spoke Architecture 154
Data Ingest 157
Business Intelligence 161
Transformations 164
Organizational Structure 169
DWH to Enable Data Scientists 171
Query Interface 171
Storage API 172
ML Without Moving Your Data 173
Summary 177
vi | Table of Contents
7. Converging to a Lakehouse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
The Need for a Unique Architecture 179
User Personas 179
Antipattern: Disconnected Systems 180
Antipattern: Duplicated Data 180
Converged Architecture 182
Two Forms 183
Lakehouse on Cloud Storage 184
SQL-First Lakehouse 189
The Benefits of Convergence 193
Summary 195
8. Architectures for Streaming. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
The Value of Streaming 197
Industry Use Cases 198
Streaming Use Cases 199
Streaming Ingest 200
Streaming ETL 200
Streaming ELT 202
Streaming Insert 203
Streaming from Edge Devices (IoT) 204
Streaming Sinks 205
Real-Time Dashboards 205
Live Querying 206
Materialize Some Views 206
Stream Analytics 207
Time-Series Analytics 207
Clickstream Analytics 208
Anomaly Detection 210
Resilient Streaming 211
Continuous Intelligence Through ML 212
Training Model on Streaming Data 212
Streaming ML Inference 215
Automated Actions 215
Summary 216
9. Extending a Data Platform Using Hybrid and Edge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
Why Multicloud? 219
A Single Cloud Is Simpler and Cost-Effective 220
Multicloud Is Inevitable 220
Multicloud Could Be Strategic 221
Multicloud Architectural Patterns 223
Table of Contents | vii
Single Pane of Glass 223
Write Once, Run Anywhere 224
Bursting from On Premises to Cloud 225
Pass-Through from On Premises to Cloud 226
Data Integration Through Streaming 227
Adopting Multicloud 229
Framework 229
Time Scale 231
Define a Target Multicloud Architecture 231
Why Edge Computing? 233
Bandwidth, Latency, and Patchy Connectivity 233
Use Cases 235
Benefits 236
Challenges 237
Edge Computing Architectural Patterns 237
Smart Devices 238
Smart Gateways 238
ML Activation 239
Adopting Edge Computing 241
The Initial Context 241
The Project 241
The Final Outcomes and Next Steps 244
Summary 245
10. AI Application Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Is This an AI/ML Problem? 248
Subfields of AI 248
Generative AI 249
Problems Fit for ML 253
Buy, Adapt, or Build? 254
Data Considerations 254
When to Buy 255
What Can You Buy? 256
How Adapting Works 258
AI Architectures 260
Understanding Unstructured Data 261
Generating Unstructured Data 263
Predicting Outcomes 265
Forecasting Values 266
Anomaly Detection 268
Personalization 269
Automation 271
viii | Table of Contents
Responsible AI 272
AI Principles 273
ML Fairness 274
Explainability 275
Summary 276
11. Architecting an ML Platform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
ML Activities 279
Developing ML Models 280
Labeling Environment 281
Development Environment 281
User Environment 282
Preparing Data 283
Training ML Models 284
Deploying ML Models 286
Deploying to an Endpoint 287
Evaluate Model 288
Hybrid and Multicloud 288
Training-Serving Skew 288
Automation 293
Automate Training and Deployment 293
Orchestration with Pipelines 294
Continuous Evaluation and Training 296
Choosing the ML Framework 298
Team Skills 298
Task Considerations 299
User-Centric 299
Summary 300
12. Data Platform Modernization: A Model Case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
New Technology for a New Era 303
The Need for Change 304
It Is Not Only a Matter of Technology 305
The Beginning of the Journey 307
The Current Environment 307
The Target Environment 309
The PoC Use Case 311
The RFP Responses Proposed by Cloud Vendors 312
The Target Environment 312
The Approach on Migration 316
The RFP Evaluation Process 323
The Scope of the PoC 323
Table of Contents | ix
The Execution of the PoC 324
The Final Decision 325
Peroration 326
Summary 326
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
x | Table of Contents</abstract>
  <identifier type="isbn">9789355428158</identifier>
  <recordInfo>
    <recordCreationDate encoding="marc">241018</recordCreationDate>
    <recordChangeDate encoding="iso8601">20241018155430.0</recordChangeDate>
  </recordInfo>
</mods>
