Normal view MARC view ISBD view

Training Data For Machine Learning (Record no. 359834)

MARC details
000 -LEADER
fixed length control field	11530 a2200157 4500
005 - DATE AND TIME OF LATEST TRANSACTION
control field	20241018125254.0
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field	241018b \|\|\|\|\|\|\|\| \|\|\|\| 00\| 0 eng d
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number	9789355421920
041 ## - LANGUAGE CODE
Language code of text/sound track or separate title	English
100 ## - MAIN ENTRY--PERSONAL NAME
Author	Sarkis A.
245 ## - TITLE STATEMENT
Title	Training Data For Machine Learning
Remainder of title	Human Supervision From Annotation To Data Science
260 ## - PUBLICATION, DISTRIBUTION, ETC.
Name of publisher, distributor, etc.	SPD
Date of publication, distribution, etc.	2023
300 ## - PHYSICAL DESCRIPTION
Extent	306
520 ## - SUMMARY, ETC.
Summary, etc.	Table of Contents<br/>Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv<br/>1. Training Data Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1<br/>Training Data Intents 2<br/>What Can You Do With Training Data? 3<br/>What Is Training Data Most Concerned With? 4<br/>Training Data Opportunities 11<br/>Business Transformation 11<br/>Training Data Efficiency 12<br/>Tooling Proficiency 13<br/>Process Improvement Opportunities 13<br/>Why Training Data Matters 13<br/>ML Applications Are Becoming Mainstream 14<br/>The Foundation of Successful AI 15<br/>Training Data Is Here to Stay 16<br/>Training Data Controls the ML Program 16<br/>New Types of Users 17<br/>Training Data in the Wild 18<br/>What Makes Training Data Difficult? 18<br/>The Art of Supervising Machines 20<br/>A New Thing for Data Science 20<br/>ML Program Ecosystem 21<br/>Data-Centric Machine Learning 22<br/>Failures 23<br/>History of Development Affects Training Data Too 24<br/>What Training Data Is Not 25<br/>Generative AI 25<br/>v<br/>Human Alignment Is Human Supervision 27<br/>Summary 28<br/>2. Getting Up and Running. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31<br/>Introduction 31<br/>Getting Up and Running 32<br/>Installation 33<br/>Tasks Setup 34<br/>Annotator Setup 35<br/>Data Setup 35<br/>Workflow Setup 35<br/>Data Catalog Setup 36<br/>Initial Usage 36<br/>Optimization 36<br/>Tools Overview 37<br/>Training Data for Machine Learning 38<br/>Growing Selection of Tools 38<br/>People, Process, and Data 38<br/>Embedded Supervision 39<br/>Human Computer Supervision 39<br/>Separation of End Concerns 40<br/>Standards 40<br/>Many Personas 40<br/>A Paradigm to Deliver Machine Learning Software 41<br/>Trade-Offs 41<br/>Costs 41<br/>Installed Versus Software as a Service 42<br/>Development System 43<br/>Scale 44<br/>Installation Options 48<br/>Annotation Interfaces 50<br/>Modeling Integration 50<br/>Multi-User versus Single-User Systems 50<br/>Integrations 51<br/>Scope 51<br/>Hidden Assumptions 56<br/>Security 57<br/>Open Source and Closed Source 60<br/>History 63<br/>Open Source Standards 63<br/>vi \| Table of Contents<br/>Realizing the Need for Dedicated Tooling 63<br/>Summary 66<br/>3. Schema. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67<br/>Schema Deep Dive Introduction 67<br/>Labels and Attributes—What Is It? 68<br/>What Do We Care About? 68<br/>Introduction to Labels 68<br/>Attributes Introduction 69<br/>Attribute Complexity Exceeds Spatial Complexity 73<br/>Technical Overview 76<br/>Spatial Representation—Where Is It? 78<br/>Using Spatial Types to Prevent Social Bias 78<br/>Trade-Offs with Types 82<br/>Computer Vision Spatial Type Examples 83<br/>Relationships, Sequences, Time Series: When Is It? 87<br/>Sequences and Relationships 87<br/>When 87<br/>Guides and Instructions 88<br/>Judgment Calls 89<br/>Relation of Machine Learning Tasks to Training Data 89<br/>Semantic Segmentation 90<br/>Image Classification (Tags) 92<br/>Object Detection 92<br/>Pose Estimation 92<br/>Relationship of Tasks to Training Data Types 93<br/>General Concepts 93<br/>Instance Concept Refresher 93<br/>Upgrading Data Over Time 94<br/>The Boundary Between Modeling and Training Data 95<br/>Raw Data Concepts 96<br/>Summary 97<br/>4. Data Engineering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99<br/>Introduction 99<br/>Who Wants the Data? 100<br/>A Game of Telephone 101<br/>Planning a Great System 103<br/>Naive and Training Data–Centric Approaches 104<br/>Raw Data Storage 109<br/>Table of Contents \| vii<br/>By Reference or by Value 110<br/>Off-the-Shelf Dedicated Training Data Tooling on Your Own Hardware 111<br/>Data Storage: Where Does the Data Rest? 111<br/>External Reference Connection 112<br/>Raw Media (BLOB)–Type Specific 112<br/>Formatting and Mapping 114<br/>User-Defined Types (Compound Files) 114<br/>Defining DataMaps 114<br/>Ingest Wizards 114<br/>Organizing Data and Useful Storage 115<br/>Remote Storage 116<br/>Versioning 116<br/>Data Access 118<br/>Disambiguating Storage, Ingestion, Export, and Access 119<br/>File-Based Exports 119<br/>Streaming Data 119<br/>Queries Introduction 120<br/>Integrations with the Ecosystem 121<br/>Security 121<br/>Access Control 121<br/>Identity and Authorization 121<br/>Example of Setting Permissions 122<br/>Signed URLs 122<br/>Personally Identifiable Information 124<br/>Pre-Labeling 124<br/>Updating Data 125<br/>Summary 127<br/>5. Workflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129<br/>Introduction 129<br/>Glue Between Tech and People 130<br/>Why Are Human Tasks Needed? 132<br/>Partnering with Non-Software Users in New Ways 132<br/>Getting Started with Human Tasks 132<br/>Basics 133<br/>Schemas’ Staying Power 134<br/>User Roles 135<br/>Training 135<br/>Gold Standard Training 136<br/>Task Assignment Concepts 136<br/>viii \| Table of Contents<br/>Do You Need to Customize the Interface? 137<br/>How Long Will the Average Annotator Be Using It? 137<br/>Tasks and Project Structure 137<br/>Quality Assurance 138<br/>Annotator Trust 139<br/>Annotators Are Partners 139<br/>Common Causes of Training Data Errors 141<br/>Task Review Loops 141<br/>Analytics 143<br/>Annotation Metrics Examples 143<br/>Data Exploration 144<br/>Models 146<br/>Using the Model to Debug the Humans 146<br/>Distinctions Between a Dataset, Model, and Model Run 147<br/>Getting Data to Models 148<br/>Dataflow 148<br/>Overview of Streaming 149<br/>Data Organization 149<br/>Pipelines and Processes 150<br/>Direct Annotation 153<br/>Business Process Integration 154<br/>Attributes 154<br/>Depth of Labeling 154<br/>Supervising Existing Data 155<br/>Interactive Automations 155<br/>Example: Semantic Segmentation Auto Bordering 156<br/>Video 157<br/>Summary 162<br/>6. Theories, Concepts, and Maintenance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165<br/>Introduction 165<br/>Theories 166<br/>A System Is Only as Useful as Its Schema 166<br/>Who Supervises the Data Matters 167<br/>Intentionally Chosen Data Is Best 168<br/>Working with Historical Data 169<br/>Training Data Is Like Code 170<br/>Surface Assumptions Around Usage of Your Training Data 171<br/>Human Supervision Is Different from Classic Datasets 173<br/>General Concepts 176<br/>Table of Contents \| ix<br/>Data Relevancy 176<br/>Need for Both Qualitative and Quantitative Evaluations 177<br/>Iterations 178<br/>Prioritization: What to Label 178<br/>Transfer Learning’s Relation to Datasets (Fine-Tuning) 178<br/>Per-Sample Judgment Calls 180<br/>Ethical and Privacy Considerations 181<br/>Bias 181<br/>Bias Is Hard to Escape 183<br/>Metadata 183<br/>Preventing Lost Metadata 184<br/>Train/Val/Test Is the Cherry on Top 185<br/>Sample Creation 185<br/>Simple Schema for a Strawberry Picking System 186<br/>Geometric Representations 187<br/>Binary Classification 188<br/>Let’s Manually Create Our First Set 189<br/>Upgraded Classification 192<br/>Where Is the Traffic Light? 193<br/>Maintenance 193<br/>Actions 193<br/>Net Lift 195<br/>Levels of System Maturity of Training Data Operations 196<br/>Applied Versus Research Sets 197<br/>Training Data Management 198<br/>Quality 199<br/>Completed Tasks 199<br/>Freshness 201<br/>Maintaining Set Metadata 201<br/>Task Management 201<br/>Summary 202<br/>7. AI Transformation and Use Cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203<br/>Introduction 203<br/>AI Transformation 204<br/>Seeing Your Day-to-Day Work as Annotation 205<br/>The Creative Revolution of Data-centric AI 207<br/>You Can Create New Data 207<br/>You Can Change What Data You Collect 208<br/>You Can Change the Meaning of the Data 209<br/>x \| Table of Contents<br/>You Can Create! 209<br/>Think Step Function Improvement for Major Projects 209<br/>Build Your AI Data to Secure Your AI Present and Future 210<br/>Appoint a Leader: The Director of AI Data 210<br/>New Expectations People Have for the Future of AI 211<br/>Sometimes Proposals and Corrections, Sometimes Replacement 212<br/>Upstream Producers and Downstream Consumers 212<br/>Spectrum of Training Data Team Engagement 217<br/>Dedicated Producers and Other Teams 218<br/>Organizing Producers from Other Teams 218<br/>Use Case Discovery 221<br/>Rubric for Good Use Cases 222<br/>Evaluating a Use Case Against the Rubric 225<br/>Conceptual Effects of Use Cases 227<br/>The New “Crowd Sourcing”: Your Own Experts 229<br/>Key Levers on Training Data ROI 230<br/>What the Annotated Data Represents 230<br/>Trade-Offs of Controlling Your Own Training Data 230<br/>The Need for Hardware 231<br/>Common Project Mistakes 231<br/>Modern Training Data Tools 232<br/>Think Learning Curve, Not Perfection 232<br/>New Training and Knowledge Are Required 233<br/>How Companies Produce and Consume Data 234<br/>Trap to Avoid: Premature Optimization in Training Data 234<br/>No Silver Bullets 236<br/>Culture of Training Data 236<br/>New Engineering Principles 237<br/>Summary 238<br/>8. Automation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239<br/>Introduction 239<br/>Getting Started 240<br/>Motivation: When to Use These Methods? 240<br/>Check What Part of the Schema a Method Is Designed to Work On 241<br/>What Do People Actually Use? 241<br/>What Kind of Results Can I Expect? 242<br/>Common Confusions 243<br/>User Interface Optimizations 244<br/>Risks 244<br/>Table of Contents \| xi<br/>Trade-Offs 245<br/>Nature of Automations 246<br/>Setup Costs 246<br/>How to Benchmark Well 246<br/>How to Scope the Automation Relative to the Problem 247<br/>Correction Time 248<br/>Subject Matter Experts 248<br/>Consider How the Automations Stack 249<br/>Pre-Labeling 249<br/>Standard Pre-Labeling 249<br/>Pre-Labeling a Portion of the Data Only 252<br/>Interactive Annotation Automation 254<br/>Creating Your Own 255<br/>Technical Setup Notes 255<br/>What Is a Watcher? (Observer Pattern) 256<br/>How to Use a Watcher 256<br/>Interactive Capturing of a Region of Interest 257<br/>Interactive Drawing Box to Polygon Using GrabCut 257<br/>Full Image Model Prediction Example 258<br/>Example: Person Detection for Different Attribute 258<br/>Quality Assurance Automation 259<br/>Using the Model to Debug the Humans 259<br/>Automated Checklist Example 259<br/>Domain-Specific Reasonableness Checks 260<br/>Data Discovery: What to Label 260<br/>Human Exploration 260<br/>Raw Data Exploration 261<br/>Metadata Exploration 261<br/>Adding Pre-Labeling-Based Metadata 262<br/>Augmentation 262<br/>Better Models Are Better than Better Augmentation 263<br/>To Augment or Not to Augment 263<br/>Simulation and Synthetic Data 265<br/>Simulations Still Need Human Review 265<br/>Media Specific 267<br/>What Methods Work with Which Media? 268<br/>Considerations 269<br/>Media-Specific Research 269<br/>Domain Specific 270<br/>Geometry-Based Labeling 270<br/>xii \| Table of Contents<br/>Heuristics-Based Labeling 271<br/>Summary 271<br/>9. Case Studies and Stories. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273<br/>Introduction 273<br/>Industry 274<br/>A Security Startup Adopts Training Data Tools 274<br/>Quality Assurance at a Large-Scale Self-Driving Project 275<br/>Big-Tech Challenges 281<br/>Insurance Tech Startup Lessons 288<br/>Stories 289<br/>An Academic Approach to Training Data 292<br/>Kaggle TSA Competition 292<br/>Summary 295<br/>Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297<br/>Table of Contents \| xiii<br/>
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Koha item type	Books

Holdings
Withdrawn status	Lost status	Source of classification or shelving scheme	Damaged status	Not for loan	Home library	Current library	Date acquired	Vendor	Net Price	Total Checkouts	Full call number	Barcode	Date last seen	Actual Price	Bill Date	Koha item type
		Dewey Decimal Classification			Cummins College of Engineering for Women Pune	Cummins College of Engineering for Women Pune	11/10/2024	115	1237.50		006.31 SAR	CCEP-BK-67498	11/10/2024	1650.00	11/10/2024	Books