Training Data For Machine Learning (Record no. 359834)
[ view plain ]
| 000 -LEADER | |
|---|---|
| fixed length control field | 11530 a2200157 4500 |
| 005 - DATE AND TIME OF LATEST TRANSACTION | |
| control field | 20241018125254.0 |
| 008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION | |
| fixed length control field | 241018b |||||||| |||| 00| 0 eng d |
| 020 ## - INTERNATIONAL STANDARD BOOK NUMBER | |
| International Standard Book Number | 9789355421920 |
| 041 ## - LANGUAGE CODE | |
| Language code of text/sound track or separate title | English |
| 100 ## - MAIN ENTRY--PERSONAL NAME | |
| Author | Sarkis A. |
| 245 ## - TITLE STATEMENT | |
| Title | Training Data For Machine Learning |
| Remainder of title | Human Supervision From Annotation To Data Science |
| 260 ## - PUBLICATION, DISTRIBUTION, ETC. | |
| Name of publisher, distributor, etc. | SPD |
| Date of publication, distribution, etc. | 2023 |
| 300 ## - PHYSICAL DESCRIPTION | |
| Extent | 306 |
| 520 ## - SUMMARY, ETC. | |
| Summary, etc. | Table of Contents<br/>Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv<br/>1. Training Data Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1<br/>Training Data Intents 2<br/>What Can You Do With Training Data? 3<br/>What Is Training Data Most Concerned With? 4<br/>Training Data Opportunities 11<br/>Business Transformation 11<br/>Training Data Efficiency 12<br/>Tooling Proficiency 13<br/>Process Improvement Opportunities 13<br/>Why Training Data Matters 13<br/>ML Applications Are Becoming Mainstream 14<br/>The Foundation of Successful AI 15<br/>Training Data Is Here to Stay 16<br/>Training Data Controls the ML Program 16<br/>New Types of Users 17<br/>Training Data in the Wild 18<br/>What Makes Training Data Difficult? 18<br/>The Art of Supervising Machines 20<br/>A New Thing for Data Science 20<br/>ML Program Ecosystem 21<br/>Data-Centric Machine Learning 22<br/>Failures 23<br/>History of Development Affects Training Data Too 24<br/>What Training Data Is Not 25<br/>Generative AI 25<br/>v<br/>Human Alignment Is Human Supervision 27<br/>Summary 28<br/>2. Getting Up and Running. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31<br/>Introduction 31<br/>Getting Up and Running 32<br/>Installation 33<br/>Tasks Setup 34<br/>Annotator Setup 35<br/>Data Setup 35<br/>Workflow Setup 35<br/>Data Catalog Setup 36<br/>Initial Usage 36<br/>Optimization 36<br/>Tools Overview 37<br/>Training Data for Machine Learning 38<br/>Growing Selection of Tools 38<br/>People, Process, and Data 38<br/>Embedded Supervision 39<br/>Human Computer Supervision 39<br/>Separation of End Concerns 40<br/>Standards 40<br/>Many Personas 40<br/>A Paradigm to Deliver Machine Learning Software 41<br/>Trade-Offs 41<br/>Costs 41<br/>Installed Versus Software as a Service 42<br/>Development System 43<br/>Scale 44<br/>Installation Options 48<br/>Annotation Interfaces 50<br/>Modeling Integration 50<br/>Multi-User versus Single-User Systems 50<br/>Integrations 51<br/>Scope 51<br/>Hidden Assumptions 56<br/>Security 57<br/>Open Source and Closed Source 60<br/>History 63<br/>Open Source Standards 63<br/>vi | Table of Contents<br/>Realizing the Need for Dedicated Tooling 63<br/>Summary 66<br/>3. Schema. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67<br/>Schema Deep Dive Introduction 67<br/>Labels and Attributes—What Is It? 68<br/>What Do We Care About? 68<br/>Introduction to Labels 68<br/>Attributes Introduction 69<br/>Attribute Complexity Exceeds Spatial Complexity 73<br/>Technical Overview 76<br/>Spatial Representation—Where Is It? 78<br/>Using Spatial Types to Prevent Social Bias 78<br/>Trade-Offs with Types 82<br/>Computer Vision Spatial Type Examples 83<br/>Relationships, Sequences, Time Series: When Is It? 87<br/>Sequences and Relationships 87<br/>When 87<br/>Guides and Instructions 88<br/>Judgment Calls 89<br/>Relation of Machine Learning Tasks to Training Data 89<br/>Semantic Segmentation 90<br/>Image Classification (Tags) 92<br/>Object Detection 92<br/>Pose Estimation 92<br/>Relationship of Tasks to Training Data Types 93<br/>General Concepts 93<br/>Instance Concept Refresher 93<br/>Upgrading Data Over Time 94<br/>The Boundary Between Modeling and Training Data 95<br/>Raw Data Concepts 96<br/>Summary 97<br/>4. Data Engineering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99<br/>Introduction 99<br/>Who Wants the Data? 100<br/>A Game of Telephone 101<br/>Planning a Great System 103<br/>Naive and Training Data–Centric Approaches 104<br/>Raw Data Storage 109<br/>Table of Contents | vii<br/>By Reference or by Value 110<br/>Off-the-Shelf Dedicated Training Data Tooling on Your Own Hardware 111<br/>Data Storage: Where Does the Data Rest? 111<br/>External Reference Connection 112<br/>Raw Media (BLOB)–Type Specific 112<br/>Formatting and Mapping 114<br/>User-Defined Types (Compound Files) 114<br/>Defining DataMaps 114<br/>Ingest Wizards 114<br/>Organizing Data and Useful Storage 115<br/>Remote Storage 116<br/>Versioning 116<br/>Data Access 118<br/>Disambiguating Storage, Ingestion, Export, and Access 119<br/>File-Based Exports 119<br/>Streaming Data 119<br/>Queries Introduction 120<br/>Integrations with the Ecosystem 121<br/>Security 121<br/>Access Control 121<br/>Identity and Authorization 121<br/>Example of Setting Permissions 122<br/>Signed URLs 122<br/>Personally Identifiable Information 124<br/>Pre-Labeling 124<br/>Updating Data 125<br/>Summary 127<br/>5. Workflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129<br/>Introduction 129<br/>Glue Between Tech and People 130<br/>Why Are Human Tasks Needed? 132<br/>Partnering with Non-Software Users in New Ways 132<br/>Getting Started with Human Tasks 132<br/>Basics 133<br/>Schemas’ Staying Power 134<br/>User Roles 135<br/>Training 135<br/>Gold Standard Training 136<br/>Task Assignment Concepts 136<br/>viii | Table of Contents<br/>Do You Need to Customize the Interface? 137<br/>How Long Will the Average Annotator Be Using It? 137<br/>Tasks and Project Structure 137<br/>Quality Assurance 138<br/>Annotator Trust 139<br/>Annotators Are Partners 139<br/>Common Causes of Training Data Errors 141<br/>Task Review Loops 141<br/>Analytics 143<br/>Annotation Metrics Examples 143<br/>Data Exploration 144<br/>Models 146<br/>Using the Model to Debug the Humans 146<br/>Distinctions Between a Dataset, Model, and Model Run 147<br/>Getting Data to Models 148<br/>Dataflow 148<br/>Overview of Streaming 149<br/>Data Organization 149<br/>Pipelines and Processes 150<br/>Direct Annotation 153<br/>Business Process Integration 154<br/>Attributes 154<br/>Depth of Labeling 154<br/>Supervising Existing Data 155<br/>Interactive Automations 155<br/>Example: Semantic Segmentation Auto Bordering 156<br/>Video 157<br/>Summary 162<br/>6. Theories, Concepts, and Maintenance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165<br/>Introduction 165<br/>Theories 166<br/>A System Is Only as Useful as Its Schema 166<br/>Who Supervises the Data Matters 167<br/>Intentionally Chosen Data Is Best 168<br/>Working with Historical Data 169<br/>Training Data Is Like Code 170<br/>Surface Assumptions Around Usage of Your Training Data 171<br/>Human Supervision Is Different from Classic Datasets 173<br/>General Concepts 176<br/>Table of Contents | ix<br/>Data Relevancy 176<br/>Need for Both Qualitative and Quantitative Evaluations 177<br/>Iterations 178<br/>Prioritization: What to Label 178<br/>Transfer Learning’s Relation to Datasets (Fine-Tuning) 178<br/>Per-Sample Judgment Calls 180<br/>Ethical and Privacy Considerations 181<br/>Bias 181<br/>Bias Is Hard to Escape 183<br/>Metadata 183<br/>Preventing Lost Metadata 184<br/>Train/Val/Test Is the Cherry on Top 185<br/>Sample Creation 185<br/>Simple Schema for a Strawberry Picking System 186<br/>Geometric Representations 187<br/>Binary Classification 188<br/>Let’s Manually Create Our First Set 189<br/>Upgraded Classification 192<br/>Where Is the Traffic Light? 193<br/>Maintenance 193<br/>Actions 193<br/>Net Lift 195<br/>Levels of System Maturity of Training Data Operations 196<br/>Applied Versus Research Sets 197<br/>Training Data Management 198<br/>Quality 199<br/>Completed Tasks 199<br/>Freshness 201<br/>Maintaining Set Metadata 201<br/>Task Management 201<br/>Summary 202<br/>7. AI Transformation and Use Cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203<br/>Introduction 203<br/>AI Transformation 204<br/>Seeing Your Day-to-Day Work as Annotation 205<br/>The Creative Revolution of Data-centric AI 207<br/>You Can Create New Data 207<br/>You Can Change What Data You Collect 208<br/>You Can Change the Meaning of the Data 209<br/>x | Table of Contents<br/>You Can Create! 209<br/>Think Step Function Improvement for Major Projects 209<br/>Build Your AI Data to Secure Your AI Present and Future 210<br/>Appoint a Leader: The Director of AI Data 210<br/>New Expectations People Have for the Future of AI 211<br/>Sometimes Proposals and Corrections, Sometimes Replacement 212<br/>Upstream Producers and Downstream Consumers 212<br/>Spectrum of Training Data Team Engagement 217<br/>Dedicated Producers and Other Teams 218<br/>Organizing Producers from Other Teams 218<br/>Use Case Discovery 221<br/>Rubric for Good Use Cases 222<br/>Evaluating a Use Case Against the Rubric 225<br/>Conceptual Effects of Use Cases 227<br/>The New “Crowd Sourcing”: Your Own Experts 229<br/>Key Levers on Training Data ROI 230<br/>What the Annotated Data Represents 230<br/>Trade-Offs of Controlling Your Own Training Data 230<br/>The Need for Hardware 231<br/>Common Project Mistakes 231<br/>Modern Training Data Tools 232<br/>Think Learning Curve, Not Perfection 232<br/>New Training and Knowledge Are Required 233<br/>How Companies Produce and Consume Data 234<br/>Trap to Avoid: Premature Optimization in Training Data 234<br/>No Silver Bullets 236<br/>Culture of Training Data 236<br/>New Engineering Principles 237<br/>Summary 238<br/>8. Automation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239<br/>Introduction 239<br/>Getting Started 240<br/>Motivation: When to Use These Methods? 240<br/>Check What Part of the Schema a Method Is Designed to Work On 241<br/>What Do People Actually Use? 241<br/>What Kind of Results Can I Expect? 242<br/>Common Confusions 243<br/>User Interface Optimizations 244<br/>Risks 244<br/>Table of Contents | xi<br/>Trade-Offs 245<br/>Nature of Automations 246<br/>Setup Costs 246<br/>How to Benchmark Well 246<br/>How to Scope the Automation Relative to the Problem 247<br/>Correction Time 248<br/>Subject Matter Experts 248<br/>Consider How the Automations Stack 249<br/>Pre-Labeling 249<br/>Standard Pre-Labeling 249<br/>Pre-Labeling a Portion of the Data Only 252<br/>Interactive Annotation Automation 254<br/>Creating Your Own 255<br/>Technical Setup Notes 255<br/>What Is a Watcher? (Observer Pattern) 256<br/>How to Use a Watcher 256<br/>Interactive Capturing of a Region of Interest 257<br/>Interactive Drawing Box to Polygon Using GrabCut 257<br/>Full Image Model Prediction Example 258<br/>Example: Person Detection for Different Attribute 258<br/>Quality Assurance Automation 259<br/>Using the Model to Debug the Humans 259<br/>Automated Checklist Example 259<br/>Domain-Specific Reasonableness Checks 260<br/>Data Discovery: What to Label 260<br/>Human Exploration 260<br/>Raw Data Exploration 261<br/>Metadata Exploration 261<br/>Adding Pre-Labeling-Based Metadata 262<br/>Augmentation 262<br/>Better Models Are Better than Better Augmentation 263<br/>To Augment or Not to Augment 263<br/>Simulation and Synthetic Data 265<br/>Simulations Still Need Human Review 265<br/>Media Specific 267<br/>What Methods Work with Which Media? 268<br/>Considerations 269<br/>Media-Specific Research 269<br/>Domain Specific 270<br/>Geometry-Based Labeling 270<br/>xii | Table of Contents<br/>Heuristics-Based Labeling 271<br/>Summary 271<br/>9. Case Studies and Stories. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273<br/>Introduction 273<br/>Industry 274<br/>A Security Startup Adopts Training Data Tools 274<br/>Quality Assurance at a Large-Scale Self-Driving Project 275<br/>Big-Tech Challenges 281<br/>Insurance Tech Startup Lessons 288<br/>Stories 289<br/>An Academic Approach to Training Data 292<br/>Kaggle TSA Competition 292<br/>Summary 295<br/>Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297<br/>Table of Contents | xiii<br/> |
| 942 ## - ADDED ENTRY ELEMENTS (KOHA) | |
| Koha item type | Books |
| Withdrawn status | Lost status | Source of classification or shelving scheme | Damaged status | Not for loan | Home library | Current library | Date acquired | Vendor | Net Price | Total Checkouts | Full call number | Barcode | Date last seen | Actual Price | Bill Date | Koha item type |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dewey Decimal Classification | Cummins College of Engineering for Women Pune | Cummins College of Engineering for Women Pune | 11/10/2024 | 115 | 1237.50 | 006.31 SAR | CCEP-BK-67498 | 11/10/2024 | 1650.00 | 11/10/2024 | Books |