Amazon cover image
Image from Amazon.com
Image from Google Jackets

Python Data Science Handbook :Essential Tools For Working With Data

By: Language: English Publication details: SPD/O'reilly 2024Edition: 2ndDescription: 563ISBN:
  • 9789355422552
Summary: Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix Part I. Jupyter: Beyond Normal Python 1. Getting Started in IPython and Jupyter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Launching the IPython Shell 3 Launching the Jupyter Notebook 4 Help and Documentation in IPython 4 Accessing Documentation with ? 5 Accessing Source Code with ?? 6 Exploring Modules with Tab Completion 7 Keyboard Shortcuts in the IPython Shell 9 Navigation Shortcuts 10 Text Entry Shortcuts 10 Command History Shortcuts 10 Miscellaneous Shortcuts 12 2. Enhanced Interactive Features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 IPython Magic Commands 13 Running External Code: %run 13 Timing Code Execution: %timeit 14 Help on Magic Functions: ?, %magic, and %lsmagic 15 Input and Output History 15 IPython’s In and Out Objects 15 Underscore Shortcuts and Previous Outputs 16 Suppressing Output 17 Related Magic Commands 17 v IPython and Shell Commands 18 Quick Introduction to the Shell 18 Shell Commands in IPython 19 Passing Values to and from the Shell 20 Shell-Related Magic Commands 20 3. Debugging and Pro€ling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Errors and Debugging 22 Controlling Exceptions: %xmode 22 Debugging: When Reading Tracebacks Is Not Enough 24 Profiling and Timing Code 26 Timing Code Snippets: %timeit and %time 27 Profiling Full Scripts: %prun 28 Line-by-Line Profiling with %lprun 29 Profiling Memory Use: %memit and %mprun 30 More IPython Resources 31 Web Resources 31 Books 32 Part II. Introduction to NumPy 4. Understanding Data Types in Python. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 A Python Integer Is More Than Just an Integer 36 A Python List Is More Than Just a List 37 Fixed-Type Arrays in Python 39 Creating Arrays from Python Lists 39 Creating Arrays from Scratch 40 NumPy Standard Data Types 41 5. The Basics of NumPy Arrays. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 NumPy Array Attributes 44 Array Indexing: Accessing Single Elements 44 Array Slicing: Accessing Subarrays 45 One-Dimensional Subarrays 45 Multidimensional Subarrays 46 Subarrays as No-Copy Views 47 Creating Copies of Arrays 47 Reshaping of Arrays 48 Array Concatenation and Splitting 49 Concatenation of Arrays 49 Splitting of Arrays 50 vi | Table of Contents 6. Computation on NumPy Arrays: Universal Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 The Slowness of Loops 51 Introducing Ufuncs 52 Exploring NumPy’s Ufuncs 53 Array Arithmetic 53 Absolute Value 55 Trigonometric Functions 55 Exponents and Logarithms 56 Specialized Ufuncs 56 Advanced Ufunc Features 57 Specifying Output 57 Aggregations 58 Outer Products 59 Ufuncs: Learning More 59 7. Aggregations: min, max, and Everything in Between. . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Summing the Values in an Array 60 Minimum and Maximum 61 Multidimensional Aggregates 61 Other Aggregation Functions 62 Example: What Is the Average Height of US Presidents? 63 8. Computation on Arrays: Broadcasting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Introducing Broadcasting 65 Rules of Broadcasting 67 Broadcasting Example 1 68 Broadcasting Example 2 68 Broadcasting Example 3 69 Broadcasting in Practice 70 Centering an Array 70 Plotting a Two-Dimensional Function 71 9. Comparisons, Masks, and Boolean Logic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Example: Counting Rainy Days 72 Comparison Operators as Ufuncs 73 Working with Boolean Arrays 75 Counting Entries 75 Boolean Operators 76 Boolean Arrays as Masks 77 Using the Keywords and/or Versus the Operators &/| 78 Table of Contents | vii 10. Fancy Indexing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Exploring Fancy Indexing 80 Combined Indexing 81 Example: Selecting Random Points 82 Modifying Values with Fancy Indexing 84 Example: Binning Data 85 11. Sorting Arrays. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Fast Sorting in NumPy: np.sort and np.argsort 89 Sorting Along Rows or Columns 89 Partial Sorts: Partitioning 90 Example: k-Nearest Neighbors 90 12. Structured Data: NumPy’s Structured Arrays. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Exploring Structured Array Creation 96 More Advanced Compound Types 97 Record Arrays: Structured Arrays with a Twist 97 On to Pandas 98 Part III. Data Manipulation with Pandas 13. Introducing Pandas Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 The Pandas Series Object 101 Series as Generalized NumPy Array 102 Series as Specialized Dictionary 103 Constructing Series Objects 104 The Pandas DataFrame Object 104 DataFrame as Generalized NumPy Array 105 DataFrame as Specialized Dictionary 106 Constructing DataFrame Objects 106 The Pandas Index Object 108 Index as Immutable Array 108 Index as Ordered Set 108 14. Data Indexing and Selection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Data Selection in Series 110 Series as Dictionary 110 Series as One-Dimensional Array 111 Indexers: loc and iloc 112 Data Selection in DataFrames 113 viii | Table of Contents DataFrame as Dictionary 113 DataFrame as Two-Dimensional Array 115 Additional Indexing Conventions 116 15. Operating on Data in Pandas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Ufuncs: Index Preservation 118 Ufuncs: Index Alignment 119 Index Alignment in Series 119 Index Alignment in DataFrames 120 Ufuncs: Operations Between DataFrames and Series 121 16. Handling Missing Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Trade-offs in Missing Data Conventions 123 Missing Data in Pandas 124 None as a Sentinel Value 125 NaN: Missing Numerical Data 125 NaN and None in Pandas 126 Pandas Nullable Dtypes 127 Operating on Null Values 128 Detecting Null Values 128 Dropping Null Values 129 Filling Null Values 130 17. Hierarchical Indexing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 A Multiply Indexed Series 132 The Bad Way 133 The Better Way: The Pandas MultiIndex 133 MultiIndex as Extra Dimension 134 Methods of MultiIndex Creation 136 Explicit MultiIndex Constructors 136 MultiIndex Level Names 137 MultiIndex for Columns 138 Indexing and Slicing a MultiIndex 138 Multiply Indexed Series 139 Multiply Indexed DataFrames 140 Rearranging Multi-Indexes 141 Sorted and Unsorted Indices 141 Stacking and Unstacking Indices 143 Index Setting and Resetting 143 18. Combining Datasets: concat and append. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Recall: Concatenation of NumPy Arrays 146 Table of Contents | ix Simple Concatenation with pd.concat 147 Duplicate Indices 148 Concatenation with Joins 149 The append Method 150 19. Combining Datasets: merge and join. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Relational Algebra 151 Categories of Joins 152 One-to-One Joins 152 Many-to-One Joins 153 Many-to-Many Joins 153 Specification of the Merge Key 154 The on Keyword 154 The left_on and right_on Keywords 155 The left_index and right_index Keywords 155 Specifying Set Arithmetic for Joins 157 Overlapping Column Names: The suffixes Keyword 158 Example: US States Data 159 20. Aggregation and Grouping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Planets Data 165 Simple Aggregation in Pandas 165 groupby: Split, Apply, Combine 167 Split, Apply, Combine 167 The GroupBy Object 169 Aggregate, Filter, Transform, Apply 171 Specifying the Split Key 174 Grouping Example 175 21. Pivot Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 Motivating Pivot Tables 176 Pivot Tables by Hand 177 Pivot Table Syntax 178 Multilevel Pivot Tables 178 Additional Pivot Table Options 179 Example: Birthrate Data 180 22. Vectorized String Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Introducing Pandas String Operations 185 Tables of Pandas String Methods 186 Methods Similar to Python String Methods 186 Methods Using Regular Expressions 187 x | Table of Contents Miscellaneous Methods 188 Example: Recipe Database 190 A Simple Recipe Recommender 192 Going Further with Recipes 193 23. Working with Time Series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 Dates and Times in Python 195 Native Python Dates and Times: datetime and dateutil 195 Typed Arrays of Times: NumPy’s datetime64 196 Dates and Times in Pandas: The Best of Both Worlds 197 Pandas Time Series: Indexing by Time 198 Pandas Time Series Data Structures 199 Regular Sequences: pd.date_range 200 Frequencies and Offsets 201 Resampling, Shifting, and Windowing 202 Resampling and Converting Frequencies 203 Time Shifts 205 Rolling Windows 206 Example: Visualizing Seattle Bicycle Counts 208 Visualizing the Data 209 Digging into the Data 211 24. High-Performance Pandas: eval and query. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 Motivating query and eval: Compound Expressions 215 pandas.eval for Efficient Operations 216 DataFrame.eval for Column-Wise Operations 218 Assignment in DataFrame.eval 219 Local Variables in DataFrame.eval 219 The DataFrame.query Method 220 Performance: When to Use These Functions 220 Further Resources 221 Part IV. Visualization with Matplotlib 25. General Matplotlib Tips. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 Importing Matplotlib 225 Setting Styles 225 show or No show? How to Display Your Plots 226 Plotting from a Script 226 Plotting from an IPython Shell 227 Plotting from a Jupyter Notebook 227 Table of Contents | xi Saving Figures to File 228 Two Interfaces for the Price of One 230 26. Simple Line Plots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 Adjusting the Plot: Line Colors and Styles 235 Adjusting the Plot: Axes Limits 238 Labeling Plots 240 Matplotlib Gotchas 242 27. Simple Scatter Plots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 Scatter Plots with plt.plot 244 Scatter Plots with plt.scatter 247 plot Versus scatter: A Note on Efficiency 250 Visualizing Uncertainties 251 Basic Errorbars 251 Continuous Errors 253 28. Density and Contour Plots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 Visualizing a Three-Dimensional Function 255 Histograms, Binnings, and Density 260 Two-Dimensional Histograms and Binnings 263 plt.hist2d: Two-Dimensional Histogram 263 plt.hexbin: Hexagonal Binnings 264 Kernel Density Estimation 264 29. Customizing Plot Legends. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 Choosing Elements for the Legend 270 Legend for Size of Points 272 Multiple Legends 274 30. Customizing Colorbars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 Customizing Colorbars 277 Choosing the Colormap 278 Color Limits and Extensions 280 Discrete Colorbars 281 Example: Handwritten Digits 282 31. Multiple Subplots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 plt.axes: Subplots by Hand 285 plt.subplot: Simple Grids of Subplots 287 plt.subplots: The Whole Grid in One Go 289 plt.GridSpec: More Complicated Arrangements 291 xii | Table of Contents 32. Text and Annotation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 Example: Effect of Holidays on US Births 294 Transforms and Text Position 296 Arrows and Annotation 298 33. Customizing Ticks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302 Major and Minor Ticks 302 Hiding Ticks or Labels 304 Reducing or Increasing the Number of Ticks 306 Fancy Tick Formats 307 Summary of Formatters and Locators 310 34. Customizing Matplotlib: Con€gurations and Stylesheets. . . . . . . . . . . . . . . . . . . . . . . 312 Plot Customization by Hand 312 Changing the Defaults: rcParams 314 Stylesheets 316 Default Style 317 FiveThiryEight Style 317 ggplot Style 318 Bayesian Methods for Hackers Style 318 Dark Background Style 319 Grayscale Style 319 Seaborn Style 320 35. Three-Dimensional Plotting in Matplotlib. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 Three-Dimensional Points and Lines 322 Three-Dimensional Contour Plots 323 Wireframes and Surface Plots 325 Surface Triangulations 328 Example: Visualizing a Möbius Strip 330 36. Visualization with Seaborn. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 Exploring Seaborn Plots 333 Histograms, KDE, and Densities 333 Pair Plots 335 Faceted Histograms 336 Categorical Plots 338 Joint Distributions 339 Bar Plots 340 Example: Exploring Marathon Finishing Times 342 Further Resources 350 Other Python Visualization Libraries 351 Table of Contents | xiii Part V. Machine Learning 37. What Is Machine Learning?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355 Categories of Machine Learning 355 Qualitative Examples of Machine Learning Applications 356 Classification: Predicting Discrete Labels 356 Regression: Predicting Continuous Labels 359 Clustering: Inferring Labels on Unlabeled Data 363 Dimensionality Reduction: Inferring Structure of Unlabeled Data 364 Summary 366 38. Introducing Scikit-Learn. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 Data Representation in Scikit-Learn 367 The Features Matrix 368 The Target Array 368 The Estimator API 370 Basics of the API 371 Supervised Learning Example: Simple Linear Regression 372 Supervised Learning Example: Iris Classification 375 Unsupervised Learning Example: Iris Dimensionality 376 Unsupervised Learning Example: Iris Clustering 377 Application: Exploring Handwritten Digits 378 Loading and Visualizing the Digits Data 378 Unsupervised Learning Example: Dimensionality Reduction 380 Classification on Digits 381 Summary 383 39. Hyperparameters and Model Validation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384 Thinking About Model Validation 384 Model Validation the Wrong Way 385 Model Validation the Right Way: Holdout Sets 385 Model Validation via Cross-Validation 386 Selecting the Best Model 388 The Bias-Variance Trade-off 389 Validation Curves in Scikit-Learn 391 Learning Curves 395 Validation in Practice: Grid Search 400 Summary 401 40. Feature Engineering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402 Categorical Features 402 xiv | Table of Contents Text Features 404 Image Features 405 Derived Features 405 Imputation of Missing Data 408 Feature Pipelines 409 41. In Depth: Naive Bayes Classi€cation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410 Bayesian Classification 410 Gaussian Naive Bayes 411 Multinomial Naive Bayes 414 Example: Classifying Text 414 When to Use Naive Bayes 417 42. In Depth: Linear Regression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419 Simple Linear Regression 419 Basis Function Regression 422 Polynomial Basis Functions 422 Gaussian Basis Functions 424 Regularization 425 Ridge Regression (L2 Regularization) 427 Lasso Regression (L1 Regularization) 428 Example: Predicting Bicycle Traffic 429 43. In Depth: Support Vector Machines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435 Motivating Support Vector Machines 435 Support Vector Machines: Maximizing the Margin 437 Fitting a Support Vector Machine 438 Beyond Linear Boundaries: Kernel SVM 441 Tuning the SVM: Softening Margins 444 Example: Face Recognition 445 Summary 450 44. In Depth: Decision Trees and Random Forests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 Motivating Random Forests: Decision Trees 451 Creating a Decision Tree 452 Decision Trees and Overfitting 455 Ensembles of Estimators: Random Forests 456 Random Forest Regression 458 Example: Random Forest for Classifying Digits 459 Summary 462 Table of Contents | xv 45. In Depth: Principal Component Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 Introducing Principal Component Analysis 463 PCA as Dimensionality Reduction 466 PCA for Visualization: Handwritten Digits 467 What Do the Components Mean? 469 Choosing the Number of Components 470 PCA as Noise Filtering 471 Example: Eigenfaces 473 Summary 476 46. In Depth: Manifold Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 Manifold Learning: “HELLO” 478 Multidimensional Scaling 479 MDS as Manifold Learning 482 Nonlinear Embeddings: Where MDS Fails 484 Nonlinear Manifolds: Locally Linear Embedding 486 Some Thoughts on Manifold Methods 488 Example: Isomap on Faces 489 Example: Visualizing Structure in Digits 493 47. In Depth: k-Means Clustering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496 Introducing k-Means 496 Expectation–Maximization 498 Examples 504 Example 1: k-Means on Digits 504 Example 2: k-Means for Color Compression 507 48. In Depth: Gaussian Mixture Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512 Motivating Gaussian Mixtures: Weaknesses of k-Means 512 Generalizing E–M: Gaussian Mixture Models 516 Choosing the Covariance Type 520 Gaussian Mixture Models as Density Estimation 520 Example: GMMs for Generating New Data 524 49. In Depth: Kernel Density Estimation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528 Motivating Kernel Density Estimation: Histograms 528 Kernel Density Estimation in Practice 533 Selecting the Bandwidth via Cross-Validation 535 Example: Not-so-Naive Bayes 535 Anatomy of a Custom Estimator 537 Using Our Custom Estimator 539 xvi | Table of Contents 50. Application: A Face Detection Pipeline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541 HOG Features 542 HOG in Action: A Simple Face Detector 543 1. Obtain a Set of Positive Training Samples 543 2. Obtain a Set of Negative Training Samples 543 3. Combine Sets and Extract HOG Features 545 4. Train a Support Vector Machine 546 5. Find Faces in a New Image 546 Caveats and Improvements 548 Further Machine Learning Resources 550 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)
Holdings
Cover image Item type Current library Home library Collection Shelving location Call number Materials specified Vol info URL Copy number Status Notes Date due Barcode Item holds Item hold queue priority Course reserves
Books Cummins College of Engineering for Women Pune 005.13'3 VAN (Browse shelf(Opens below)) Available (not for issue) CCEP-BK-67406

Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
Part I. Jupyter: Beyond Normal Python
1. Getting Started in IPython and Jupyter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Launching the IPython Shell 3
Launching the Jupyter Notebook 4
Help and Documentation in IPython 4
Accessing Documentation with ? 5
Accessing Source Code with ?? 6
Exploring Modules with Tab Completion 7
Keyboard Shortcuts in the IPython Shell 9
Navigation Shortcuts 10
Text Entry Shortcuts 10
Command History Shortcuts 10
Miscellaneous Shortcuts 12
2. Enhanced Interactive Features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
IPython Magic Commands 13
Running External Code: %run 13
Timing Code Execution: %timeit 14
Help on Magic Functions: ?, %magic, and %lsmagic 15
Input and Output History 15
IPython’s In and Out Objects 15
Underscore Shortcuts and Previous Outputs 16
Suppressing Output 17
Related Magic Commands 17
v
IPython and Shell Commands 18
Quick Introduction to the Shell 18
Shell Commands in IPython 19
Passing Values to and from the Shell 20
Shell-Related Magic Commands 20
3. Debugging and Pro€ling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Errors and Debugging 22
Controlling Exceptions: %xmode 22
Debugging: When Reading Tracebacks Is Not Enough 24
Profiling and Timing Code 26
Timing Code Snippets: %timeit and %time 27
Profiling Full Scripts: %prun 28
Line-by-Line Profiling with %lprun 29
Profiling Memory Use: %memit and %mprun 30
More IPython Resources 31
Web Resources 31
Books 32
Part II. Introduction to NumPy
4. Understanding Data Types in Python. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
A Python Integer Is More Than Just an Integer 36
A Python List Is More Than Just a List 37
Fixed-Type Arrays in Python 39
Creating Arrays from Python Lists 39
Creating Arrays from Scratch 40
NumPy Standard Data Types 41
5. The Basics of NumPy Arrays. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
NumPy Array Attributes 44
Array Indexing: Accessing Single Elements 44
Array Slicing: Accessing Subarrays 45
One-Dimensional Subarrays 45
Multidimensional Subarrays 46
Subarrays as No-Copy Views 47
Creating Copies of Arrays 47
Reshaping of Arrays 48
Array Concatenation and Splitting 49
Concatenation of Arrays 49
Splitting of Arrays 50
vi | Table of Contents
6. Computation on NumPy Arrays: Universal Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
The Slowness of Loops 51
Introducing Ufuncs 52
Exploring NumPy’s Ufuncs 53
Array Arithmetic 53
Absolute Value 55
Trigonometric Functions 55
Exponents and Logarithms 56
Specialized Ufuncs 56
Advanced Ufunc Features 57
Specifying Output 57
Aggregations 58
Outer Products 59
Ufuncs: Learning More 59
7. Aggregations: min, max, and Everything in Between. . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Summing the Values in an Array 60
Minimum and Maximum 61
Multidimensional Aggregates 61
Other Aggregation Functions 62
Example: What Is the Average Height of US Presidents? 63
8. Computation on Arrays: Broadcasting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Introducing Broadcasting 65
Rules of Broadcasting 67
Broadcasting Example 1 68
Broadcasting Example 2 68
Broadcasting Example 3 69
Broadcasting in Practice 70
Centering an Array 70
Plotting a Two-Dimensional Function 71
9. Comparisons, Masks, and Boolean Logic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Example: Counting Rainy Days 72
Comparison Operators as Ufuncs 73
Working with Boolean Arrays 75
Counting Entries 75
Boolean Operators 76
Boolean Arrays as Masks 77
Using the Keywords and/or Versus the Operators &/| 78
Table of Contents | vii
10. Fancy Indexing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Exploring Fancy Indexing 80
Combined Indexing 81
Example: Selecting Random Points 82
Modifying Values with Fancy Indexing 84
Example: Binning Data 85
11. Sorting Arrays. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Fast Sorting in NumPy: np.sort and np.argsort 89
Sorting Along Rows or Columns 89
Partial Sorts: Partitioning 90
Example: k-Nearest Neighbors 90
12. Structured Data: NumPy’s Structured Arrays. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Exploring Structured Array Creation 96
More Advanced Compound Types 97
Record Arrays: Structured Arrays with a Twist 97
On to Pandas 98
Part III. Data Manipulation with Pandas
13. Introducing Pandas Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
The Pandas Series Object 101
Series as Generalized NumPy Array 102
Series as Specialized Dictionary 103
Constructing Series Objects 104
The Pandas DataFrame Object 104
DataFrame as Generalized NumPy Array 105
DataFrame as Specialized Dictionary 106
Constructing DataFrame Objects 106
The Pandas Index Object 108
Index as Immutable Array 108
Index as Ordered Set 108
14. Data Indexing and Selection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Data Selection in Series 110
Series as Dictionary 110
Series as One-Dimensional Array 111
Indexers: loc and iloc 112
Data Selection in DataFrames 113
viii | Table of Contents
DataFrame as Dictionary 113
DataFrame as Two-Dimensional Array 115
Additional Indexing Conventions 116
15. Operating on Data in Pandas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Ufuncs: Index Preservation 118
Ufuncs: Index Alignment 119
Index Alignment in Series 119
Index Alignment in DataFrames 120
Ufuncs: Operations Between DataFrames and Series 121
16. Handling Missing Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Trade-offs in Missing Data Conventions 123
Missing Data in Pandas 124
None as a Sentinel Value 125
NaN: Missing Numerical Data 125
NaN and None in Pandas 126
Pandas Nullable Dtypes 127
Operating on Null Values 128
Detecting Null Values 128
Dropping Null Values 129
Filling Null Values 130
17. Hierarchical Indexing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
A Multiply Indexed Series 132
The Bad Way 133
The Better Way: The Pandas MultiIndex 133
MultiIndex as Extra Dimension 134
Methods of MultiIndex Creation 136
Explicit MultiIndex Constructors 136
MultiIndex Level Names 137
MultiIndex for Columns 138
Indexing and Slicing a MultiIndex 138
Multiply Indexed Series 139
Multiply Indexed DataFrames 140
Rearranging Multi-Indexes 141
Sorted and Unsorted Indices 141
Stacking and Unstacking Indices 143
Index Setting and Resetting 143
18. Combining Datasets: concat and append. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Recall: Concatenation of NumPy Arrays 146
Table of Contents | ix
Simple Concatenation with pd.concat 147
Duplicate Indices 148
Concatenation with Joins 149
The append Method 150
19. Combining Datasets: merge and join. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Relational Algebra 151
Categories of Joins 152
One-to-One Joins 152
Many-to-One Joins 153
Many-to-Many Joins 153
Specification of the Merge Key 154
The on Keyword 154
The left_on and right_on Keywords 155
The left_index and right_index Keywords 155
Specifying Set Arithmetic for Joins 157
Overlapping Column Names: The suffixes Keyword 158
Example: US States Data 159
20. Aggregation and Grouping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Planets Data 165
Simple Aggregation in Pandas 165
groupby: Split, Apply, Combine 167
Split, Apply, Combine 167
The GroupBy Object 169
Aggregate, Filter, Transform, Apply 171
Specifying the Split Key 174
Grouping Example 175
21. Pivot Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
Motivating Pivot Tables 176
Pivot Tables by Hand 177
Pivot Table Syntax 178
Multilevel Pivot Tables 178
Additional Pivot Table Options 179
Example: Birthrate Data 180
22. Vectorized String Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Introducing Pandas String Operations 185
Tables of Pandas String Methods 186
Methods Similar to Python String Methods 186
Methods Using Regular Expressions 187
x | Table of Contents
Miscellaneous Methods 188
Example: Recipe Database 190
A Simple Recipe Recommender 192
Going Further with Recipes 193
23. Working with Time Series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
Dates and Times in Python 195
Native Python Dates and Times: datetime and dateutil 195
Typed Arrays of Times: NumPy’s datetime64 196
Dates and Times in Pandas: The Best of Both Worlds 197
Pandas Time Series: Indexing by Time 198
Pandas Time Series Data Structures 199
Regular Sequences: pd.date_range 200
Frequencies and Offsets 201
Resampling, Shifting, and Windowing 202
Resampling and Converting Frequencies 203
Time Shifts 205
Rolling Windows 206
Example: Visualizing Seattle Bicycle Counts 208
Visualizing the Data 209
Digging into the Data 211
24. High-Performance Pandas: eval and query. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Motivating query and eval: Compound Expressions 215
pandas.eval for Efficient Operations 216
DataFrame.eval for Column-Wise Operations 218
Assignment in DataFrame.eval 219
Local Variables in DataFrame.eval 219
The DataFrame.query Method 220
Performance: When to Use These Functions 220
Further Resources 221
Part IV. Visualization with Matplotlib
25. General Matplotlib Tips. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Importing Matplotlib 225
Setting Styles 225
show or No show? How to Display Your Plots 226
Plotting from a Script 226
Plotting from an IPython Shell 227
Plotting from a Jupyter Notebook 227
Table of Contents | xi
Saving Figures to File 228
Two Interfaces for the Price of One 230
26. Simple Line Plots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
Adjusting the Plot: Line Colors and Styles 235
Adjusting the Plot: Axes Limits 238
Labeling Plots 240
Matplotlib Gotchas 242
27. Simple Scatter Plots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
Scatter Plots with plt.plot 244
Scatter Plots with plt.scatter 247
plot Versus scatter: A Note on Efficiency 250
Visualizing Uncertainties 251
Basic Errorbars 251
Continuous Errors 253
28. Density and Contour Plots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
Visualizing a Three-Dimensional Function 255
Histograms, Binnings, and Density 260
Two-Dimensional Histograms and Binnings 263
plt.hist2d: Two-Dimensional Histogram 263
plt.hexbin: Hexagonal Binnings 264
Kernel Density Estimation 264
29. Customizing Plot Legends. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
Choosing Elements for the Legend 270
Legend for Size of Points 272
Multiple Legends 274
30. Customizing Colorbars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
Customizing Colorbars 277
Choosing the Colormap 278
Color Limits and Extensions 280
Discrete Colorbars 281
Example: Handwritten Digits 282
31. Multiple Subplots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
plt.axes: Subplots by Hand 285
plt.subplot: Simple Grids of Subplots 287
plt.subplots: The Whole Grid in One Go 289
plt.GridSpec: More Complicated Arrangements 291
xii | Table of Contents
32. Text and Annotation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
Example: Effect of Holidays on US Births 294
Transforms and Text Position 296
Arrows and Annotation 298
33. Customizing Ticks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
Major and Minor Ticks 302
Hiding Ticks or Labels 304
Reducing or Increasing the Number of Ticks 306
Fancy Tick Formats 307
Summary of Formatters and Locators 310
34. Customizing Matplotlib: Con€gurations and Stylesheets. . . . . . . . . . . . . . . . . . . . . . . 312
Plot Customization by Hand 312
Changing the Defaults: rcParams 314
Stylesheets 316
Default Style 317
FiveThiryEight Style 317
ggplot Style 318
Bayesian Methods for Hackers Style 318
Dark Background Style 319
Grayscale Style 319
Seaborn Style 320
35. Three-Dimensional Plotting in Matplotlib. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
Three-Dimensional Points and Lines 322
Three-Dimensional Contour Plots 323
Wireframes and Surface Plots 325
Surface Triangulations 328
Example: Visualizing a Möbius Strip 330
36. Visualization with Seaborn. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
Exploring Seaborn Plots 333
Histograms, KDE, and Densities 333
Pair Plots 335
Faceted Histograms 336
Categorical Plots 338
Joint Distributions 339
Bar Plots 340
Example: Exploring Marathon Finishing Times 342
Further Resources 350
Other Python Visualization Libraries 351
Table of Contents | xiii
Part V. Machine Learning
37. What Is Machine Learning?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
Categories of Machine Learning 355
Qualitative Examples of Machine Learning Applications 356
Classification: Predicting Discrete Labels 356
Regression: Predicting Continuous Labels 359
Clustering: Inferring Labels on Unlabeled Data 363
Dimensionality Reduction: Inferring Structure of Unlabeled Data 364
Summary 366
38. Introducing Scikit-Learn. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
Data Representation in Scikit-Learn 367
The Features Matrix 368
The Target Array 368
The Estimator API 370
Basics of the API 371
Supervised Learning Example: Simple Linear Regression 372
Supervised Learning Example: Iris Classification 375
Unsupervised Learning Example: Iris Dimensionality 376
Unsupervised Learning Example: Iris Clustering 377
Application: Exploring Handwritten Digits 378
Loading and Visualizing the Digits Data 378
Unsupervised Learning Example: Dimensionality Reduction 380
Classification on Digits 381
Summary 383
39. Hyperparameters and Model Validation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384
Thinking About Model Validation 384
Model Validation the Wrong Way 385
Model Validation the Right Way: Holdout Sets 385
Model Validation via Cross-Validation 386
Selecting the Best Model 388
The Bias-Variance Trade-off 389
Validation Curves in Scikit-Learn 391
Learning Curves 395
Validation in Practice: Grid Search 400
Summary 401
40. Feature Engineering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
Categorical Features 402
xiv | Table of Contents
Text Features 404
Image Features 405
Derived Features 405
Imputation of Missing Data 408
Feature Pipelines 409
41. In Depth: Naive Bayes Classi€cation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
Bayesian Classification 410
Gaussian Naive Bayes 411
Multinomial Naive Bayes 414
Example: Classifying Text 414
When to Use Naive Bayes 417
42. In Depth: Linear Regression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
Simple Linear Regression 419
Basis Function Regression 422
Polynomial Basis Functions 422
Gaussian Basis Functions 424
Regularization 425
Ridge Regression (L2
Regularization) 427
Lasso Regression (L1
Regularization) 428
Example: Predicting Bicycle Traffic 429
43. In Depth: Support Vector Machines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
Motivating Support Vector Machines 435
Support Vector Machines: Maximizing the Margin 437
Fitting a Support Vector Machine 438
Beyond Linear Boundaries: Kernel SVM 441
Tuning the SVM: Softening Margins 444
Example: Face Recognition 445
Summary 450
44. In Depth: Decision Trees and Random Forests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
Motivating Random Forests: Decision Trees 451
Creating a Decision Tree 452
Decision Trees and Overfitting 455
Ensembles of Estimators: Random Forests 456
Random Forest Regression 458
Example: Random Forest for Classifying Digits 459
Summary 462
Table of Contents | xv
45. In Depth: Principal Component Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
Introducing Principal Component Analysis 463
PCA as Dimensionality Reduction 466
PCA for Visualization: Handwritten Digits 467
What Do the Components Mean? 469
Choosing the Number of Components 470
PCA as Noise Filtering 471
Example: Eigenfaces 473
Summary 476
46. In Depth: Manifold Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
Manifold Learning: “HELLO” 478
Multidimensional Scaling 479
MDS as Manifold Learning 482
Nonlinear Embeddings: Where MDS Fails 484
Nonlinear Manifolds: Locally Linear Embedding 486
Some Thoughts on Manifold Methods 488
Example: Isomap on Faces 489
Example: Visualizing Structure in Digits 493
47. In Depth: k-Means Clustering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
Introducing k-Means 496
Expectation–Maximization 498
Examples 504
Example 1: k-Means on Digits 504
Example 2: k-Means for Color Compression 507
48. In Depth: Gaussian Mixture Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
Motivating Gaussian Mixtures: Weaknesses of k-Means 512
Generalizing E–M: Gaussian Mixture Models 516
Choosing the Covariance Type 520
Gaussian Mixture Models as Density Estimation 520
Example: GMMs for Generating New Data 524
49. In Depth: Kernel Density Estimation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528
Motivating Kernel Density Estimation: Histograms 528
Kernel Density Estimation in Practice 533
Selecting the Bandwidth via Cross-Validation 535
Example: Not-so-Naive Bayes 535
Anatomy of a Custom Estimator 537
Using Our Custom Estimator 539
xvi | Table of Contents
50. Application: A Face Detection Pipeline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
HOG Features 542
HOG in Action: A Simple Face Detector 543
1. Obtain a Set of Positive Training Samples 543
2. Obtain a Set of Negative Training Samples 543
3. Combine Sets and Extract HOG Features 545
4. Train a Support Vector Machine 546
5. Find Faces in a New Image 546
Caveats and Improvements 548
Further Machine Learning Resources 550
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551

There are no comments on this title.

to post a comment.