Keep Calm and Study On - Unlock Your Success - Use #TOGETHER for 30% discount at Checkout

Data Cleansing using Python Practice Exam

Data Cleansing using Python Practice Exam


About the Data Cleansing using Python Exam

Data cleansing (or data cleaning) is the process of identifying and correcting errors, inconsistencies, and inaccuracies in datasets to improve data quality. In Python, this is often done using libraries like pandas, numpy, and re (for regex-based cleaning).


Skills Required

  • Basic Python Knowledge – Understanding of variables, data types, loops, and functions.
  • Familiarity with Pandas and NumPy – Knowledge of data structures like DataFrames and Series, and functions for data manipulation.
  • Understanding of Regular Expressions (Regex) – Useful for text cleaning and pattern matching.
  • Basic SQL (Optional) – Helps in handling structured data and performing cleansing tasks in databases.
  • Knowledge of Data Types and Formats – Understanding numerical, categorical, and datetime data.
  • Statistical Concepts – Basics of mean, median, standard deviation, and outlier detection methods like Z-score and IQR.


Knowledge Gained

After completing a Data Cleansing using Python course, you will gain:

  • Ability to Handle Missing Data – Learn to detect, remove, or impute missing values using Pandas.
  • Data Type Correction – Convert incorrect data types and standardize formats.
  • Text Data Cleaning – Use string operations and regular expressions for text preprocessing.
  • Outlier Detection and Handling – Apply Z-score, IQR, and visualization techniques to detect and manage outliers.
  • Data Standardization – Normalize and transform data for consistency.
  • Practical Experience with Pandas and NumPy – Master key data manipulation functions.
  • Automation of Data Cleaning Tasks – Write reusable scripts to clean and preprocess large datasets efficiently.


Who should take the Exam?

  • Beginners in Data Science looking to build a strong foundation in data preprocessing.
  • Data Analysts who work with messy datasets and need to clean and structure data efficiently.
  • Machine Learning Enthusiasts who want to prepare high-quality data for model training.
  • Business Intelligence Professionals dealing with data-driven decision-making and reporting.
  • Database Managers & Engineers handling structured/unstructured data in databases.
  • Researchers & Academicians analyzing large datasets and ensuring data accuracy.
  • Students & Job Seekers aspiring for careers in data analytics, data science, or AI.


Course Outline

Introduction

  • Course Introduction
  • Course Structure
  • Is this Course Right for You?

Foundations

  • Introducing Data Preparation
  • The Machine Learning Process
  • Data Preparation Defined
  • Choosing a Data Preparation Technique
  • What is Data in Machine Learning?
  • Raw Data
  • Machine Learning is Mostly Data Preparation
  • Common Data Preparation Tasks - Data Cleansing
  • Common Data Preparation Tasks - Feature Selection
  • Common Data Preparation Tasks - Data Transforms
  • Common Data Preparation Tasks - Feature Engineering
  • Common Data Preparation Tasks - Dimensionality Reduction
  • Data Leakage
  • Problem with NaÏve Data Preparation
  • Case Study: Data Leakage: Train / Test / Split NaÏve Approach
  • Case Study: Data Leakage: Train / Test / Split Correct Approach
  • Case Study: Data Leakage: K-Fold NaÏve Approach
  • Case Study: Data Leakage: K-Fold Correct Approach

Data Cleansing

  • Data Cleansing Overview
  • Identify Columns That Contain a Single Value
  • Identify Columns with Few Values
  • Remove Columns with Low Variance
  • Identify and Remove Rows That Contain Duplicate Data
  • Defining Outliers
  • Remove Outliers - The Standard Deviation Approach
  • Remove Outliers - The IQR Approach
  • Automatic Outlier Detection
  • Mark Missing Values
  • Remove Rows with Missing Values
  • Statistical Imputation
  • Mean Value Imputation
  • Simple Imputer with Model Evaluation
  • Compare Different Statistical Imputation Strategies
  • K-Nearest Neighbors Imputation
  • KNNImputer and Model Evaluation
  • Iterative Imputation
  • IterativeImputer and Model Evaluation
  • IterativeImputer and Different Imputation Order
  • Feature Selection
  • Feature Selection Introduction
  • Feature Selection Defined
  • Statistics for Feature Selection
  • Loading a Categorical Dataset
  • Encode the Dataset for Modelling
  • Chi-Squared
  • Mutual Information
  • Modeling with Selected Categorical Features
  • Feature Selection with ANOVA on Numerical Input
  • Feature Selection with Mutual Information
  • Modeling with Selected Numerical Features
  • Tuning a Number of Selected Features
  • Select Features for Numerical Output
  • Linear Correlation with Correlation Statistics
  • Linear Correlation with Mutual Information
  • Baseline and Model Built Using Correlation
  • Model Built Using Mutual Information Features
  • Tuning Number of Selected Features
  • Recursive Feature Elimination
  • RFE for Classification
  • RFE for Regression
  • RFE Hyperparameters
  • Feature Ranking for RFE
  • Feature Importance Scores Defined
  • Feature Importance Scores: Linear Regression
  • Feature Importance Scores: Logistic Regression and CART
  • Feature Importance Scores: Random Forests
  • Permutation Feature Importance
  • Feature Selection with Importance

Data Transforms

  • Scale Numerical Data
  • Diabetes Dataset for Scaling
  • MinMaxScaler Transform
  • StandardScaler Transform
  • Robust Scaling Data
  • Robust Scaler Applied to Dataset
  • Explore Robust Scaler Range
  • Nominal and Ordinal Variables
  • Ordinal Encoding
  • One-Hot Encoding Defined
  • One-Hot Encoding
  • Dummy Variable Encoding
  • Ordinal Encoder Transform on Breast Cancer Dataset
  • Make Distributions More Gaussian
  • Power Transform on Contrived Dataset
  • Power Transform on Sonar Dataset
  • Box-Cox on Sonar Dataset
  • Yeo-Johnson on Sonar Dataset
  • Polynomial Features
  • Effect of Polynomial Degrees
  • Advanced Transforms

Transforming Different Data Types

  • The ColumnTransformer
  • The ColumnTransformer on Abalone Dataset
  • Manually Transform Target Variable
  • Automatically Transform Target Variable
  • Challenge of Preparing New Data for a Model
  • Save Model and Data Scaler
  • Load and Apply Saved Scalers

Dimensionality Reduction

  • Curse of Dimensionality
  • Techniques for Dimensionality Reduction
  • Linear Discriminant Analysis
  • Linear Discriminant Analysis Demonstrated
  • Principal Component Analysis

Tags: Data Cleansing using Python Practice Exam, Learn Data Cleansing using Python, Data Cleansing using Python Training Course, Data Cleansing using Python Questions, Data Cleansing using Python Free Test