Skip to content

Mission 2: Cleanse Data with AI

Estimated time: 10 minutes


Learning Objective

Now that you've learned how to create data pipelines, it's time to apply those skills to clean and refine a dataset. In this mission, you will perform specific actions to ensure the data is accurate, consistent, and ready for use.


Dataset

Download the required dataset: Churn.xlsx


Prerequisites

Please refer to the Navigation Guide to familiarize yourself with the platform interface.

Tip

If you're unsure about any steps, review Mission 1 first to understand the basics of the Data Transformer.


Data Cleansing Goals

We want to perform the following transformations:

  • "medium_of_operation" - Replace ? with Unknown
  • "membership_category" - Remove the word Membership
  • "offer_application_preference", "used_special_discount", "past_complaint" - Convert Yes/No to 1/0
  • "points_in_wallet" - Fill empty fields with 0

Step-by-Step Instructions

1. Import the Dataset

In the "Module 1" folder, click the "Add Item" button, select Add Data, choose Excel, and import the Churn dataset.

2. Open the Data Transformer

Right-click the Churn dataset and select Transform Data to launch the AI Data Transformer.

3. Build Your Cleansing Pipeline

Create a multi-step cleansing pipeline by entering each prompt in the search bar. You can try creating the pipeline yourself, or follow the detailed steps below.

Step 1: Replace Unknown Values

Replace placeholder characters with meaningful text:

replace the ? with the word Unknown in "medium_of_operation"

Press Enter and click Add Step to create the next transformation.

Step 2: Remove Text from Category

Clean up category names by removing redundant text:

remove the word "Membership" from "membership_category"

Press Enter and click Add Step.

Step 3: Convert Yes/No to Binary

Standardize categorical data by converting Yes/No values to binary format across multiple columns:

turn Yes and No into 1 and 0 in "offer_application_preference" "used_special_discount" and "past_complaint"

Press Enter and click Add Step.

Step 4: Fill Missing Values

Handle missing data by filling empty fields with default values:

fill empty fields with 0 in "points_in_wallet"

Press Enter.

Continue to Next Mission

Don't exit the Data Transformer dialog yet. Keep it open and continue directly to Mission 3: Export Pipelines to learn how to export your pipeline as a reusable Flow.


Visual Guide

Step 1: Open Data Transformer

Open Transformer

Right-click the Churn dataset and select "Transform Data"

Step 2: Replace Unknown Values

Replace Unknown Values

Enter the first prompt - replace the ? with the word Unknown in "medium_of_operation"

Step 3: Remove Text from Category

Remove Membership Text

Enter the second prompt - remove the word "Membership" from "membership_category"

Step 4: Convert Yes/No to Binary

Convert Binary

Enter the third prompt - turn Yes and No into 1 and 0 in "offer_application_preference" "used_special_discount" and "past_complaint"

Step 5: Fill Missing Values

Fill Missing Values

Enter the fourth prompt - fill empty fields with 0 in "points_in_wallet"

Next Step

Keep the Data Transformer dialog open and proceed to Mission 3 to export this pipeline as a Flow.


Common Data Cleansing Tasks

Task Example Command
Replace values Replace "N/A" with "Unknown" in "Status"
Remove text Remove the word "ID" from "Customer_ID"
Convert format Turn Yes and No into 1 and 0 in "Active"
Fill missing Fill empty fields with 0 in "Quantity"
Standardize text Convert all text to uppercase in "Country"

Summary

You've successfully learned how to:

✓ Import datasets for data cleansing

✓ Build a multi-step data cleansing pipeline

✓ Replace placeholder values with meaningful data

✓ Remove unwanted text from columns

✓ Convert categorical values to binary format

✓ Fill missing values with defaults

✓ Save cleansing pipelines for reuse


Well done! Now let's move on to the next mission!