Mission 1 - Advanced Cleansing
Estimated time for completing this mission: 20 mins
Learning Objective
Getting familiar with LOGIBLOX for data cleansing. It is common in a real-life dataset, that the same word exists in different forms, hence in this mission we will transform all instances of a word to be the same.
Background Information
In each mission you will use a provided dataset to eventually create predictions based on the data provided. In this section we prepare the dataset for further usage.
Scenario
Imagine an example where you have a data set which contains different products and their prices. However, some products are in a plural form and some are not (such as "car" or "cars"). In order to use this data for classification we would like each product to have one name.
In general, this exercise aims to transform words to a consistent form/group. Therefore, your task is to convert the values in column PRODUCTLINE in a way that they will not vary anymore.
BLOX used in this mission:
- Basics/Start
- MyData/Transactions
- Database/Delete Column
- Language/Stem
- Database/Table
- Database/Save
Data
Please download the dataset, which will be used: Transactions.xlsx
Steps
Please refer to Navigation Guide to perform the steps below
Flow Builder:
Creating consistent forms of words
- Import dataset from Data section to Module4 folder
- In Module3 create new logic named Advanced Cleansing
- Drag-and-drop logics that will be used for this mission
- Connect "Start" BLOX to the "MyData" BLOX
- Connect output from "MyData" BLOX to inputs of "Delete Column"
- In "Delete Column" specify which column needs to be deleted (PRODUCTLINE in our case)
- Now, connect the output from "MyData" BLOX to a "Stem" BLOX where you have to specify which column should be transformed (In our case, PRODUCTLINE)
- Next, connect the output from "Stem" and from "Delete Column" to a "Table" BLOX.
- Finally, connect the result of the "Table" BLOX to the "Save" BLOX, specify the folder (Module3) and name (preferably "CleanedData")


