Skip to content

Mission 1 - Advanced Cleansing

Estimated time for completing this mission: 20 mins

Learning Objective

Getting familiar with LOGIBLOX for data cleansing. It is common in a real-life dataset, that the same word exists in different forms, hence in this mission we will transform all instances of a word to be the same.

Background Information

In each mission you will use a provided dataset to eventually create predictions based on the data provided. In this section we prepare the dataset for further usage.

Scenario

Imagine an example where you have a data set which contains different products and their prices. However, some products are in a plural form and some are not (such as "car" or "cars"). In order to use this data for classification we would like each product to have one name.

In general, this exercise aims to transform words to a consistent form/group. Therefore, your task is to convert the values in column PRODUCTLINE in a way that they will not vary anymore.

BLOX used in this mission:

  • Basics/Start
  • MyData/Transactions
  • Database/Delete Column
  • Language/Stem
  • Database/Table
  • Database/Save

startBlox

BLOX used for this mission

Data

Please download the dataset, which will be used: Transactions.xlsx

Steps

Please refer to Navigation Guide to perform the steps below

Flow Builder:

Creating consistent forms of words

  1. Import dataset from Data section to Module4 folder
  2. In Module3 create new logic named Advanced Cleansing
  3. Drag-and-drop logics that will be used for this mission
  4. Connect "Start" BLOX to the "MyData" BLOX
  5. Connect output from "MyData" BLOX to inputs of "Delete Column"
  6. In "Delete Column" specify which column needs to be deleted (PRODUCTLINE in our case)
  7. Now, connect the output from "MyData" BLOX to a "Stem" BLOX where you have to specify which column should be transformed (In our case, PRODUCTLINE)
  8. Next, connect the output from "Stem" and from "Delete Column" to a "Table" BLOX.
  9. Finally, connect the result of the "Table" BLOX to the "Save" BLOX, specify the folder (Module3) and name (preferably "CleanedData")

Results and Summary

startBlox

Final logic composition

startBlox

Final table

Well done! Now let's move on to the next mission!