Skip to content

Mission 7 - Feature Selection

Estimated time for completing this mission: 20 mins

Learning Objective

Making a bar chart with different feature variables to see which ones have the most predictive power towards the target variable.

Scenario

Imagine that you want to predict the future sales with a time series predictor but you would like to include other variables as well to make the prediction more accurate. With the following function, you can decide which feature variables are worth including in you model.

Know-How Refresh

Feature selection is the essence of a good Machine Learning model. Choosing the correct features (the variables which you want to use in order to predict the target variable) is needed for both classification and time series prediction. The target variable is the variable which you want to predict.

BLOX used in this mission:

  • Basics/Start x2
  • MyData/FinalData
  • MyData/Transactions
  • AI/Feature Finder x2
  • Basics/Display x2

startBlox

BLOX used for this mission

Data

You will use the same, cleaned dataset as for the regression FinalData. Also, we need the dataset Transactions.xlsx

Steps

Please refer to the Navigation Guide to perform the steps below

  1. In the Module 3 folder press the green plus button to create new logic named Feature Selection
  2. Drag-and-drop logics that will be used for this mission including the dataset FinalData
  3. Next, connect the "Starting" BLOX to the "MyData" BLOX
  4. Then, connect the "MyData" BLOX to a "Feature Finder" BLOX where you have to specify the target variable (use STATUS in this case) which is the variable you want to predict and the feature variables whose importance for the target variable you want to estimate (feel free to play a bit with the different feature variables or even target variable to see different results)
  5. Finally, connect the output from "Line Chart" to "Display" BLOX and click on play button to execute the logic

Results and Summary

startBlox

Final logic composition

startBlox

Using STATUS as a target variable and all other features from the FinalData (cleaned)

startBlox

Using STATUS as a target variable and all other features from the original Transaction data (uncleaned)

There is a clear difference between the two figures. The same target and feature variables were used however, the first image shows the result using the cleaned data set whereas the second image shows the result on the uncleaned data set. This is the reason why data pre-processing is inevitable when it comes to prediction.

Well done! Now let's move on to the next mission!