Module 4: Advanced Data Science
Overview
After completing this module, you will be able to use advanced data operation techniques and create complex AI models and visualizations in LOGIBLOX. This module takes your data science skills to the next level with sophisticated transformation, prediction, and machine learning capabilities.
This module is structured into three progressive sections:
- Advanced Data Massaging - Master advanced data transformation techniques using columns as key structural elements
- Advanced Predictions - Learn regression analysis including linear and non-linear approaches, your first steps into Machine Learning
- Advanced AI - Build, train, and deploy AI models for time series forecasting and classification
What You'll Learn
Advanced Data Massaging - Column Excellence
- Fill columns with calculated values
- Manage time elements and date operations
- Transpose columns for data restructuring
- Find and replace operations in columns
- Perform mathematical operations on columns
- Concatenate columns efficiently
Advanced Predictions - Regression Excellence
- Build and enhance regression analysis
- Create linear and non-linear regression models
- Generate advanced and insightful charts
- Manage predictions and interpret results
- Visualize correlations with heatmaps
Advanced AI - Machine Learning
- Build AI models from scratch
- Train time series forecasting models
- Train classification models
- Use AI for predictions and data completion
- Perform feature selection for optimal results
- Apply AI to real business problems
Mission Structure
Missions are hands-on exercises using real sales transaction data to build advanced data science and machine learning skills.
1. Advanced Data Massaging
2. Advanced Predictions
- Mission 3: Creating Correlation Heatmap
- Mission 4: Creating Linear Regression
- Mission 5: Modifying to Non-Linear Regression
- Mission 6: Creating Advanced Charts with Multiple Regressions
3. Advanced AI
- Mission 7: Feature Selection on the Dataset
- Mission 8: Training Time Series Model
- Mission 9: Using Prediction Model to Create Charts
- Mission 10: Training Classification Model
- Mission 11: Filling Empty Cells Using Classification Model
Dataset Used
Download the required dataset for this module:
- Transactions.xlsx - Sales transaction records
Understanding the Data
The Transactions dataset contains raw sales data of motorcycles and classic cars from a multinational company. The data includes:
Sample Data:
| Order Number | Quantity | Price | Sales | Order Date | Status | Product Line | Customer Name | City | Country | Deal Size |
|---|---|---|---|---|---|---|---|---|---|---|
| 10107 | 30 | 95.7 | 2871 | 2/24/2003 | Shipped | Motorcycles | Land of Toys Inc. | NYC | USA | Small |
| 10121 | 34 | 81.35 | 2765.9 | 5/7/2003 | Shipped | Motorcycles | Reims Collectables | Reims | France | Small |
| 10134 | 41 | 94.74 | 3884.34 | 7/1/2003 | Shipped | Motorcycles | Lyon Souveniers | Paris | France | Medium |
Key Characteristics:
- Sales transactions across multiple years (2003-2005)
- Global customer base (USA, France, Norway, etc.)
- Product information (motorcycles, classic cars)
- Order status tracking
- Price and quantity data
- Customer and location details
- Some columns require transformation and completion
Data Challenges:
- Formatting issues to resolve
- Missing information to fill
- Requires cleansing and transformation
- Ideal for regression and classification models
Preparation
To prepare for Module 4, create a new project folder named Module4 in LOGIBLOX.
Learning Path
This module builds upon the foundational data science skills from Module 3. The missions are designed to be completed sequentially, progressively introducing more advanced techniques.
By the end of this module, you'll be proficient in:
✓ Advanced column operations and transformations
✓ Time series data manipulation
✓ Correlation analysis with heatmaps
✓ Linear and non-linear regression modeling
✓ Feature selection for machine learning
✓ Training time series forecasting models
✓ Training classification models
✓ Using AI for predictions and data completion
✓ Creating advanced visualizations
Key Concepts
| Concept | Description |
|---|---|
| Regression Analysis | Predicting continuous values based on historical patterns |
| Time Series Forecasting | Predicting future values based on temporal data |
| Classification | Categorizing data into predefined classes |
| Feature Selection | Identifying the most important variables for modeling |
| Heatmap | Visualizing correlations between variables |
Ready to begin? Start with Business Scenario to understand the context, then proceed to Mission 1!