Skip to content

Know-How

This section will provide you with all the necessary information to understand and complete the Module 2 - Basic Data Preparation section which revolves around basic data preparation using only LOGIBLOX tools. They are specifically designed to make your tasks as easy and comprehensible as possible.

Guide on datasets preparation

Introduction

Data preparation is the process of gathering, combining, structuring and organizing data so it can be used in business intelligence (BI), analytics and data visualization applications.

The components of data preparation include data preprocessing, profiling, cleansing, validation and transformation; it often also involves pulling together data from different internal systems and external sources.

Purpose

One of the primary purposes of data preparation is to ensure that raw data being readied for processing and analysis is accurate and consistent so the results of BI and analytics applications will be valid. Data is commonly created with missing values, inaccuracies or other errors, and separate data sets often have different formats that need to be reconciled when they're combined. Correcting data errors, validating data quality and consolidating data sets are big parts of data preparation projects.

Challenges

Data scientists often complain that they spend most of their time gathering, cleansing and structuring data instead of analyzing it. A big benefit of an effective data preparation process is that they and other end users can focus more on data mining and data analysis -- the parts of their job that generate business value.

For example, data preparation can be done more quickly, and prepared data can automatically be fed to users for recurring analytics applications.

Procedure

Done properly, data preparation also helps an organization do the following:

  • ensure the data used in analytics applications produces reliable results
  • identify and fix data issues that otherwise might not be detected
  • enable more informed decision-making by business executives and operational workers
  • reduce data management and analytics costs
  • avoid duplication of effort in preparing data for use in multiple applications
  • and get a higher ROI from BI and analytics initiatives

Effective data preparation is particularly beneficial in big data environments that store a combination of structured, semi structured and unstructured data, often in raw form until it's needed for specific analytics uses.

Those uses include predictive analytics, machine learning (ML) and other forms of advanced analytics that typically involve large amounts of data to prepare.

Taken from: https://www.techtarget.com/searchbusinessanalytics/definition/data-preparation