An overview of the
The development cycle of a data science project is different from a traditional software development cycle. Although development methods and practices vary from organization to organization, most organizations have similar processes. One well-known process is the cross-industry standard process for data mining (CRISp-DM), a summary version of which is presented in this blog.
The life cycle of data science
The life cycle of a data science project is divided into six phases.
Business understanding – Understanding the business context and short and long term goals
Data understanding – Understanding the availability of both quality and quantity of data
Data preparation – Prepare the right data sets, characteristics, and data projects for use in the model
Modeling – Select the right modeling techniques, algorithms, and frameworks
Evaluation – Model evaluation, benchmarking, and metrics
Deployment – Deployment of the final model
The chart below shows the life cycle of a typical data science project.
Figure: Data science project life cycle
Understand the business
At this stage, business requirements and goals are understood. This phase is about assessing, planning, and defining governance patterns and success criteria.
Data understanding
At this stage, data is retrieved and examined. Data understanding can include exploratory data analysis, data visualization, and evaluating the quality and quantity of data.
Data preparation
The data preparation phase is one of the most important stages in the life cycle of a data science project. Some of the activities performed during this phase are identifying the correct data sets, data cleaning, grading, data and feature engineering.
modeling
This is one of the most exciting phases in the life cycle. Data sets are usually split into test, training, and validation sets. The algorithm to be used is determined. Models are constantly built and evaluated. The results of different models are interpreted in terms of success and test criteria. This is an iterative phase that continues until the results reach the desired benchmark.
assessment
The evaluation phase focuses on evaluating the model against business objectives. This assessment differs from the previous assessment, which was a technical assessment of the model. The overall assessment includes metrics that validate and measure success criteria and definitions.
The deployment of
At this stage, the model is deployed and put to use. Machine learning models are typically integrated and coupled with products and applications. These can be Web, desktop, or mobile applications. Machine learning models are also being deployed on devices and are increasingly being adopted and popularized in edge computing.
Abstract
The contents of this article refer to the CRISp-DM process. There are other known processes for data science and data mining projects, such as SEMMA, Knowledge Discovery in databases (KDD), and so on. With the widespread adoption of agile and scale-agile approaches, most of these data science lifecycle processes are customized to meet specific business needs, with an emphasis on iterative and incremental development and visibility.
reference
www.datascience-pm.com/