CRISP-DM vs. Agile - Complimentary PM Frameworks

 To those with a background in Data Science, the acronym CRISP-DM, is a familiar process. To date, it is still the most popular framework for managing data science projects. Moving into the broader field of technology, where software engineering and artificial intelligence collide, other project management paradigms prevail, in particular Agile/Scrum. Here I will share some important details about each, and highlight what they have in common. If you are well-versed in CRISP-DM, your skills apply to Agile as well.


For the purpose of this blog post, I will be using the terms “predictive model,” “artificial intelligence,” “machine learning,” and even “data mining,” interchangeably. This could be confusing. However, in general, where data is mined, and prior observations (or even synthetic data) is utilized to predict future events, no matter the algorithm(s), this is the general topic.

Before, we delve into each, one point I want to make, is that the goals of CRISP-DM and Agile can be different. CRISP-DM has the goal of delivering a predictive model into a deployment environment. The goal of Agile is to develop a software or product (Krisolis, 2022). However, as AI is embedded into many software applications, these are often no longer separate endeavors.



CRISP-DM

History

CRISP-DM stands for the Cross Industry Standard Process for Data Mining. It was conceived in 1996 and was originally utilized by five major companies: ISL (which later merged into SPSS, which was later an IBM acquisition), OHRA, Daimler AG, Teradata, and NCR Corporation. In 1999 it was published as a step-by-step data mining guide. SPSS built CRISP-DM into their software workbench, most particularly, Clementine, which became SPSS Modeler, or IBM SPSS Modeler later.

Model

There are six stages in the CRISP-DM model. The stages of CRISP-DM are: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, and Deployment.

Business Understanding
Business Understanding is the first stage. This is where you meet with stakeholders and domain subject matter experts to determine deliverables. In this time, a business question is turned into a quantitative target that can be addressed with a predictive model. For example, a company may be worried about why they had a decrease in customer base. This may indicate a customer churn model, for example. It is also in this stage that criteria for success, what a good model will look like, and evaluation metrics are specified.

The outcome of this stage is a project plan. Often this project plan might look like a Statement of Work, in which the model, the analytics team, software being used, time frame to completion, constraints, check points, final deliverables and acceptable metrics are spelled out with stakeholders.
Data Understanding
This is an important foundational stage. In this stage, all data that pertains to the quantitative question defined during Business Understanding is catalogued and described. Exploratory data analysis is done on all data. If you have specified that you will be using a tool in the initial project plan, during this phase, you load your data into this tool. All surface and summary features of the data are provided in a report during this phase. Any missing data is also noted. It’s important to note, and if the data meets the requirements of the stated modeling goal. Not every customer has the data maturity to meet the goal stated in Business Understanding and this may be found in Data Understanding, and these stages are iterative, not discrete – meaning that you can be in one phase, realize that something needs to changed in a previous stage, and move backwards, then forwards again.
Data Preparation
In this stage data is elevated, through transformations and cleaning, to the quality required for modeling. This stage, while not glamorous, takes 60-80% of the project effort. This can include such efforts as missing data imputation, feature engineering, case weighting, etc. This can also include reformatting data, such as date/time to meet merging or software requirements. In the documentation of this stage, the rationale for all of these choices, including selecting data and all of the transformations must be included, so that they are reproducable.
Modeling
This generally is one of the shortest phases of CRISP-DM. This is where an algorithm, series of algorithms, or ensembled algorithms are applied to the prepared data to answer the quantitative question posed in business understanding. All modeling algorithms make assumptions about the data -whether or not there can be missing observations, if all attributes are normally distributed, etc. So, for example, you may choose binomial logistic regression for customer churn, after comparing it to other classification algorithms, based upon the acceptable metrics. In order to perform a test design, or to be able to evaluate the model, it is customary to separate the data into train and test samples; building the model on the training set and testing model performance on the testing set.
Evaluation
A model is evaluated in at least two ways: 1) Did it meet the business criteria/stated goal for which it was designed? 2) How does the model perform according to metrics. These depend upon the type of model and could be ROC curves, confusion matrices, Gain and Lift Charts, etc. You are also reviewing documentation to see how well the process went. Did it meet the SOW – was it within the timeframe? Did it meet budget? Did it meet stated goals and needs? Additionally, does the documentation make it repeatable?
Deployment
In CRISP-DM the last step is to put the model into deployment. Often times this touches several departments. The model once running in a data infrastructure may interactively come out as a dashboard, or a visualization, or an app. It will require that IT and Information Security to authorize these updates as well.

AGILE


Like CRISP-DM, Agile Project Management, is iterative in nature, and products are developed incrementally throughout the project. For those who have used CRISP-DM, Agile is an easy leap to make. Like, CRISP-DM, Agile focuses not just on the final product, but on the findings throughout the process. A key difference is that in Agile a team is put together at the beginning of the project and work together throughout. In CRISP-DM, there are experts throughout the project, but often, the data scientist is the consistent member who relies upon others as needed, such as a data engineer, or a subject matter expert. These other members are only brought in at different stages (Krisolis, 2022).



Like CRISP-DM, Agile Project Management, is iterative in nature, and products are developed incrementally throughout the project. For those who have used CRISP-DM, Agile is an easy leap to make. Like, CRISP-DM, Agile focuses not just on the final product, but on the findings throughout the process. A key difference is that in Agile a team is put together at the beginning of the project and work together throughout. In CRISP-DM, there are experts throughout the project, but often, the data scientist is the consistent member who relies upon others as needed, such as a data engineer, or a subject matter expert. These other members are only brought in at different stages (Krisolis, 2022).

There are two popular types of project management school - Agile and Waterfall. Waterfall moves forward in discrete stages; each one starts, after the other ends, Waterfall is the least like CRISP-DM. Agile, however, integrates well with CRISP-DM (Thurber, 2020).

Agile’s stages include: 1) Requirements Gathering; 2) Designing the Requirements; 3) Construction/Iteration; 4) Testing/Quality Assurance; 5) Deployment; and 6) Feedback. As it can be seen, these iterative stages have similarities to CRISP-DM. Like CRISP-DM, the philosophy embraces feedback in the cycle, and customer feedback. It also values intermediate input from inductive results, such as one gets from data mining (Thurber, 2020).


Another important point regarding the complementarity of CRISP-DM and Agile is that many software implementations integrate or are for the express purpose of delivering AI solutions. The software may be AI or machine learning driven, or it may be a dashboard displaying models to decision makers. CRISP-DM stages can be integrated within Agile stages as part of the software delivery project (Thurber, 2020). For those familiar with CRISP-DM, Agile is an intuitive leap to make.

References:

Hughey, D. (2009). Comparing Traditional Systems Analysis and Design with Agile Methodologies., University of Missouri, St. Louis, https://www.umsl.edu/~hugheyd/is6840/agile.html Retrieved April 4, 2023.
Krisolis (2022). CRISP-DM vs Agile: The Face Off. Krisolis Blog. https://krisolis.ie/crisp-dm-vs-agile-the-face-off/#:~:text=So%20how%20do%20they%20differ,of%20software%20or%20a%20product. Retrieved April 1, 2023.
Thurber, M. (2020). A Holistic Framework for Managing Data Analytics Projects. Elder Research Blog. https://www.elderresearch.com/blog/a-holistic-framework-for-managing-data-analytics-projects/ retrieved April 1, 2023.


Popular Posts