1. Problem Description and Objective

We aim to anticipate equipment failure in 75 medical devices using multivariate longitudinal telemetry data. We develop and evaluate a predictive model, document our approach, and implement an early warning system that forecasts failures up to 10 days in advance.

1.1. Objective

Anticipate equipment failure using panel/longitudinal (multivariate) time series data from telemetry data.

1.2. Problem Statement

Predict whether a failure will occur on the next day (t+1) based on sensor readings up to day t.

2. Content of the Case Study

Here is the content:

I. Exploratory Data Analysis (EDA)

Preprocess raw data
Conduct univariate analysis
Perform bivariate analysis: sensor readings vs. failure status (e.g., violin plots)
Correlations

II. Feature Engineering

Shift failure column backward to simulate early warning (proactive labeling)
Create lag features to reflect past states
Calculate rolling statistics (e.g., 7-day mean, std)
(Bonus) Engineer a binary label indicating whether a failure occurs within the next 10 days

III. Model Development

Perform time-aware, group-respecting train-test split (by machine and day)
Assess and handle class imbalance if necessary
Train supervised learning models:
- Random Forest
- XGBoost
- LSTM
Evaluate using:
- F1-score
- ROC AUC
- Precision & Recall

IV. Bonus: Early Warning Model

Adjust the target to predict failure within the next 10 days

V. Outlook

What needs to be done in order to obtain the optimized and robust model possible?

3. Complete case study in the Git repository

The full case study, including code and data, is available on GitHub for easy access and replication:

Jupyter Notebook: Contains the complete analysis workflow with detailed explanations and all Python code.
Dataset: Provided for download to allow you to reproduce the results and explore further.

🔗 Download both the notebook and dataset from our GitHub repository:
https://github.com/3dStatisticalLearning/predictive_maintenance_medical_device.git

3 D Statistical Learning

We help businesses and researchers solve complex challenges by providing expert guidance in statistics, machine learning, and tailored education.

Our core services include:

– Statistical Consulting:
Comprehensive consulting tailored to your data-driven needs.

– Training and Coaching:
In-depth instruction in statistics, machine learning, and the use of statistical software such as SAS, R, and Python.

– Reproducible Data Analysis Pipelines:
Development of documented, reproducible workflows using SAS macros and customized R and Python code.

– Interactive Data Visualization and Web Applications:
Creation of dynamic visualizations and web apps with R (Shiny, Plotly), Python (Streamlit, Dash by Plotly), and SAS (SAS Viya, SAS Web Report Studio).

– Automated Reporting and Presentation:
Generation of automated reports and presentations using Markdown and Quarto.

– Scientific Data Analysis:
Advanced analytical support for scientific research projects.