Statistical Learning Dr. D. Djeudeu

Case Study: Predictive Model for Customer Churn (Kuendigungsmodell)

Overview: The Kuendigungsmodell, or the Customer Churn Predictive Model, aims to forecast customer attrition based on a comprehensive dataset collected from a service provider. By leveraging historical customer information, the model identifies patterns and trends that precede customer churn, enabling proactive retention strategies.

Dataset: The dataset comprises various demographic, transactional, and behavioral attributes collected from customers over multiple years. Each row represents a unique customer, and the columns encompass diverse features relevant to customer behavior and engagement.

Variable Descriptions:

  1. Anonym: Anonymized customer identifier.
  2. BPNR: Customer’s unique identification number.
  3. Geschlecht: Gender of the customer.
  4. Alter (13.12.2021): Age of the customer as of December 13, 2021.
  5. Nationalität: Customer’s nationality.
  6. DB I 2019: Data attribute for the year 2019.
  7. Anzahl HMGs 2019: Count of a specific type of transaction in 2019.
  8. Anzahl HMGs Klasse 1-7 2019: Counts of different transaction classes in 2019.
  9. PLZ 2019: Postal code of the customer in 2019.
  10. Ort 2019: City or locality of the customer in 2019.
  11. Bundesland 2019: Federal state of the customer in 2019.
  12. Regierungsbezirk 2019: Administrative region of the customer in 2019.
  13. Kreis 2019: District of the customer in 2019.
  14. Gemeinde 2019: Municipality of the customer in 2019.
  15. Land 2019: Country of residence of the customer in 2019.
  16. VART Eingang Kündigung oder 31.12.2021: Variable related to customer engagement or service termination.
  17. VersStatus Eingang Kündigung oder 31.12.2021: Status of customer relationship at the time of engagement or termination.
  18. VART-Wechsel 2021: Type of service change in 2021.
  19. VART-Wechsel 2021 zu 0101: Specific service transition in 2021.
  20. Längste VART in 2020: Longest service duration in 2020.
  21. Tage längste VART in 2020: Number of days for the longest service duration in 2020.
  22. Längste VART in 2021: Longest service duration in 2021.
  23. Tage längste VART in 2021: Number of days for the longest service duration in 2021.
  24. VersJahre: Years of customer engagement.
  25. Eingang Kündigung 2019-2021: Customer churn indicators for the respective years.
  26. AU-Tage 2020, KG-Tage 2020, AU-Tage 2021, KG-Tage 2021: Days of medical leave or sick days for the corresponding years.
  27. Mind. 1 Leistung GB500 2020, Mind. 1 Leistung GB500 2021: Minimum service performance indicators for 2020 and 2021.
  28. Wirtschaftszweig 2021: Customer’s industry sector in 2021.
  29. Branche 2021, Branche 2021 (Text): Industry and textual description of the industry in 2021.
  30. Anzahl Kontakte 2021 (YO48, YO49): Count of customer interactions in 2021.
  31. DMP-Teilnahme 2020, DMP-Teilnahme 2021: Participation in disease management programs for 2020 and 2021.
  32. Bonus- und Prämienprogramm 2020, Bonus- und Prämienprogramm 2021: Enrollment in bonus and reward programs for 2020 and 2021.

Preprocessing: The dataset undergoes extensive preprocessing, including data cleaning, variable transformation, and handling missing values to ensure data quality and integrity.

Model Development: Following preprocessing, the data is split into training and test sets. Various machine learning algorithms such as logistic regression, decision trees, or ensemble methods are employed to build predictive models.

Evaluation and Deployment: The models are evaluated using performance metrics like accuracy, precision, recall, and F1-score. Once a