Statistical Learning Dr. D. Djeudeu

This project entails a detailed analysis of the Workplace Survey 2022 dataset, focusing on variables such as gender, age, educational background, and employment status. Using statistical methods such as chi-square tests, Wilcoxon rank-sum tests, and linear regression models, the study explores hypotheses related to the relationships between educational qualifications and employment status, age and workforce participation, while considering gender differentials. Through meticulous data preprocessing and visualization techniques, the analysis aims to provide actionable insights for organizational decision-making.

Introduction: The Workplace Survey 2022 dataset comprises responses from 1,169 participants across 87 variables. Key variables of interest include gender, age, educational qualifications, and employment status. The study seeks to uncover patterns, trends, and correlations within the dataset to inform strategic decision-making processes within organizational contexts.

Variables Used:

  1. Gender (Categorical): Represents the gender identity of participants (e.g., male, female).
  2. Age (Continuous): Denotes the age of participants in years.
  3. Educational Background (Categorical): Indicates the highest educational qualification attained by participants (e.g., high school, bachelor’s degree, master’s degree).
  4. Employment Status (Categorical): Describes the current employment situation of participants (e.g., employed, unemployed, student).

Statistical Methods:

  1. Descriptive Statistics: Utilized to provide insights into the distribution and characteristics of the dataset. Tabular summaries, frequency distributions, and graphical representations are employed to explore demographic trends.
  2. Hypothesis Formulation: Hypotheses are crafted to investigate relationships between variables. For instance, hypotheses explore the impact of educational qualifications on employment status and the correlation between age and workforce participation, while considering gender differentials.
  3. Inferential Statistics:
    • Chi-square Tests: Applied to examine the association between educational qualifications and employment status.
    • Wilcoxon Rank-Sum Tests: Utilized to compare the average age between employed and unemployed individuals.
    • Linear Regression Models: Deployed to explore the relationship between age and employment status while controlling for gender.
  4. Visualization Techniques: Mosaic plots, box plots stratified by gender and employment status, aid in visually representing complex relationships and trends within the dataset.

Conclusion: The analysis yields valuable insights into workforce dynamics, demographic trends, and educational landscapes within the surveyed population. By leveraging statistical methods and visualization techniques, organizations can make informed decisions regarding resource allocation, talent management, and strategic planning initiatives.

Acknowledgments:

Special thanks to the contributors of the Workplace Survey 2022 dataset and the robust functionalities offered by R programming and LaTeX for facilitating data analysis and documentation in this study.