Statistical Learning Dr. D. Djeudeu

Overview: This use case involves analyzing health data to explore the relationship between environmental exposures, demographic factors, and health outcomes using advanced statistical methods. The primary focus is on understanding the impact of greenness exposure (measured through NDVI) on blood pressure levels while accounting for potential confounding variables.

Variables of Interest:

  1. Outcome Variable:

    • Blood Pressure: Both systolic and diastolic blood pressure levels serve as the primary outcome variables of interest.
  2. Exposure Variable:

    • Greenness Exposure (NDVI): The Normalized Difference Vegetation Index (NDVI) serves as the primary exposure variable, indicating the level of greenness in the environment.
  3. Covariates:

    • Demographic Factors: Age, gender, and potentially other demographic variables are considered as covariates in the analysis to control for their potential influence on blood pressure.
    • Risk Factors: Other potential risk factors for high blood pressure, such as lifestyle factors or medical history, may also be included as covariates.

Key Analyses:

  1. Logistic Regression Analysis:

    • Estimation of Odds Ratios (OR) and their 95% Confidence Intervals (CI) to assess the association between greenness exposure levels and the likelihood of having normal versus abnormal blood pressure.
    • Interpretation of Odds Ratios to determine the magnitude and direction of the association between greenness exposure and blood pressure status.
  2. Linear Regression Analysis:

    • Investigation of the linear relationship between greenness exposure and continuous blood pressure measurements (systolic and diastolic).
    • Examination of regression coefficients to quantify the change in blood pressure associated with unit changes in greenness exposure while controlling for covariates.

SAS Code Implementation:

  1. Data Preparation:

    • Data cleaning, transformation, and variable manipulation to ensure data quality and compatibility with regression analyses.
    • Handling missing values and outliers as necessary.
  2. Part 1: Data Preparation:

    • Loading and preprocessing of the health dataset, including merging relevant variables and creating derived variables if needed.
  3. Part 2: Regression Analysis:

    • Utilization of SAS procedures such as PROC LOGISTIC for logistic regression and PROC REG for linear regression to perform the specified analyses.
    • Interpretation of regression output, including regression coefficients, standard errors, p-values, and confidence intervals.

Document Analysis: The analysis involves reviewing documents containing statistical summaries, regression outputs, and interpretations of key findings related to the relationship between greenness exposure and blood pressure outcomes.

Conclusion: Through rigorous statistical analysis and interpretation, this use case aims to enhance understanding of the relationship between environmental exposures and blood pressure levels, thereby informing public health strategies and interventions aimed at promoting cardiovascular health in various populations.