3 D Statistical Learning

Statistical Test Insights: Key and Common Tips for Accurate Analysis and Inference

Summary

Statistical tests are essential tools in research and data analysis, helping us draw meaningful conclusions from data. We explore some key and common knowledge and tips related to statistical tests. First, we point out the importance of developing a research question, stating hypotheses, and collecting relevant data. Understanding the p-value distinction between one-tail and two-tail tests is crucial for interpreting results accurately. We also highlight the role of standard deviation in reflecting data spread, considering confidence intervals, and their sensitivity to factors like sample size and bias. We delve into the assumptions underlying parametric tests, such as normal distribution of data, and introduce non-parametric alternatives like the Mann-Whitney-U and Kruskal-Wallis tests. Additionally, we explore the concepts of confidence level, hypothesis testing, correlation, and different types of t-tests. Finally, we touch on Fisher’s exact test for nominal categorical data. Armed with this knowledge, researchers can confidently select appropriate statistical tests for their analyses, ensuring reliable and valid outcomes.

Key Knowledge and Tips

  • For conducting proper research, you should Develop a research question, decide which variable will be required to answer the question, state the hypotheses, collect data for the variable and analyze it

  • For the analyses of exactly the same data point values for a variable, the p-value for a one-tail test will generally be half of a two-tail test

  • The standard deviation simply reflects the spread in the sample data point values for a numerical type variable, the values which themselves might be quite skewed

  • Confidence units are symmetric around the point estimate and the lower the confidence level, the less of the area under the curve is included and therefore the lower and upper values creep closer to the point estimate

  • The confidence intervals are subject to many factors including sample size, Bias in subject selection, Major differences in population from which sample subjects was taken and population to which results infer, Cherry picking data point values, discarding some

  • Confidence intervals reflect potential errors in relating the sample point estimate and the true population parameter for that variable. This is necessarily so due to the vagaries of research where the sample does not include the whole population, where bias can and does creep in and where the population from which the sample was taken does not always truly reflect the population to which these results are inferred

  • Usual assumption for the use of a parametric test: sample data point values for the variable  are from a population in which that parameter is normally distributed

  • The Mann-Whitney-U test is a form of a non-parametric test, used when the data point values from a sample group of participants are not taken for a population in which that variable is normally distributed

  • ANOVA is used to compare the means of more than two groups

  • Mann-Whitney-U-test is a test for ordinal variables from a sample in which the underlying population does not have that variable normally distributed

  •  
  • The Kruskal-Wallis test is a non-parametric statistical test used to compare the distributions of two or more independent groups

  • The Kruskal-Wallis test  is an extension of the Mann-Whitney U test, which is used to compare two groups

  • The Kruskal-Wallis test is suitable when the assumptions of normality and equal variances required by parametric tests like ANOVA cannot be met, or when dealing with ordinal or non-normally distributed data.

  • A confidence level of 95% typically  means that if repeated many times, 95% of studies would have the true population mean between the different lower and upper values that each study would provide

  • Clinical research by hypothesis testing is calculated by constructing a probability density given a true null hypothesis

  • Correlation testing through linear regression can not prove causation even if such causation does exist

  • t-Test assuming equal variances (Student’s t-test) is a test for numerical variables from a sample in which the underlying population has that variable normally distributed

  • t-Test assuming unequal variances (Welch’s t-test) is a test for numerical variables from a sample in which the underlying population has that variable normally distributed

  • Paired-sample t-test is a test for numerical variables from a sample in which the underlying population has that variable normally distributed

  • Fisher’s exact test is appropriate to dealing with proportions of nominal categorical data type values

If you have any questions, comments, or specific details related to the tips provided above or if there’s anything you would like to discuss with Dany Djeudeu, please feel free to reach out