Statistical Test Insights: Key and Common Tips for Accurate Analysis and Inference

Summary

Statistical tests are essential tools in research and data analysis, helping us draw meaningful conclusions from data. We explore some key and common knowledge and tips related to statistical tests. First, we point out the importance of developing a research question, stating hypotheses, and collecting relevant data. Understanding the p-value distinction between one-tail and two-tail tests is crucial for interpreting results accurately. We also highlight the role of standard deviation in reflecting data spread, considering confidence intervals, and their sensitivity to factors like sample size and bias. We delve into the assumptions underlying parametric tests, such as normal distribution of data, and introduce non-parametric alternatives like the Mann-Whitney-U and Kruskal-Wallis tests. Additionally, we explore the concepts of confidence level, hypothesis testing, correlation, and different types of t-tests. Finally, we touch on Fisher’s exact test for nominal categorical data. Armed with this knowledge, researchers can confidently select appropriate statistical tests for their analyses, ensuring reliable and valid outcomes.

Key Knowledge and Tips

For conducting proper research, you should Develop a research question, decide which variable will be required to answer the question, state the hypotheses, collect data for the variable and analyze it
For the analyses of exactly the same data point values for a variable, the p-value for a one-tail test will generally be half of a two-tail test
The standard deviation simply reflects the spread in the sample data point values for a numerical type variable, the values which themselves might be quite skewed
Confidence units are symmetric around the point estimate and the lower the confidence level, the less of the area under the curve is included and therefore the lower and upper values creep closer to the point estimate
The confidence intervals are subject to many factors including sample size, Bias in subject selection, Major differences in population from which sample subjects was taken and population to which results infer, Cherry picking data point values, discarding some
Confidence intervals reflect potential errors in relating the sample point estimate and the true population parameter for that variable. This is necessarily so due to the vagaries of research where the sample does not include the whole population, where bias can and does creep in and where the population from which the sample was taken does not always truly reflect the population to which these results are inferred
Usual assumption for the use of a parametric test: sample data point values for the variable are from a population in which that parameter is normally distributed
The Mann-Whitney-U test is a form of a non-parametric test, used when the data point values from a sample group of participants are not taken for a population in which that variable is normally distributed
ANOVA is used to compare the means of more than two groups
Mann-Whitney-U-test is a test for ordinal variables from a sample in which the underlying population does not have that variable normally distributed
The Kruskal-Wallis test is a non-parametric statistical test used to compare the distributions of two or more independent groups
The Kruskal-Wallis test is an extension of the Mann-Whitney U test, which is used to compare two groups
The Kruskal-Wallis test is suitable when the assumptions of normality and equal variances required by parametric tests like ANOVA cannot be met, or when dealing with ordinal or non-normally distributed data.
A confidence level of 95% typically means that if repeated many times, 95% of studies would have the true population mean between the different lower and upper values that each study would provide
Clinical research by hypothesis testing is calculated by constructing a probability density given a true null hypothesis
Correlation testing through linear regression can not prove causation even if such causation does exist
t-Test assuming equal variances (Student’s t-test) is a test for numerical variables from a sample in which the underlying population has that variable normally distributed
t-Test assuming unequal variances (Welch’s t-test) is a test for numerical variables from a sample in which the underlying population has that variable normally distributed
Paired-sample t-test is a test for numerical variables from a sample in which the underlying population has that variable normally distributed
Fisher’s exact test is appropriate to dealing with proportions of nominal categorical data type values

If you have any questions, comments, or specific details related to the tips provided above or if there’s anything you would like to discuss with Dany Djeudeu, please feel free to reach out

Dany Djeudeu is a versatile Freelance Data Scientist, Statistician, and AI Engineer with extensive industry experience. With a passion for solving complex data challenges, Dany is committed to helping organizations unlock the full potential of their data. His strong track record of delivering high-quality solutions and exceeding client expectations has earned him a reputation as a trusted partner. Contact Dany today for a free initial consultation: Please click on the “+” sign below.

Statistical Test Insights: Key and Common Tips for Accurate Analysis and Inference

Summary

Key Knowledge and Tips

For conducting proper research, you should Develop a research question, decide which variable will be required to answer the question, state the hypotheses, collect data for the variable and analyze it

For the analyses of exactly the same data point values for a variable, the p-value for a one-tail test will generally be half of a two-tail test

The standard deviation simply reflects the spread in the sample data point values for a numerical type variable, the values which themselves might be quite skewed

Confidence units are symmetric around the point estimate and the lower the confidence level, the less of the area under the curve is included and therefore the lower and upper values creep closer to the point estimate

The confidence intervals are subject to many factors including sample size, Bias in subject selection, Major differences in population from which sample subjects was taken and population to which results infer, Cherry picking data point values, discarding some

Usual assumption for the use of a parametric test: sample data point values for the variable are from a population in which that parameter is normally distributed

The Mann-Whitney-U test is a form of a non-parametric test, used when the data point values from a sample group of participants are not taken for a population in which that variable is normally distributed

ANOVA is used to compare the means of more than two groups

Mann-Whitney-U-test is a test for ordinal variables from a sample in which the underlying population does not have that variable normally distributed

The Kruskal-Wallis test is a non-parametric statistical test used to compare the distributions of two or more independent groups

The Kruskal-Wallis test is an extension of the Mann-Whitney U test, which is used to compare two groups

The Kruskal-Wallis test is suitable when the assumptions of normality and equal variances required by parametric tests like ANOVA cannot be met, or when dealing with ordinal or non-normally distributed data.

A confidence level of 95% typically means that if repeated many times, 95% of studies would have the true population mean between the different lower and upper values that each study would provide

Clinical research by hypothesis testing is calculated by constructing a probability density given a true null hypothesis

Correlation testing through linear regression can not prove causation even if such causation does exist

t-Test assuming equal variances (Student’s t-test) is a test for numerical variables from a sample in which the underlying population has that variable normally distributed

t-Test assuming unequal variances (Welch’s t-test) is a test for numerical variables from a sample in which the underlying population has that variable normally distributed

Paired-sample t-test is a test for numerical variables from a sample in which the underlying population has that variable normally distributed

Fisher’s exact test is appropriate to dealing with proportions of nominal categorical data type values

If you have any questions, comments, or specific details related to the tips provided above or if there’s anything you would like to discuss with Dany Djeudeu, please feel free to reach out

Subscribe To Us