Your Guide to Choosing the Right Test for Your Data (Part I)
Have you ever had a dataset and found yourself lost and confused about which statistical significance test is most suitable to answer your research question? Well, let me assure you, you’re not alone. I was once that person! Despite my respect for Statistics, I never had a great passion for it. In this article, I will focus on unraveling some key concepts to help you make informed decisions when choosing the right statistical significance test for your data. Since performing statistical significant testing essentially involves dealing with variables (independent and dependent), I find it imperative to pay a visit to the different types of those variables.
Types of data:
1- Categorical or nominal
A categorical (or nominal) variable has two or more categories without intrinsic order. For instance, eye color is a categorical variable with categories like blue, green, brown, and hazel. There is no agreed way to rank these categories. If a variable has a clear order, it is an ordinal variable, discussed below.
2- Ordinal
An ordinal variable is like a categorical variable, but with a clear order. For example, consider customer satisfaction levels: dissatisfied, neutral, and satisfied. These categories can be ordered, but the spacing between them is not consistent. Another example is pain severity: mild, moderate, and severe. Although we can rank these levels, the difference in pain between each category varies. If the categories were equally spaced, the variable would be an interval variable.
3- Interval or numerical
An interval (or numerical) variable, unlike an ordinal variable, has equally spaced intervals between values. For instance, consider temperature measured in Celsius. The difference between 20°C and 30°C is the same as between 30°C and 40°C. This equal spacing distinguishes interval variables from ordinal variables.
Are you still pondering the consequences of not correctly identifying the type of data? Let’s clarify with a simple example. Imagine needing to compute the mean of a dataset that is categorical or ordinal. Does this hold any meaningful interpretation? For instance, what would the average “eye color” signify? It’s clearly nonsensical. That said, it should also be emphasized that the type of data is not the only factor in determining the statistical test. The number of the independent and the dependent variable stands on an equal footing with the type of the data.
I would also like to remind you that there is no need to be intimidated by the number of tests to be discussed. One good way to think about these tests is that they are different approaches to calculate the p-value. The p-value itself can be conceived as a measure of the statistical compatibility of the data with the null hypothesis. That is,
Now, let us delve without any further due into the different tests that you need to understand when and how to use.
Statistical tests:
1- One sample student’s t-test
The one-sample t-test is a statistical test used to determine whether the mean of a single sample (from a normally distributed interval variable) of data significantly differs from a known or hypothesized population mean. This test is commonly used in various fields to assess whether a sample is representative of a larger population or to test hypotheses about population means when the population standard deviation is unknown.
import pandas as pd
from scipy import stats
import numpy as np
# Sample data (scores of 20 students)
scores = [72, 78, 80, 73, 69, 71, 76, 74, 77, 79, 75, 72, 70, 73, 78, 76, 74, 75, 77, 79]
# Population mean under the null hypothesis
pop_mean = 75
# Create a pandas DataFrame
df = pd.DataFrame(scores, columns=['Scores'])
# Calculate sample mean and sample standard deviation
sample_mean = df['Scores'].mean()
sample_std = df['Scores'].std(ddof=1) # ddof=1 for sample standard deviation
# Number of observations
n = len(df)
# Perform one-sample t-test
t_statistic, p_value = stats.ttest_1samp(df['Scores'], pop_mean)
# Critical t-value for two-tailed test at alpha=0.05 (95% confidence level)
alpha = 0.05
t_critical = stats.t.ppf(1 - alpha/2, df=n-1)
# Output results
print("Sample Mean:", sample_mean)
print("Sample Standard Deviation:", sample_std)
print("Degrees of Freedom (df):", n - 1)
print("t-statistic:", t_statistic)
print("p-value:", p_value)
print("Critical t-value (two-tailed, α=0.05):", t_critical)
# Decision based on p-value
if p_value < alpha:
print("Reject the null hypothesis. There is a significant difference between the sample mean and the population mean.")
else:
print("Fail to reject the null hypothesis. There is no significant difference between the sample mean and the population mean.")
2- Binomial test
The test is used to determine if the proportion of successes in a sample is significantly different from a hypothesized proportion. It’s particularly useful when dealing with binary outcomes, such as success/failure or yes/no scenarios. This test is widely used in fields such as medicine, marketing, and quality control, where determining the significance of proportions is crucial.
from scipy import stats
# Define the observed number of successes and the number of trials
observed_successes = 55
n_trials = 100
hypothesized_probability = 0.5
# Perform the binomial test
p_value = stats.binom_test(observed_successes, n_trials, hypothesized_probability, alternative='two-sided')
print('Results of the binomial test:')
print(f'Observed successes: {observed_successes}')
print(f'Number of trials: {n_trials}')
print(f'Hypothesized probability: {hypothesized_probability}')
print(f'P-value: {p_value}')
# Set significance level
alpha = 0.05
# Decision based on p-value
if p_value < alpha:
print("Reject the null hypothesis: The coin is not fair.")
else:
print("Fail to reject the null hypothesis: There is no evidence to suggest the coin is not fair.")
3- Chi-square goodness of fit
The test is used to determine if an observed frequency distribution of a categorical variable differs significantly from an expected distribution. It helps assess whether the observed data fits a specific theoretical distribution. This test is widely used in fields such as genetics, marketing, and psychology to validate hypotheses about distributions.
import numpy as np
from scipy.stats import chisquare
# Observed frequencies
observed = np.array([25, 30, 20, 25])
# Expected frequencies for a uniform distribution
expected = np.array([25, 25, 25, 25])
# Perform Chi-Square Goodness of Fit test
chi2_stat, p_value = chisquare(f_obs=observed, f_exp=expected)
print('Results of the Chi-Square Goodness of Fit test:')
print(f'Observed frequencies: {observed}')
print(f'Expected frequencies: {expected}')
print(f'Chi-square statistic: {chi2_stat}')
print(f'P-value: {p_value}')
# Set significance level
alpha = 0.05
# Decision based on p-value
if p_value < alpha:
print("Reject the null hypothesis: The observed distribution does not fit the expected distribution.")
else:
print("Fail to reject the null hypothesis: The observed distribution fits the expected distribution.")
4- Two independent samples t-test
The test is used to compare the means of a normally distributed continuous dependent variable between two independent groups.For instance, imagine we’re assessing the impact of a medical intervention. We recruit 100 participants, randomly assigning 50 to a treatment group and 50 to a control group. Here, we have two distinct samples, making the unpaired t-test appropriate for comparing their outcomes.
import numpy as np
from scipy import stats
# Generate example data (normally distributed)
np.random.seed(42) # for reproducibility
treatment_group = np.random.normal(loc=75, scale=10, size=50)
control_group = np.random.normal(loc=72, scale=10, size=50)
# Perform independent samples t-test
t_statistic, p_value = stats.ttest_ind(treatment_group, control_group)
# Decision based on p-value
alpha = 0.05
if p_value < alpha:
print("Reject the null hypothesis: There is a significant difference in the treatment effect between groups.")
else:
print("Fail to reject the null hypothesis: There is no significant difference in the treatment effect between groups.")
5- Wilcoxon-Mann-Whitney test (Mann-Whitney U test)
It is a non-parametric test, meaning it makes no assumptions about the variables distributions, used to compare the medians of two independent groups. It assesses whether the distributions of two samples are different without assuming the data follow a specific distribution. This test is particularly useful when the assumptions of the independent samples t-test (such as normality and equal variance) are not met or when analyzing ordinal or interval data that do not meet parametric assumptions.
import numpy as np
from scipy.stats import mannwhitneyu
# Generate example data
np.random.seed(42) # for reproducibility
group1 = np.random.normal(loc=50, scale=10, size=30)
group2 = np.random.normal(loc=55, scale=12, size=35)
# Perform Wilcoxon-Mann-Whitney test
statistic, p_value = mannwhitneyu(group1, group2)
# Print results
print('Results of the Wilcoxon-Mann-Whitney test:')
print(f'Statistic: {statistic}')
print(f'P-value: {p_value}')
# Decision based on p-value
alpha = 0.05
if p_value < alpha:
print("Reject the null hypothesis: The distributions of the two groups are significantly different.")
else:
print("Fail to reject the null hypothesis: There is no significant difference in the distributions of the two groups.")
6- Chi-square test of independence
The chi-square test of independence is used to determine if there is a significant association between two categorical variables. It helps identify whether the distribution of one variable is independent of the other. This test is widely applied in fields like marketing, social sciences, and biology. To perform this test, you first need to pivot the data to create a contingency table, as shown in the Python code below. Additionally, the chi-square test assumes that the expected value for each cell is five or higher. To find the expected value of a specific cell, we multiply the row total by the column total and then divide by the grand total. If this condition is not verified, we must use the next test
import numpy as np
import pandas as pd
from scipy.stats import chi2_contingency
# Create a contingency table
data = pd.DataFrame({
'Gender': ['Male', 'Male', 'Female', 'Female'],
'Preference': ['Yes', 'No', 'Yes', 'No'],
'Count': [20, 10, 30, 40]
})
# Pivot the data to get the contingency table
contingency_table = data.pivot(index='Gender', columns='Preference', values='Count').fillna(0).values
# Perform Chi-Square Test of Independence
chi2_stat, p_value, dof, expected = chi2_contingency(contingency_table)
# Print results
print('Results of the Chi-Square Test of Independence:')
print(f'Chi-square statistic: {chi2_stat}')
print(f'P-value: {p_value}')
print(f'Degrees of freedom: {dof}')
print('Expected frequencies:')
print(expected)
# Decision based on p-value
alpha = 0.05
if p_value < alpha:
print("Reject the null hypothesis: There is a significant association between gender and product preference.")
else:
print("Fail to reject the null hypothesis: There is no significant association between gender and product preference.")
7- Fisher’s exact test
The test can be thought of as an alternative to chi-square test when one or more of your contingency table cells has an expected frequency of less than five. This makes it particularly valuable for small sample sizes or when dealing with sparse data.
import numpy as np
from scipy.stats import fisher_exact
# Create a contingency table
# Example data: treatment group vs. control group with success and failure outcomes
# Treatment group: 12 successes, 5 failures
# Control group: 8 successes, 7 failures
contingency_table = np.array([[12, 5],
[8, 7]])
# Perform Fisher's Exact Test
odds_ratio, p_value = fisher_exact(contingency_table)
# Print results
print('Results of Fisher's Exact Test:')
print(f'Odds ratio: {odds_ratio}')
print(f'P-value: {p_value}')
# Decision based on p-value
alpha = 0.05
if p_value < alpha:
print("Reject the null hypothesis: There is a significant association between the treatment and the outcome.")
else:
print("Fail to reject the null hypothesis: There is no significant association between the treatment and the outcome.")
8- Paired t-test
This is the ‘dependent’ version of the student’s t-test that I have covered previously. The test is used to compare the means of two related groups to determine if there is a statistically significant difference between them. This test is commonly applied in before-and-after studies, or when the same subjects are measured under two different conditions.
import numpy as np
from scipy.stats import ttest_rel
# Example data: test scores before and after a training program
before = np.array([70, 75, 80, 85, 90])
after = np.array([72, 78, 85, 87, 93])
# Perform paired t-test
t_statistic, p_value = ttest_rel(before, after)
# Print results
print('Results of the paired t-test:')
print(f'T-statistic: {t_statistic}')
print(f'P-value: {p_value}')
# Decision based on p-value
alpha = 0.05
if p_value < alpha:
print("Reject the null hypothesis: There is a significant difference between the before and after scores.")
else:
print("Fail to reject the null hypothesis: There is no significant difference between the before and after scores.")
Okay, were you able to guess the next test? If you are thinking that we will now relax the normality condition and thus need a non-parametric test, then congratulations. This non-parametric test is called the Wilcoxon Signed-Rank Test.
9- Wilcoxon signed-rank test
The Wilcoxon Signed-Rank Test is a non-parametric test used to compare two related samples or repeated measurements on a single sample to assess whether their population mean ranks differ. It is often used as an alternative to the paired t-test when the data does not meet the assumptions of normality.
import numpy as np
from scipy.stats import wilcoxon
# Example data: stress scores before and after a meditation program
before = np.array([10, 15, 20, 25, 30])
after = np.array([8, 14, 18, 24, 28])
# Perform Wilcoxon signed-rank test
statistic, p_value = wilcoxon(before, after)
# Print results
print('Results of the Wilcoxon Signed-Rank Test:')
print(f'Statistic: {statistic}')
print(f'P-value: {p_value}')
# Decision based on p-value
alpha = 0.05
if p_value < alpha:
print("Reject the null hypothesis: There is a significant difference between the before and after scores.")
else:
print("Fail to reject the null hypothesis: There is no significant difference between the before and after scores.")
10- McNemar test
Yes, it is exactly what you are thinking of; this is the counterpart of the paired t-test but for when the dependent variable is categorical.
import numpy as np
from statsmodels.stats.contingency_tables import mcnemar
# Create a contingency table
# Example data: before and after treatment
# [[before success, before failure],
# [after success, after failure]]
contingency_table = np.array([[15, 5], # before success, before failure
[3, 17]]) # after success, after failure
# Perform McNemar test
result = mcnemar(contingency_table, exact=True)
# Print results
print('Results of the McNemar Test:')
print(f'Statistic: {result.statistic}')
print(f'P-value: {result.pvalue}')
# Decision based on p-value
alpha = 0.05
if result.pvalue < alpha:
print("Reject the null hypothesis: There is a significant difference between before and after proportions.")
else:
print("Fail to reject the null hypothesis: There is no significant difference between before and after proportions.")
Conclusion
In this part, I have covered three main groups of common statistical tests. The first group is necessary when analyzing a single population (One Sample Student’s t-test, Binomial test, and Chi-square goodness of fit). The second group (two independent samples t-test, Mann-Whitney U test, and Chi-square test of independence (Fisher’s exact test)) focuses on calculating p-values when examining the relationship between one dependent variable and one independent variable (specifically with exactly two independent groups). In the third group, I addressed tests (paired t-test, Wilcoxon signed-rank test, and McNemar test) required when assuming dependence between the two levels of the independent variable.
In Part II, I will explore the tests specifically required when increasing the number of levels (both independent and dependent) of a single independent variable beyond two.
References:
[2] https://stats.oarc.ucla.edu/other/mult-pkg/whatstat/#assumption
[3] https://www.stat.berkeley.edu/~aldous/Real_World/ASA_statement.pdf
Mastering Statistical Tests was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
Mastering Statistical Tests