Reference to this article: ConductScience, Hypothesis Testing (2022). doi.org/10.55157/CS20220615
What is Hypothesis Testing?
Hypothesis testing is a method used to find out the representable validity of a given probable outcome being tested for a defined significance value in a sample. For a set of potential probability distributions, a hypothesis will be tested in comparison to an alternative hypothesis with a pre-defined significance level, also called a confidence level. By hypothesis testing, we want to come to a statistical inference based on the comparison of hypotheses with a defined significance level so that the resulting data value does not fall under the null hypothesis. (Rice, John A, 2007).
The process of hypothesis testing starts with establishing a preliminary hypothesis to be proven, with its corresponding null and alternative hypotheses. Once the hypotheses are defined, some valid assumptions regarding the sample are made to assess whether any interdependence (or lack of it) exists. Then the test statistic (T) is determined so that the distribution under the null hypothesis can be determined to be either simple or composite. For subsequent testing, a specific significance level (α) is chosen, which is normally between 1% and 5%. In order to determine the critical region, the point where the test statistic rejects or accepts the null hypothesis, the distribution partition is selected. Once all these are set, the values of the test statistic are observed repeatedly and calculated so that the decision to either accept or reject the null hypothesis can be taken.
At this point, it will be critical to understanding the distinguishing factor between ‘accept the null hypothesis’ and ‘fail to reject.’ Simply accepting the null hypothesis means the initial assumption regarding the test was true, which is not always the case. Failing to reject the null hypothesis, however, means that after testing, no significant confirmatory or contradictory results can be observed. Therefore, this must be either re-tested, or the initial hypotheses must be re-phrased.
Terminologies
Before getting into any further detail for a technical understanding, we need to know the following key concepts:
Alternative hypothesis H1
It is the new hypothesis based on literature review and previous studies, in contrast, but consistent with the null hypothesis.
Critical region (Region of rejection)
It is that area of test statistic values where the null hypothesis is appropriately rejected.
Critical value
It is the bracket for test statistic value where either it is accepted or rejected.
Errors
There are two types of errors that help to differentiate the null hypothesis from the alternative hypothesis:
- Type 1 Error: Null hypothesis is incorrectly rejected
- Type 2 Error: Null hypothesis is incorrectly accepted
Null hypothesis H0
A null hypothesis is the default state of a chosen argument proposing a lack of inter-relationship between the compared hypotheses being tested. The subsequent acceptance or rejection of the null hypothesis provides a reliable benchmark to move forward. Prior to reaching any definitive conclusion, the null hypothesis is implicitly accepted to be true unless the testing process proves otherwise.
P-value concept
It is the best probability of the null hypothesis being true.
Power of a test (1 − β)
It is the probability of accepting the alternative hypothesis, thereby appropriately rejecting the null hypothesis, where ‘power’ means the sensitivity of the test.
Region of acceptance
It is that area of test statistic values where the null hypothesis is failed to be rejected.
Statistical hypothesis
It is an assumption based on the specific features of a population, not just a sample.
Core Concept when used in Machine Learning
In hypothesis testing, the p-value results determine the number of probable outcomes of null hypothesis conditionality (Wasserman, L. 2004) where:
- Probability of p-value falling within the significance threshold or critical region
- Probability of p-value being less than the significance threshold
- Probability of p-value falling outside the significance threshold
It must be noted that the focus of hypothesis testing is on the principle of rejection, which means more rigorous logic is applied to determine the validity of a probable outcome. There are five main factors under which the probability of rejection functions.
- The test being one-tailed or two-tailed
- Significance level
- Standard deviation
- Extent of deviation from the null hypothesis
- Subjective appearance of the results controlled by the experimenter
A number of cautionary steps are advised to avoid misuse or misrepresentation of data within the hypothesis testing framework. In order to reduce type 2 errors, it is advised to consider larger sample sizes. It must be noted that the statistical significance of data in no way asserts the practical significance of the outcomes. Similarly, correlation does not equate to causation; therefore, it is not enough to simply reject the null hypothesis in order to reach a definitively correct outcome.
Practical Application
In data sciences, statistical tools like hypothesis testing play a critical role in justifying probable inferences, especially where no previous scientific theory or practice exists. Most significantly, the field of social sciences has benefitted greatly from hypothesis testing, although there has been some criticism of the application as well. Some of the important applications of hypothesis testing in the practical world are as follows:
Courtroom trials
In a courtroom trial setting, the default mode or null hypothesis works best since the assumption, ‘innocent until proven guilty,’ correlates to the H0 probability. Therefore the two hypotheses can be very clearly formulated to be not guilty vs. guilty.
Gender ratio
Hypothesis testing was initially used to prove the assumption of the equal gender distribution of human births back in the 1700s. The two hypotheses being as simple as true or false, it came out to be males having a greater probability of birth than females at that time, without any considerable explanation.
Other areas of application include, but are not limited to:
- Handwriting analysis claims
- Best ways to quit smoking for good
- The extent of behavioral effects of a full moon on humans and animals
- Verifying the origin of manuscripts
Still, a great deal of criticism exists on the validity of hypothesis testing since the results of an experiment are only as valid as the sample selection criteria and design. Hence, caution must be taken before admitting any results from a single source.
Conclusion
In the modern world, hypothesis testing has far-reaching consequences and can be seen in a variety of fields, from opinion polls to trends in biomedicine. For a mature statistical method, the process of hypothesis testing can be easily summarized as follows:
- Establish hypotheses
- Determine a significance level
- Evaluate point estimate
- Compute test statistic
- Assess p-value
- Deduce outcomes
In effect, it serves as an important ‘filter’ before investing time and money into any statistical outcome of consequence. Prior to building on a previous result, it will be practical to read into the details of the said experiment to look for any design or execution errors of the study. As mentioned earlier, more than a single source should be sought before taking the results of a single study for granted, since the most prevalent use of hypothesis testing remains the scientific deductions of experimental stats. Instead of repeating a faulty experiment to confirm a subsequently erroneous result, it will be prudent to initiate a critical test of the existing results to avoid misrepresentation and abuse of this sometimes misunderstood statistical method.