Quantile normalization, in the field of statistics, is a technique that makes two distributions identical in statistical properties. The two distributions in this instance, which we’ll discuss later, are the test and reference distributions. To make them identical in terms of statistical properties, the highest entry in the test and reference distributions should be aligned, followed by the next highest, and so on.
While it sounds complex, you can think of it as two lines of five students arranged by height (i.e., shortest to tallest). The first line could have Ross, Chandler, Joey, Gunther, and Frank, and the second could have Phoebe, Monica, Rachel, Ursula, and Janice. To quantile-normalize the lines, Ross and Phoebe (the shortest male and female, making them identical in the statistical property height) will be the first test and reference subjects, respectively, followed by Chandler and Monica, and so on.
Read More about “Quantile Normalization”
The primary purpose of normalization is to eliminate or minimize technical variability or eliminate outliers.
How Does Quantile Normalization Work?
This video tutorial explains quantile normalization in simple terms:
It shows how quantile normalization normalized the sample by identifying and aligning the most expressed genes in three samples with one another until the least expressed genes are aligned. Note that normalization requires getting the mean or average value of each set of variables (i.e., the highest expressed values) to align them. In the end, you’ll see the effect of the process on the datasets.
Here’s an example showing how normalization in general works. You have two datasets and need to get the normalized values for each expression.
|Dataset 1||Dataset 2||Normalized Value|
So instead of having the chart on the left, you’d have the one on the right.
The most popular normalization techniques are linear scaling, known as “min-max scaling”; Z-normalization; and rank-scaling, known as “linear interpolation.”
Are There Downsides to Quantile Normalization?
While quantile normalization is widely used, almost all normalization techniques are imperfect and even generate errors, especially when the data doesn’t meet the assumptions of the normalization technique used.
Take our example earlier. While Ross and Phoebe could be the shortest male and female, respectively, they may not be truly identical. Ross and Phoebe, for instance, could have massive height and weight differences. The same could be true for the rest of the subjects. That said, even if the datasets were ordered the same way, their physical differences could result in false positives and negatives.
What Are the Uses of Quantile Normalization?
Quantile normalization has several use cases, including:
- Microarray data analysis: This analysis is applied to thousands of genes from a sample (e.g., a specific tissue). A microarray is a laboratory tool that detects the expression of thousands of genes simultaneously. An example would be a DNA microarray or a microscope slide printed with thousands of tiny spots in defined positions. Each spot should contain a known DNA sequence or gene. The technique is used on:
- DNA microarrays
- MMChips used for the surveillance of microRNA populations
- Protein microarrays
- Peptide microarrays
- Tissue microarrays
- Cellular microarrays
- Chemical compound microarrays
- Antibody microarrays
- Carbohydrate arrays)
- Phenotype microarrays
- Reverse phase protein lysate microarrays
- Interferometric reflectance imaging sensors (IRISs)
- Removing technical variations in noisy data: This use case is only applicable when the expert using quantile normalization believes the differences among the data points are unwanted or would make the results inaccurate.
- Genomic sequencing: The technique is popular in machine learning (ML) projects involving genomic sequencing or gene editing. In computing, an example would be determining what a computer can learn or know.
- Proteomics: This process analyzes a large set of proteins systematically. It assumes that the proteins produced by a specific cell or organism satisfy the defined set of conditions for the experiment.
Quantile normalization, while imperfect, remains very helpful in the fields of science and technology. Many scientists who study various illnesses like cancer often use the technique. In such cases, they compare normal with cancer cells to determine what makes them different, allowing them to determine if a treatment works.
- Quantile normalization is a technique that makes two distributions identical statistically.
- While quantile normalization is widely used, it can be imperfect due to false positives and negatives. It’s also quite prone to errors.
- Quantile normalization is helpful for microarray data analysis, removing technical variations in noisy data, genomic sequencing, and proteomics.