Whole Genome Log Ratio Association Tests and PCA
Below are four p-value plots depicting the results of a whole genome log ratio association test performed on publicly available Affymetrix 500k data with ~3,500 samples. Below left shows the initial results with spurious associations across the entire genome, in this case caused by batch effects or other stratification issues. Below right shows the results after employing PCA corrections. There are still spurious associations, but fewer than initially.
Applying median smoothing to the resulting p-values (below left) dramatically reduces the noise in the data leaving only a few significant regions. Taking a closer look into one of the significant regions (below right), we observe a nine marker segment where each marker has a similarly low p-value, giving more confidence in the results.
For more details on PCA, see Correction for Stratification in the latest Manual.