Chapter 15 Genome-wide association (GWAS) data
New tools for genetic/genomic data analysis are being created at a mind-boggling rate. It seems like a new paper in this area comes out each week – which makes it a challenge to summarize the tools available for working in this area. With this in mind, consider the set of tools and tips provided here as select suggestions from my own experience rather than an exhaustive list.
A tutorial for GWAS in R
is available here.
15.1 Principal component analysis (PCA)
One concept that comes up often in GWAS data is principal components – this CrossValidated post is the best explanation I have read on PCA. Another site that may be helpful is this one, which has examples of visualizing PCA in R
.
15.2 R
packages
15.2.3 for summary-level GWAS data
- PLACO: this isn’t a ‘package’ per se, but this method for assessing pleiotropy between traits is implemented in
R
.
15.4 Command-line tools
PLINK is arguably the most established tool for managing and analyzing genetic/genomic data. Definitely worth learning if you are interested in working with this kind of data. This is probably the best tool to start learning if you are new to the field.
LDSC is a tool for estimating heritability and genetic correlation from GWAS summary statistics.
HEELS does heritability estimation with high-efficiency using LD and summary statistics.