Genotyping summary | Australian Autism Biobank

SNP genotyping quality control (QC) and imputation

SNP genotyping was performed with the Illumina Global Screening Array v1 and v2. The GenomeStudio v2.0.4 software was used to call genotypes and filter low quality samples prior to strand alignment and standard quality control procedures performed in PLINK1.91,2. Cross-referenced allele frequencies to the Haplotype Reference Consortium reference was performed.

Imputation to the Haplotype Reference Consortium3 was performed using the Sanger Imputation Service, with pre-phasing performed using EAGLE2 software4.

Description of input GWAS summary statistics

We used summary statistics from GWAS for height5, ASD6, IQ7 and chronotype8.

SBayesR PGS weighting

We generated polygenic scores (PGS) using SBayesR5 – a Bayesian method that takes GWAS summary statistics as input. For height only, there was an additional step to filter GWAS SNPs with the software package DENTIST9, to remove inconsistent imputed Z-scores based on the linkage disequilibrium reference matrix and observed GWAS Z-scores, which improved convergence of the SBayesR algorithm.

PGS calculation

To generate PGS for each trait, we multiplied the best guess genotypes in the target sample (i.e. AAB individuals and UKB controls) by the effect sizes (reweighted by SBayesR, with the addition of DENTIST for the height analysis), using the PLINK –score function. PGS scores were standardised by subtracting the mean and dividing by the standard deviation of the UKB controls.

Between-group PGS differences

We tested for a mean difference in PGS for each trait between ASD, SIB and UNR AAB experimental groups using Z-tests. To improve power to test for differences, we added the group of unrelated controls of European ancestry from the UKB (as described above).

Citations

  1. Chang, C. C., Chow, C. C., Tellier, L. C., Vattikuti, S., Purcell, S. M., & Lee, J. J. (2015). Second-generation PLINK: Rising to the challenge of larger and richer datasets. Gigascience, 4(1), 7. ^
  2. Purcell, S. M., & Chang, C. C. (2015). PLINK 1.9. ^
  3. McCarthy, S., Das, S., Kretzschmar, W., Delaneau, O., Wood, A. R., Teumer, A., et al. (2016). A reference panel of 64,976 haplotypes for genotype imputation. Nature Genetics, 48(10), 1279-1283. ^
  4. Loh, P. R., Danecek, P., Palamara, P. F., Fuchsberger, C., Reshef, Y. A., Finucane, H. K., & Durbin, R. (2016). Reference-based phasing using the Haplotype Reference Consortium panel. Nature Genetics, 48(11), 1443-1448. ^
  5. Lloyd-Jones, L. R., Zeng, J., Sidorenko, J., Yengo, L., Moser, G., Kemper, K. E., et al. (2019). Improved polygenic prediction by Bayesian multiple regression on summary statistics. Nature Communications, 10(1), 5086. ^
  6. Grove, J., Ripke, S., Als, T. D., Mattheisen, M., Walters, R. K., Won, H., et al. (2019). Identification of common genetic risk variants for autism spectrum disorder. Nature Genetics, 51(3), 431-444. ^
  7. Savage, J. E., Jansen, P. R., Stringer, S., Watanabe, K., Bryois, J., de Leeuw, C. A., et al. (2018). Genome-wide association meta-analysis in 269,867 individuals identifies new genetic and functional links to intelligence. Nature Genetics, 50(7), 912-919. ^
  8. Jones, S. E., Lane, J. M., Wood, A. R., van Hees, V. T., Tyrrell, J., Beaumont, R. N., et al. (2019). Genome-wide association analyses of chronotype in 697,828 individuals provides insights into circadian rhythms. Nature Communications, 10(1), 343. ^
  9. Chen, W., Wu, Y., Zheng, Z., Qi, T., Visscher, P. M., Zhu, Z., et al. (2020). Improved analyses of GWAS summary statistics by reducing data heterogeneity and errors. bioRxiv. ^