Library preparation and Illumina sequencing
Genomic DNA from each sample was prepared for libraries construction. The library preparations were sequenced by using an Illumina Hi...
Library preparation and Illumina sequencing
Genomic DNA from each sample was prepared for libraries construction. The library preparations were sequenced by using an Illumina HiSeqX platform and 150bp paired-end reads were generated.
Data filtering and alignment
The recently released genome of Agaricus_bisporus_var_bisporus was downloaded from Ensembl (http://fungi.ensembl.org/) and used as a reference genome. Using a custom C program, we filtered out the low quality reads based on the following criteria: (i) reads with ≥10 % unidentified nucleotides (N); (ii) reads for which more than 50 % of the read length had a Phred quality value ≤10; (iii) reads with the adapter. The cleaned data were aligned on to the reference genome using BWA-MEM (0.7.10-r789) with default parameters[1], SAMTOOLS was used to sort and index the resulting Binary Alignment Map (BAM) format files[2]. Mark duplicates in Picard tools(v1.102) (http://broadinstitute.github.io/picard/) was used to discard duplicates, and the final sorted bam results were used for downstream analysis;
Variant calling and filtering
After alignment, we performed SNP calling on a population scale using GATK Tool Kits version 3.6. GATK HaplotypeCaller (HC) was used to call variants[3]. Variants were kept for quality using the following parameters (1) mapping quality filter equal to PASS; (2) Quality Depth (QD) >2; (3) Mapping Quality (MQ) > 40; (5) QUAL >30; (5)MAF(minor allele frequencies) >0.05; Moreover,variants were further filtered if coverage < 4, if cluster SNPs more than 2 in 5bp window, if SNP around Indel within 5bp;
Population genetics analysis
Using the neighbour-joining method and a distance matrix calculated in PHYLIP 3.68 [4], a phylogenetic tree was constructed and displayed using ETE python package [5]. Using all SNPs, we evaluated the population structure of the XXX accessions in ADMIXTURE software[6]. The input parameter K in ADMIXTURE software ranged from 2 to 10, representing the simulated number of groups in ancient populations. A PCA of whole-genome SNPs was performed with the smartpca program embedded in the EIGENSOFT software[7].
LD analysis
LD was calculated for each subpopulation with SNPs with MAF >0.05. To evaluate LD decay, the coefficient of determination (r2) between any two loci was calculated using Haploview[8]. Average r2 was calculated for pairwise markers in a 500 kb window and averaged across the whole genome.
Genome scanning for selection
A sliding-window approach (40-kb windows sliding in 5-kb steps) was applied to calculated the nucleotide diversity (π) and genetic differentiation (Fst) between wild, and selection statistics (Tajima’s D, a measure of selection in the genome) using VCFtools[10];
Genome-wide association study
We used high-quality SNPs (MAF >0.05) to perform GWAS for traits related to quality in 200 accessions. The traits included XXX XXX . Association analyses were performed with TASSEL 5.0 with the compressed mixed linear model (P + G + Q + K)[9]. Kinship was derived from all these SNPs. The significant association threshold was set as 1/n (n, total SNP number).
基因组重测序数据分析视频课程:
或者扫码二维码:
Reference
[1] Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics.2009; 25(14):1754–60. doi: 10.1093/bioinformatics/btp324 PMID: 19451168
[2] Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. The Sequence Alignment/Map format and SAMtools.Bioinformatics. 2009;25(16):2078–9.
[3] McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome research. 2010; 20(9):1297–303. doi: 10.1101/gr.107524.110 PMID: 20644199
[4] Felsenstein, J. PHYLIP-phylogeny inference package (version 3.2). Cladistics 5,164–166 (1989).
[5] Jaime Huerta-Cepas, François Serra and Peer Bork. "ETE 3: Reconstruction, analysis and visualization of phylogenomic data." Mol Biol Evol (2016) doi:10.1093/molbev/msw046
[6] D.H. Alexander, J. Novembre, and K. Lange. Fast model-based estimation of ancestry in unrelated individuals. Genome Research, 19:1655–1664, 2009.
[7] Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
[8] Barrett, J.C., Fry, B., Maller, J. & Daly, M.J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263–265 (2005).
[9] Bradbury, P. J. et al. TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics 23, 2633–2635 (2007).
[10] Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27,2156–2158 (2011).