遗传进化英文方法参考

Library preparation and Illumina sequencing Genomic DNA from each sample was prepared for libraries construction. The library preparations were sequenced by using an Illumina Hi...
Library preparation and Illumina sequencing

Genomic DNA from each sample was prepared for libraries construction. The library preparations were sequenced by using an Illumina HiSeqX platform and 150bp paired-end reads were generated.


Data filtering and alignment
The recently released genome of Agaricus_bisporus_var_bisporus was downloaded from Ensembl (http://fungi.ensembl.org/) and used as a reference genome. Using a custom C program, we filtered out the low quality reads based on the following criteria: (i) reads with 10 % unidentified nucleotides (N); (ii) reads for which more than 50 % of the read length had a Phred quality value 10; (iii) reads with the adapter. The cleaned data were aligned on to the reference genome  using BWA-MEM (0.7.10-r789) with default parameters[1], SAMTOOLS  was used to sort and index the resulting Binary Alignment Map (BAM) format files[2]. Mark duplicates in Picard tools(v1.102) (http://broadinstitute.github.io/picard/) was used to discard duplicates, and  the final sorted bam results were used for downstream analysis;
 
Variant calling and filtering
After alignment, we performed SNP calling on a population scale using GATK Tool Kits version 3.6. GATK HaplotypeCaller (HC) was used to call variants[3]. Variants were kept for quality using the following parameters (1) mapping quality filter equal to PASS; (2) Quality Depth (QD) >2; (3) Mapping Quality (MQ) > 40; (5) QUAL >30; (5)MAF(minor allele frequencies) >0.05; Moreover,variants were further filtered if coverage < 4, if cluster SNPs more than 2 in 5bp window, if SNP around Indel within 5bp;
 
Population genetics analysis
Using the neighbour-joining method and a distance matrix calculated in PHYLIP 3.68 [4], a phylogenetic tree was constructed and displayed using ETE python package [5]. Using all SNPs, we evaluated the population structure of the XXX accessions in ADMIXTURE software[6]. The input parameter K in ADMIXTURE software ranged from 2 to 10, representing the simulated number of groups in ancient populations. A PCA of whole-genome SNPs was performed with the smartpca program embedded in the EIGENSOFT software[7].
LD analysis
LD was calculated for each subpopulation with SNPs with MAF >0.05. To evaluate LD decay, the coefficient of determination (r2) between any two loci was calculated using Haploview[8]. Average r2 was calculated for pairwise markers in a 500 kb window and averaged across the whole genome.
Genome scanning for selection
A sliding-window approach (40-kb windows sliding in 5-kb steps) was applied to calculated the nucleotide diversity (π) and genetic differentiation (Fst) between wild, and selection statistics (Tajima’s D, a measure of selection in the genome)  using VCFtools[10];
Genome-wide association study
We used high-quality SNPs (MAF >0.05) to perform GWAS for traits related to quality in  200 accessions. The traits included XXX  XXX . Association analyses were performed with TASSEL 5.0 with the compressed mixed linear model (P + G + Q + K)[9]. Kinship was derived from all these SNPs. The significant association threshold was set as 1/n (n, total SNP number).
 
 

基因组重测序数据分析视频课程:

或者扫码二维码:attachments-2020-07-QpiLKirI5f08330af1bee.png
 
 
Reference
[1] Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics.2009; 25(14):1754–60. doi: 10.1093/bioinformatics/btp324 PMID: 19451168
[2] Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. The Sequence Alignment/Map format and SAMtools.Bioinformatics. 2009;25(16):2078–9.
[3] McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome research. 2010; 20(9):1297–303. doi: 10.1101/gr.107524.110 PMID: 20644199
[4] Felsenstein, J. PHYLIP-phylogeny inference package (version 3.2). Cladistics 5,164–166 (1989).
[5] Jaime Huerta-Cepas, François Serra and Peer Bork. "ETE 3: Reconstruction, analysis and visualization of phylogenomic data."  Mol Biol Evol (2016) doi:10.1093/molbev/msw046
[6] D.H. Alexander, J. Novembre, and K. Lange. Fast model-based estimation of ancestry in unrelated individuals. Genome Research, 19:1655–1664, 2009.
[7] Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
[8] Barrett, J.C., Fry, B., Maller, J. & Daly, M.J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263–265 (2005).
[9] Bradbury, P. J. et al. TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics 23, 2633–2635 (2007).
[10] Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27,2156–2158 (2011).
  • 发表于 2018-05-16 21:23
  • 阅读 ( 3533 )
  • 分类:重测序

0 条评论

请先 登录 后评论
omicsgene
omicsgene

生物信息

702 篇文章

作家榜 »

  1. omicsgene 702 文章
  2. 安生水 350 文章
  3. Daitoue 167 文章
  4. 生物女学霸 120 文章
  5. xun 82 文章
  6. 红橙子 78 文章
  7. rzx 76 文章
  8. CORNERSTONE 72 文章