二代和三代组装方法,以及基因预测方法
Genome assembly of different species was carried out with Falcon (ver 0.5.0; Chin et al., 2016) and
were improved using Quiver (Chin et al., 2013) and finisherSC (Lam, LaButti, Khalak, & Tse, 2015).
For assembly of individual strains of P. noxius, Illumina paired end reads were trimmed with
Trimmomatic (ver 0.32; options LEADING:30 TRAILING:30 SLIDINGWINDOW:4:30 MINLEN:50; Bolger,
Lohse, & Usadel, 2014) and subsequently assembled using SPAdes (ver 3.7.1; Bankevich et al., 2012).
Multiple mate-pair reads were available for three strains of P. noxius (KPN91, A42 and 718-S1) and
they were assembled using ALLPATH-LG (ver 49688; Butler et al., 2008) assembler and improved
using Pilon (Walker et al., 2014). The P. noxius assembly was further merged with metassembler (ver
1.5; Wences & Schatz, 2015), misassemblies were identified using REAPR (ver 1.0.18; Hunt et al.,
2013) and manually corrected.
For P. noxius, the gene predictor Augustus (ver3.2.1; Stanke, Tzvetkova, & Morgenstern, 2006) was trained on a gene training set of complete core genes from CEGMA (ver2.5; Parra, Bradnam, & Korf, 2007) and subsequently used for manual curation of ~1000 genes. Annotation was then run by providing introns as evidence from RNA-seq data. For P. lamaensis, P. sulphurascens and P. pini, genes were predicted using Braker1 (ver 1.9; Hoff, Lange, Lomsadze, Borodovsky, & Stanke, 2016) pipeline that automatically use RNA-seq mappings as evidence hints and retraining of GeneMark-ES (Borodovsky & Lomsadze, 2011) and Augustus. Gene product description was assigned using blast2go (ver 4.0.7; Conesa et al., 2005) and GO term assignment were provided by ARGOT2.5 (Lavezzo, Falda, Fontana, Bianco, & Toppo, 2016). The web server dbCAN (HMMs 5.0, last accessed September 5 2016; Yin et al., 2012) was used to predict CAZymes from the protein sequences of all species, while AntiSMASH (ver 3.0; Weber et al., 2015) was used to predict secondary metabolite gene clusters. For dbCAN results, only hits with <= 1 x 10e-5 e-value and >= 30% HMM coverage were considered, while overlapping domains were resolved by choosing hits with the smallest P-value. Proteome completeness were assessed with BUSCO (ver 2.0; Simão, Waterhouse, Ioannidis, Kriventseva, & Zdobnov, 2015) using the Basidiomycota dataset.
参考文献:https://onlinelibrary.wiley.com/doi/abs/10.1111/mec.14359
如果觉得我的文章对您有用,请随意打赏。你的支持将鼓励我继续创作!