这个脚本如果直接从ensamble下载cds和pep还需要去提取吗，是不是可以直接转换格式

for species in Anser_brachyrhynchus Equus_caballus Homo_sapiens Mus_musculus Oryctolagus_cuniculus Ovis_aries_rambouillet Sus_scrofa.Sscrofa;do

genome=$datadir/$species.genome.fa.gz

gff=$datadir/$species.gff.gz

#保留蛋白编码基因

agat_sp_filter_feature_by_attribute_value.pl --gff $gff --attribute biotype --value protein_coding -t '!' -o $species.protein_coding.gff3

#保留最长转录本

agat_sp_keep_longest_isoform.pl --gff $species.protein_coding.gff3 -o $species.longest_isoform.gff3

#Ensembl 下载的基因组会在基因和mRNA ID前面加gene: 和 transcript:这里给去掉

sed -i 's/gene://' $species.longest_isoform.gff3

sed -i 's/transcript://' $species.longest_isoform.gff3

#提取cds序列

/share/work/biosoft/TransDecoder/latest/util/gff3_file_to_proteins.pl --gff3 $species.longest_isoform.gff3 --fasta $genome --seqType CDS >$species.cds.fa

#提取pep序列

/share/work/biosoft/TransDecoder/latest/util/gff3_file_to_proteins.pl --gff3 $species.longest_isoform.gff3 --fasta $genome --seqType prot >$species.pep.fa

#提取cds 转换成OrthoFinder 需要的序列格式

perl $scriptsdir/get_gene_longest_fa_for_OrthoFinder.pl -l 150 --gff $species.longest_isoform.gff3 --fa $species.cds.fa -p $species -o cds

#提取pep 转换成OrthoFinder 需要的序列格式

perl $scriptsdir/get_gene_longest_fa_for_OrthoFinder.pl -l 50 --gff $species.longest_isoform.gff3 --fa $species.pep.fa -p $species -o pep

done

0 条评论
分类：基因家族分析

这个脚本如果直接从ensamble下载cds和pep还需要去提取吗，是不是可以直接转换格式

0 个回答

相似问题