bioperl处理fasta和fastq序列截取反向互补翻译等等

截取,反向互补,读入,写出,筛选等
bioperl处理fasta序列最全代码:

序列的读取与输出



use Bio::SeqIO;
use Bio::Seq;
use Data::Dumper;
$in  = Bio::SeqIO->new(-file => "D:/share/scripts/Arabidopsis_thaliana.TAIR10.cds.all.fa" ,
                               -alphabet=>"dna",
                               -format => 'Fasta');
$out = Bio::SeqIO->new(-file => ">D:/share/scripts/aa.fa" ,
                               -format => 'fasta');
while ( my $seqobj = $in->next_seq() ) {
# the human read-able id of the sequence
my $id=$seqobj->id(); 
 
# string of sequence
my $seq=$seqobj->seq();
# a description of the sequence
my $desc=$seqobj->desc(); 
# one of 'dna','rna','protein'  https://www.bioinformatics.org/sms/iupac.html
my $alphabet=$seqobj->alphabet(); 
my $len=$seqobj->length();
print $id."\n";
print $seq."\n";
print $desc."\n";
print $alphabet."\n";
print $len."\n";
print $seqobj."\n";
print Dumper($seqobj);
$out->write_seq($seqobj);
last;
}

Fastq序列的处理:


$in1  = Bio::SeqIO->new(-file => "D:/share/scripts/test_1.fastq" ,
                               -format => 'fastq');
$in2  = Bio::SeqIO->new(-file => "D:/share/scripts/test_2.fastq" ,
                               -format => 'fastq');
                               
$out = Bio::SeqIO->new(-file => ">D:/share/scripts/test_1.fa" ,
                               -format => 'fasta');
while ( my $seqobj1 = $in1->next_seq() and  my $seqobj2 = $in2->next_seq()) {
# the human read-able id of the sequence
my $id=$seqobj->id(); 
# string of sequence
my $seq=$seqobj->seq(); 
# a description of the sequence
my $desc=$seqobj->desc(); 
# one of 'dna','rna','protein'  https://www.bioinformatics.org/sms/iupac.html
my $alphabet=$seqobj->alphabet(); 
my $len=$seqobj->length();
my $qual=$seqobj->qual();
print $id."\n";
print $seq."\n";
print $desc."\n";
print $alphabet."\n";
print $len."\n";
print "@{$qual}\n";
print Dumper($seqobj);
$out->write_seq($seqobj);
last;
}


序列的截取  反向互补序列:


my$seqobj = Bio::Seq->new(-seq => 'actgtggcgtcaact',-desc => 'Sample Bio::Seq object',-id =>"ID1");
# part of the sequence as a string
my$subseq=$seqobj->subseq(5,10); 
print $subseq."\n";
my $newSeqobj = Bio::Seq->new(-seq => $subseq,
       -desc => 'subseq 5-10',
       -id =>"ID2",
 );
$out->write_seq($newSeqobj);

trunc和subseq有区别,trunc返回还是原来的序列对象,只是截短了,而subseq返回的是截取的序列字符串,一般subseq使用的多一些:

$out  = Bio::SeqIO->new(-file => ">D:/share/scripts/subseq1.fa" ,
                               -format => 'Fasta');
 
my$subseqObj= $seqobj->trunc(5,10);
my$subseq=$seqobj->subseq(5,10);
print $subseqObj."\n";
print $subseq."\n";
$out->write_seq($subseqObj);
my$revcom= $seqobj->revcom;  #序列反向互补
print   $revcom."\n";
print $subseqObj."\n";
print   $seqobj."\n";


DNA序列翻译成蛋白质序列:

#参数   1  stop *
#     2   unknown amino acid ('X').

my$translate0=$seqobj->translate(undef,undef,0);
my$translate1=$seqobj->translate(undef,undef,1);
my$translate2=$seqobj->translate(undef,undef,2);

print Dumper($translate0)."\n";
print Dumper($translate1)."\n";
print Dumper($translate2)."\n";

推荐学习《perl高级编程》,有bioperl的详细讲解;

  • 发表于 2018-06-08 15:30
  • 阅读 ( 6292 )
  • 分类:perl

0 条评论

请先 登录 后评论
omicsgene
omicsgene

生物信息

700 篇文章

作家榜 »

  1. omicsgene 700 文章
  2. 安生水 348 文章
  3. Daitoue 167 文章
  4. 生物女学霸 120 文章
  5. xun 82 文章
  6. 红橙子 78 文章
  7. rzx 75 文章
  8. CORNERSTONE 72 文章