fastq和fasta 文件中序列ID都是唯一的,如果出现不唯一的情况,就需要给他去重复。这有一脚本可 实现此功能。
脚本帮助:
Usage
Forced parameter:
-fa fasta file <infile>
-fq fastq file <infile>
-fq1 fastq read1 file <infile>
-fq2 fastq read2 file <infile>
-f input file type,"fa"or"fq" <infile> must be given
-od output dir <outfile> must be given
-n output filename <outfile> must be given
Other parameter:
-h Help document
脚本既可以输入fasta格式的,也可以输入fastq格式序列文件。
-fa选项后跟输入的fasta文件,-fq选项后跟输入的fastq文件,如果fastq序列为双端测序的,则-fq1后跟read1序列,-fq2后跟read2序列。-n后跟输出文件前缀名,-od后跟输出目录。
脚本如下:
use Getopt::Long;
use strict;
use Bio::SeqIO;
use Bio::Seq;
#get opts
my %opts;
GetOptions(\%opts, "fa=s", "fq=s", "fq1=s", "fq2=s", "f=s", "od=s", "n=s", "h");
if(!defined($opts{f}) || !defined($opts{od}) || !defined($opts{n}) || defined($opts{h}))
{
print <<"Usage End.";
Usage
Forced parameter:
-fa fasta file <infile>must be given
-fq fastq file <infile>must be given
-fq1fastq read1 file <infile>must be given
-fq2fastq read2 file <infile>must be given
-f input file type,"fa"or"fq" <infile>must be given
-odoutput dir <outfile>must be given
-noutput filename <outfile>must be given
Other parameter:
-h Help document
Usage End.
exit;
}
if($opts{f} eq "fa" && defined($opts{fa})){
my$read1 = $opts{fa};
open my $FQ1 ,"zcat $read1|" or die "$!";
my$fq1=Bio::SeqIO->new(-fh=>$FQ1,-format=>'fasta');
open my $GZ1 ,"| gzip >$opts{od}/${n}.fa.gz" or die $!;
my$out1 = Bio::SeqIO->new(-fh => $GZ1 , -format => 'fasta');
my %id;
while ( my $obj1=$fq1->next_seq() ) {
my $id1=$obj1->id;
if(exists $id{$id1}){
next;
}else{
$id{$id1} = 1;
}
$out1->write_seq($obj1);
}
}
if($opts{f} eq "fq"){
if(defined($opts{fq})){
my$read1 = $opts{fq};
open my $FQ1 ,"zcat $read1|" or die "$!";
my$fq1=Bio::SeqIO->new(-fh=>$FQ1,-format=>'fastq');
open my $GZ1 ,"| gzip >$opts{od}/${n}.fq.gz" or die $!;
my$out1 = Bio::SeqIO->new(-fh => $GZ1 , -format => 'fastq');
my %id;
while ( my $obj1=$fq1->next_seq()) {
my $id1=$obj1->id;
if(exists $id{$id1}){
next;
}else{
$id{$id1} = 1;
}
$out1->write_seq($obj1);
}
}elsif(defined($opts{fq1}) && defined($opts{fq2})){
my$read1 = $opts{fq1};
my$read2 = $opts{fq2};
open my $FQ1 ,"zcat $read1|" or die "$!";
my$fq1=Bio::SeqIO->new(-fh=>$FQ1,-format=>'fastq');
open my $FQ2 ,"zcat $read2|" or die "$!";
my$fq2=Bio::SeqIO->new(-fh=>$FQ2,-format=>'fastq');
open my $GZ1 ,"| gzip >$opts{od}/${n}_R1.fq.gz" or die $!;
my$out1 = Bio::SeqIO->new(-fh => $GZ1 , -format => 'fastq');
open my $GZ2 ,"| gzip >$opts{od}/${n}_R2.fq.gz" or die $!;
my$out2 = Bio::SeqIO->new(-fh => $GZ2 , -format => 'fastq');
my %id;
while ( my $obj1=$fq1->next_seq() and my $obj2=$fq2->next_seq() ) {
my ($id1,$id2)=($obj1->id,$obj2->id);
if(exists $id{$id1}){
next;
}else{
$id{$id1} = 1;
}
$out1->write_seq($obj1);
$out2->write_seq($obj2);
}
}
}
更多perl语言知识可学习 Perl语言高级编程 !
如果觉得我的文章对您有用,请随意打赏。你的支持将鼓励我继续创作!