对的 NCBI上的GFF文件格式不标准导致代码报错,
你可以换个地方下载基因组文件试试;
以下是日志
[root@b3b8d8255335 13:06:20 ~]# cd $workdir
[root@b3b8d8255335 13:06:34 /work/desaturase]# cd 01.data_prepare
[root@b3b8d8255335 13:06:41 /work/desaturase/01.data_prepare]# ll
total 0
-rwxr-xr-x 1 root root 152M Jul 23 16:04 Branchiostoma_floridae.fa.gz
-rwxr-xr-x 1 root root 11K Jul 31 11:25 Branchiostoma_floridae.gff.agat.log
-rwxr-xr-x 1 root root 14M Jul 23 16:05 Branchiostoma_floridae.gff.gz
-rwxr-xr-x 1 root root 2.7K Jul 31 12:04 Danio_rerio.GRCz11.112.chr.gff3.agat.log
-rwxr-xr-x 1 root root 17M Jul 23 20:47 Danio_rerio.GRCz11.112.chr.gff3.gz
-rwxr-xr-x 1 root root 629M Jul 23 19:22 Danio_rerio.GRCz11.dna.toplevel.fa.gz
-rwxr-xr-x 1 root root 117K Jul 17 17:53 degs.hmm
-r-xr-xr-x 1 root root 615K Jul 23 16:07 FAD_gen_result.tar.gz
-rwxr-xr-x 1 root root 1.3G Jul 25 00:07 GCF_902713615.1_sScyCan1.1_genomic.fa.gz
-rwxr-xr-x 1 root root 16M Jul 24 23:32 GCF_902713615.1_sScyCan1.1_genomic.gff.gz
-rwxr-xr-x 1 root root 18K Apr 22 19:16 general.hmm
-rwxr-xr-x 1 root root 45M Jul 23 17:23 Homo_sapiens.GRCh38.112.chr.gff3.gz
-rwxr-xr-x 1 root root 894M Jul 23 18:54 Homo_sapiens.GRCh38.dna.toplevel.fa.gz
-rwxr-xr-x 1 root root 9.1M Jul 23 18:16 Latimeria_chalumnae.LatCha1.112.gff3.gz
-rwxr-xr-x 1 root root 658M Jul 23 18:15 Latimeria_chalumnae.LatCha1.dna.toplevel.fa.gz
-rwxr-xr-x 1 root root 8.8M Jul 23 17:20 Petromyzon_marinus.Pmarinus_7.0.112.gff3.gz
-rwxr-xr-x 1 root root 187M Jul 23 17:38 Petromyzon_marinus.Pmarinus_7.0.dna.toplevel.fa.gz
drwxrwxrwx 1 root root 4.0K Jul 23 15:54 result
[root@b3b8d8255335 13:06:42 /work/desaturase/01.data_prepare]# genome=Branchiostoma_floridae.fa.gz
[root@b3b8d8255335 13:07:12 /work/desaturase/01.data_prepare]# gff=Branchiostoma_floridae.gff.gz
[root@b3b8d8255335 13:07:24 /work/desaturase/01.data_prepare]# species=Branchiostoma_floridae
[root@b3b8d8255335 13:07:32 /work/desaturase/01.data_prepare]# agat_sp_filter_feature_by_attribute_value.pl --gff $gff --attribute gene_biotype --value protein_coding -t '!' -o $species.protein_coding.gff3
Using standard /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/auto/share/dist/AGAT/config.yaml file
07/31/2024 at 13h07m54s
usage: /share/work/biosoft/perl/latest/bin/agat_sp_filter_feature_by_attribute_value.pl --gff Branchiostoma_floridae.gff.gz --attribute gene_biotype --value protein_coding -t ! -o Branchiostoma_floridae.protein_coding.gff3
We will discard all features that have the attribute gene_biotype with the value ne protein_coding.
Can not open Branchiostoma_floridae.gff.agat.log for printing: No such file or directoryprint() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/OmniscientI.pm line 151.
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/OmniscientI.pm line 152.
********************************************************************************
* - Start parsing - *
********************************************************************************
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
-------------------------- parse options and metadata --------------------------
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
=> Accessing the feature_levels YAML file
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
Using standard /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/auto/share/dist/AGAT/feature_levels.yaml file
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
=> Attribute used to group features when no Parent/ID relationship exists (i.e common tag):
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
* locus_tag
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
* gene_id
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
=> merge_loci option deactivated
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
=> Machine information:
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
This script is being run by perl v5.22.1
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
Bioperl location being used: /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/Bio/
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
Operating system being used: linux
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
=> Accessing Ontology
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
No ontology accessible from the gff file header!
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
We use the SOFA ontology distributed with AGAT:
/share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/auto/share/dist/AGAT/so.obo
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
Read ontology /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/auto/share/dist/AGAT/so.obo:
4 root terms, and 2596 total terms, and 1516 leaf terms
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
Filtering ontology:
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
We found 1861 terms that are sequence_feature or is_a child of it.
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
--------------------------------- parsing file ---------------------------------
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
=> Number of line in file: 1095208
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
=> Number of comment lines: 871
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
=> Fasta included: No
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
=> Number of features lines: 1094337
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
=> Number of feature type (3rd column): 15
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
* Level1: 5 => pseudogene region sequence_feature gene cDNA_match
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
* level2: 8 => rRNA snoRNA lnc_RNA transcript tRNA snRNA guide_RNA mRNA
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
* level3: 2 => CDS exon
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
* unknown: 0 =>
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
************** Too much WARNING message we skip the next **************
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.
=> Version of the Bioperl GFF parser selected by AGAT: 3
WARNING level1: This feature level1 is not a duplicate but has an ID already used.
/!\ AGAT might mix up the child features and create chimeric records.
Indeed we changed the ID for this L1 feature to be unique but we do not
change the Parent attribute of the child features to reflect this change.
Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.
@ the feature is:
NC_049979.1 RefSeq cDNA_match 22755 22861 . - . ID "nbis-cdna_match-1" ; Target "XM_035817707.1 705 811 +" ; for_remapping 2 ; gap_count 0 ; num_ident 5324 ; num_mismatch 0 ; pct_coverage "89.4189" ; pct_coverage_hiqual "89.4189" ; pct_identity_gap 100 ; pct_identity_ungap 100 ; rank 1
original id: 070fe4f1-79a5-4bfd-b9fa-36a8ba9c69b8
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297, <$fh> line 102032.
WARNING level1: This feature level1 is not a duplicate but has an ID already used.
/!\ AGAT might mix up the child features and create chimeric records.
Indeed we changed the ID for this L1 feature to be unique but we do not
change the Parent attribute of the child features to reflect this change.
Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.
@ the feature is:
NC_049979.1 RefSeq cDNA_match 22190 22297 . - . ID "nbis-cdna_match-2" ; Target "XM_035817707.1 812 919 +" ; for_remapping 2 ; gap_count 0 ; num_ident 5324 ; num_mismatch 0 ; pct_coverage "89.4189" ; pct_coverage_hiqual "89.4189" ; pct_identity_gap 100 ; pct_identity_ungap 100 ; rank 1
original id: 070fe4f1-79a5-4bfd-b9fa-36a8ba9c69b8
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297, <$fh> line 102033.
WARNING level1: This feature level1 is not a duplicate but has an ID already used.
/!\ AGAT might mix up the child features and create chimeric records.
Indeed we changed the ID for this L1 feature to be unique but we do not
change the Parent attribute of the child features to reflect this change.
Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.
@ the feature is:
NC_049979.1 RefSeq cDNA_match 21601 21722 . - . ID "nbis-cdna_match-3" ; Target "XM_035817707.1 920 1041 +" ; for_remapping 2 ; gap_count 0 ; num_ident 5324 ; num_mismatch 0 ; pct_coverage "89.4189" ; pct_coverage_hiqual "89.4189" ; pct_identity_gap 100 ; pct_identity_ungap 100 ; rank 1
original id: 070fe4f1-79a5-4bfd-b9fa-36a8ba9c69b8
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297, <$fh> line 102034.
WARNING level1: This feature level1 is not a duplicate but has an ID already used.
/!\ AGAT might mix up the child features and create chimeric records.
Indeed we changed the ID for this L1 feature to be unique but we do not
change the Parent attribute of the child features to reflect this change.
Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.
@ the feature is:
NC_049979.1 RefSeq cDNA_match 19974 20096 . - . ID "nbis-cdna_match-4" ; Target "XM_035817707.1 1042 1164 +" ; for_remapping 2 ; gap_count 0 ; num_ident 5324 ; num_mismatch 0 ; pct_coverage "89.4189" ; pct_coverage_hiqual "89.4189" ; pct_identity_gap 100 ; pct_identity_ungap 100 ; rank 1
original id: 070fe4f1-79a5-4bfd-b9fa-36a8ba9c69b8
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297, <$fh> line 102035.
WARNING level1: This feature level1 is not a duplicate but has an ID already used.
/!\ AGAT might mix up the child features and create chimeric records.
Indeed we changed the ID for this L1 feature to be unique but we do not
change the Parent attribute of the child features to reflect this change.
Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.
@ the feature is:
NC_049979.1 RefSeq cDNA_match 19280 19452 . - . ID "nbis-cdna_match-5" ; Target "XM_035817707.1 1165 1337 +" ; for_remapping 2 ; gap_count 0 ; num_ident 5324 ; num_mismatch 0 ; pct_coverage "89.4189" ; pct_coverage_hiqual "89.4189" ; pct_identity_gap 100 ; pct_identity_ungap 100 ; rank 1
original id: 070fe4f1-79a5-4bfd-b9fa-36a8ba9c69b8
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297, <$fh> line 102036.
WARNING level1: This feature level1 is not a duplicate but has an ID already used.
/!\ AGAT might mix up the child features and create chimeric records.
Indeed we changed the ID for this L1 feature to be unique but we do not
change the Parent attribute of the child features to reflect this change.
Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.
@ the feature is:
NC_049979.1 RefSeq cDNA_match 18430 18549 . - . ID "nbis-cdna_match-6" ; Target "XM_035817707.1 1338 1457 +" ; for_remapping 2 ; gap_count 0 ; num_ident 5324 ; num_mismatch 0 ; pct_coverage "89.4189" ; pct_coverage_hiqual "89.4189" ; pct_identity_gap 100 ; pct_identity_ungap 100 ; rank 1
original id: 070fe4f1-79a5-4bfd-b9fa-36a8ba9c69b8
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297, <$fh> line 102037.
WARNING level1: This feature level1 is not a duplicate but has an ID already used.
/!\ AGAT might mix up the child features and create chimeric records.
Indeed we changed the ID for this L1 feature to be unique but we do not
change the Parent attribute of the child features to reflect this change.
Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.
@ the feature is:
NC_049979.1 RefSeq cDNA_match 18139 18216 . - . ID "nbis-cdna_match-7" ; Target "XM_035817707.1 1458 1535 +" ; for_remapping 2 ; gap_count 0 ; num_ident 5324 ; num_mismatch 0 ; pct_coverage "89.4189" ; pct_coverage_hiqual "89.4189" ; pct_identity_gap 100 ; pct_identity_ungap 100 ; rank 1
original id: 070fe4f1-79a5-4bfd-b9fa-36a8ba9c69b8
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297, <$fh> line 102038.
WARNING level1: This feature level1 is not a duplicate but has an ID already used.
/!\ AGAT might mix up the child features and create chimeric records.
Indeed we changed the ID for this L1 feature to be unique but we do not
change the Parent attribute of the child features to reflect this change.
Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.
@ the feature is:
NC_049979.1 RefSeq cDNA_match 17524 17607 . - . ID "nbis-cdna_match-8" ; Target "XM_035817707.1 1536 1619 +" ; for_remapping 2 ; gap_count 0 ; num_ident 5324 ; num_mismatch 0 ; pct_coverage "89.4189" ; pct_coverage_hiqual "89.4189" ; pct_identity_gap 100 ; pct_identity_ungap 100 ; rank 1
original id: 070fe4f1-79a5-4bfd-b9fa-36a8ba9c69b8
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297, <$fh> line 102039.
WARNING level1: This feature level1 is not a duplicate but has an ID already used.
/!\ AGAT might mix up the child features and create chimeric records.
Indeed we changed the ID for this L1 feature to be unique but we do not
change the Parent attribute of the child features to reflect this change.
Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.
@ the feature is:
NC_049979.1 RefSeq cDNA_match 16650 16679 . - . ID "nbis-cdna_match-9" ; Target "XM_035817707.1 1620 1649 +" ; for_remapping 2 ; gap_count 0 ; num_ident 5324 ; num_mismatch 0 ; pct_coverage "89.4189" ; pct_coverage_hiqual "89.4189" ; pct_identity_gap 100 ; pct_identity_ungap 100 ; rank 1
original id: 070fe4f1-79a5-4bfd-b9fa-36a8ba9c69b8
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297, <$fh> line 102040.
WARNING level1: This feature level1 is not a duplicate but has an ID already used.
/!\ AGAT might mix up the child features and create chimeric records.
Indeed we changed the ID for this L1 feature to be unique but we do not
change the Parent attribute of the child features to reflect this change.
Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.
@ the feature is:
NC_049979.1 RefSeq cDNA_match 14817 14999 . - . ID "nbis-cdna_match-10" ; Target "XM_035817707.1 1650 1832 +" ; for_remapping 2 ; gap_count 0 ; num_ident 5324 ; num_mismatch 0 ; pct_coverage "89.4189" ; pct_coverage_hiqual "89.4189" ; pct_identity_gap 100 ; pct_identity_ungap 100 ; rank 1
original id: 070fe4f1-79a5-4bfd-b9fa-36a8ba9c69b8
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297, <$fh> line 102041.
WARNING level1: This feature level1 is not a duplicate but has an ID already used.
/!\ AGAT might mix up the child features and create chimeric records.
Indeed we changed the ID for this L1 feature to be unique but we do not
change the Parent attribute of the child features to reflect this change.
Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.
************** Too much WARNING message we skip the next **************
print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297, <$fh> line 102041.