10 使用NCBI下载的基因组和gff文件进行基因家族分析时,保留编码蛋白基因的命令似乎无法识别NCBI的gff文件(描述中为日志,图片为尝试获取的gff文件)

attachments-2024-07-GnPce62866a9c98cb84a4.png以下是日志

[root@b3b8d8255335  13:06:20 ~]# cd $workdir

[root@b3b8d8255335  13:06:34 /work/desaturase]# cd 01.data_prepare

[root@b3b8d8255335  13:06:41 /work/desaturase/01.data_prepare]# ll

total 0

-rwxr-xr-x 1 root root 152M Jul 23 16:04 Branchiostoma_floridae.fa.gz

-rwxr-xr-x 1 root root  11K Jul 31 11:25 Branchiostoma_floridae.gff.agat.log

-rwxr-xr-x 1 root root  14M Jul 23 16:05 Branchiostoma_floridae.gff.gz

-rwxr-xr-x 1 root root 2.7K Jul 31 12:04 Danio_rerio.GRCz11.112.chr.gff3.agat.log

-rwxr-xr-x 1 root root  17M Jul 23 20:47 Danio_rerio.GRCz11.112.chr.gff3.gz

-rwxr-xr-x 1 root root 629M Jul 23 19:22 Danio_rerio.GRCz11.dna.toplevel.fa.gz

-rwxr-xr-x 1 root root 117K Jul 17 17:53 degs.hmm

-r-xr-xr-x 1 root root 615K Jul 23 16:07 FAD_gen_result.tar.gz

-rwxr-xr-x 1 root root 1.3G Jul 25 00:07 GCF_902713615.1_sScyCan1.1_genomic.fa.gz

-rwxr-xr-x 1 root root  16M Jul 24 23:32 GCF_902713615.1_sScyCan1.1_genomic.gff.gz

-rwxr-xr-x 1 root root  18K Apr 22 19:16 general.hmm

-rwxr-xr-x 1 root root  45M Jul 23 17:23 Homo_sapiens.GRCh38.112.chr.gff3.gz

-rwxr-xr-x 1 root root 894M Jul 23 18:54 Homo_sapiens.GRCh38.dna.toplevel.fa.gz

-rwxr-xr-x 1 root root 9.1M Jul 23 18:16 Latimeria_chalumnae.LatCha1.112.gff3.gz

-rwxr-xr-x 1 root root 658M Jul 23 18:15 Latimeria_chalumnae.LatCha1.dna.toplevel.fa.gz

-rwxr-xr-x 1 root root 8.8M Jul 23 17:20 Petromyzon_marinus.Pmarinus_7.0.112.gff3.gz

-rwxr-xr-x 1 root root 187M Jul 23 17:38 Petromyzon_marinus.Pmarinus_7.0.dna.toplevel.fa.gz

drwxrwxrwx 1 root root 4.0K Jul 23 15:54 result

[root@b3b8d8255335  13:06:42 /work/desaturase/01.data_prepare]# genome=Branchiostoma_floridae.fa.gz

[root@b3b8d8255335  13:07:12 /work/desaturase/01.data_prepare]# gff=Branchiostoma_floridae.gff.gz

[root@b3b8d8255335  13:07:24 /work/desaturase/01.data_prepare]# species=Branchiostoma_floridae

[root@b3b8d8255335  13:07:32 /work/desaturase/01.data_prepare]# agat_sp_filter_feature_by_attribute_value.pl --gff  $gff --attribute gene_biotype --value protein_coding -t '!' -o $species.protein_coding.gff3

Using standard /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/auto/share/dist/AGAT/config.yaml file

07/31/2024 at 13h07m54s

usage: /share/work/biosoft/perl/latest/bin/agat_sp_filter_feature_by_attribute_value.pl --gff Branchiostoma_floridae.gff.gz --attribute gene_biotype --value protein_coding -t ! -o Branchiostoma_floridae.protein_coding.gff3

We will discard all features that have the attribute gene_biotype with the value ne protein_coding.

Can not open Branchiostoma_floridae.gff.agat.log for printing: No such file or directoryprint() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/OmniscientI.pm line 151.

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/OmniscientI.pm line 152.

********************************************************************************

*                              - Start parsing -                               *

********************************************************************************

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

-------------------------- parse options and metadata --------------------------

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

=> Accessing the feature_levels YAML file

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

Using standard /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/auto/share/dist/AGAT/feature_levels.yaml file

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

=> Attribute used to group features when no Parent/ID relationship exists (i.e common tag):

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

        * locus_tag

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

        * gene_id

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

=> merge_loci option deactivated

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

=> Machine information:

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

        This script is being run by perl v5.22.1

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

        Bioperl location being used: /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/Bio/

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

        Operating system being used: linux

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

=> Accessing Ontology

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

        No ontology accessible from the gff file header!

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

        We use the SOFA ontology distributed with AGAT:

                /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/auto/share/dist/AGAT/so.obo

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

        Read ontology /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/auto/share/dist/AGAT/so.obo:

                4 root terms, and 2596 total terms, and 1516 leaf terms

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

        Filtering ontology:

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

                We found 1861 terms that are sequence_feature or is_a child of it.

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

--------------------------------- parsing file ---------------------------------

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

=> Number of line in file: 1095208

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

=> Number of comment lines: 871

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

=> Fasta included: No

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

=> Number of features lines: 1094337

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

=> Number of feature type (3rd column): 15

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

        * Level1: 5 => pseudogene region sequence_feature gene cDNA_match

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

        * level2: 8 => rRNA snoRNA lnc_RNA transcript tRNA snRNA guide_RNA mRNA

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

        * level3: 2 => CDS exon

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

        * unknown: 0 =>

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

 ************** Too much WARNING message we skip the next **************

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297.

=> Version of the Bioperl GFF parser selected by AGAT: 3

WARNING level1: This feature level1 is not a duplicate but has an ID already used.

/!\ AGAT might mix up the child features and create chimeric records.

Indeed we changed the ID for this L1 feature to be unique but we do not

change the Parent attribute of the child features to reflect this change.

Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.

 @ the feature is:

NC_049979.1     RefSeq  cDNA_match      22755   22861   .       -       .       ID "nbis-cdna_match-1"  ; Target "XM_035817707.1 705 811 +"  ; for_remapping 2 ; gap_count 0 ; num_ident 5324 ; num_mismatch 0 ; pct_coverage "89.4189"  ; pct_coverage_hiqual "89.4189"  ; pct_identity_gap 100 ; pct_identity_ungap 100 ; rank 1

original id: 070fe4f1-79a5-4bfd-b9fa-36a8ba9c69b8

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297, <$fh> line 102032.

WARNING level1: This feature level1 is not a duplicate but has an ID already used.

/!\ AGAT might mix up the child features and create chimeric records.

Indeed we changed the ID for this L1 feature to be unique but we do not

change the Parent attribute of the child features to reflect this change.

Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.

 @ the feature is:

NC_049979.1     RefSeq  cDNA_match      22190   22297   .       -       .       ID "nbis-cdna_match-2"  ; Target "XM_035817707.1 812 919 +"  ; for_remapping 2 ; gap_count 0 ; num_ident 5324 ; num_mismatch 0 ; pct_coverage "89.4189"  ; pct_coverage_hiqual "89.4189"  ; pct_identity_gap 100 ; pct_identity_ungap 100 ; rank 1

original id: 070fe4f1-79a5-4bfd-b9fa-36a8ba9c69b8

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297, <$fh> line 102033.

WARNING level1: This feature level1 is not a duplicate but has an ID already used.

/!\ AGAT might mix up the child features and create chimeric records.

Indeed we changed the ID for this L1 feature to be unique but we do not

change the Parent attribute of the child features to reflect this change.

Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.

 @ the feature is:

NC_049979.1     RefSeq  cDNA_match      21601   21722   .       -       .       ID "nbis-cdna_match-3"  ; Target "XM_035817707.1 920 1041 +"  ; for_remapping 2 ; gap_count 0 ; num_ident 5324 ; num_mismatch 0 ; pct_coverage "89.4189"  ; pct_coverage_hiqual "89.4189"  ; pct_identity_gap 100 ; pct_identity_ungap 100 ; rank 1

original id: 070fe4f1-79a5-4bfd-b9fa-36a8ba9c69b8

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297, <$fh> line 102034.

WARNING level1: This feature level1 is not a duplicate but has an ID already used.

/!\ AGAT might mix up the child features and create chimeric records.

Indeed we changed the ID for this L1 feature to be unique but we do not

change the Parent attribute of the child features to reflect this change.

Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.

 @ the feature is:

NC_049979.1     RefSeq  cDNA_match      19974   20096   .       -       .       ID "nbis-cdna_match-4"  ; Target "XM_035817707.1 1042 1164 +"  ; for_remapping 2 ; gap_count 0 ; num_ident 5324 ; num_mismatch 0 ; pct_coverage "89.4189"  ; pct_coverage_hiqual "89.4189"  ; pct_identity_gap 100 ; pct_identity_ungap 100 ; rank 1

original id: 070fe4f1-79a5-4bfd-b9fa-36a8ba9c69b8

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297, <$fh> line 102035.

WARNING level1: This feature level1 is not a duplicate but has an ID already used.

/!\ AGAT might mix up the child features and create chimeric records.

Indeed we changed the ID for this L1 feature to be unique but we do not

change the Parent attribute of the child features to reflect this change.

Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.

 @ the feature is:

NC_049979.1     RefSeq  cDNA_match      19280   19452   .       -       .       ID "nbis-cdna_match-5"  ; Target "XM_035817707.1 1165 1337 +"  ; for_remapping 2 ; gap_count 0 ; num_ident 5324 ; num_mismatch 0 ; pct_coverage "89.4189"  ; pct_coverage_hiqual "89.4189"  ; pct_identity_gap 100 ; pct_identity_ungap 100 ; rank 1

original id: 070fe4f1-79a5-4bfd-b9fa-36a8ba9c69b8

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297, <$fh> line 102036.

WARNING level1: This feature level1 is not a duplicate but has an ID already used.

/!\ AGAT might mix up the child features and create chimeric records.

Indeed we changed the ID for this L1 feature to be unique but we do not

change the Parent attribute of the child features to reflect this change.

Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.

 @ the feature is:

NC_049979.1     RefSeq  cDNA_match      18430   18549   .       -       .       ID "nbis-cdna_match-6"  ; Target "XM_035817707.1 1338 1457 +"  ; for_remapping 2 ; gap_count 0 ; num_ident 5324 ; num_mismatch 0 ; pct_coverage "89.4189"  ; pct_coverage_hiqual "89.4189"  ; pct_identity_gap 100 ; pct_identity_ungap 100 ; rank 1

original id: 070fe4f1-79a5-4bfd-b9fa-36a8ba9c69b8

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297, <$fh> line 102037.

WARNING level1: This feature level1 is not a duplicate but has an ID already used.

/!\ AGAT might mix up the child features and create chimeric records.

Indeed we changed the ID for this L1 feature to be unique but we do not

change the Parent attribute of the child features to reflect this change.

Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.

 @ the feature is:

NC_049979.1     RefSeq  cDNA_match      18139   18216   .       -       .       ID "nbis-cdna_match-7"  ; Target "XM_035817707.1 1458 1535 +"  ; for_remapping 2 ; gap_count 0 ; num_ident 5324 ; num_mismatch 0 ; pct_coverage "89.4189"  ; pct_coverage_hiqual "89.4189"  ; pct_identity_gap 100 ; pct_identity_ungap 100 ; rank 1

original id: 070fe4f1-79a5-4bfd-b9fa-36a8ba9c69b8

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297, <$fh> line 102038.

WARNING level1: This feature level1 is not a duplicate but has an ID already used.

/!\ AGAT might mix up the child features and create chimeric records.

Indeed we changed the ID for this L1 feature to be unique but we do not

change the Parent attribute of the child features to reflect this change.

Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.

 @ the feature is:

NC_049979.1     RefSeq  cDNA_match      17524   17607   .       -       .       ID "nbis-cdna_match-8"  ; Target "XM_035817707.1 1536 1619 +"  ; for_remapping 2 ; gap_count 0 ; num_ident 5324 ; num_mismatch 0 ; pct_coverage "89.4189"  ; pct_coverage_hiqual "89.4189"  ; pct_identity_gap 100 ; pct_identity_ungap 100 ; rank 1

original id: 070fe4f1-79a5-4bfd-b9fa-36a8ba9c69b8

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297, <$fh> line 102039.

WARNING level1: This feature level1 is not a duplicate but has an ID already used.

/!\ AGAT might mix up the child features and create chimeric records.

Indeed we changed the ID for this L1 feature to be unique but we do not

change the Parent attribute of the child features to reflect this change.

Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.

 @ the feature is:

NC_049979.1     RefSeq  cDNA_match      16650   16679   .       -       .       ID "nbis-cdna_match-9"  ; Target "XM_035817707.1 1620 1649 +"  ; for_remapping 2 ; gap_count 0 ; num_ident 5324 ; num_mismatch 0 ; pct_coverage "89.4189"  ; pct_coverage_hiqual "89.4189"  ; pct_identity_gap 100 ; pct_identity_ungap 100 ; rank 1

original id: 070fe4f1-79a5-4bfd-b9fa-36a8ba9c69b8

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297, <$fh> line 102040.

WARNING level1: This feature level1 is not a duplicate but has an ID already used.

/!\ AGAT might mix up the child features and create chimeric records.

Indeed we changed the ID for this L1 feature to be unique but we do not

change the Parent attribute of the child features to reflect this change.

Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.

 @ the feature is:

NC_049979.1     RefSeq  cDNA_match      14817   14999   .       -       .       ID "nbis-cdna_match-10"  ; Target "XM_035817707.1 1650 1832 +"  ; for_remapping 2 ; gap_count 0 ; num_ident 5324 ; num_mismatch 0 ; pct_coverage "89.4189"  ; pct_coverage_hiqual "89.4189"  ; pct_identity_gap 100 ; pct_identity_ungap 100 ; rank 1

original id: 070fe4f1-79a5-4bfd-b9fa-36a8ba9c69b8

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297, <$fh> line 102041.

WARNING level1: This feature level1 is not a duplicate but has an ID already used.

/!\ AGAT might mix up the child features and create chimeric records.

Indeed we changed the ID for this L1 feature to be unique but we do not

change the Parent attribute of the child features to reflect this change.

Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.

  ************** Too much WARNING message we skip the next **************

print() on closed filehandle $log at /share/work/biosoft/perl/perl-5.22.1/lib/site_perl/5.22.1/AGAT/Utilities.pm line 297, <$fh> line 102041.

请先 登录 后评论

2 个回答

omicsgene - 生物信息
擅长:重测序,遗传进化,转录组,GWAS

对的 NCBI上的GFF文件格式不标准导致代码报错,

你可以换个地方下载基因组文件试试;

请先 登录 后评论
dhzmars

老师,请问有办法能把NCBI上的GFF文件格式转换成标准格式吗?

请先 登录 后评论
  • 2 关注
  • 0 收藏,424 浏览
  • dhzmars 提出于 2024-07-31 13:20

相似问题