你好老师,我想请问一下,有没有代码可以计算群体的多态性信息含量PIC值,观察等位基因数A,有效等位基因数Ae啊,ddRAD测序做的分析有限,想增加点分析

比如文献里计算的这些

attachments-2024-12-6f8tvfGg6750481376626.png

请先 登录 后评论

1 个回答

omicsgene - 生物信息
擅长:重测序,遗传进化,转录组,GWAS

stacks 这个 软件可以计算你看看的:https://catchenlab.life.illinois.edu/stacks/manual/



populations.sumstats.tsv: Summary statistics for each population

The populations program will calculate a standard set of population genetic statistics for population in the set of data it processes. These values are calculated at every variant site in the metapopulation (that means a site may be fixed in one or more populations, but is variant in at least one population, or across populations). Each variant site will be listed in the file on one line, for each population. If there are three populations in the analysis, each variant site will be listed on three lines, once per population.

  • If the analysis is de novo, then the chromosome will be listed as "un" which is short for "unknown" and the basepair will arbitrarily ordered.
  • If smoothing is enabled, with the --smooth, or more specific, --smooth-popstats option, then the smoothed columns will have values, otherwise they will be blank. The chromosome and basepair fields will also be populated.
  • If bootstrapping is enabled, smoothed windows will be resampled to generate a p-value indicating significance for each of the smoothed statistics, otherwise these columns will be blank.
1Locus IDCatalog locus identifier.
2ChromosomeIf aligned to a reference genome.
3BasepairIf aligned to a reference genome. This is the basepair for this particular SNP.
4ColumnThe nucleotide site within the catalog locus, reported using a zero-based offset (first nucleotide is enumerated as 0).
5Population IDThe ID supplied to the populations program, as written in the population map file.
6P NucleotideThe most frequent allele at this position in this population.
7Q NucleotideThe alternative allele.
8Number of IndividualsNumber of individuals sampled in this population at this site.
9PFrequency of most frequent allele.
10Observed HeterozygosityThe proportion of individuals that are heterozygotes in this population.
11Observed HomozygosityThe proportion of individuals that are homozygotes in this population.
12Expected HeterozygosityHeterozygosity expected under Hardy-Weinberg equilibrium.
13Expected HomozygosityHomozygosity expected under Hardy-Weinberg equilibrium.
14πAn estimate of nucleotide diversity.
15Smoothed πA weighted average of π depending on the surrounding 3σ of sequence in both directions. A value of -1 indicates that a particular locus was not included in the smoothing operation (likely because it was overlapped by a separate RAD locus that was included).
16Smoothed π P-valueIf bootstrap resampling is enabled, a p-value ranking the significance of π within this population.
17FISThe inbreeding coefficient of an individual (I) relative to the subpopulation (S). Derived from Hartl & Clark, Principles of Population Genetics, fourth edition, equation 6.4, page 264.
18Smoothed FISA weighted average of FIS depending on the surrounding 3σ of sequence in both directions.
19Smoothed FIS P-valueIf bootstrap resampling is enabled, a p-value ranking the significance of FIS within this population.
20HWE P-valueThe probability that this variant site deviates from Hardy-Weinberg equilibrium. Calculated via an exact test.
21Private alleleTrue (1) or false (0), depending on if this allele is only occurs in this population.
populations.sumstats_summary.tsv: Summary of summary statistics for each population

The populations program will summarize the standard set of population genetic statistics across the dataset. These values can be replicated by summing the columns in the populations.sumstats.tsv file. For example, the mean value of Π is obtained by summing column 14 in the populations.sumstats.tsv file for one of the populations and dividing by the number of rows.

1Pop IDPopulation ID as defined in the Population Map file.
2PrivateNumber of private alleles in this population.
3Number of IndividualsMean number of individuals per locus in this population.
4Variance
5Standard Error
6PMean frequency of the most frequent allele at each locus in this population.
7Variance
8Standard Error
9Observed HeterozygosityMean obsered heterozygosity in this population.
10Variance
11Standard Error
12Observed HomozygosityMean observed homozygosity in this population.
13Variance
14Standard Error
15Expected HeterozygosityMean expected heterozygosity in this population.
16Variance
17Standard Error
18Expected HomozygosityMean expected homozygosity in this population.
19Variance
20Standard Error
21ΠMean value of π in this population.
22Π Variance
23Π Standard Error
24FISMean measure of FIS in this population.
25FIS Variance
26FIS Standard Error
Notes: There are two tables in this file containing the same headings. The first table, labeled "Variant" calculated these values at only the variable sites in each population. The second table, labeled "All positions" calculted these values at all positions, both variable and fixed, in each population.
populations.fst_Y-Z.tsv: FST calculations for each pair of populations

If --fstats is specified to the populations program, then FST statistics will be calculated for each pair of populations, as defined in the population map.

  • FST will be calculated for each variable site between the pair of populations (different pairs of populations may have different numbers of variable sites).
  • A p-value indicating a statistically significant difference in allele frequencies (that is a p-value for the FST measure), is provided by Fisher’s Exact Test in the "FET p-value" column.
  • If a reference genome is available and --smooth is specified, these values will be smoothed across chromosomes and those smoothed values will be stored in the "Smoothed AMOVA ST" column below.
  • If bootstrapping is enabled, then it will be used to generate p-values for each smoothing window and stored in the "Smoothed AMOVA FST P-value" column.
1Locus IDCatalog locus identifier.
2Population ID 1The ID supplied to the populations program, as written in the population map file.
3Population ID 2The ID supplied to the populations program, as written in the population map file.
4ChromosomeIf aligned to a reference genome.
5BasepairIf aligned to a reference genome.
6ColumnThe nucleotide site within the catalog locus, reported using a zero-based offset (first nucleotide is enumerated as 0).
7Overall πAn estimate of nucleotide diversity across the two populations.
8AMOVA FSTAnalysis of Molecular Variance FST calculation. Derived from Weir, Genetic Data Analysis II, chapter 5, "F Statistics," pp166-167.
9FET p-valueP-value describing if the FST measure is statistically significant according to Fisher’s Exact Test.
10Odds RatioFisher’s Exact Test odds ratio.
11CI HighFisher’s Exact Test confidence interval.
12CI LowFisher’s Exact Test confidence interval.
13LOD ScoreLogarithm of odds score.
14Corrected AMOVA FSTAMOVA FST with either the FET p-value, or a window-size or genome size Bonferroni correction.
15Smoothed AMOVA FSTA weighted average of AMOVA FST depending on the surrounding 3σ of sequence in both directions.
16Smoothed AMOVA FST P-valueIf bootstrap resampling is enabled, a p-value ranking the significance of FST within this pair of populations.
17Window SNP CountNumber of SNPs found in the sliding window centered on this nucleotide position.
请先 登录 后评论
  • 1 关注
  • 0 收藏,335 浏览
  • 啊呀。 提出于 2024-12-04 20:17

相似问题