Output
Pydamage generates both a tabular and a visual output.
The tabular outputs are comma-separated file (.csv
) with the following columns, for each analysed reference:
pydamage_results.csv
reference
: name of the reference genome/contigpredicted_accuracy
: Predicted accuracy of Pydamage prediction, from the GLM modellingnull_model_p0
: parameterp0
of the null modelnull_model_p0_stdev
: standard error of the null model paramaterp0
damage_model_p
: parameterp
of the damage modeldamage_model_p_stdev
: standard error of the parameterp
of the damage modeldamage_model_pmin
: paramaterp_min
of the damage model. This is the modelled damage baselinedamage_model_pmin_stdev
: standard error of the paramaterp_min
of the damage modeldamage_model_pmax
: paramaterp_max
of the damage model. This is the modelled amount of damage on the 5’ end.damage_model_pmax_stdev
: standard error of the paramaterp_max
of the damage modelpvalue
: p-value calculated from the likelihood-ratio test-statistic using a chi-squared distributionqvalue
: p-value corrected for multiple testing using Benjamini-Hochberg procedure. Only computed when multiple references are usedRMSE
: residual mean standard error of the model fit of the damage modelnb_reads_aligned
: number of aligned readscoverage
: average coverage along the reference genomeCtoT-N
: Proportion of CtoT substitutions observed at positionN
from 5’ endGtoA-N
: Proportion of GtoA substitutions observed at positionN
from 5’
pydamage_filtered_results.csv
Same file as above, but with contigs filtered with qvalue <= 0.05
and predicted_accuracy >= threshold
with a user defined filtering threshold (default = 0.5), or determined with the kneedle method.
pydamage_rescaled.bam
The input alignment file with rescaled base quality scores when running pydamage analyze
with the -r
or --rescale
flag.
The rescaled base calling scores are computed for each read containing ancient DNA damage according to the following formula, with i
the position in the read, p_err
the original base calling error probability,p_pydam
the pydamage computed ancient damage probability, and p_new
the updated base calling error probability.
p_new(i) = 1 - (1 - p_err(i)) (1 - p_pydam(i))
Plots
The visual output are PNG files, one per reference contig. They show the frequency of observed C to T, and G to A transition at the 5’ end of the sequencing data and overlay it with the fitted models for both the null and the damage model, including 95% confidence intervals. Furthermore, it provides a “residuals versus fitted” plot to help evaluate the fit of the pydamage damage model. Finally, the plot contains informtion on the average coverage along the reference and the p-value calculated from the likelihood-ratio test-statistic using a chi-squared distribution.
The visual output is only produced when using the
--plot
flag
Example
Tabular ouput
Visual output