MethCompare Part 18

Formatting individual sample information

In this issue we’ve discussed how to revise our CpG methylation status and genomic location statistical analyses. We know we want to compare proportoins while investigating if sequencing method affects the proportion of different methylation statuses or in genomic locations. I posted some suggestions, but in the meantime I thought I could obtain individual-level proportion data.

Thankfully for me, most of the pipeline was already set up! In this Jupyter notebook I counted CpGs for each methylation status and in various genomic features. The only things I needed to modify were ensuring I used bedtools -u, adding code for upstream and downstream flank overlaps, and adding the path to the explicit intragenic region tracks. I took the output files (line counts) and used them in this R Markdown script and used them to create summary tables:

M. capitata:

P. acuta:

Going forward

  1. Conduct statistical analysis
  2. Locate TE tracks
  3. [Characterize intersections between data and TE, and create summary tables]
  4. Look into program for mCpG identification
Written on June 11, 2020