| Title: | Simulate DNA Methylation Dynamics on Different Genomic Structures along Genealogies |
|---|---|
| Description: | DNA methylation is an epigenetic modification involved in genomic stability, gene regulation, development and disease. DNA methylation occurs mainly through the addition of a methyl group to cytosines, for example to cytosines in a CpG dinucleotide context (CpG stands for a cytosine followed by a guanine). Tissue-specific methylation patterns lead to genomic regions with different characteristic methylation levels. E.g. in vertebrates CpG islands (regions with high CpG content) that are associated to promoter regions of expressed genes tend to be unmethylated. 'MethEvolSIM' is a model-based simulation software for the generation and modification of cytosine methylation patterns along a given tree, which can be a genealogy of cells within an organism, a coalescent tree of DNA sequences sampled from a population, or a species tree. The simulations are based on an extension of the model of Grosser & Metzler (2020) <doi:10.1186/s12859-020-3438-5> and allows for changes of the methylation states at single cytosine positions as well as simultaneous changes of methylation frequencies in genomic structures like CpG islands. |
| Authors: | Sara Castillo Vicente [aut, cre], Dirk Metzler [aut, ths] |
| Maintainer: | Sara Castillo Vicente <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.3.0 |
| Built: | 2026-05-29 18:26:55 UTC |
| Source: | https://github.com/cran/MethEvolSIM |
This function categorizes CpG islands into unmethylated, methylated, or partially methylated states based on specified thresholds.
categorize_islandGlbSt(meanMeth_islands, u_threshold, m_threshold)categorize_islandGlbSt(meanMeth_islands, u_threshold, m_threshold)
meanMeth_islands |
A numeric vector containing the mean methylation levels for CpG islands at each tip. |
u_threshold |
A numeric value (0-1) defining the threshold for categorization as unmethylated. |
m_threshold |
A numeric value (0-1) defining the threshold for categorization as methylated. |
The function assigns each island a state:
if mean methylation lower or equal than u_threshold
if mean methylation greater or equal than m_threshold
if mean methylation is in between
A character vector of length equal to meanMeth_islands, containing "u", "p", or "m" for each island.
meanMeth_islands <- c(0.1, 0.4, 0.8) categorize_islandGlbSt(meanMeth_islands, 0.2, 0.8)meanMeth_islands <- c(0.1, 0.4, 0.8) categorize_islandGlbSt(meanMeth_islands, 0.2, 0.8)
This function categorizes the values in data[[tip]][[structure]] into three categories:
0 for unmethylated sites, where values are smaller or equal to u_threshold.
0.5 for partially methylated sites, where values are between u_threshold and m_threshold.
1 for methylated sites, where values are larger or equal to m_threshold.
categorize_siteMethSt(data, u_threshold = 0.2, m_threshold = 0.8)categorize_siteMethSt(data, u_threshold = 0.2, m_threshold = 0.8)
data |
A list structured as |
u_threshold |
A numeric value representing the upper bound for values to be classified as unmethylated ( |
m_threshold |
A numeric value representing the lower bound for values to be classified as methylated ( |
If any value in data[[tip]][[structure]] is outside these categories, it is transformed based on the given thresholds.
A transformed version of data where each value is categorized as 0 (unmethylated), 0.5 (partially methylated),
or 1 (methylated).
data <- list( list(c(0.1, 0.2, 0.02), c(0.05, 0.25, 0.15)), # tip 1 list(c(0.01, 0.7, 0.85), c(0.3, 0.1, 0.98)) # tip 2 ) transformed_data <- categorize_siteMethSt(data, u_threshold = 0.2, m_threshold = 0.8)data <- list( list(c(0.1, 0.2, 0.02), c(0.05, 0.25, 0.15)), # tip 1 list(c(0.01, 0.7, 0.85), c(0.3, 0.1, 0.98)) # tip 2 ) transformed_data <- categorize_siteMethSt(data, u_threshold = 0.2, m_threshold = 0.8)
an R6 class representing the steps for sampling a sequence of methylation states from the equilibrium (SSEi and SSEc considered, IWE neglected) using the CFTP algorithm for a given combiStructureGenerator instance.
It is stored in the private attribute CFTP_info of combiStructureGenerator instances when calling the combiStructureGenerator$cftp() method and can be retrieved with the combiStructureGenerator$get_CFTP_info()
singleStr_numberPublic attribute: Number of singleStr instances
singleStr_siteNumberPublic attribute: Number of sites in singleStr instances
CFTP_highest_ratePublic attribute: CFTP highest rate
number_stepsPublic attribute: counter of steps alredy generated
CFTP_chosen_singleStrPublic attribute: list with vectors of equal size with chosen singleStr index at each CFTP step
CFTP_chosen_sitePublic attribute: list with vectors of equal size with chosen site index at each CFTP step
CFTP_eventPublic attribute: list with vectors of equal size with type of CFTP event at each CFTP step.
CFTP_randomPublic attribute: list with vectors of equal size with CFTP threshold at each CFTP step
steps_perVectorPublic attribute: size of vectors in lists CFTP_chosen_singleStr, CFTP_chosen_site, CFTP_event and CFTP_random
cftpStepGenerator$new()Create a new instance of class cftpStepGenerator with the info of the corresponding combiStrucutre instance
cftpStepGenerator$new( singleStr_number, singleStr_siteNumber, CFTP_highest_rate )
singleStr_numberNumber of singleStr instances
singleStr_siteNumberNumber of sites in singleStr instances
CFTP_highest_rateCFTP highest rate across all singleStr withing combiStr instance
A new instance of class cftpStepGenerator
cftpStepGenerator$generate_events()1: SSEi to unmethylated, 2: SSEi to partially methylated, 3: SSEi to methylated 4: SSEc copy left state, 5: SSEc copy right state
Public Method. Generates the events to apply for CFTP.
cftpStepGenerator$generate_events(steps = 10000, testing = FALSE)
stepsInteger value >=1
testingdefault FALSE. TRUE for testing output
The function add steps to the existing ones. If called several times the given steps need to be higher than the sum of steps generated before.
NULL when testing FALSE. Testing output when testing TRUE.
cftpStepGenerator$clone()The objects of this class are cloneable with this method.
cftpStepGenerator$clone(deep = FALSE)
deepWhether to make a deep clone.
an R6 class representing several genomic structures. Each genomic structure contained is an object of class singleStructureGenerator. Note that default clone(deep=TRUE) fails to clone singleStructureGenerator objects contained, use method $copy() instead.
testing_outputPublic attribute: Testing output for initialize
combiStructureGenerator$new()Create a new combiStructureGenerator object.
Note that this object can be generated within a treeMultiRegionSimulator object.
combiStructureGenerator$new(infoStr, params = NULL, testing = FALSE)
infoStrA data frame containing columns 'n' for the number of sites, and 'globalState' for the favoured global methylation state. If initial equilibrium frequencies are given the dataframe must contain 3 additional columns: 'u_eqFreq', 'p_eqFreq' and 'm_eqFreq'
paramsDefault NULL. When given: data frame containing model parameters.
testingDefault FALSE. TRUE for writing in public field of new instance $testing_output
A new combiStructureGenerator object.
combiStructureGenerator$get_singleStr()Public method: Get one singleStructureGenerator object in $singleStr
combiStructureGenerator$get_singleStr(i)
iindex of the singleStructureGenerator object in $singleStr
the singleStructureGenerator object in $singleStr with index i
combiStructureGenerator$get_singleStr_number()Public method: Get number of singleStructureGenerator objects in $singleStr
combiStructureGenerator$get_singleStr_number()
number of singleStructureGenerator object contained in $singleStr
combiStructureGenerator$get_singleStr_siteNumber()Public method: Get number of sites in all singleStructureGenerator objects
combiStructureGenerator$get_singleStr_siteNumber()
number of sites in all singleStructureGenerator objects
combiStructureGenerator$get_island_number()Public method: Get number of singleStructureGenerator objects in $singleStr with $globalState "U" (CpG islands)
combiStructureGenerator$get_island_number()
number of singleStructureGenerator in $singleStr objects with $globalState "U" (CpG islands)
combiStructureGenerator$get_island_index()Public method: Get index of singleStructureGenerator objects in $singleStr with $globalState "U" (CpG islands)
combiStructureGenerator$get_island_index()
index of singleStructureGenerator objects in $singleStr with $globalState "U" (CpG islands)
combiStructureGenerator$set_IWE_events()Public method: Set information of the IWE events sampled in a tree branch
combiStructureGenerator$set_IWE_events(a)
avalue to which IWE_events should be set
NULL
combiStructureGenerator$get_IWE_events()Public method: Get information of the IWE events sampled in a tree branch
combiStructureGenerator$get_IWE_events()
information of the IWE events sampled in a tree branch
combiStructureGenerator$set_name()Public method: Set the name of the leaf if evolutionary process (simulated from class treeMultiRegionSimulator) ends in a tree leaf.
combiStructureGenerator$set_name(a)
avalue to which name should be set
NULL
combiStructureGenerator$get_name()Public method: Get the name of the leaf if evolutionary process (simulated from class treeMultiRegionSimulator) ended in a tree leaf.
combiStructureGenerator$get_name()
Name of the leaf if evolutionary process (simulated from class treeMultiRegionSimulator) ended in a tree leaf. For iner tree nodes return NULL
combiStructureGenerator$get_own_index()Public method: Set own branch index in the tree along which the evolutionary process is simulated (from class treeMultiRegionSimulator).
combiStructureGenerator$get_own_index()
NULL
combiStructureGenerator$set_own_index()Public method: Get own branch index in the tree along which the evolutionary process is simulated (from class treeMultiRegionSimulator).
combiStructureGenerator$set_own_index(i)
iindex of focal object
Own branch index in the tree along which the evolutionary process is simulated (from class treeMultiRegionSimulator).
combiStructureGenerator$get_parent_index()Public method: Get parent branch index in the tree along which the evolutionary process is simulated (from class treeMultiRegionSimulator).
combiStructureGenerator$get_parent_index()
Parent branch index in the tree along which the evolutionary process is simulated (from class treeMultiRegionSimulator).
combiStructureGenerator$set_parent_index()Public method: Set parent branch index in the tree along which the evolutionary process is simulated (from class treeMultiRegionSimulator).
combiStructureGenerator$set_parent_index(i)
iset parent_index to this value
NULL
combiStructureGenerator$get_offspring_index()Public method: Get offspring branch index in the tree along which the evolutionary process is simulated (from class treeMultiRegionSimulator).
combiStructureGenerator$get_offspring_index()
Offspring branch index in the tree along which the evolutionary process is simulated (from class treeMultiRegionSimulator).
combiStructureGenerator$set_offspring_index()Public method: Set offspring branch index in the tree along which the evolutionary process is simulated (from class treeMultiRegionSimulator).
combiStructureGenerator$set_offspring_index(i)
iset offspring_index to this value
NULL
combiStructureGenerator$add_offspring_index()Public method: Add offspring branch index in the tree along which the evolutionary process is simulated (from class treeMultiRegionSimulator).
combiStructureGenerator$add_offspring_index(i)
iindex to be added
NULL
combiStructureGenerator$get_mu()Public method.
combiStructureGenerator$get_mu()
Model parameter for the rate of the IWE evolutionary process (per island and branch length).
combiStructureGenerator$set_params()Public method: Update model parameters across all contained singleStructureGenerator objects and recompute all dependent quantities (IWE rate, Ri_values, Rc_values, Qi, Qc, Q, ratetree).
The current methylation state sequences and equilibrium frequencies are preserved;
only stored parameter values and their derived quantities are updated.
This allows changing parameters between simulation segments — for example,
setting mu = 0 to suppress IWE events on a terminal branch while keeping
all other parameters and the current sequence state intact.
combiStructureGenerator$set_params(params)
paramsData frame with parameter values, structured as the output of
get_parameterValues().
Equilibrium frequencies (eqFreqs) are intentionally not resampled when
parameters are updated. Each contained singleStructureGenerator holds a
realized eqFreqs that was sampled at initialization (and can be updated by
IWE events during evolution). Resampling here would discard those realized states
and replace them with new random draws, which is not the intended behaviour when
changing parameters mid-simulation (e.g. to continue evolving tip sequences from
a completed tree simulation under a different parameter regime). The updated
alpha/beta parameters are stored and will take effect in any future IWE events
that resample eqFreqs within each contained object.
NULL
combiStructureGenerator$get_id()Public method. Get the unique ID of the instance
combiStructureGenerator$get_id()
A numeric value representing the unique ID of the instance.
combiStructureGenerator$set_id()Public method. Set the unique ID of the instance
combiStructureGenerator$set_id(id)
idinteger value to identificate the combiStructure instance
A numeric value representing the unique ID of the instance.
combiStructureGenerator$get_sharedCounter()Public method. Get the counter value from the shared environment between instances of combiStructureGenerator class
combiStructureGenerator$get_sharedCounter()
Numeric counter value.
combiStructureGenerator$reset_sharedCounter()Public method. Reset the counter value of the shared environment between instances of combiStructureGenerator class
combiStructureGenerator$reset_sharedCounter()
NULL
combiStructureGenerator$set_singleStr()Public method: Clone each singleStructureGenerator object in $singleStr
combiStructureGenerator$set_singleStr(singStrList)
singStrListobject to be cloned
NULL
combiStructureGenerator$copy()Public method: Clone combiStructureGenerator object and all singleStructureGenerator objects in it
combiStructureGenerator$copy()
cloned combiStructureGenerator object
combiStructureGenerator$branch_evol()Simulate CpG dinucleotide methylation state evolution along a tree branch. The function samples the IWE events on the tree branch and simulates the evolution through the SSE and IWE processes.
combiStructureGenerator$branch_evol(branch_length, dt, testing = FALSE)
branch_lengthLength of the branch.
dtLength of SSE time steps.
testingDefault FALSE. TRUE for testing purposes.
It handles both cases where IWE events are sampled or not sampled within the branch.
Default NULL. If testing = TRUE it returns information for testing purposes.
combiStructureGenerator$get_highest_rate()Public Method. Gets the highest rate among all singleStructureGenerator objects for CFTP.
combiStructureGenerator$get_highest_rate()
Highest rate value.
combiStructureGenerator$set_CFTP_info()Public Method. Sets a cftpStepGenerator instance asthe CFTP info.
combiStructureGenerator$set_CFTP_info(CFTP_instance)
CFTP_instanceCFTP info.
NULL
combiStructureGenerator$get_CFTP_info()Public Method. Gets the CFTP info.
combiStructureGenerator$get_CFTP_info()
CFTP info.
combiStructureGenerator$cftp_apply_events()Public Method. Applies the CFTP events.
combiStructureGenerator$cftp_apply_events(testing = FALSE)
testingdefault FALSE. TRUE for testing output
NULL when testing FALSE. Testing output when testing TRUE.
combiStructureGenerator$cftp()Public Method. Applies the CFTP algorithm to evolve a structure and checks for convergence by comparing methylation states.
This method generates CFTP steps until the methylation sequences of the current structure and a cloned structure become identical across all singleStr instances or a step limit is reached. If the step limit is exceeded, an approximation method is applied to finalize the sequence.
combiStructureGenerator$cftp( steps = 10000, step_limit = 327680000, testing = FALSE )
stepsminimum number of steps to apply (default 10000).
step_limitmaximum number of steps before applying an approximation method (default 327680000 corresponding to size of CFTP info of approx 6.1 GB). If this limit is reached, the algorithm stops and an approximation is applied.
testinglogical. If TRUE, returns additional testing output including the current structure, the cloned structure, the counter value, total steps, and the chosen site for the CFTP events. Default is FALSE.
NULL when testing is FALSE. If testing is TRUE, returns a list with:
self: the current object after applying the CFTP algorithm.
combi_m: a deep cloned object with applied CFTP events.
counter: the number of iterations performed.
total_steps: the number of steps applied by the CFTP algorithm.
CFTP_chosen_site: the site selected during the CFTP event application.
combiStructureGenerator$clone()The objects of this class are cloneable with this method.
combiStructureGenerator$clone(deep = FALSE)
deepWhether to make a deep clone.
Performs a chi-squared test to compare the distribution of methylation states
(unmethylated 0, partially-methylated 0.5, and methylated 1)
between two cherry tips. A cherry is a pair of leaf nodes (also called tips or terminal nodes)
in a phylogenetic tree that share a direct common ancestor.
compare_CherryFreqs(tip1, tip2, testing = FALSE)compare_CherryFreqs(tip1, tip2, testing = FALSE)
tip1 |
A numeric vector representing methylation states ( |
tip2 |
A numeric vector representing methylation states ( |
testing |
Logical; if |
The function uses simulate.p.value = TRUE in chisq.test
to compute the p-value via Monte Carlo simulation to improve reliability
regardless of whether the expected frequencies meet the assumptions of the chi-squared test
(i.e., expected counts of at least 5 in each category).
If testing = TRUE, returns a list with the contingency table and chi-squared test results.
Otherwise, returns the p-value of the test.
tip1 <- c(0, 0, 1, 0.5, 1, 0.5) tip2 <- c(0, 1, 1, 0, 0.5, 0.5) compare_CherryFreqs(tip1, tip2)tip1 <- c(0, 0, 1, 0.5, 1, 0.5) tip2 <- c(0, 1, 1, 0, 0.5, 0.5) compare_CherryFreqs(tip1, tip2)
This function applies Fitch parsimony to determine the minimum number of changes required for methylation categories at tree tips.
compute_fitch(tree, meth, input_control = TRUE)compute_fitch(tree, meth, input_control = TRUE)
tree |
A rooted binary tree in Newick format (character string) or as an |
meth |
A matrix of methylation categories at the tree tips, with rows corresponding to tips (names matching tree tip labels) and columns corresponding to sites or structures. |
input_control |
Logical; if |
A list containing:
optStateSetA list of sets of optimal states for the root at each site/structure.
minChange_numberA numeric vector indicating the minimum number of changes for each site.
tree <- "((a:1,b:1):2,(c:2,d:2):1.5);" meth <- matrix(c("u", "m", "p", "u", "p", "m", "m", "u"), nrow=4, byrow=TRUE, dimnames=list(c("a", "b", "c", "d"))) compute_fitch(tree, meth)tree <- "((a:1,b:1):2,(c:2,d:2):1.5);" meth <- matrix(c("u", "m", "p", "u", "p", "m", "m", "u"), nrow=4, byrow=TRUE, dimnames=list(c("a", "b", "c", "d"))) compute_fitch(tree, meth)
This function calculates the mean correlation of methylation states within island structures, allowing to exclude the shores.
compute_meanCor_i( index_islands, minN_CpG, shore_length, data, sample_n, categorized_data = FALSE )compute_meanCor_i( index_islands, minN_CpG, shore_length, data, sample_n, categorized_data = FALSE )
index_islands |
A vector containing the structural indices for islands. |
minN_CpG |
The minimum number of central CpGs required for computation. |
shore_length |
The number of CpGs at each side of an island to exclude (shores). |
data |
A list containing methylation states at tree tips for each genomic structure (island / non-island)
For a single tip: |
sample_n |
The number of tips (samples) to process. |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
The function processes only islands with a minimum length equal to 2 * shore_length + minN_CpG.
If none has minimum length, returns NA
A numeric value representing the mean correlation of methylation states in the central CpGs of islands.
# Example usage: index_islands <- c(1, 2) data <- list( list(c(0, 1, 0.5, 1, 0.5, 0), c(0.5, 0.5, 1, 1, 0, 0)), # tip 1 list(c(1, 0, 1, 1, 0.5, 0), c(1, 1, 0.5, 0.5, 0, 1)) # tip 2 ) minN_CpG <- 2 shore_length <- 1 sample_n <- 2 compute_meanCor_i(index_islands, minN_CpG, shore_length, data, sample_n, categorized_data = TRUE)# Example usage: index_islands <- c(1, 2) data <- list( list(c(0, 1, 0.5, 1, 0.5, 0), c(0.5, 0.5, 1, 1, 0, 0)), # tip 1 list(c(1, 0, 1, 1, 0.5, 0), c(1, 1, 0.5, 0.5, 0, 1)) # tip 2 ) minN_CpG <- 2 shore_length <- 1 sample_n <- 2 compute_meanCor_i(index_islands, minN_CpG, shore_length, data, sample_n, categorized_data = TRUE)
This function calculates the mean correlation of methylation states within non-island structures, allowing to exclude the shores.
compute_meanCor_ni( index_nonislands, minN_CpG, shore_length, data, sample_n, categorized_data = FALSE )compute_meanCor_ni( index_nonislands, minN_CpG, shore_length, data, sample_n, categorized_data = FALSE )
index_nonislands |
A vector containing the structural indices for non-islands. |
minN_CpG |
The minimum number of central CpGs required for computation. |
shore_length |
The number of CpGs at each side of an non-island to exclude (shores). |
data |
A list containing methylation states at tree tips for each genomic structure (island / non-island)
For a single tip: |
sample_n |
The number of tips (samples) to process. |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
The function processes only non-islands with a minimum length equal to 2 * shore_length + minN_CpG.
If none has minimum length, returns NA
A numeric value representing the mean correlation of methylation states in the central CpGs of non-islands.
# Example usage: index_nonislands <- c(1, 2) data <- list( list(c(0, 1, 0.5, 1, 0.5, 0), c(0.5, 0.5, 1, 1, 0, 0)), # tip 1 list(c(1, 0, 1, 1, 0.5, 0), c(1, 1, 0.5, 0.5, 0, 1)) # tip 2 ) minN_CpG <- 2 shore_length <- 1 sample_n <- 2 compute_meanCor_ni(index_nonislands, minN_CpG, shore_length, data, sample_n, categorized_data = TRUE)# Example usage: index_nonislands <- c(1, 2) data <- list( list(c(0, 1, 0.5, 1, 0.5, 0), c(0.5, 0.5, 1, 1, 0, 0)), # tip 1 list(c(1, 0, 1, 1, 0.5, 0), c(1, 1, 0.5, 0.5, 0, 1)) # tip 2 ) minN_CpG <- 2 shore_length <- 1 sample_n <- 2 compute_meanCor_ni(index_nonislands, minN_CpG, shore_length, data, sample_n, categorized_data = TRUE)
This function categorizes CpG islands into methylation states and applies Fitch parsimony to estimate the minimum number of state changes in a phylogenetic tree.
computeFitch_islandGlbSt( index_islands, data, tree, u_threshold, m_threshold, testing = FALSE )computeFitch_islandGlbSt( index_islands, data, tree, u_threshold, m_threshold, testing = FALSE )
index_islands |
A numeric vector specifying the indices of genomic structures corresponding to islands. |
data |
A list containing methylation states at tree tips, structured as |
tree |
A rooted binary tree in Newick format (character string) or as an |
u_threshold |
A numeric threshold value (0-1) defining the unmethylated category. |
m_threshold |
A numeric threshold value (0-1) defining the methylated category. |
testing |
Logical; if |
The function first validates the input data and categorizes CpG islands using categorize_islandGlbSt.
It then structures the data into a matrix matching tree tip labels and applies compute_fitch
to infer the minimum number of changes.
If testing = TRUE, returns a list containing the categorized data matrix; otherwise,
returns a numeric vector of minimum state changes.
tree <- "((a:1,b:1):2,(c:2,d:2):1.5);" data <- list( list(rep(1,10), rep(0,5), rep(1,8)), list(rep(1,10), rep(0.5,5), rep(0,8)) ) index_islands <- c(1,3) computeFitch_islandGlbSt(index_islands, data, tree, 0.2, 0.6)tree <- "((a:1,b:1):2,(c:2,d:2):1.5);" data <- list( list(rep(1,10), rep(0,5), rep(1,8)), list(rep(1,10), rep(0.5,5), rep(0,8)) ) index_islands <- c(1,3) computeFitch_islandGlbSt(index_islands, data, tree, 0.2, 0.6)
This internal function counts the number of sites with unmethylated, partially-methylated, and methylated states in a given vector.
count_upm(data)count_upm(data)
data |
A numeric vector with methylation values: |
An integer vector of length 3 containing counts of unmethylated, partially-methylated, and methylated sites, respectively.
This function calculates the number of methylation differences between pairs of cherry tips in a phylogenetic tree. A cherry is a pair of leaf nodes that share a direct common ancestor. The function quantifies full and half methylation differences for each genomic structure (e.g., island/non-island) across all sites.
countSites_cherryMethDiff( cherryDist, data, categorized_data = FALSE, input_control = TRUE )countSites_cherryMethDiff( cherryDist, data, categorized_data = FALSE, input_control = TRUE )
cherryDist |
A data frame containing pairwise distances between the tips of a phylogenetic tree that form cherries.
This should be as the output of
|
data |
A list containing methylation states at tree tips for each genomic structure (e.g., island/non-island).
The data should be structured as |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
input_control |
A logical value indicating whether to validate the input data.
If |
The function first verifies that cherryDist contains the required columns and has at least one row.
It also ensures that data contains a sufficient number of tips and that all structures have the same number of sites.
The function then iterates over each cherry and genomic structure to compute the number of full and half methylation differences
between the two tips of each cherry.
A data frame with one row per cherry, containing the following columns:
A character string representing the names of the two tips in the cherry, concatenated with a hyphen.
A character string representing the indices of the two tips in the cherry, concatenated with a hyphen.
A numeric value representing the sum of the branch distances between the cherry tips.
An integer count of the sites with a full methylation difference (where one tip is methylated and the other is unmethylated) for the given structure.
An integer count of the sites with a half methylation difference (where one tip is partially methylated and the other is either fully methylated or unmethylated) for the given structure.
# Example data setup data <- list( list(c(0, 1, 0.5, 0), c(1, 1, 0, 0.5)), list(c(1, 0, 0.5, 1), c(0, 1, 0.5, 0.5)) ) tree <- "(tip1:0.25, tip2:0.25);" cherryDist <- get_cherryDist(tree) countSites_cherryMethDiff(cherryDist, data, categorized_data = TRUE)# Example data setup data <- list( list(c(0, 1, 0.5, 0), c(1, 1, 0, 0.5)), list(c(1, 0, 0.5, 1), c(0, 1, 0.5, 0.5)) ) tree <- "(tip1:0.25, tip2:0.25);" cherryDist <- get_cherryDist(tree) countSites_cherryMethDiff(cherryDist, data, categorized_data = TRUE)
This function calculates the frequency of methylation differences between pairs of cherry tips in a phylogenetic tree. A cherry is a pair of leaf nodes that share a direct common ancestor. The function quantifies full and half methylation differences for each genomic structure (e.g., island/non-island) across all sites and normalizes these counts by the number of sites per structure to obtain frequencies.
freqSites_cherryMethDiff( tree, data, categorized_data = FALSE, input_control = TRUE )freqSites_cherryMethDiff( tree, data, categorized_data = FALSE, input_control = TRUE )
tree |
A phylogenetic tree object. The function assumes it follows an appropriate format for downstream processing. |
data |
A list containing methylation states at tree tips for each genomic structure (e.g., island/non-island).
The data should be structured as |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
input_control |
A logical value indicating whether to validate the input data.
If |
The function first validates the tree structure and extracts pairwise distances between cherry tips.
It then counts methylation differences using countSites_cherryMethDiff and normalizes these counts by the number
of sites per structure to compute frequencies. The resulting data frame provides a per-cherry frequency
of methylation differences (half or full difference) across different genomic structures.
A data frame with one row per cherry, containing the following columns:
A character string representing the names of the two tips in the cherry, concatenated with a hyphen.
A character string representing the indices of the two tips in the cherry, concatenated with a hyphen.
A numeric value representing the sum of the branch distances between the cherry tips.
A numeric value representing the frequency of sites with a full methylation difference (where one tip is methylated and the other is unmethylated) for the given structure.
A numeric value representing the frequency of sites with a half methylation difference (where one tip is partially methylated and the other is either fully methylated or unmethylated) for the given structure.
# Example data setup data <- list( list(rep(1,10), rep(0,5), rep(1,8)), list(rep(1,10), rep(0.5,5), rep(0,8)), list(rep(1,10), rep(0.5,5), rep(0,8)), list(c(rep(0,5), rep(0.5, 5)), c(0, 0, 1, 1, 1), c(0.5, 1, rep(0, 6)))) tree <- "((a:1.5,b:1.5):2,(c:2,d:2):1.5);" freqSites_cherryMethDiff(tree, data, categorized_data = TRUE)# Example data setup data <- list( list(rep(1,10), rep(0,5), rep(1,8)), list(rep(1,10), rep(0.5,5), rep(0,8)), list(rep(1,10), rep(0.5,5), rep(0,8)), list(c(rep(0,5), rep(0.5, 5)), c(0, 0, 1, 1, 1), c(0.5, 1, rep(0, 6)))) tree <- "((a:1.5,b:1.5):2,(c:2,d:2):1.5);" freqSites_cherryMethDiff(tree, data, categorized_data = TRUE)
This function computes the pairwise distances between the tips of a phylogenetic tree that are part of cherries. A cherry is a pair of leaf nodes (also called tips or terminal nodes) in a phylogenetic tree that share a direct common ancestor. In other words, if two leaves are connected to the same internal node and no other leaves are connected to that internal node, they form a cherry. The distance is calculated as the sum of the branch lengths between the two cherry tips.
get_cherryDist(tree, input_control = TRUE)get_cherryDist(tree, input_control = TRUE)
tree |
A tree in Newick format (as a character string) or an object of class |
input_control |
A logical value indicating whether to validate the input tree.
If |
The function first checks if the input is either a character string in the Newick format or an object of class phylo,
unless input_control is set to FALSE. It then computes the pairwise distances between the tips in the tree and
identifies the sister pairs (cherries). The distance between each cherry is the sum of the branch lengths leading to the sister tips.
The tips of each cherry are identified by their names and indices.
The tip indices correspond to (a) the index from left to right on the Newick string,
(b) the order of the tip label in the phylo_object$tip.label, and
(c) the index in the methylation data list (data[[tip]][[structure]]) as obtained with the function simulate_evolData() when the given tree has several tips.
If the tree is provided in Newick format, it will be parsed using the ape::read.tree function.
A data frame with five columns:
first_tip_name |
A character string representing the name of the first tip in the cherry. |
second_tip_name |
A character string representing the name of the second tip in the cherry. |
first_tip_index |
An integer representing the index of the first tip in the cherry. |
second_tip_index |
An integer representing the index of the second tip in the cherry. |
dist |
A numeric value representing the sum of the branch lengths between the two tips (i.e., the distance between the cherries). |
# Example of a tree in Newick format newick_tree <- "((a:1,b:2):5,c:6);" get_cherryDist(newick_tree) # Example of using a phylo object from ape library(ape) tree_phylo <- read.tree(text = "((a:1,b:1):5,c:6);") get_cherryDist(tree_phylo)# Example of a tree in Newick format newick_tree <- "((a:1,b:2):5,c:6);" get_cherryDist(newick_tree) # Example of using a phylo object from ape library(ape) tree_phylo <- read.tree(text = "((a:1,b:1):5,c:6);") get_cherryDist(tree_phylo)
This function computes the mean frequency of methylated sites (with methylation state 1) for a set of structures identified as islands.
get_islandMeanFreqM(index_islands, data, sample_n, categorized_data = FALSE)get_islandMeanFreqM(index_islands, data, sample_n, categorized_data = FALSE)
index_islands |
A vector containing the structural indices for islands. |
data |
A list containing methylation states at tree tips for each genomic structure (island / non-island)
For a single tip: |
sample_n |
The number of samples (tips) to process. |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
A numeric value representing the mean frequency of methylated sites in the islands.
# Example usage: index_islands <- c(1, 3) data <- list( list(c(0.5, 1, 0.5), c(0, 0.5, 1), c(1, 0, 0.5)), # tip 1 list(c(0.5, 0.5, 0), c(1, 0.5, 0.5), c(0.5, 0.5, 1)) # tip 2 ) sample_n <- 2 get_islandMeanFreqM(index_islands, data, sample_n, categorized_data = TRUE)# Example usage: index_islands <- c(1, 3) data <- list( list(c(0.5, 1, 0.5), c(0, 0.5, 1), c(1, 0, 0.5)), # tip 1 list(c(0.5, 0.5, 0), c(1, 0.5, 0.5), c(0.5, 0.5, 1)) # tip 2 ) sample_n <- 2 get_islandMeanFreqM(index_islands, data, sample_n, categorized_data = TRUE)
This function computes the mean frequency of partially methylated sites (with methylation state 0.5) for the set of genomic structures identified as islands.
get_islandMeanFreqP(index_islands, data, sample_n, categorized_data = FALSE)get_islandMeanFreqP(index_islands, data, sample_n, categorized_data = FALSE)
index_islands |
A vector containing the structural indices for islands. |
data |
A list containing methylation states at tree tips for each genomic structure (island / non-island)
For a single tip: |
sample_n |
The number of samples (tips) to process. |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
A numeric value representing the mean frequency of partially methylated sites in the islands.
# Example usage: index_islands <- c(1, 3) data <- list( list(c(0.5, 1, 0.5), c(0, 0.5, 1), c(1, 0, 0.5)), # tip 1 list(c(0.5, 0.5, 0), c(1, 0.5, 0.5), c(0.5, 0.5, 1)) # tip 2 ) sample_n <- 2 get_islandMeanFreqP(index_islands, data, sample_n, categorized_data = TRUE)# Example usage: index_islands <- c(1, 3) data <- list( list(c(0.5, 1, 0.5), c(0, 0.5, 1), c(1, 0, 0.5)), # tip 1 list(c(0.5, 0.5, 0), c(1, 0.5, 0.5), c(0.5, 0.5, 1)) # tip 2 ) sample_n <- 2 get_islandMeanFreqP(index_islands, data, sample_n, categorized_data = TRUE)
This function computes the mean standard deviation of methylated sites (with methylation state 1) for a set of genomic structures identified as islands.
get_islandSDFreqM(index_islands, data, sample_n, categorized_data = FALSE)get_islandSDFreqM(index_islands, data, sample_n, categorized_data = FALSE)
index_islands |
A vector containing the structural indices for islands. |
data |
A list containing methylation states at tree tips for each genomic structure (island / non-island)
For a single tip: |
sample_n |
The number of tips (samples) to process. |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
A numeric value representing the mean standard deviation of methylated sites in the islands.
# Example usage: index_islands <- c(1, 3) data <- list( list(c(0.5, 1, 0.5), c(0, 0.5, 1), c(1, 0, 0.5)), # tip 1 list(c(0.5, 0.5, 0), c(1, 0.5, 0.5), c(0.5, 0.5, 1)) # tip 2 ) sample_n <- 2 get_islandSDFreqM(index_islands, data, sample_n, categorized_data = TRUE)# Example usage: index_islands <- c(1, 3) data <- list( list(c(0.5, 1, 0.5), c(0, 0.5, 1), c(1, 0, 0.5)), # tip 1 list(c(0.5, 0.5, 0), c(1, 0.5, 0.5), c(0.5, 0.5, 1)) # tip 2 ) sample_n <- 2 get_islandSDFreqM(index_islands, data, sample_n, categorized_data = TRUE)
This function computes the mean standard deviation of partially methylated sites (with methylation state 0.5) for a set of genomic structures identified as islands.
get_islandSDFreqP(index_islands, data, sample_n, categorized_data = FALSE)get_islandSDFreqP(index_islands, data, sample_n, categorized_data = FALSE)
index_islands |
A vector containing the structural indices for islands. |
data |
A list containing methylation states at tree tips for each genomic structure (island / non-island)
For a single tip: |
sample_n |
The number of samples (tips) to process. |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
A numeric value representing the mean standard deviation of partially methylated sites in the islands.
# Example usage: index_islands <- c(1, 3) data <- list( list(c(0.5, 1, 0.5), c(0, 0.5, 1), c(1, 0, 0.5)), # tip 1 list(c(0.5, 0.5, 0), c(1, 0.5, 0.5), c(0.5, 0.5, 1)) # tip 2 ) sample_n <- 2 get_islandSDFreqP(index_islands, data, sample_n, categorized_data = TRUE)# Example usage: index_islands <- c(1, 3) data <- list( list(c(0.5, 1, 0.5), c(0, 0.5, 1), c(1, 0, 0.5)), # tip 1 list(c(0.5, 0.5, 0), c(1, 0.5, 0.5), c(0.5, 0.5, 1)) # tip 2 ) sample_n <- 2 get_islandSDFreqP(index_islands, data, sample_n, categorized_data = TRUE)
This function calculates the mean methylation level for CpG islands across all tree tips.
get_meanMeth_islands(index_islands, data)get_meanMeth_islands(index_islands, data)
index_islands |
A numeric vector specifying the indices of genomic structures corresponding to islands. |
data |
A list containing methylation states at tree tips for each genomic structure
(e.g., island/non-island). The data should be structured as |
A list where each element corresponds to a tree tip and contains a numeric vector representing the mean methylation levels for the indexed CpG islands.
# Example data setup data <- list( # Tip 1 list(rep(1,10), rep(0,5), rep(1,8)), # Tip 2 list(rep(1,10), rep(0.5,5), rep(0,8)) ) index_islands <- c(1,3) get_meanMeth_islands(index_islands, data)# Example data setup data <- list( # Tip 1 list(rep(1,10), rep(0,5), rep(1,8)), # Tip 2 list(rep(1,10), rep(0.5,5), rep(0,8)) ) index_islands <- c(1,3) get_meanMeth_islands(index_islands, data)
This function computes the mean frequency of methylated sites (with methylation state 1) for a set of structures identified as non-islands.
get_nonislandMeanFreqM( index_nonislands, data, sample_n, categorized_data = FALSE )get_nonislandMeanFreqM( index_nonislands, data, sample_n, categorized_data = FALSE )
index_nonislands |
A vector containing the structural indices for non-islands. |
data |
A list containing methylation states at tree tips for each genomic structure (island / non-island)
For a single tip: |
sample_n |
The number of samples (tips) to process. |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
A numeric value representing the mean frequency of methylated sites in the non-islands.
# Example usage: index_nonislands <- c(1, 3) data <- list( list(c(1, 0, 1), c(0.5, 1, 1), c(1, 0, 0.5)), # tip 1 list(c(1, 0.5, 1), c(0.5, 1, 1), c(1, 0.5, 0.5)) # tip 2 ) sample_n <- 2 get_nonislandMeanFreqM(index_nonislands, data, sample_n, categorized_data = TRUE)# Example usage: index_nonislands <- c(1, 3) data <- list( list(c(1, 0, 1), c(0.5, 1, 1), c(1, 0, 0.5)), # tip 1 list(c(1, 0.5, 1), c(0.5, 1, 1), c(1, 0.5, 0.5)) # tip 2 ) sample_n <- 2 get_nonislandMeanFreqM(index_nonislands, data, sample_n, categorized_data = TRUE)
This function computes the mean frequency of partially methylated sites (with methylation state 0.5) for a set of genomic structures identified as non-islands.
get_nonislandMeanFreqP( index_nonislands, data, sample_n, categorized_data = FALSE )get_nonislandMeanFreqP( index_nonislands, data, sample_n, categorized_data = FALSE )
index_nonislands |
A vector containing the structural indices for non-islands. |
data |
A list containing methylation states at tree tips for each genomic structure (island / non-island)
For a single tip: |
sample_n |
The number of samples (tips) to process. |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
A numeric value representing the mean frequency of partially methylated sites in the non-islands.
# Example usage: index_nonislands <- c(1, 3) data <- list( list(c(0.5, 1, 0.5), c(0, 0.5, 1), c(1, 0, 0.5)), # tip 1 list(c(0.5, 0.5, 0), c(1, 0.5, 0.5), c(0.5, 0.5, 1)) # tip 2 ) sample_n <- 2 get_nonislandMeanFreqP(index_nonislands, data, sample_n, categorized_data = TRUE)# Example usage: index_nonislands <- c(1, 3) data <- list( list(c(0.5, 1, 0.5), c(0, 0.5, 1), c(1, 0, 0.5)), # tip 1 list(c(0.5, 0.5, 0), c(1, 0.5, 0.5), c(0.5, 0.5, 1)) # tip 2 ) sample_n <- 2 get_nonislandMeanFreqP(index_nonislands, data, sample_n, categorized_data = TRUE)
This function computes the mean standard deviation of methylated sites (with methylation state 1) for a set of genomic structures identified as non-islands.
get_nonislandSDFreqM( index_nonislands, data, sample_n, categorized_data = FALSE )get_nonislandSDFreqM( index_nonislands, data, sample_n, categorized_data = FALSE )
index_nonislands |
A vector containing the structural indices for non-islands. |
data |
A list containing methylation states at tree tips for each genomic structure (island / non-island)
For a single tip: |
sample_n |
The number of tips (samples) to process. |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
A numeric value representing the mean standard deviation of methylated sites in the non-islands.
# Example usage: index_nonislands <- c(1, 3) data <- list( list(c(1, 1, 1), c(0, 1, 0.5), c(1, 0, 1)), # tip 1 list(c(1, 0.5, 0), c(1, 1, 0.5), c(1, 1, 1)) # tip 2 ) sample_n <- 2 get_nonislandSDFreqM(index_nonislands, data, sample_n, categorized_data = TRUE)# Example usage: index_nonislands <- c(1, 3) data <- list( list(c(1, 1, 1), c(0, 1, 0.5), c(1, 0, 1)), # tip 1 list(c(1, 0.5, 0), c(1, 1, 0.5), c(1, 1, 1)) # tip 2 ) sample_n <- 2 get_nonislandSDFreqM(index_nonislands, data, sample_n, categorized_data = TRUE)
This function computes the mean standard deviation of partially methylated sites (with methylation state 0.5) for a set of genomic structures identified as non-islands.
get_nonislandSDFreqP( index_nonislands, data, sample_n, categorized_data = FALSE )get_nonislandSDFreqP( index_nonislands, data, sample_n, categorized_data = FALSE )
index_nonislands |
A vector containing the structural indices for non-islands. |
data |
A list containing methylation states at tree tips for each genomic structure (island / non-island)
For a single tip: |
sample_n |
The number of samples (tips) to process. |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
A numeric value representing the mean standard deviation of partially methylated sites in the non-islands.
# Example usage: index_nonislands <- c(1, 3) data <- list( list(c(0.5, 1, 0.5), c(0, 0.5, 1), c(1, 0, 0.5)), # tip 1 list(c(0.5, 0.5, 0), c(1, 0.5, 0.5), c(0.5, 0.5, 1)) # tip 2 ) sample_n <- 2 get_nonislandSDFreqP(index_nonislands, data, sample_n, categorized_data = TRUE)# Example usage: index_nonislands <- c(1, 3) data <- list( list(c(0.5, 1, 0.5), c(0, 0.5, 1), c(1, 0, 0.5)), # tip 1 list(c(0.5, 0.5, 0), c(1, 0.5, 0.5), c(0.5, 0.5, 1)) # tip 2 ) sample_n <- 2 get_nonislandSDFreqP(index_nonislands, data, sample_n, categorized_data = TRUE)
This function retrieves parameter values for the DNA methylation simulation.
get_parameterValues(rootData = NULL)get_parameterValues(rootData = NULL)
rootData |
NULL to return default parameter values. For data parameter values, provide rootData as the output of simulate_initialData()$data. |
The function called without arguments returns default parameter values. When rootData (as $data output of simulate_initialData()) is given, it returns data parameter values.
A data frame containing default parameter values.
# Get default parameter values default_values <- get_parameterValues() # Get parameter values of simulate_initialData() output custom_params <- get_parameterValues() infoStr <- data.frame(n = c(5, 10), globalState = c("M", "U")) rootData <- simulate_initialData(infoStr = infoStr, params = custom_params)$data rootData_paramValues <- get_parameterValues(rootData = rootData)# Get default parameter values default_values <- get_parameterValues() # Get parameter values of simulate_initialData() output custom_params <- get_parameterValues() infoStr <- data.frame(n = c(5, 10), globalState = c("M", "U")) rootData <- simulate_initialData(infoStr = infoStr, params = custom_params)$data rootData_paramValues <- get_parameterValues(rootData = rootData)
This function calculates the total frequency of methylation differences (both full and half changes) for each genomic structure for each cherry in a phylogenetic tree. A cherry is a pair of leaf nodes (also called tips or terminal nodes) in a phylogenetic tree that share a direct common ancestor. In other words, if two leaves are connected to the same internal node and no other leaves are connected to that internal node, they form a cherry.
get_siteFChange_cherry(tree, data, categorized_data = FALSE)get_siteFChange_cherry(tree, data, categorized_data = FALSE)
tree |
A phylogenetic tree in Newick format or a phylo object from the ape package. The function ensures the tree has a valid structure and at least two tips. |
data |
A list containing methylation states at tree tips for each genomic structure (e.g., island/non-island).
The data should be structured as |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
The function first verifies that tree and data have valid structures and the minimum number of tips.
It then extracts per-cherry methylation differences using freqSites_cherryMethDiff, handling potential errors.
Finally, it aggregates the full and half methylation differences for each genomic structure at each cherry.
A data frame with one row per cherry, containing the following columns:
A character string representing the names of the two tips in the cherry, concatenated with a hyphen.
A character string representing the indices of the two tips in the cherry, concatenated with a hyphen.
A numeric value representing the sum of the branch distances between the cherry tips.
A numeric value representing the total frequency of methylation changes (both full and half) for the given structure.
# Example data setup data <- list( list(rep(1,10), rep(0,5), rep(1,8)), list(rep(1,10), rep(0.5,5), rep(0,8)), list(rep(1,10), rep(0.5,5), rep(0,8)), list(c(rep(0,5), rep(0.5, 5)), c(0, 0, 1, 1, 1), c(0.5, 1, rep(0, 6)))) tree <- "((a:1.5,b:1.5):2,(c:2,d:2):1.5);" get_siteFChange_cherry(tree, data, categorized_data = TRUE)# Example data setup data <- list( list(rep(1,10), rep(0,5), rep(1,8)), list(rep(1,10), rep(0.5,5), rep(0,8)), list(rep(1,10), rep(0.5,5), rep(0,8)), list(c(rep(0,5), rep(0.5, 5)), c(0, 0, 1, 1, 1), c(0.5, 1, rep(0, 6)))) tree <- "((a:1.5,b:1.5):2,(c:2,d:2):1.5);" get_siteFChange_cherry(tree, data, categorized_data = TRUE)
Computes the mean number of significant changes per island in phylogenetic tree cherries, based on a specified p-value threshold.
mean_CherryFreqsChange_i( data, categorized_data = FALSE, index_islands, tree, pValue_threshold )mean_CherryFreqsChange_i( data, categorized_data = FALSE, index_islands, tree, pValue_threshold )
data |
A list containing methylation states at tree tips for each genomic structure (e.g., island/non-island).
The data should be structured as |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
index_islands |
A numeric vector specifying the indices of islands to analyze. |
tree |
A rooted binary tree in Newick format (character string) or as an |
pValue_threshold |
A numeric value between 0 and 1 that serves as the threshold for statistical significance in the chi-squared test. |
The function uses simulate.p.value = TRUE in chisq.test
to compute the p-value via Monte Carlo simulation to improve reliability
regardless of whether the expected frequencies meet the assumptions of the chi-squared test
(i.e., expected counts of at least 5 in each category).
A data frame containing the same information as pValue_CherryFreqsChange_i,
but with additional columns indicating whether p-values are below the threshold (significant changes)
and the mean frequency of significant changes per island.
tree <- "((d:1,e:1):2,a:2);" data <- list( #Tip 1 list(c(rep(1,9), rep(0,1)), c(rep(0,9), 1), c(rep(0,9), rep(0.5,1))), #Tip 2 list(c(rep(0,9), rep(0.5,1)), c(rep(0.5,9), 1), c(rep(1,9), rep(0,1))), #Tip 3 list(c(rep(1,9), rep(0.5,1)), c(rep(0.5,9), 1), c(rep(0,9), rep(0.5,1)))) index_islands <- c(1,3) mean_CherryFreqsChange_i(data, categorized_data = TRUE, index_islands, tree, pValue_threshold = 0.05)tree <- "((d:1,e:1):2,a:2);" data <- list( #Tip 1 list(c(rep(1,9), rep(0,1)), c(rep(0,9), 1), c(rep(0,9), rep(0.5,1))), #Tip 2 list(c(rep(0,9), rep(0.5,1)), c(rep(0.5,9), 1), c(rep(1,9), rep(0,1))), #Tip 3 list(c(rep(1,9), rep(0.5,1)), c(rep(0.5,9), 1), c(rep(0,9), rep(0.5,1)))) index_islands <- c(1,3) mean_CherryFreqsChange_i(data, categorized_data = TRUE, index_islands, tree, pValue_threshold = 0.05)
This function analyzes the frequency changes of methylation states (unmethylated, partially methylated, methylated) across tree tips for a given set of islands. It performs a chi-squared test for each island to check for significant changes in frequencies across tips and returns the proportion of islands showing significant changes.
mean_TreeFreqsChange_i( tree, data, categorized_data = FALSE, index_islands, pValue_threshold, testing = FALSE )mean_TreeFreqsChange_i( tree, data, categorized_data = FALSE, index_islands, pValue_threshold, testing = FALSE )
tree |
A phylogenetic tree object, typically of class |
data |
A list containing methylation states at tree tips for each genomic structure (e.g., island/non-island).
The data should be structured as |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
index_islands |
A vector of indices of genomic structures corresponding to islands in data. |
pValue_threshold |
A numeric value between 0 and 1 that serves as the threshold for statistical significance in the chi-squared test. |
testing |
Logical defaulted to FALSE. TRUE for testing output. |
The function uses simulate.p.value = TRUE in chisq.test
to compute the p-value via Monte Carlo simulation to improve reliability
regardless of whether the expected frequencies meet the assumptions of the chi-squared test
(i.e., expected counts of at least 5 in each category).
Throws errors if:
The tree is not valid.
data is not structured correctly across tips.
index_islands is empty.
pValue_threshold is not between 0 and 1.
A numeric value representing the mean proportion of islands with significant frequency changes across tips.
# Example of usage: tree <- "((d:1,e:1):2,a:2);" data <- list( #Tip 1 list(c(rep(1,9), rep(0,1)), c(rep(0,9), 1), c(rep(0,9), rep(0.5,1))), #Tip 2 list(c(rep(1,9), rep(0.5,1)), c(rep(0.5,9), 1), c(rep(1,9), rep(0,1))), #Tip 3 list(c(rep(1,9), rep(0.5,1)), c(rep(0.5,9), 1), c(rep(0,9), rep(0.5,1)))) index_islands <- c(1,3) mean_TreeFreqsChange_i(tree, data, categorized_data = TRUE, index_islands, pValue_threshold = 0.05)# Example of usage: tree <- "((d:1,e:1):2,a:2);" data <- list( #Tip 1 list(c(rep(1,9), rep(0,1)), c(rep(0,9), 1), c(rep(0,9), rep(0.5,1))), #Tip 2 list(c(rep(1,9), rep(0.5,1)), c(rep(0.5,9), 1), c(rep(1,9), rep(0,1))), #Tip 3 list(c(rep(1,9), rep(0.5,1)), c(rep(0.5,9), 1), c(rep(0,9), rep(0.5,1)))) index_islands <- c(1,3) mean_TreeFreqsChange_i(tree, data, categorized_data = TRUE, index_islands, pValue_threshold = 0.05)
This function calculates the weighted mean frequency of methylation changes at island and non-island genomic structures for each cherry in a phylogenetic tree. A cherry is a pair of leaf nodes (also called tips or terminal nodes) in a phylogenetic tree that share a direct common ancestor.
MeanSiteFChange_cherry( data, categorized_data = FALSE, tree, index_islands, index_nonislands )MeanSiteFChange_cherry( data, categorized_data = FALSE, tree, index_islands, index_nonislands )
data |
A list containing methylation states at tree tips for each genomic structure (e.g., island/non-island).
The data should be structured as |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
tree |
A phylogenetic tree in Newick format or a |
index_islands |
A numeric vector specifying the indices of genomic structures corresponding to islands. |
index_nonislands |
A numeric vector specifying the indices of genomic structures corresponding to non-islands. |
The function first validates the tree and the input data structure. It then computes the
per-cherry frequency of sites with different methylation states using get_siteFChange_cherry.
The indices provided for islands and non-islands are checked for validity using validate_structureIndices.
Finally, the function calculates the weighted mean site frequency of methylation changes for each cherry,
separately for islands and non-islands.
A data frame with one row per cherry, containing the following columns:
A character string representing the names of the two tips in the cherry, concatenated with a hyphen.
A character string representing the indices of the two tips in the cherry, concatenated with a hyphen.
A numeric value representing the sum of the branch distances between the cherry tips.
A numeric value representing the weighted mean frequency of methylation changes in non-island structures.
A numeric value representing the weighted mean frequency of methylation changes in island structures.
# Example data setup data <- list( list(rep(1,10), rep(0,5), rep(1,8)), # Tip a list(rep(1,10), rep(0.5,5), rep(0,8)), # Tip b list(rep(1,10), rep(0.5,5), rep(0,8)), # Tip c list(c(rep(0,5), rep(0.5, 5)), c(0, 0, 1, 1, 1), c(0.5, 1, rep(0, 6)))) # Tip d tree <- "((a:1.5,b:1.5):2,(c:2,d:2):1.5);" index_islands <- c(1,3) index_nonislands <- c(2) MeanSiteFChange_cherry(data = data, categorized_data = TRUE, tree = tree, index_islands = index_islands, index_nonislands = index_nonislands)# Example data setup data <- list( list(rep(1,10), rep(0,5), rep(1,8)), # Tip a list(rep(1,10), rep(0.5,5), rep(0,8)), # Tip b list(rep(1,10), rep(0.5,5), rep(0,8)), # Tip c list(c(rep(0,5), rep(0.5, 5)), c(0, 0, 1, 1, 1), c(0.5, 1, rep(0, 6)))) # Tip d tree <- "((a:1.5,b:1.5):2,(c:2,d:2):1.5);" index_islands <- c(1,3) index_nonislands <- c(2) MeanSiteFChange_cherry(data = data, categorized_data = TRUE, tree = tree, index_islands = index_islands, index_nonislands = index_nonislands)
Calculates p-values for changes in methylation frequency between pairs of cherry tips in a phylogenetic tree. A cherry is a pair of leaf nodes (also called tips or terminal nodes) in a phylogenetic tree that share a direct common ancestor.
pValue_CherryFreqsChange_i( data, categorized_data = FALSE, index_islands, tree, input_control = TRUE )pValue_CherryFreqsChange_i( data, categorized_data = FALSE, index_islands, tree, input_control = TRUE )
data |
A list containing methylation states at tree tips for each genomic structure (e.g., island/non-island).
The data should be structured as |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
index_islands |
A numeric vector specifying the indices of islands to analyze. |
tree |
A rooted binary tree in Newick format (character string) or as an |
input_control |
Logical; if |
The function uses simulate.p.value = TRUE in chisq.test
to compute the p-value via Monte Carlo simulation to improve reliability
regardless of whether the expected frequencies meet the assumptions of the chi-squared test
(i.e., expected counts of at least 5 in each category).
A data frame containing tip pair information (first tip name, second tip name, first tip index, second tip index, distance) and one column per island with the p-values from the chi-squared tests.
# Example with hypothetical tree and data structure tree <- "((d:1,e:1):2,a:2);" data <- list( #Tip 1 list(c(rep(1,9), rep(0,1)), c(rep(0,9), 1), c(rep(0,9), rep(0.5,1))), #Tip 2 list(c(rep(0,9), rep(0.5,1)), c(rep(0.5,9), 1), c(rep(1,9), rep(0,1))), #Tip 3 list(c(rep(1,9), rep(0.5,1)), c(rep(0.5,9), 1), c(rep(0,9), rep(0.5,1)))) index_islands <- c(1,3) pValue_CherryFreqsChange_i(data, categorized_data = TRUE, index_islands, tree)# Example with hypothetical tree and data structure tree <- "((d:1,e:1):2,a:2);" data <- list( #Tip 1 list(c(rep(1,9), rep(0,1)), c(rep(0,9), 1), c(rep(0,9), rep(0.5,1))), #Tip 2 list(c(rep(0,9), rep(0.5,1)), c(rep(0.5,9), 1), c(rep(1,9), rep(0,1))), #Tip 3 list(c(rep(1,9), rep(0.5,1)), c(rep(0.5,9), 1), c(rep(0,9), rep(0.5,1)))) index_islands <- c(1,3) pValue_CherryFreqsChange_i(data, categorized_data = TRUE, index_islands, tree)
This function simulates methylation data evolution along a tree. Either by simulating data at the root of the provided evolutionary tree (if infoStr is given) or by using pre-existing data at the root (if rootData is given) and letting it evolve along the tree.
simulate_evolData( infoStr = NULL, rootData = NULL, tree = NULL, params = NULL, dt = 0.01, CFTP = FALSE, CFTP_step_limit = 327680000, n_rep = 1, only_tip = TRUE )simulate_evolData( infoStr = NULL, rootData = NULL, tree = NULL, params = NULL, dt = 0.01, CFTP = FALSE, CFTP_step_limit = 327680000, n_rep = 1, only_tip = TRUE )
infoStr |
A data frame containing columns 'n' for the number of sites, and 'globalState' for the favoured global methylation state. If customized initial equilibrium frequencies are given, it also contains columns 'u_eqFreq', 'p_eqFreq', and 'm_eqFreq' with the equilibrium frequency values for unmethylated, partially methylated, and methylated. |
rootData |
The output of the simulate_initialData()$data function. It represents the initial data at the root of the evolutionary tree. |
tree |
A string in Newick format representing the evolutionary tree. |
params |
Optional data frame with specific parameter values. Structure as in get_parameterValues() output. If not provided, default values will be used. |
dt |
Length of time step for Gillespie's Tau-Leap Approximation (default is 0.01). |
CFTP |
Default FALSE. TRUE for calling cftp algorithm to set root state according to model equilibrium (Note that current implementation neglects IWE process). |
CFTP_step_limit |
when CFTP = TRUE, maximum number of steps before applying an approximation method (default 327680000 corresponding to size of CFTP info of approx 6.1 GB). |
n_rep |
Number of replicates to simulate (default is 1). |
only_tip |
Logical indicating whether to extract data only for tips (default is TRUE, FALSE to extract the information for all the tree branches). |
A list containing the parameters used ($params), the length of the time step used for the Gillespie's tau-leap approximation ($dt, default 0.01), the tree used ($tree).
simulated data and the simulated data ($data). In $data, each list element corresponds to a simulation replicate.
If only_tip is TRUE: In $data, each list element corresponds to a simulation replicate.
Each replicate includes one list per tree tip, each containing:
The name of each tip in the simulated tree (e.g. replicate 2, tip 1: $data[[2]][[1]]$name).
A list with the sequence of methylation states for each tip-specific structure (e.g. replicate 1, tip 2, 3rd structure: $data[[1]][[2]]$seq[[3]].
The methylation states are encoded as 0 for unmethylated, 0.5 for partially methylated, and 1 for methylated.
If only_tip is FALSE, $data contains 2 lists:
$data$branchInTree: a list in which each element contains the information of the relationship with other branches:
Index of the parent branch (e.g. branch 2): $data$branchInTree[[2]]$parent_index)
Index(es) of the offspring branch(es) (e.g. branch 1 (root)): $data$branchInTree[[1]]$offspring_index)
$data$sim_data: A list containing simulated data. Each list element corresponds to a simulation replicate.
Each replicate includes one list per tree branch, each containing:
The name of each branch in the simulated tree. It's NULL for the tree root and inner nodes, and the name of the tips for the tree tips.
(e.g. replicate 2, branch 1: $data$sim_data[[2]][[1]]$name)
Information of IWE events on that branch. It's NULL for the tree root and FALSE for the branches in which no IWE event was sampled,
and a list containing $islands with the index(ces) of the island structure(s) that went through the IWE event and $times
for the branch time point(s ) in which the IWE was sampled.
(e.g. replicate 1, branch 3: $data$sim_data[[1]][[3]]$IWE)
A list with the sequence of methylation states for each structure (the index of the list corresponds to the index
of the structures). The methylation states are encoded as 0 for unmethylated, 0.5 for partially methylated, and 1 for methylated.
(e.g. replicate 3, branch 2, structure 1: $data$sim_data[[3]][[2]]$seq[[1]])
A list with the methylation equilibrium frequencies for each structure (the index of the list corresponds to the index
of the structures). Each structure has a vector with 3 values, the first one corresponding to the frequency of unmethylated,
the second one to the frequency of partially methylated, and the third one to the frequency of methylated CpGs.
(e.g. replicate 3, branch 2, structure 1: $data$sim_data[[3]][[2]]$eqFreqs[[1]])
# Example data infoStr <- data.frame(n = c(10, 100, 10), globalState = c("M", "U", "M")) # Simulate data evolution along a tree with default parameters simulate_evolData(infoStr = infoStr, tree = "(A:0.1,B:0.1);") # Simulate data evolution along a tree with custom parameters custom_params <- get_parameterValues() custom_params$iota <- 0.5 simulate_evolData(infoStr = infoStr, tree = "(A:0.1,B:0.1);", params = custom_params)# Example data infoStr <- data.frame(n = c(10, 100, 10), globalState = c("M", "U", "M")) # Simulate data evolution along a tree with default parameters simulate_evolData(infoStr = infoStr, tree = "(A:0.1,B:0.1);") # Simulate data evolution along a tree with custom parameters custom_params <- get_parameterValues() custom_params$iota <- 0.5 simulate_evolData(infoStr = infoStr, tree = "(A:0.1,B:0.1);", params = custom_params)
This function simulates initial data based on the provided information and parameters.
simulate_initialData( infoStr, params = NULL, CFTP = FALSE, CFTP_step_limit = 327680000 )simulate_initialData( infoStr, params = NULL, CFTP = FALSE, CFTP_step_limit = 327680000 )
infoStr |
A data frame containing columns 'n' for the number of sites, and 'globalState' for the favoured global methylation state. If customized equilibrium frequencies are given, it also contains columns 'u_eqFreq', 'p_eqFreq' and 'm_eqFreq' with the equilibrium frequency values for unmethylated, partially methylated and methylated. |
params |
Optional data frame with specific parameter values. |
CFTP |
Default FALSE. TRUE for calling cftp algorithm to set root state according to model equilibrium (Note that current implementation neglects IWE process). Structure as in get_parameterValues() output. If not provided, default values will be used. |
CFTP_step_limit |
when CFTP = TRUE, maximum number of steps before applying an approximation method (default 327680000 corresponding to size of CFTP info of approx 6.1 GB). |
The function performs several checks on the input data and parameters to ensure they meet the required criteria and simulates DNA methylation data.
A list containing the simulated data ($data) and parameters ($params).
# Example data infoStr <- data.frame(n = c(10, 100, 10), globalState = c("M", "U", "M")) # Simulate initial data with default parameters simulate_initialData(infoStr = infoStr) # Simulate data evolution along a tree with custom parameters custom_params <- get_parameterValues() custom_params$iota <- 0.5 simulate_initialData(infoStr = infoStr, params = custom_params)# Example data infoStr <- data.frame(n = c(10, 100, 10), globalState = c("M", "U", "M")) # Simulate initial data with default parameters simulate_initialData(infoStr = infoStr) # Simulate data evolution along a tree with custom parameters custom_params <- get_parameterValues() custom_params$iota <- 0.5 simulate_initialData(infoStr = infoStr, params = custom_params)
an R6 class representing a single genomic structure
testing_outputPublic attribute: Testing output for initialize
singleStructureGenerator$init_neighbSt()Public method: Initialization of $neighbSt
This fuction initiates each CpG position $neighbSt as encoded in $mapNeighbSt_matrix
It uses $update_neighbSt which updates for each sequence index, the neighbSt of left and right neighbors This means that it updates position 2, then 1 and 3, then 2 and 4.. Therefore, if the combiStructure instance has several singleStr instances within and the first has length 1, the $neighbSt of that position of the first singleStr instance is initialized when the method is called from the second singleStr instance
Positions at the edge of the entire simulated sequence use their only neighbor as both neighbors.
singleStructureGenerator$init_neighbSt()
NULL
singleStructureGenerator$initialize_ratetree()Public method: Initialization of $ratetree
This function initializes $ratetree
singleStructureGenerator$initialize_ratetree()
NULL
singleStructureGenerator$new()Create a new singleStructureGenerator object.
Note that this object is typically generated withing a combiStructureGenerator object.
singleStructureGenerator$new( globalState, n, eqFreqs = NULL, combiStr = NULL, combiStr_index = NULL, params = NULL, testing = FALSE )
globalStateCharacter. Structure's favored global state: "M" for methylated (island structures) / "U" for unmethylated (non-island structures).
nNumerical Value. Number of CpG positions
eqFreqsDefault NULL. When given: numerical vector with structure's methylation state equilibrium frequencies (for unmethylated, partially methylated and methylated)
combiStrDefault NULL. When initiated from combiStructureGenerator: object of class combiStructureGenerator containing it
combiStr_indexDefault NULL. When initiated from combiStructureGenerator: index in Object of class combiStructureGenerator
paramsDefault NULL. When given: data frame containing model parameters
testingDefault FALSE. TRUE for writing in public field of new instance $testing_output
A new singleStructureGenerator object.
singleStructureGenerator$set_myCombiStructure()Public method: Set my_combiStructure. Assigns given combi instance to private field my_combiStructure
singleStructureGenerator$set_myCombiStructure(combi)
combiinstance of combiStructureGenerator
NULL
singleStructureGenerator$get_seq()Public method: Get object's methylation state sequence
Encoded with 1 for unmethylated, 2 for partially methylated and 3 for methylated
singleStructureGenerator$get_seq()
vector with equilibrium frequencies of unmethylated, partially methylated and methylated
singleStructureGenerator$get_seqFirstPos()Public method: Get first sequence position methylation state
singleStructureGenerator$get_seqFirstPos()
numerical encoding of first position's methylation state
singleStructureGenerator$get_seq2ndPos()Public method: Get second sequence position methylation state
singleStructureGenerator$get_seq2ndPos()
numerical encoding of second position's methylation state. NULL if position does not exist
singleStructureGenerator$get_seqLastPos()Public method: Get first sequence position methylation state
singleStructureGenerator$get_seqLastPos()
numerical encoding of first position's methylation state
singleStructureGenerator$get_seq2ndButLastPos()Public method: Get second but last sequence position methylation state
singleStructureGenerator$get_seq2ndButLastPos()
numerical encoding of second but last position's methylation state. NULL if position does not exist
singleStructureGenerator$get_combiStructure_index()Public method: Get index in object of class combiStructureGenerator
singleStructureGenerator$get_combiStructure_index()
index in object of class combiStructureGenerator
singleStructureGenerator$update_interStr_firstNeighbSt()Public method: Update neighbSt of next singleStructureGenerator object within combiStructureGenerator object
This function is used when the last $seq position of a singleStructureGenerator object changes methylation state to update the neighbSt position
singleStructureGenerator$update_interStr_firstNeighbSt( leftNeighb_seqSt, rightNeighb_seqSt )
leftNeighb_seqSt$seq state of left neighbor (left neighbor is in previous singleStructureGenerator object)
rightNeighb_seqSt$seq state of right neighbor
NULL
singleStructureGenerator$update_interStr_lastNeighbSt()Public method: Update neighbSt of previous singleStructureGenerator object within combiStructureGenerator object
singleStructureGenerator$update_interStr_lastNeighbSt( leftNeighb_seqSt, rightNeighb_seqSt )
leftNeighb_seqSt$seq state of right neighbor (left neighbor is in next singleStructureGenerator object)
rightNeighb_seqSt$seq state of right neighbor
NULL
singleStructureGenerator$get_eqFreqs()Public method: Get object's equilibrium Frequencies
singleStructureGenerator$get_eqFreqs()
vector with equilibrium frequencies of unmethylated, partially methylated and methylated
singleStructureGenerator$SSE_evol()Public method. Simulate how CpG dinucleotide methylation state changes due to the SSE process along a time step of length dt
singleStructureGenerator$SSE_evol(dt, testing = FALSE)
dttime step length.
testinglogical value for testing purposes. Default FALSE.
default NULL. If testing TRUE it returns a list with the number of events sampled and a dataframe with the position(s) affected, new state and old methylation state.
singleStructureGenerator$get_transMat()Public Method. Get a transition matrix
singleStructureGenerator$get_transMat( old_eqFreqs, new_eqFreqs, info, testing = FALSE )
old_eqFreqsnumeric vector with 3 frequency values (for old u, p and m)
new_eqFreqsnumeric vector with 3 frequency values (for new u, p and m)
infocharacter string to indicate where the method is being called
testinglogical value for testing purposes. Default FALSE.
Given a tripple of old equilibrium frequencies and new equilibrium frequencies, generates the corresponding transition matrix.
transMat. The transition matrix. If testing = TRUE it returns a list. If there was a change in the equilibrium frequencies the list contains the following 7 elements, if not it contains the first 3 elements:
transMattransition matrix
caseThe applied case.
singleStructureGenerator$IWE_evol()Public Method. Simulate IWE Events
Simulates how CpG Islands' methylation state frequencies change and simultaneous sites change methylation state along a branch of length t according to the SSE-IWE model.
singleStructureGenerator$IWE_evol(testing = FALSE)
testinglogical value for testing purposes. Default FALSE.
The function checks if the methylation equilibrium frequencies (eqFreqs) and sequence observed
frequencies (obsFreqs) change after the IWE event. If there is a change in either
frequencies, the corresponding change flag eqFreqsChange
in the infoIWE list will be set to TRUE.
If testing = TRUE it returns a list. If there was a change in the equilibrium frequencies the list contains the following 7 elements, if not it contains the first 3 elements:
eqFreqsChangelogical indicating if there was a change in the equilibrium frequencies.
old_eqFreqsOriginal equilibrium frequencies before the IWE event.
new_eqFreqsNew equilibrium frequencies after the IWE event.
old_obsFreqsOriginal observed frequencies before the IWE event.
new_obsFreqsNew observed frequencies after the IWE event.
IWE_caseDescription of the IWE event case.
MkTransition matrix used for the IWE event.
singleStructureGenerator$get_alpha_pI()Public Method.
singleStructureGenerator$get_alpha_pI()
Model parameter alpha_pI for sampling island equilibrium frequencies
singleStructureGenerator$get_beta_pI()Public Method.
singleStructureGenerator$get_beta_pI()
Model parameter for sampling island equilibrium frequencies
singleStructureGenerator$get_alpha_mI()Public Method.
singleStructureGenerator$get_alpha_mI()
Model parameter for sampling island equilibrium frequencies
singleStructureGenerator$get_beta_mI()Public Method.
singleStructureGenerator$get_beta_mI()
Model parameter for sampling island equilibrium frequencies
singleStructureGenerator$get_alpha_pNI()Public Method.
singleStructureGenerator$get_alpha_pNI()
Model parameter for sampling non-island equilibrium frequencies
singleStructureGenerator$get_beta_pNI()Public Method.
singleStructureGenerator$get_beta_pNI()
Model parameter for sampling non-island equilibrium frequencies
singleStructureGenerator$get_alpha_mNI()Public Method.
singleStructureGenerator$get_alpha_mNI()
Model parameter for sampling non-island equilibrium frequencies
singleStructureGenerator$get_beta_mNI()Public Method.
singleStructureGenerator$get_beta_mNI()
Model parameter for sampling non-island equilibrium frequencies
singleStructureGenerator$get_alpha_Ri()Public Method.
singleStructureGenerator$get_alpha_Ri()
Model parameter for gamma distribution shape to initialize the 3 $Ri_values
singleStructureGenerator$get_iota()Public Method.
singleStructureGenerator$get_iota()
Model parameter for gamma distribution expected value to initialize the 3 $Ri_values
singleStructureGenerator$get_Ri_values()Public Method.
singleStructureGenerator$get_Ri_values()
The 3 $Ri_values
singleStructureGenerator$set_params()Public method: Update model parameters and recompute all dependent rate quantities (Ri_values, Rc_values, Qi, Qc, Q, ratetree).
The current methylation state sequence and equilibrium frequencies are preserved; only stored parameter values and their derived rate quantities are updated. Note that alpha_Ri and iota values below 1e-2 are silently clamped to 1e-2 (consistent with initialization behaviour).
singleStructureGenerator$set_params(params)
paramsData frame with parameter values, structured as the output of
get_parameterValues().
Equilibrium frequencies (eqFreqs) are intentionally not resampled when
parameters are updated. The eqFreqs represent the realized methylation
equilibrium of this particular sequence instance — they are sampled once at
initialization (and can be updated by IWE events during evolution). Resampling
them here would discard that realized state and replace it with a new random
draw, which is not the intended behaviour when changing parameters mid-simulation
(e.g. to continue evolving a tip sequence under a different parameter regime).
The updated alpha/beta parameters are stored and will take effect in any future
IWE events that resample eqFreqs within this object.
NULL
singleStructureGenerator$get_Q()Public Method.
singleStructureGenerator$get_Q( siteR = NULL, neighbSt = NULL, oldSt = NULL, newSt = NULL )
siteRdefault NULL. Numerical value encoding for the sites rate of independent SSE (1, 2 or 3)
neighbStdefault NULL. Numerical value encoding for the sites neighbouring state (as in mapNeighbSt_matrix)
oldStdefault NULL. Numerical value encoding for the sites old methylation state (1, 2 or 3)
newStdefault NULL. Numerical value encoding for the sites new methylation state (1, 2 or 3)
With NULL arguments, the list of rate matrices. With non NULL arguments, the corresponding rate of change.
singleStructureGenerator$get_siteR()Public Method.
singleStructureGenerator$get_siteR(index = NULL)
indexdefault NULL. Numerical value for the index of the CpG position within the singleStr instance
with NULL arguments, siteR vector. non NULL arguments, the corresponding siteR
singleStructureGenerator$get_neighbSt()Public Method.
singleStructureGenerator$get_neighbSt(index = NULL)
indexdefault NULL. Numerical value for the index of the CpG position within the singleStr instance
with NULL arguments, neighbSt vector. non NULL arguments, the corresponding neighbSt
singleStructureGenerator$update_ratetree_otherStr()Public Method. Update ratetree from another singleStructure instance
singleStructureGenerator$update_ratetree_otherStr(position, rate)
positionNumerical value for the index of the CpG position within the singleStr instance
rateRate of change to asign to that position
NULL
singleStructureGenerator$get_Qi()Public Method. Get list of matrices for SSE process
singleStructureGenerator$get_Qi(siteR = NULL, oldSt = NULL, newSt = NULL)
siteRdefault NULL. Numerical value encoding for the sites rate of independent SSE (1, 2 or 3)
oldStdefault NULL. Numerical value encoding for the sites old methylation state (1, 2 or 3)
newStdefault NULL. Numerical value encoding for the sites new methylation state (1, 2 or 3)
With NULL arguments, the list of SSEi rate matrices. With non NULL arguments, the corresponding rate of change.
singleStructureGenerator$get_seqSt_leftneighb()Public Method. Decode methylation state of left neighbor form owns neighbSt
singleStructureGenerator$get_seqSt_leftneighb(index)
indexInteger index value for the CpG position within the singleStr instance
decoded methylation state ($seq) of left neighbor (1, 2 or 3 for unmethylated, partially methylated or methylated)
singleStructureGenerator$get_seqSt_rightneighb()Public Method. Decode methylation state of left neighbor form owns neighbSt
singleStructureGenerator$get_seqSt_rightneighb(index)
indexInteger index value for the CpG position within the singleStr instance
decoded methylation state ($seq) of right neighbor (1, 2 or 3 for unmethylated, partially methylated or methylated)
singleStructureGenerator$cftp_all_equal()Public Method. Make a singleStructure with the same segment lengths and parameters as the focal one but where all states are m or u
singleStructureGenerator$cftp_all_equal(state, testing = FALSE)
stateCharacter value "U" or "M"
testingdefault FALSE. TRUE for testing output
right neighbSt
singleStructureGenerator$set_seqSt_update_neighbSt()Public Method. Set the methylation state of a sequence position and update the neighbor's neighbSt. It does NOT update RATETREE
singleStructureGenerator$set_seqSt_update_neighbSt( index, newSt, testing = FALSE )
indexNumerical value for the index of the CpG position within the singleStr instance
newStNumerical value encoding for the sites new methylation state (1, 2 or 3)
testingdefault FALSE. TRUE for testing output
NULL when testing FALSE. Testing output when testing TRUE.
singleStructureGenerator$reset_seq()Public Method. Resets the sequence states by resampling according to the instance's equilibrium frequencies.
singleStructureGenerator$reset_seq()
NULL. The sequence is updated in place.
singleStructureGenerator$clone()The objects of this class are cloneable with this method.
singleStructureGenerator$clone(deep = FALSE)
deepWhether to make a deep clone.
an R6 class representing the methylation state of GpGs in different genomic structures in the nodes of a tree.
The whole CpG sequence is an object of class combiStructureGenerator. Each genomic structure in it is contained in an object of class singleStructureGenerator.
testing_outputPublic attribute: Testing output for initialize
BranchPublic attribute: List containing objects of class combiStructureGenerator
branchLengthPublic attribute: Vector with the corresponding branch lengths of each $Branch element
treeMultiRegionSimulator$treeEvol()Simulate CpG dinucleotide methylation state evolution along a tree. The function splits a given tree and simulates evolution along its branches. It recursively simulates evolution in all of the subtrees in the given tree until the tree leafs
treeMultiRegionSimulator$treeEvol( Tree, dt = 0.01, parent_index = 1, testing = FALSE )
TreeString. Tree in Newick format. When called recursivelly it is given the corresponding subtree.
dtLength of SSE time steps.
parent_indexDefault 1. When called recursivelly it is given the corresponding parent branch index.
testingDefault FALSE. TRUE for testing purposes.
NULL
treeMultiRegionSimulator$new()Create a new treeMultiRegionSimulator object. $Branch is a list for the tree branches, its first element represents the tree root.
Note that one of either infoStr or rootData needs to be given. Not both, not neither.
treeMultiRegionSimulator$new( infoStr = NULL, rootData = NULL, tree = NULL, params = NULL, dt = 0.01, CFTP = FALSE, CFTP_step_limit = 327680000, testing = FALSE )
infoStrA data frame containing columns 'n' for the number of sites, and 'globalState' for the favoured global methylation state. If initial equilibrium frequencies are given the dataframe must contain 3 additional columns: 'u_eqFreq', 'p_eqFreq' and 'm_eqFreq'
rootDatacombiStructureGenerator object. When given, the simulation uses its parameter values.
treetree
paramsDefault NULL. When given: data frame containing model parameters. Note that if rootData is not null, its parameter values are used.
dtlength of the dt time steps for the SSE evolutionary process
CFTPDefault FALSE. TRUE for calling cftp algorithm to set root state according to model equilibrium (Note that current implementation neglects IWE process).
CFTP_step_limitwhen CFTP = TRUE, maximum number of steps before applying an approximation method (default 327680000 corresponding to size of CFTP info of approx 6.1 GB).
testingDefault FALSE. TRUE for testing output.
A new treeMultiRegionSimulator object.
treeMultiRegionSimulator$clone()The objects of this class are cloneable with this method.
treeMultiRegionSimulator$clone(deep = FALSE)
deepWhether to make a deep clone.
This function checks whether the provided input data has the required structure. It ensures that the number of tips is sufficient and that the data structure is consistent across tips and structures.
validate_data_cherryDist(cherryDist, data)validate_data_cherryDist(cherryDist, data)
cherryDist |
A data frame containing cherry pair distances, including tip indices (output from |
data |
A nested list representing structured data for each tip, following the format |
The function performs several validation steps:
Ensures that the number of tips in data is at least as large as the highest tip index in cherryDist.
Checks that all tips contain at least one structure and that the number of structures is consistent across tips.
Verifies that within each structure, all tips have the same number of sites and no zero-length structures.
If any of these conditions fail, the function throws an error with a descriptive message.
This function ensures that data follows the required nested structure
data[[tip]][[structure]], where:
data is a list of at least two tip elements.
Each tip is a list of structure elements.
Each structure contains a numeric vector of equal length across all tips.
validate_dataAcrossTips(data)validate_dataAcrossTips(data)
data |
A list structured as |
Throws errors if:
data is not a list.
It has fewer than two tips.
Any tip is not a list.
The number of structures is inconsistent across tips.
Any structure has zero-length data at any tip.
Structures have different site lengths across tips.
This function checks whether the provided indices for islands and non-islands
are within the valid range of structures in the dataset. It also warns if
any indices are present in both index_islands and index_nonislands.
validate_structureIndices(data, index_islands, index_nonislands)validate_structureIndices(data, index_islands, index_nonislands)
data |
A nested list |
index_islands |
An integer vector specifying indices that correspond to island structures. |
index_nonislands |
An integer vector specifying indices that correspond to non-island structures. |
The funct@exportion performs the following checks:
Ensures that all indices in index_islands and index_nonislands are within
the range of available structures.
Throws an error if any index is out of bounds.
Issues a warning if the same index appears in both index_islands and index_nonislands.
No return value. The function stops execution if invalid indices are detected.
This function checks whether the input is a valid phylogenetic tree, either as a character string in Newick format
or as an object of class phylo from the ape package. If the input is a Newick string, it is parsed into
a phylo object. The function also ensures that the tree contains at least two tips.
validate_tree(tree)validate_tree(tree)
tree |
A phylogenetic tree in Newick format (as a character string) or an object of class
|
The function first verifies that the input is either a valid phylo object or a character string.
If the input is a Newick string, it attempts to parse it into a phylo object using ape::read.tree().
If parsing fails, an informative error message is returned.
The function also checks that the tree contains at least two tips, as a valid phylogenetic tree should have at least one split.
A phylo object representing the validated and parsed tree.