Functions
This page provides a detailed documentation of the CLARINET functions, defined under (runClarinet
).
creating ECLG from BioRECIPE
- runClarinet.create_eclg(interaction_filename, model_dict)[source]
This function creates the ECLG where a node is an event (e.g., biochemical interaction) and there is an edge between two nodes (two events) if they happen to occur in the same paper. The reading output file is in BioRECIPE format.
- Parameters
interaction_filename (str) – The path of the reading output file (extracted events)
model_dict (dict) – Dictionary that holds critical information of each baseline model element
- Returns
G – Event CoLlaboration Graph
- Return type
Graph
creating ECLG from others
- runClarinet.create_eclg_el(interaction_filename, model_dict)[source]
This function creates the ECLG where a node is an event (e.g., biochemical interaction) and there is an edge between two nodes (two events) if they happen to occur in the same paper. The reading output file header must contain the following fields: Element Name, Element Type, Element Identifier, PosReg Name/Type/ID, NegReg Name/Type/ID and Paper ID.
- Parameters
interaction_filename (str) – The path of the reading output file (extracted events)
model_dict (dict) – Dictionary that holds critical information of each baseline model element
- Returns
G – Event CoLlaboration Graph
- Return type
Graph
assigning weights to nodes
- runClarinet.node_weighting(G, freqTh, path)[source]
This function assigns weights to graph nodes using frequency class, and returns a new ECLG after removing less frequent nodes. In the meantime, ECLG nodes and their freqClass level, ECLG edges before and after the removal will be saved to specified directory.
- Parameters
G (undirected graph) – Event CoLlaboration Graph
freqTh (int) – Frequency class threshold value, events (nodes) having FC greater than this value will be removed
path (str) – The output directory where the genereted files will be saved
- Returns
G – a new ECLG after the removal of less frequent nodes
- Return type
undirected graph
assigning weights to edges
- runClarinet.edge_weighting(G, path, weightMethod)[source]
This function assigns weights to graph edges using frequency class (FC) or inverse frequency formula (IF), and returns a weighted ECLG. In the meantime, ECLG edges and their weights will be saved to specified directory.
- Parameters
G (undirected graph) – Event CoLlaboration Graph
path (str) – The output directory where the genereted files will be saved
weightMethod (str) – ‘FC’ or ‘IF’
- Returns
G – ECLG after assigning weights to edges
- Return type
undirected graph
clustering ECLG
- runClarinet.clustering(G, path)[source]
This function implements three things: (1) clusters the ECLG using the community detection algorithm by Blondel et al., and returns a pickle file containing grouped (clustered) extensions, specified as nested lists. Each group starts with an integer, followed by interactions specified as [regulator element, regulated element, Interaction type: Activation (+) or Inhibition (-)]; (2) displays the cluster result; (3) saves each cluster in a separate file, in both uninterpreted (under GeneratedClusters/) and interpreted manners (under InterpretedClusters/).
- Parameters
G (undirected graph) – Event CoLlaboration Graph
path (str) – The output directory where the genereted files will be saved
getting cluster information
- runClarinet.get_cluster_info(generated_clu_path, LSS_file, output_path)[source]
This function returns some basic information about each of these generated clusters as a DataFrame object, as well as saves it as .csv file Information includes Cluster_index, Nodes, Edges, Density, AvgPathLength, Coeff, LSS, NodesX, EdgesX, DensityX, AvgPathLength, CoeffX, FreqClass, node_perc.
- Parameters
generated_clu_path (str) – The directory that contains the genereted clusters
LSS_file (str) – The path of LSS_file, containing ECLG edges and their weights, generated in edge_weighting()
output_path (str) – The output directory where ClusterInfoFile.csv will be saved
- Returns
cluster_df – DataFrame that contains information for each generated cluster
- Return type
pandas.DataFrame()
merging clusters
- runClarinet.merge_clusters(regulators, path, ReturnTh)[source]
This function records indices of clusters to be merged based on the existence of return paths. It generates the grouped_ext_Merged pickle file that contains the merged clusters.
- Parameters
regulators (dict) – Contains baseline model elements and corresponding regulator elements
path (str) – The path of the directory that contains the grouped_ext file
ReturnTh (int) – A user-defined integer threshold for the number of return paths, beyond which clusters will be merged
loading baseline model
- runClarinet.get_model(model_file: str)[source]
This function reads the baseline model of BioRECIPES format and returns two useful dictionaries
- Parameters
model_file (str) – The path of the baseline model file
- Returns
model_dict (dict) – Dictionary that holds critical information of each baseline model element
regulators (dict) – Contains baseline model elements and corresponding regulator elements
matching element names
- runClarinet.getVariableName(model_dict, curr_map, ext_element_info)[source]
A utility function for create_eclg() and create_eclg_el(), which matches the element name from the extracted event to an element in the baseline model
- Parameters
model_dict (dict) – Dictionary that holds critical information of each baseline model element
curr_map (dict) – Temporary dictionary that contains already matched pairs
ext_element_info (list) – List of information for certain element in the extracted event, starting with element name
- Returns
match – The most likely matched element name in model_dict, to the element represented by ext_element_info; Otherwise, return the extended element name suffix by “_ext”
- Return type
str
generating directed graph
- runClarinet.make_diGraph(mdldict)[source]
A utility function for merge_clusters(), this function converts the baseline model into a directed graph.
- Parameters
regulators (dict) – Contains baseline model elements and corresponding regulator elements
- Returns
G – Directed graph of the baseline model
- Return type
DiGraph()