Functions

This page provides a detailed documentation of the CLARINET functions, defined under (runClarinet).

creating ECLG from BioRECIPE

runClarinet.create_eclg(interaction_filename, model_dict)[source]

This function creates the ECLG where a node is an event (e.g., biochemical interaction) and there is an edge between two nodes (two events) if they happen to occur in the same paper. The reading output file is in BioRECIPE format.

Parameters
  • interaction_filename (str) – The path of the reading output file (extracted events)

  • model_dict (dict) – Dictionary that holds critical information of each baseline model element

Returns

G – Event CoLlaboration Graph

Return type

Graph

creating ECLG from others

runClarinet.create_eclg_el(interaction_filename, model_dict)[source]

This function creates the ECLG where a node is an event (e.g., biochemical interaction) and there is an edge between two nodes (two events) if they happen to occur in the same paper. The reading output file header must contain the following fields: Element Name, Element Type, Element Identifier, PosReg Name/Type/ID, NegReg Name/Type/ID and Paper ID.

Parameters
  • interaction_filename (str) – The path of the reading output file (extracted events)

  • model_dict (dict) – Dictionary that holds critical information of each baseline model element

Returns

G – Event CoLlaboration Graph

Return type

Graph

assigning weights to nodes

runClarinet.node_weighting(G, freqTh, path)[source]

This function assigns weights to graph nodes using frequency class, and returns a new ECLG after removing less frequent nodes. In the meantime, ECLG nodes and their freqClass level, ECLG edges before and after the removal will be saved to specified directory.

Parameters
  • G (undirected graph) – Event CoLlaboration Graph

  • freqTh (int) – Frequency class threshold value, events (nodes) having FC greater than this value will be removed

  • path (str) – The output directory where the genereted files will be saved

Returns

G – a new ECLG after the removal of less frequent nodes

Return type

undirected graph

assigning weights to edges

runClarinet.edge_weighting(G, path, weightMethod)[source]

This function assigns weights to graph edges using frequency class (FC) or inverse frequency formula (IF), and returns a weighted ECLG. In the meantime, ECLG edges and their weights will be saved to specified directory.

Parameters
  • G (undirected graph) – Event CoLlaboration Graph

  • path (str) – The output directory where the genereted files will be saved

  • weightMethod (str) – ‘FC’ or ‘IF’

Returns

G – ECLG after assigning weights to edges

Return type

undirected graph

clustering ECLG

runClarinet.clustering(G, path)[source]

This function implements three things: (1) clusters the ECLG using the community detection algorithm by Blondel et al., and returns a pickle file containing grouped (clustered) extensions, specified as nested lists. Each group starts with an integer, followed by interactions specified as [regulator element, regulated element, Interaction type: Activation (+) or Inhibition (-)]; (2) displays the cluster result; (3) saves each cluster in a separate file, in both uninterpreted (under GeneratedClusters/) and interpreted manners (under InterpretedClusters/).

Parameters
  • G (undirected graph) – Event CoLlaboration Graph

  • path (str) – The output directory where the genereted files will be saved

getting cluster information

runClarinet.get_cluster_info(generated_clu_path, LSS_file, output_path)[source]

This function returns some basic information about each of these generated clusters as a DataFrame object, as well as saves it as .csv file Information includes Cluster_index, Nodes, Edges, Density, AvgPathLength, Coeff, LSS, NodesX, EdgesX, DensityX, AvgPathLength, CoeffX, FreqClass, node_perc.

Parameters
  • generated_clu_path (str) – The directory that contains the genereted clusters

  • LSS_file (str) – The path of LSS_file, containing ECLG edges and their weights, generated in edge_weighting()

  • output_path (str) – The output directory where ClusterInfoFile.csv will be saved

Returns

cluster_df – DataFrame that contains information for each generated cluster

Return type

pandas.DataFrame()

merging clusters

runClarinet.merge_clusters(regulators, path, ReturnTh)[source]

This function records indices of clusters to be merged based on the existence of return paths. It generates the grouped_ext_Merged pickle file that contains the merged clusters.

Parameters
  • regulators (dict) – Contains baseline model elements and corresponding regulator elements

  • path (str) – The path of the directory that contains the grouped_ext file

  • ReturnTh (int) – A user-defined integer threshold for the number of return paths, beyond which clusters will be merged

loading baseline model

runClarinet.get_model(model_file: str)[source]

This function reads the baseline model of BioRECIPES format and returns two useful dictionaries

Parameters

model_file (str) – The path of the baseline model file

Returns

  • model_dict (dict) – Dictionary that holds critical information of each baseline model element

  • regulators (dict) – Contains baseline model elements and corresponding regulator elements

matching element names

runClarinet.getVariableName(model_dict, curr_map, ext_element_info)[source]

A utility function for create_eclg() and create_eclg_el(), which matches the element name from the extracted event to an element in the baseline model

Parameters
  • model_dict (dict) – Dictionary that holds critical information of each baseline model element

  • curr_map (dict) – Temporary dictionary that contains already matched pairs

  • ext_element_info (list) – List of information for certain element in the extracted event, starting with element name

Returns

match – The most likely matched element name in model_dict, to the element represented by ext_element_info; Otherwise, return the extended element name suffix by “_ext”

Return type

str

generating directed graph

runClarinet.make_diGraph(mdldict)[source]

A utility function for merge_clusters(), this function converts the baseline model into a directed graph.

Parameters

regulators (dict) – Contains baseline model elements and corresponding regulator elements

Returns

G – Directed graph of the baseline model

Return type

DiGraph()