Usage ===== Installation ************ iMATpy can be installed with pip: ``pip install imatpy`` iMATpy requires an optimizer to be installed. The default optimizer is ``glpk`` which is installed by ``COBRApy``. This package can also be used with Gurobi_ or CPLEX_ which must be installed seperately. Both ``cplex`` and ``gurobi`` have free academic licenses. Once installed, the optimizer can be changed by setting the solver property of a model, and the default can be changed via ``COBRApy`` config, by changing the ``solver`` property of the ``cobra.core.configuration.Configuration()`` object (see `cobrapy documentation`_). .. _Gurobi: https://www.gurobi.com/ .. _CPLEX: https://www.ibm.com/products/ilog-cplex-optimization-studio/cplex-optimizer .. _cobrapy documentation: https://cobrapy.readthedocs.io/en/latest/ Package Structure ***************** The package is divided into four main submodules, ``imat``, ``model_creation``, ``parse_gpr``, and ``model_utils``. - ``imat``: submodule containing functions for running the iMAT algorithm and generating flux distributions. - ``model_creation``: submodule contains functions for creating models based on iMAT solutions. Doesn't require using any of the `imat` functions seperately, as they are called internally as needed for the different model creation methods. - ``parse_gpr``: submodule contains functions for using the gene protein reaction rules in a model for converting gene expression weights into reaction weights. The gene expression data should already be converted into weights before using these functions (and so should only contain -1, 0, and 1). The functions assume a single series to be used, so to use multiple observations, the data should be combined (e.g. by taking the mean or median) before using these functions. - ``model_utils``: submodule contains various utility functions for working with models. This includes helper functions for reading a writing models to various formats, as well as functions for checking if two models are equivalent. Workflow ******** 1. Normalize gene expression data for sequencing depth and gene length (TPM or RPKM both work well) 2. Quantile normalize the gene expression data (for example with the `qnorm package`_) (This is optional, but can help make the results less sensitive to parameters) 3. Combine gene expression values accross samples, for example by taking the mean or median 4. Convert the gene expression data into qualitative weights, of -1, 0, or 1, where -1 represents low expression, 1 represents high expression, and 0 represents all other genes. This can be done by setting the bottom 15% of genes to -1, the top 15% of genes to 1, and all other genes to 0. Other percentiles can be used, but 15% is a decent starting point. 5. Using the `parse_gpr` submodule, specifically the `gene_to_rxn_weights` function, convert the gene weights into reaction weights using the GPR rules in the model. 6. Either generate a flux distribution with the iMAT algorithm, or construct a new model with restrictions based on iMAT results. - To generate a flux distribution, use the `imatpy.imat` submodule, specifically the `imat` function. This function takes a model, and a series of reaction weights, and returns a flux distribution which maximizes the sum of reactions with high expression which have a flux above `epsilon`, and reactions with low expression which have a flux below `threshold`. - To generate a model with restrictions based on the iMAT results, use the `model_creation` submodule. It includes a variety of different methods for generating models based on iMAT, with a wrapper function `generate_model` for convinience. .. _qnorm package: https://pypi.org/project/qnorm/ Parse GPR ********* The ``parse_gpr`` submodule has functions for applying the gene-protien-reaction (GPR) rules in a model to gene weights in order to convert gene weights into reaction weights which can act as input to the iMAT algorithm. The input to this method is a pandas Series with the index being the gene identifiers, and the values being the gene weights. The output is a pandas Series with the index being the reaction identifiers, and the values being the reaction weights. This method only handles a single series, so the gene expression data must be processed into single weights (with -1 representing lowly expressed genes, 1 representing highly expressed genes, and 0 for all other genes). Here is an example of how to use this method: .. exec_code:: # External imports from cobra.core.configuration import Configuration import pandas as pd # iMATpy imports from imatpy.model_utils import read_model from imatpy.parse_gpr import gene_to_rxn_weights # Set the solver to glpk Configuration().solver = "glpk" # Read in the model model = read_model("./tests/data/test_model.xml") # Create a pandas Series representing gene expression weights model_weights = pd.Series({ "g_A_imp": 1, "g_B_imp": -1, "g_C_imp": -1, "g_F_exp": 0, "g_G_exp": -1, "g_H_exp": 0, "g_A_B_D_E": 0, "g_C_E_F": -1, "g_C_H": 0, "g_D_G": 1, }) # Convert the gene weights into reaction weights reaction_weights = gene_to_rxn_weights(model=model, gene_weights=model_weights) # Print the reaction weights print(reaction_weights) iMAT methods ************ The ``imat`` submodule contains functions for running the iMAT algorithm. The main function is ``imat``, which takes a model, and a series of reaction weights, and returns a flux distribution which maximizes the sum of reactions with high expression which have a flux above ``epsilon``, and reactions with low expression which have a flux below ``threshold``. Here is an example of how to use this method: .. exec_code:: # External imports from cobra.core.configuration import Configuration import pandas as pd # iMATpy imports from imatpy.model_utils import read_model from imatpy.imat import imat # Set the solver to glpk Configuration().solver = "glpk" # Read in the model model = read_model("./tests/data/test_model.xml") # Read in the reaction weights rxn_weights = pd.read_csv("./tests/data/test_model_reaction_weights.csv", index_col=0).squeeze() # Run iMAT imat_results = imat(model=model, rxn_weights=rxn_weights, epsilon=1, threshold=0.01) # Print the imat objective print(f"iMAT Objective: {imat_results.objective_value}") # Print the imat flux distribution print(f"iMAT Flux Distribution: \n{imat_results.fluxes}") Model Creation ************** The ``model_creation`` submodule contains functions for creating new models based on the results of iMAT. The main function is ``generate_model``, which takes a model, and a series of reaction weights, and returns a new model with restrictions based on the iMAT results. The available methods for creating a model based on an iMAT flux distribution is: ``imat_restrictions`` Adds the binary variables and constraints used in the iMAT algorithm, as well as an additional constraint ensuring that the flux distribution is within tolerance of the optimal iMAT objective value. This method stays closest to the iMAT objective, but the included indicator (binary) variables mean that is unsuitable for sampling. ``simple_bounds`` Adds bounds on the reactions found to be "on", and "off" in iMAT. For all the highly expressed reactions found to be "on", the flux is constrained to be at least ``epsilon``. For all the lowly expressed reactions found to be "off", the flux is constrained to be below ``threshold``. ``subset`` Removes reactions from the model which are found to be "off". For all the lowly expressed reactions found to be off, they are constrained to have a flux below ``threshold``. ``fva`` Finds bounds using an FVA like approach. A temporary model is created in a simmilar way to the ``imat_restrictions`` method above, which includes the imat variables, constraints, and which also constrains the flux distribution to be near optimal for iMAT. The maximum and minimum fluxes allowed through each reaction (while still maintaining the optimal iMAT objective) is found. These values are used as the new reaction bounds. It should be noted, that although the individual upper and lower bounds for the reaction are achievable for each reation while being consistant with the optimal iMAT objective, this doesn't guarantee that the flux distribution overall is consistant with the optimal iMAT objective. ``milp`` Uses a set of mixed integer linear programs to find whether a reaction should be forced off, forward, or reverse. Each reaction in turn is forced to be off, active in the forward direction, and active in the reverse direction, and the iMAT objective is maximized. Whether a reaction should be forced off, or active in either the forward or reverse direction is then determined by which direction maximizes the iMAT objective. Again, it should be noted that this doesn't guarantee that the iMAT objective is overall maximized by solutions to this model. Here is an example of how to use this method: .. exec_code:: # External imports from cobra.core.configuration import Configuration import pandas as pd # iMATpy imports from imatpy.model_utils import read_model from imatpy.model_creation import generate_model # Set the solver to glpk Configuration().solver = "glpk" # Read in the model model = read_model("./tests/data/test_model.xml") # Read in the reaction weights rxn_weights = pd.read_csv("./tests/data/test_model_reaction_weights.csv", index_col=0).squeeze() # Generate a new model based on the iMAT results new_model = generate_model(model=model, rxn_weights=rxn_weights, method="fva") Model Utils *********** The ``model_utils`` submodule contains several utility functions for working with COBRApy models. Specifically, it contains functions for: - Reading and writing models in various formats with a single function, specifically ``read_model``/``write_model``. - Determining if two models are equivalent, using ``model_eq``. Here is an example of how to use the model IO methods: .. code-block::python # iMATpy imports from imatpy.model_utils import read_model, write_model, model_eq # You can read in a model from a file model = read_model("./tests/data/test_model.xml") # in SBML model = read_model("./tests/data/test_model.json") # in JSON model = read_model("./tests/data/test_model.yml") # in YAML model = read_model("./tests/data/test_model.mat") # in Matlab # You can also write a model to a file write_model(model, "./tests/data/test_model.xml") # in SBML write_model(model, "./tests/data/test_model.json") # in JSON write_model(model, "./tests/data/test_model.yml") # in YAML write_model(model, "./tests/data/test_model.mat") # in Matlab Here is an example of using the model comparison method: .. exec_code:: # External imports from cobra.core.configuration import Configuration # iMATpy imports from imatpy.model_utils import read_model, model_eq # Set the solver to glpk Configuration().solver = "glpk" # Read a model model = read_model("./tests/data/test_model.xml") # Create a copy of the model model_copy = model.copy() # Check that the models are equivalent print(f"Models are equivalent: {model_eq(model, model_copy)}") # Change the copy model model_copy.reactions.get_by_id("r_A_B_D_E").lower_bound = -314 # Check that the models are no longer equivalent print(f"Models are equivalent: {model_eq(model, model_copy)}")