Welcome to CarveMe!¶
Contents:
Installation¶
CarveMe can be easily installed using the pip package manager:
$ pip install carveme
Additionally, you must manually install two external dependencies:
Note that you will need to register with IBM to obtain an academic license for CPLEX.
IMPORTANT: After installing CPLEX, do not forget to install the CPLEX python API (see the CPLEX documentation for details).
Everything should be ready now! See the next section for instructions on how to start carving.
Usage¶
Building a model¶
CarveMe provides a very simple command line interface to build models. The most basic utilization is:
$ carve genome.faa
This will build a genome-scale metabolic model from the genome file.
By default CarveMe expects a protein FASTA file. Alternatively, you can also provide DNA sequences instead:
$ carve --dna genome.fna
Note that raw genome files are not supported. The FASTA file must be divided into individual genes.
It is possible to specify a different name or directory for the output file:
$ carve genome.faa --output model.xml
Short version:
$ carve genome.faa -o model.xml
If you want to produce a compressed SBML file, just change the extension (this is automatically supported by libSBML):
$ carve genome.faa -o model.xml.gz
Rather than providing the genome data yourself, you can also provide an NCBI RefSeq accession code. This will automatically download the sequence and build the model:
$ carve --refseq GCF_000005845.2 -o ecoli_k12_mg1655.xml
If you have downloaded multiple genome sequences, you can run recursive mode to build multiple models in one call.
This will launch multiple parallel processes, which can decrease the overall computation time if you are running CarveMe in a multi-core CPU or in a computing cluster:
$ carve -r myfolder/*.faa
This can be combined with -o to change the output folder:
$ carve -r myfolder/*.faa -o mymodels/
Gap Filling¶
CarveMe tries to predict the uptake and secretion capabilities of an organism only from genetic evidence, and will produce a simulation-ready model without gap-filling for any particular media.
However, there are situations where you want to guarantee that the model is able to reproduce growth in one, or several, experimentally verified media.
For instance, you can ensure the model reproduces growth on M9 and LB media:
$ carve genome.faa --gapfill M9,LB
Short version:
$ carve genome.faa -g M9,LB
Please see the Advanced Usage section on how to provide your own media compositions.
If you already have a model, and you just want to gap-fill it, you can do it with the gapfill utility function:
$ gapfill model.xml -m M9 -o new_model.xml
Please note that the result is not the same if you gap-fill during reconstruction. When you gap-fill during reconstruction, the gene annotation scores are used to prioritize the reactions selected for gap-filling based on genetic evidence. If you invoke gapfill alone, all potential gap-filling reactions are treated equally.
Finally, it is important to note that the models generated with CarveMe are not initialized with any medium composition.
You can define the growth environment of the organism for simulation purposes by setting the flux bounds of the exchange reactions yourself to match the respective medium composition.
Alternatively, you can tell CarveMe you want the model to come with a pre-defined medium composition.
$ carve genome.faa --init M9
Short version:
$ carve genome.faa -i M9
Note that this will not gap-fill the model, but only define the external environment for simulation purposes.
To simultaneously gap-fill and initialize the model for a desired medium, you must combine both flags:
$ carve genome.faa -g M9 -i M9
You are now a basic user. Happy carving!
Microbial Communities¶
CarveMe enables the generation of microbial community models from single species models.
The most basic usage is:
$ merge_community organism_1.xml organism_2.xml ... organism_N.xml -o community.xml
or more simply:
$ merge_community *.xml -o community.xml
This generates an SBML file with a community where each organism is assigned to its own compartment and a common community biomass equation is also generated. You can import the merged model into any simulation tool, just as any normal constraint-based model and apply different types of simulation methods (FBA, FVA, etc…). You can initialize the community with a pre-defined medium (just like during single-species reconstruction):
$ merge_community [input files] -i M9
Advanced Usage¶
Advanced blasting¶
Gene matching (i.e. blasting) between provided genomes and our internal database is performed with diamond.
You can manually tweak the blasting options within diamond itself (at your own risk) as follows:
$ carve genome.faa --diamond-args="-e 1e-20 --top 20"
The default arguments are "--more-sensitive --top 10"
. Please see diamond’s documentation for more details.
eggNOG-mapper¶
By default, CarveMe performs gene matching by homology search using diamond. However, you can also perform orthology-based search using eggNOG-mapper.
For this you must first annotate your genome with eggNOG-mapper, and provide the output of eggNOG-mapper directly as input to CarveMe:
$ carve --egg eggnog_output.tsv
Please make sure you install eggNOG-mapper from the bigg branch:
Media database¶
CarveMe comes with a very small pre-built library of media compositions:
LB
(Lysogeny broth)LB[-O2]
(Lysogeny broth, anaerobic)M9
(Minimal M9 medium)M9[-O2]
(Minimal M9 medium, anaerobic)M9[glyc]
(Minimal M9 medium, glycerol as carbon source)
Additionally, you can provide your own media library for gap-filling:
$ carve genome.faa --gapfill X,Y,Z --mediadb mylibrary.tsv
The library must be a tab-separated file with four columns:
- medium: short id to be passed in command line (example: X)
- description: description of the medium (optional, example: Our magic X formula)
- compound: compound id (example: glc)
- name: compound name (optional, example: Glucose)
Please note that, at this moment, CarveMe only supports metabolite ids from the BiGG database.
Please feel free to contact us with suggestions of more media compositions to add to our default library.
SBML flavor¶
By default, CarveMe generates models compatible with the old cobra toolbox format. This format is outdated but is still compatible with most constraint-based modeling tools. The new format based on the sbml-fbc2 specification is also supported.
You can specify your desired SBML flavor with the following flags:
$ carve genome.faa --cobra -o model.xml
$ carve genome.faa --fbc2 -o model.xml
Ensemble modeling¶
Our model reconstruction algorithm is implemented as an MILP optimization problem. The generated model is structured according to the solution to this problem. Often, one might want to explore how alternative solutions lead to slightly different network structures, and consequently, predict different phenotypes.
CarveMe allows the generation of model ensembles. You only need to specify how many models you want to generate:
$ carve genome.faa -n 100 -o model.xml
This example would generate an ensemble of 100 models. Note that the ensemble is stored as a single SBML file, using a compact notation (binary vectors) to represent the ensemble state of each reaction.
Some utility methods to read/write and perform simulations using ensemble models are implemented in framed.
Alternative universes¶
CarveMe implements a top-down reconstruction approach that requires a well-curated universal model to be used as template for the model carving process.
Currently, you can choose between the universal bacterial template, or two templates specialized for gram-positive and gram-negative bacteria:
$ carve genome.faa -u grampos
$ carve genome.faa -u gramneg
A script with some utility functions is available to help you build your own templates. For instructions please check:
$ build_universe -h
You can then provide your own customized universe model during reconstruction:
$ carve genome.faa --universe-file yeast_universe.xml
Experimental constraints¶
When you have experimental evidence for the presence/absence of a given set of reactions, you can provide this information to improve the reconstruction process. According to the level of evidence, you can format your data as soft or hard constraints. These can be applied to any kind of reaction present in the universe model (exchange, transport or enzymatic reactions).
Soft constraints are used to change the priority given to a set of reactions, as well as their expected direction. They can be used when there is limited amount of evidence for some expected phenotype. For instance, if the organism you are reconstructing is closely related to other organisms that are known to secrete a given compound, you can include the respective exchange reaction as a soft constraint.
$ carve genome.faa --soft data.tsv
Where data.tsv is a tab-separated file with two columns, the reaction identifiers and the respective values. Each value is one of the following: 1) reaction occurs in forward direction, -1) reaction occurs in backward direction, 0) reaction does not occur.
Hard constraints are used to force the fluxes through a given set of reactions during reconstruction. They can be used when there is absolute evidence about a given phenotype. For instance, if you are reconstructing an obligatory anaerobe, you can force the oxygen uptake rate to be zero.
$ carve genome.faa --hard data.tsv
Where data.tsv is a tab-separated file with three columns, reaction identifiers, lower bounds, and upper bounds. Please use hard constraints with care, as they can make the reconstruction problem infeasible when incorrectly formulated.