Advanced Usage

Advanced blasting

Gene matching (i.e. blasting) between provided genomes and our internal database is performed with diamond.

You can manually tweak the blasting options within diamond itself (at your own risk) as follows:

$ carve genome.faa --diamond-args="-e 1e-20 --top 20"

The default arguments are "--more-sensitive --top 10". Please see diamond’s documentation for more details.

eggNOG-mapper

By default, CarveMe performs gene matching by homology search using diamond. However, you can also perform orthology-based search using eggNOG-mapper.

For this you must first annotate your genome with eggNOG-mapper, and provide the output of eggNOG-mapper directly as input to CarveMe:

$ carve --egg eggnog_output.tsv

Please make sure you install eggNOG-mapper from the bigg branch:

https://github.com/jhcepas/eggnog-mapper/tree/bigg

Media database

CarveMe comes with a very small pre-built library of media compositions:

  • LB (Lysogeny broth)
  • LB[-O2] (Lysogeny broth, anaerobic)
  • M9 (Minimal M9 medium)
  • M9[-O2] (Minimal M9 medium, anaerobic)
  • M9[glyc] (Minimal M9 medium, glycerol as carbon source)

Additionally, you can provide your own media library for gap-filling:

$ carve genome.faa --gapfill X,Y,Z --mediadb mylibrary.tsv

The library must be a tab-separated file with four columns:

  • medium: short id to be passed in command line (example: X)
  • description: description of the medium (optional, example: Our magic X formula)
  • compound: compound id (example: glc)
  • name: compound name (optional, example: Glucose)

Please note that, at this moment, CarveMe only supports metabolite ids from the BiGG database.

Please feel free to contact us with suggestions of more media compositions to add to our default library.

SBML flavor

By default, CarveMe generates models compatible with the old cobra toolbox format. This format is outdated but is still compatible with most constraint-based modeling tools. The new format based on the sbml-fbc2 specification is also supported.

You can specify your desired SBML flavor with the following flags:

$ carve genome.faa --cobra -o model.xml

$ carve genome.faa --fbc2 -o model.xml

Ensemble modeling

Our model reconstruction algorithm is implemented as an MILP optimization problem. The generated model is structured according to the solution to this problem. Often, one might want to explore how alternative solutions lead to slightly different network structures, and consequently, predict different phenotypes.

CarveMe allows the generation of model ensembles. You only need to specify how many models you want to generate:

$ carve genome.faa -n 100 -o model.xml

This example would generate an ensemble of 100 models. Note that the ensemble is stored as a single SBML file, using a compact notation (binary vectors) to represent the ensemble state of each reaction.

Some utility methods to read/write and perform simulations using ensemble models are implemented in framed.

Alternative universes

CarveMe implements a top-down reconstruction approach that requires a well-curated universal model to be used as template for the model carving process.

Currently, you can choose between the universal bacterial template, or two templates specialized for gram-positive and gram-negative bacteria:

$ carve genome.faa -u grampos
$ carve genome.faa -u gramneg

A script with some utility functions is available to help you build your own templates. For instructions please check:

$ build_universe -h

You can then provide your own customized universe model during reconstruction:

$ carve genome.faa --universe-file yeast_universe.xml

Experimental constraints

When you have experimental evidence for the presence/absence of a given set of reactions, you can provide this information to improve the reconstruction process. According to the level of evidence, you can format your data as soft or hard constraints. These can be applied to any kind of reaction present in the universe model (exchange, transport or enzymatic reactions).

Soft constraints are used to change the priority given to a set of reactions, as well as their expected direction. They can be used when there is limited amount of evidence for some expected phenotype. For instance, if the organism you are reconstructing is closely related to other organisms that are known to secrete a given compound, you can include the respective exchange reaction as a soft constraint.

$ carve genome.faa --soft data.tsv

Where data.tsv is a tab-separated file with two columns, the reaction identifiers and the respective values. Each value is one of the following: 1) reaction occurs in forward direction, -1) reaction occurs in backward direction, 0) reaction does not occur.

Hard constraints are used to force the fluxes through a given set of reactions during reconstruction. They can be used when there is absolute evidence about a given phenotype. For instance, if you are reconstructing an obligatory anaerobe, you can force the oxygen uptake rate to be zero.

$ carve genome.faa --hard data.tsv

Where data.tsv is a tab-separated file with three columns, reaction identifiers, lower bounds, and upper bounds. Please use hard constraints with care, as they can make the reconstruction problem infeasible when incorrectly formulated.