Skip to content

FunctionLab/mahi

Repository files navigation

MAHI Logo

Multi-modal tissue-aware graph neural network for in silico genetic discovery

📄 Manuscript • 🛠️ Installation • 📦 Data • 🧪 Demo • 🧬 Embedding Generation • 🔬 Perturbation Analysis


Installation

Please clone this repository into a directory with sufficient storage space. Each functional network is approximately 22 GB, so ensure you have adequate disk capacity before downloading.

Recommended Installation (Conda)

# clone GitHub repository
git clone https://github.com/FunctionLab/mahi.git
cd mahi

# create Conda environment from YAML
conda env create -f environment.yaml
conda activate mahi

# install PyTorch Geometric dependencies
pip install torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-2.1.0+cu121.html

Manual Installation (other package managers or custom setup)

# create new Conda environment
conda create --name mahi python=3.10 pytorch=2.1 torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
conda activate mahi

# install dependencies
pip install "numpy<2"
pip install torch-geometric wandb pytorch-lightning ipykernel umap-learn biopython pyfaidx seaborn xgboost
conda install scikit-learn matplotlib pandas -c conda-forge

# install PyTorch Geometric dependencies
pip install torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-2.1.0+cu121.html

Data

Download required datasets from:

https://drive.google.com/drive/folders/1xWfPkC8bs3aQCsI6YMqYpXnSn6f6E1-B?usp=share_link

Then unzip into the repository root:

unzip <data>.zip

Demo: gene essentiality prediction

This demo runs gene essentiality prediction on one cell line to verify your set up (takes ~30 minutes depending on your setup):

# attach gene essentiality labels to Mahi demo embeddings for lung tissue
python scripts/gene_essentiality/add_labels.py \
  --mahi_root data/demo/mahi_embeddings_lung \
  --data_dir data/demo

# evaluate gene essentiality (5-fold CV + test eval)
python scripts/gene_essentiality/evaluate_mahi_gene_essentiality.py \
  --out_dir outputs/demo \
  --mahi_root data/demo/mahi_embeddings_lung \
  --mapping_file resources/cell_lines.txt \
  --cell_line ACH-000012 # cell line associated with lung tissue

Optional (HPC/SLURM)

For much faster runtime on CPUs, you can also submit the demo as a SLURM job:

sbatch demo.slurm

Outputs

outputs/demo/mahi_gene_essentiality_eval/
  ├── mahi.metrics_by_cellline_and_tissue.csv    # summary metrics on training set
  ├── cv_preds/                                  # per-gene out-of-fold predictions
  └── test_preds/                                # per-gene test predictions

Mahi: End-to-end

Mahi can be run entirely on CPU (unless you are re-training the multigraph GNN).

Generate Mahi embeddings

Processing functional networks

Please download the MAGE tissue networks from https://humanbase.io/download or using the links from the manuscript. If you are using links from the manuscript, please convert the .dab files to .dat format using Dat2Dab from Sleipnir (https://github.com/FunctionLab/sleipnir.git). If you download directly from HumanBase, no conversion is necessary.

./sleipnir/build/tools/Dat2Dab -i data/dab_networks/<data.dab> -o data/dat_networks/<data.dat>

After conversion, filter networks to the top 3% of edges (recommended on SLURM). Please make sure the downloaded .dat networks are in data/dat_networks. It takes approximately 15 minutes to process 1 network.

sbatch scripts/networks/process_networks.slurm

If you do not have SLURM, you can run the same script locally:

bash scripts/networks/process_networks.slurm

This generates filtered networks in:

data/dat_networks/*_filtered_top3.dat

Mahi embeddings for single tissue

python wt_mahi.py \
  --dir data \
  --tissue lung \
  --checkpoint checkpoints/best-checkpoint.ckpt

Multiple tissues

python wt_mahi.py \
  --dir data \
  --tissues lung heart kidney \
  --checkpoint checkpoints/best-checkpoint.ckpt

Multiple tissues from a file

tissues.txt

# tissues.txt
lung
heart
colon
python wt_mahi.py \
  --dir data \
  --tissues_txt tissues.txt \
  --checkpoint checkpoints/best-checkpoint.ckpt

Perturbation (gene KO) analysis

You can specify a single tissue (--tissue), multiple tissues (--tissues), or provide a tissue list file (--tissues_txt).

python perturb_mahi.py \
  --dir data \
  --gene <Entrez ID> \
  --tissue lung \
  --checkpoint checkpoints/best-checkpoint.ckpt

Rank perturbation effects

You can specify a single tissue (--tissue), multiple tissues (--tissues), or provide a tissue list file (--tissues_txt).

python get_top_genes.py \
  --dir data \
  --gene <Entrez ID> \
  --tissue lung \
  --avg resources/averaged_distances.csv \
  --top 1000

Citation

If you use Mahi in your research, please cite:

@article{aggarwal2026mahi,
  title   = {Multi-modal tissue-aware graph neural network for in silico genetic discovery},
  author  = {Aggarwal, Anusha and Sokolova, Ksenia and Troyanskaya, Olga G},
  journal = {bioRxiv},
  year    = {2026},
  month   = feb,
  doi     = {10.64898/2026.02.17.706433},
  url     = {https://www.biorxiv.org/content/10.64898/2026.02.17.706433v1},
}

About

AI framework for in silico genetic perturbation in tissue & cell-type context.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors