BIMP Logo
SCFBio, IIT Delhi Hauz Khas, New Delhi

BIMP Manual

The definitive guide to the Bioactivity of Indian Medicinal Plants database. Master the tools for cheminformatics, virtual screening, and network pharmacology.

1 Introduction to BIMP

The Bioactivity of Indian Medicinal Plants (BIMP) database is a centralized, high-performance cheminformatics platform developed by SCFBio at IIT Delhi. It serves as a digital bridge between the ancient wisdom of traditional Indian medicine (Ayurveda, Siddha, Unani) and the rigorous precision of modern computational drug discovery.

Unlike traditional static libraries, BIMP is a dynamic ecosystem. It hosts manually curated data on over 100,000 phytochemicals derived from 6,000+ plant species, all mapped to their specific biological targets and disease pathways.

Cheminformatics Core

BIMP treats every phytochemical as a computational object. Using the RDKit and OpenBabel backends, we perform real-time standardization and property calculation.

  • Canonical SMILES: Ensuring unique representation for every molecule.
  • 3D Generation: Automatic conversion from 2D flat structures to energy-minimized 3D conformers (.SDF).
  • Lipinski Filtering: Automatic tagging of drug-like vs. non-drug-like compounds.
  • FP Generation: Pre-calculation of Morgan Fingerprints for similarity searching.

Botanical Taxonomy (APG IV)

To solve the confusion of vernacular names, BIMP adheres to the Angiosperm Phylogeny Group (APG IV) classification system.

Example Hierarchy:
Kingdom: Plantae → Order: Lamiales → Family: Lamiaceae → Genus: Ocimum → Species: Ocimum tenuiflorum (Tulsi)

* We explicitly map common names (e.g., "Neem", "Ginger") to their validated scientific binomials to assist non-botanist users.

Targets & Disease Mapping

Chemistry is useless without biology. BIMP creates a "Tri-Partite" knowledge graph connecting:

Plant Molecule Target Protein Disease

We utilize Uniprot IDs for gene normalization, PDB IDs for 3D structures, and ICD-10 codes for disease classification.

2 System Architecture

BIMP is built on a robust, high-performance Hybrid Architecture. It combines the data integrity of a Relational Database (MySQL) with the raw speed of Flat-File Indices (NDJSON) to handle complex cheminformatics queries with sub-millisecond latency.

Full Stack Overview

Backend & Data Layer

MySQL 8.0

Storage

Relational backbone storing complex entity maps (Plant ↔ Compound ↔ Disease). Fully ACID compliant for data integrity.

NDJSON Streams

Indexing

Pre-compiled "Newline Delimited JSON" indices allow the Wild Search engine to bypass SQL overhead, scanning 100k+ records in < 20ms.

PHP 8.2

Logic

Handles API routing, session security, and acts as the process manager to spawn asynchronous Python compute jobs.

Compute Engine

Python 3.12

Workers

The brain of the operation. Executes background scripts for file parsing, grid generation, and simulation monitoring.

RDKit

Cheminformatics

Calculates ADMET properties (LogP, TPSA), generates Canonical SMILES, and performs Substructure Matching.

SEARCH-ML

Physics Engine

Powered by SEARCH-ML, this module performs rigid-body molecular screening to predict binding affinities (ΔG).

Frontend Visualization

3Dmol.js (WebGL)

Hardware-accelerated 3D rendering of PDB proteins and SDF ligands. Supports cartoons, sticks, and surface representations.

Cytoscape.js

Graph theory engine using Force-Directed algorithms (COSE) to visualize complex Plant-Target-Disease networks.

JSME Editor

Java-free molecular sketcher. Converts user drawings into SMILES strings instantly for substructure searching.

Data Flow Lifecycle

1. Request: User inputs a structure via JSME.
2. Processing: PHP sanitizes input -> Passes SMILES to Python (RDKit).
3. Execution: Python calculates Fingerprints/ADMET -> Queries Database.
4. Response: Results are formatted as JSON -> Rendered by Tailwind CSS frontend.

4 Advanced Filtering

While basic search finds "what you know," Advanced Filtering helps you find "what you need." Located in the Search Dashboard, this module allows you to mine the database based on physicochemical properties. This is critical for Lead Optimization and identifying drug-like candidates that satisfy safety profiles.

Filter Interface Live Preview

Figure 4.1: The multi-parameter slider interface.

How to Use

1

Set Ranges

Drag the sliders to define min/max values. Example: Set MW between 150 and 500 Da.

2

Combine Filters

Filters are additive (AND logic). Setting "MW < 500" AND "LogP < 5" finds compounds satisfying both.

3

Export

Once you have your filtered list, click "Download CSV" or "Download SDF" to take the data offline.

Available Parameters (12 Filters)

These properties are pre-calculated using RDKit for every molecule in the database.

Molecular Weight
0 - 1000 Da

Total mass. Lipinski Rule: < 500 Da.

LogP
-5 to +10

Lipophilicity. Key for membrane permeability.

H-Bond Donors
0 - 10

Number of NH or OH groups.

H-Bond Acceptors
0 - 20

Count of N/O atoms available for H-bonding.

TPSA
0 - 200 Ų

Polar Surface Area. Predicts gut absorption.

Rotatable Bonds
0 - 15

Flexibility. Too many reduces oral availability.

Ring Count
0 - 8

Total rings (aromatic + aliphatic).

Aromatic Rings
0 - 5

Number of benzene-like systems.

Heavy Atom Count
0 - 100

Total atoms excluding Hydrogen.

Fraction Csp3
0.0 - 1.0

Ratio of sp3 carbons. Indicates 3D complexity.

Molar Refractivity
0 - 150

Steric bulk and polarizability.

Formal Charge
-5 to +5

Net electrical charge.

Scientific Best Practice: Lipinski's Rule of 5

To find orally active drug candidates, set your filters to these values:

MW ≤ 500
LogP ≤ 5
H-Donors ≤ 5
H-Acceptors ≤ 10

5 Substructure Search

Substructure search is a precise mining tool that finds all compounds containing a specific chemical scaffold (the "Substructure") embedded within them. Unlike similarity search which looks for "cousins," this looks for "parents and children."

This is critical for Structure-Activity Relationship (SAR) studies, where you want to see how adding different functional groups to a core backbone affects biological activity.

Mechanism: Subgraph Isomorphism

Biologically, this is "Pattern Matching." Mathematically, it is a Subgraph Isomorphism problem.

BIMP translates your drawn structure into a SMARTS pattern (SMILES Arbitrary Target Specification). It then scans the database to check if your query graph exists exactly within the target molecule's graph.

Logic Example
Query
Benzene Ring
Matches
Phenol
Aspirin
Tryptophan
Non-Match
Cyclohexane
Search Interface JSME Editor
Users draw a core fragment (blue) to find complex derivatives.

Workflow & Application

1. Draw Scaffold

Open the tool. Draw the chemically active part of the molecule (pharmacophore), leaving "R-groups" empty.

2. Processing

The engine converts your drawing to a SMARTS string and performs an exact subgraph match against 100,000+ entries.

3. Mining

View results. You will find complex natural products that share your drawn core, revealing potential new sources for that chemical class.

Key Difference: Substructure vs. Similarity

Substructure Search

"I need molecules that contain exactly this ring system."
(Binary Result: Yes/No Match)

Similarity Search

"I need molecules that look like this structure."
(Fuzzy Result: 0% to 100% Score)

6 Similarity Search

The Similarity Search module allows you to find existing phytochemicals in the BIMP database that are structurally related to your query molecule. This is based on the **Structure-Activity Relationship (SAR)** principle: "Molecules with similar structures often exhibit similar biological activities."

How it Works: Molecular Fingerprints

Computers cannot "see" chemical drawings. Instead, BIMP converts every molecule into a digital barcode called a Morgan Fingerprint (ECFP4).

It looks at the circular environment around every atom (radius 2) and hashes this information into a binary string of 0s and 1s.

The Math: Tanimoto Coefficient

Score = Intersection / Union
Tc(A,B) = c / (a + b - c)

• a = Bits ON in Molecule A
• b = Bits ON in Molecule B
• c = Bits shared by BOTH (Common features)
Interface Preview

Figure 6.1: Drawing a query structure (Left) and viewing ranked results (Right).

Step-by-Step Guide

1

Draw or Paste

Use the JSME editor to draw your chemical scaffold. Alternatively, paste a SMILES string (e.g., `C1=CC=C(O)C=C1`) into the text box.

2

Set Threshold

Adjust the similarity cutoff (0.0 to 1.0). A lower threshold finds diverse structures; a higher threshold finds close analogs.

3

Analyze Results

Results are ranked by Score. Click "View Profile" to see the full bioactivity details of the found hits.

Interpreting Tanimoto Scores

Score Range Meaning Application
1.00 (100%) Exact Match Checking if a specific compound exists in BIMP.
0.85 - 0.99 High Similarity Finding direct analogs (e.g., Methyl vs Ethyl derivatives).
0.70 - 0.84 Structural Similarity Scaffold Hopping: Finding different cores with similar features.
7

Virtual Screening

Powered by SEARCH-ML Algorithm

BIMP integrates the proprietary SEARCH-ML engine to perform high-throughput molecular screening directly in the browser. It allows you to screen thousands of plant-based phytochemicals against your target protein to predict binding affinity (ΔG).

Step 1: Input Target Structure

You can provide the protein structure (Receptor) in two ways:

Option A: Upload PDB

Upload a `.pdb` file from your computer. Ensure you have removed water molecules and heteroatoms beforehand.

Option B: Fetch from RCSB

Enter a 4-letter PDB ID (e.g., 6LU7). The system will automatically download and parse the structure.

Step 2: Define Binding Site

The grid box (Search Space) defines where the simulation looks for interactions.

  • Recommended
    Select Reference Ligand:
    If your PDB contains a co-crystallized ligand, select it from the dropdown. The grid box will automatically snap to center around this ligand.

Step 3: Select Screening Library

Choose which set of molecules to dock against your target.

Specific Plant Mode

Select a single plant (e.g., Ocimum tenuiflorum). The system will screen only the phytochemicals found in that specific species.

Entire BIMP Database Mode

Screen against 100,000+ compounds. Requires Filtering.

Pre-Screening Filters (ADMET)

To save compute time, apply filters to remove non-drug-like molecules before virtual screening starts:

MW < 500 LogP < 5 H-Donors < 5

Step 4: Submit Job

Click "Start Screening". The job is sent to the SCFBio Compute Cluster. A progress bar will indicate the status of the batch processing.

Step 5: Visualize & Analyze

Once complete, results are displayed in a ranked table sorted by Binding Affinity (kcal/mol).

  • Click the headers of the result table to sort the table as per the need.
  • Download the result in csv file.

8 Network Pharmacology

Traditional medicine works via Polypharmacology—where multiple phytochemicals act synergistically on multiple targets to treat a disease. The Network Viewer uses Cytoscape.js to render these complex Plant-Compound-Target-Disease relationships as an interactive force-directed graph.

Graph Topology (The Legend)

The viewer dynamically generates nodes based on your search query. Here is how to decode the shapes and colors:

C

Compound Node

Green Circle

Target Node

Blue Triangle
D

Disease Node

Red Rectangle
Input: "Cough" → Generates Disease Node + Linked Compounds

How to use the Viewer

1. Search

Enter a Disease, Target, or Compound. The engine dynamically pulls validated relationships from the graph database.

2. Interact

Physics-based nodes "float" in space. Drag to rearrange clusters to your liking, or Scroll to zoom into dense pathways.

3. Details

Click any node to reveal metadata in the sidebar. Use the deep links provided to jump to the full Profile Page.

System Performance Note

To ensure smooth performance in your browser, the graph limits Disease-to-Compound connections to the top 30 most relevant hits per query. This prevents the browser from freezing when searching for broad terms like "Cancer".

9 BOILED-Egg Model

The Brain Or IntestinaL EstimateD permeation method (BOILED-Egg) is a graphical evaluation model used to predict two key pharmacokinetic properties simultaneously:
1. Gastrointestinal Absorption (GIA)
2. Blood-Brain Barrier (BBB) Permeation

Lipophilicity (WLOGP) ↑
Polarity (TPSA) →

How to Read the Plot

Yolk

Yellow Zone (The Yolk)

Molecules here are predicted to cross the Blood-Brain Barrier (BBB+). Ideal for CNS drugs (e.g., antidepressants).

White

White Zone

Molecules here show high Gastrointestinal Absorption (GIA+) but do not cross into the brain. Ideal for systemic drugs.

Grey

Grey Zone

Low absorption. These molecules are likely to be excreted without effect or require injection.

10 Scientific Glossary

Ligand

A substance (usually a small molecule) that forms a complex with a biomolecule to serve a biological purpose.

Receptor

A protein molecule that receives chemical signals from outside a cell.

SMILES

Simplified Molecular Input Line Entry System. A text notation for chemical structures.

In Silico

Performed on computer or via computer simulation.

Docking

A method which predicts the preferred orientation of one molecule to a second when bound to each other.

ADMET

Absorption, Distribution, Metabolism, Excretion, and Toxicity.

Tanimoto

A coefficient used to measure the similarity of two sets (chemical fingerprints).

Lipinski's Rule

A rule of thumb to evaluate drug-likeness based on molecular properties.

11 Troubleshooting & FAQ

Virtual Screening (SEARCH-ML)

Error: "Job Failed" or Immediate Crash

This is almost always caused by a "Dirty" PDB file. The engine cannot process files with:

  • Water Molecules: Lines starting with HOH.
  • Heteroatoms: Non-protein ligands or ions (e.g., SO4, ZN).
  • Missing Atoms: Broken chains in the crystal structure.
Fix: Open your PDB in PyMOL -> Action -> remove waters -> File -> Export Molecule.
Simulation hangs at "Processing..."

If the progress bar does not move for > 2 minutes, check:

  1. File Size: Ensure PDB is < 5MB. Large viral complexes take too long.
  2. Grid Box Size: If the search space (X*Y*Z) is too large (> 30,000 points), calculation time increases exponentially. Try narrowing the box to just the active site.

Visualization & Graphics

3D Molecule Viewer is Black / Blank

The viewer relies on WebGL technology. If the canvas is black:

  • Chrome: Go to Settings > System > Toggle "Use graphics acceleration when available" to ON.
  • Drivers: Ensure your GPU drivers are up to date.
  • Mobile: Some older phones block high-end WebGL rendering to save battery.
Network Graph nodes keep moving ("Wiggling")

This is normal! The Network Viewer uses a Force-Directed Layout (Physics Simulation). Nodes repel each other like magnets to find the optimal arrangement.

Tip: Once they settle, you can click and drag a node to "pin" it in place manually.

Search & Data

Search returns "0 Results"
  • Try Partial Terms: Instead of searching "Ocimum tenuiflorum", try just "Ocimum". Database spellings may vary slightly.
  • Check Synonyms: A drug might be listed under a trade name vs. chemical name (e.g., try "Curcumin" vs "Diferuloylmethane").
  • Clear Filters: Ensure you haven't left an Advanced Filter (e.g., MW < 100) active from a previous session.
Download button doesn't work

Downloads are generated dynamically as `.csv` or `.sdf` files. Some browsers block these as "Pop-ups".

Fix: Look for a "Pop-up blocked" icon in your URL bar and select "Always allow for this site".

Using BIMP in your research?

Please cite our publication to support database maintenance.

Chaurasia, D. K., Anjum, R., Sharma, A., Mishra, M., Shekhar, S., Patel, A. K., Mittal, A., & Jayaram, B. (2026). BIMP: Unveiling the phytochemical richness of Indian medicinal plants as potential therapeutic agents. In A. K. Saxena & A. Saxena (Eds.), Global trends in health, technology and management II (pp. 283–298). Springer Nature Switzerland.

https://doi.org/10.1007/978-3-032-12320-6_16