1 Introduction to BIMP
The Bioactivity of Indian Medicinal Plants (BIMP) database is a centralized, high-performance cheminformatics platform developed by SCFBio at IIT Delhi. It serves as a digital bridge between the ancient wisdom of traditional Indian medicine (Ayurveda, Siddha, Unani) and the rigorous precision of modern computational drug discovery.
Unlike traditional static libraries, BIMP is a dynamic ecosystem. It hosts manually curated data on over 100,000 phytochemicals derived from 6,000+ plant species, all mapped to their specific biological targets and disease pathways.
Cheminformatics Core
BIMP treats every phytochemical as a computational object. Using the RDKit and OpenBabel backends, we perform real-time standardization and property calculation.
- Canonical SMILES: Ensuring unique representation for every molecule.
- 3D Generation: Automatic conversion from 2D flat structures to energy-minimized 3D conformers (.SDF).
- Lipinski Filtering: Automatic tagging of drug-like vs. non-drug-like compounds.
- FP Generation: Pre-calculation of Morgan Fingerprints for similarity searching.
Botanical Taxonomy (APG IV)
To solve the confusion of vernacular names, BIMP adheres to the Angiosperm Phylogeny Group (APG IV) classification system.
Kingdom: Plantae → Order: Lamiales → Family: Lamiaceae → Genus: Ocimum → Species: Ocimum tenuiflorum (Tulsi)
* We explicitly map common names (e.g., "Neem", "Ginger") to their validated scientific binomials to assist non-botanist users.
Targets & Disease Mapping
Chemistry is useless without biology. BIMP creates a "Tri-Partite" knowledge graph connecting:
We utilize Uniprot IDs for gene normalization, PDB IDs for 3D structures, and ICD-10 codes for disease classification.
Integrated Computational Suite
BIMP moves beyond being a static repository by offering active research tools directly in the browser:
2 System Architecture
BIMP is built on a robust, high-performance Hybrid Architecture. It combines the data integrity of a Relational Database (MySQL) with the raw speed of Flat-File Indices (NDJSON) to handle complex cheminformatics queries with sub-millisecond latency.
Backend & Data Layer
MySQL 8.0
StorageRelational backbone storing complex entity maps (Plant ↔ Compound ↔ Disease). Fully ACID compliant for data integrity.
NDJSON Streams
IndexingPre-compiled "Newline Delimited JSON" indices allow the Wild Search engine to bypass SQL overhead, scanning 100k+ records in < 20ms.
PHP 8.2
LogicHandles API routing, session security, and acts as the process manager to spawn asynchronous Python compute jobs.
Compute Engine
Python 3.12
WorkersThe brain of the operation. Executes background scripts for file parsing, grid generation, and simulation monitoring.
RDKit
CheminformaticsCalculates ADMET properties (LogP, TPSA), generates Canonical SMILES, and performs Substructure Matching.
SEARCH-ML
Physics EnginePowered by SEARCH-ML, this module performs rigid-body molecular screening to predict binding affinities (ΔG).
Frontend Visualization
3Dmol.js (WebGL)
Hardware-accelerated 3D rendering of PDB proteins and SDF ligands. Supports cartoons, sticks, and surface representations.
Cytoscape.js
Graph theory engine using Force-Directed algorithms (COSE) to visualize complex Plant-Target-Disease networks.
JSME Editor
Java-free molecular sketcher. Converts user drawings into SMILES strings instantly for substructure searching.
Data Flow Lifecycle
1. Request: User inputs a structure via JSME.
2. Processing: PHP sanitizes input -> Passes SMILES to Python (RDKit).
3. Execution: Python calculates Fingerprints/ADMET -> Queries Database.
4. Response: Results are formatted as JSON -> Rendered by Tailwind CSS frontend.
3 Wild Search (Global Command Palette)
The Wild Search is the central nervous system of BIMP. Unlike specific database queries that look at one table at a time, Wild Search uses pre-compiled **NDJSON Indices** to scan the entire ecosystem instantly. Think of it as the "Google" for your specific research environment.
You don't need to click the header. Access the search bar from anywhere on the site instantly.
What can you find?
The engine performs a "Fuzzy Search" across 5 distinct layers of data simultaneously:
Figure 3.1: The Command Palette showing categorized results for query "Cough".
4 Advanced Filtering
While basic search finds "what you know," Advanced Filtering helps you find "what you need." Located in the Search Dashboard, this module allows you to mine the database based on physicochemical properties. This is critical for Lead Optimization and identifying drug-like candidates that satisfy safety profiles.
Figure 4.1: The multi-parameter slider interface.
How to Use
Set Ranges
Drag the sliders to define min/max values. Example: Set MW between 150 and 500 Da.
Combine Filters
Filters are additive (AND logic). Setting "MW < 500" AND "LogP < 5" finds compounds satisfying both.
Export
Once you have your filtered list, click "Download CSV" or "Download SDF" to take the data offline.
Available Parameters (12 Filters)
These properties are pre-calculated using RDKit for every molecule in the database.
Total mass. Lipinski Rule: < 500 Da.
Lipophilicity. Key for membrane permeability.
Number of NH or OH groups.
Count of N/O atoms available for H-bonding.
Polar Surface Area. Predicts gut absorption.
Flexibility. Too many reduces oral availability.
Total rings (aromatic + aliphatic).
Number of benzene-like systems.
Total atoms excluding Hydrogen.
Ratio of sp3 carbons. Indicates 3D complexity.
Steric bulk and polarizability.
Net electrical charge.
Scientific Best Practice: Lipinski's Rule of 5
To find orally active drug candidates, set your filters to these values:
5 Substructure Search
Substructure search is a precise mining tool that finds all compounds containing a specific chemical scaffold (the "Substructure") embedded within them. Unlike similarity search which looks for "cousins," this looks for "parents and children."
This is critical for Structure-Activity Relationship (SAR) studies, where you want to see how adding different functional groups to a core backbone affects biological activity.
Mechanism: Subgraph Isomorphism
Biologically, this is "Pattern Matching." Mathematically, it is a Subgraph Isomorphism problem.
BIMP translates your drawn structure into a SMARTS pattern (SMILES Arbitrary Target Specification). It then scans the database to check if your query graph exists exactly within the target molecule's graph.
Logic Example
Aspirin
Tryptophan
Workflow & Application
1. Draw Scaffold
Open the tool. Draw the chemically active part of the molecule (pharmacophore), leaving "R-groups" empty.
2. Processing
The engine converts your drawing to a SMARTS string and performs an exact subgraph match against 100,000+ entries.
3. Mining
View results. You will find complex natural products that share your drawn core, revealing potential new sources for that chemical class.
Key Difference: Substructure vs. Similarity
"I need molecules that contain exactly this ring system."
(Binary Result: Yes/No Match)
"I need molecules that look like this structure."
(Fuzzy Result: 0% to 100% Score)
6 Similarity Search
The Similarity Search module allows you to find existing phytochemicals in the BIMP database that are structurally related to your query molecule. This is based on the **Structure-Activity Relationship (SAR)** principle: "Molecules with similar structures often exhibit similar biological activities."
How it Works: Molecular Fingerprints
Computers cannot "see" chemical drawings. Instead, BIMP converts every molecule into a digital barcode called a Morgan Fingerprint (ECFP4).
It looks at the circular environment around every atom (radius 2) and hashes this information into a binary string of 0s and 1s.
The Math: Tanimoto Coefficient
• a = Bits ON in Molecule A
• b = Bits ON in Molecule B
• c = Bits shared by BOTH (Common features)
Figure 6.1: Drawing a query structure (Left) and viewing ranked results (Right).
Step-by-Step Guide
Draw or Paste
Use the JSME editor to draw your chemical scaffold. Alternatively, paste a SMILES string (e.g., `C1=CC=C(O)C=C1`) into the text box.
Set Threshold
Adjust the similarity cutoff (0.0 to 1.0). A lower threshold finds diverse structures; a higher threshold finds close analogs.
Analyze Results
Results are ranked by Score. Click "View Profile" to see the full bioactivity details of the found hits.
Interpreting Tanimoto Scores
| Score Range | Meaning | Application |
|---|---|---|
| 1.00 (100%) | Exact Match | Checking if a specific compound exists in BIMP. |
| 0.85 - 0.99 | High Similarity | Finding direct analogs (e.g., Methyl vs Ethyl derivatives). |
| 0.70 - 0.84 | Structural Similarity | Scaffold Hopping: Finding different cores with similar features. |
Virtual Screening
Powered by SEARCH-ML AlgorithmBIMP integrates the proprietary SEARCH-ML engine to perform high-throughput molecular screening directly in the browser. It allows you to screen thousands of plant-based phytochemicals against your target protein to predict binding affinity (ΔG).
Step 1: Input Target Structure
You can provide the protein structure (Receptor) in two ways:
Upload a `.pdb` file from your computer. Ensure you have removed water molecules and heteroatoms beforehand.
Enter a 4-letter PDB ID (e.g., 6LU7). The system will automatically download and parse the structure.
Step 2: Define Binding Site
The grid box (Search Space) defines where the simulation looks for interactions.
-
Recommended
Select Reference Ligand:
If your PDB contains a co-crystallized ligand, select it from the dropdown. The grid box will automatically snap to center around this ligand.
Step 3: Select Screening Library
Choose which set of molecules to dock against your target.
Specific Plant Mode
Select a single plant (e.g., Ocimum tenuiflorum). The system will screen only the phytochemicals found in that specific species.
Entire BIMP Database Mode
Screen against 100,000+ compounds. Requires Filtering.
To save compute time, apply filters to remove non-drug-like molecules before virtual screening starts:
Step 4: Submit Job
Click "Start Screening". The job is sent to the SCFBio Compute Cluster. A progress bar will indicate the status of the batch processing.
Step 5: Visualize & Analyze
Once complete, results are displayed in a ranked table sorted by Binding Affinity (kcal/mol).
- Click the headers of the result table to sort the table as per the need.
- Download the result in csv file.
8 Network Pharmacology
Traditional medicine works via Polypharmacology—where multiple phytochemicals act synergistically on multiple targets to treat a disease. The Network Viewer uses Cytoscape.js to render these complex Plant-Compound-Target-Disease relationships as an interactive force-directed graph.
Graph Topology (The Legend)
The viewer dynamically generates nodes based on your search query. Here is how to decode the shapes and colors:
Compound Node
Target Node
Disease Node
How to use the Viewer
1. Search
Enter a Disease, Target, or Compound. The engine dynamically pulls validated relationships from the graph database.
2. Interact
Physics-based nodes "float" in space. Drag to rearrange clusters to your liking, or Scroll to zoom into dense pathways.
3. Details
Click any node to reveal metadata in the sidebar. Use the deep links provided to jump to the full Profile Page.
To ensure smooth performance in your browser, the graph limits Disease-to-Compound connections to the top 30 most relevant hits per query. This prevents the browser from freezing when searching for broad terms like "Cancer".
9 BOILED-Egg Model
The Brain Or IntestinaL EstimateD permeation method (BOILED-Egg) is a graphical evaluation model used to predict two key pharmacokinetic properties simultaneously:
1. Gastrointestinal Absorption (GIA)
2. Blood-Brain Barrier (BBB) Permeation
How to Read the Plot
Yellow Zone (The Yolk)
Molecules here are predicted to cross the Blood-Brain Barrier (BBB+). Ideal for CNS drugs (e.g., antidepressants).
White Zone
Molecules here show high Gastrointestinal Absorption (GIA+) but do not cross into the brain. Ideal for systemic drugs.
Grey Zone
Low absorption. These molecules are likely to be excreted without effect or require injection.
10 Scientific Glossary
Ligand
A substance (usually a small molecule) that forms a complex with a biomolecule to serve a biological purpose.
Receptor
A protein molecule that receives chemical signals from outside a cell.
SMILES
Simplified Molecular Input Line Entry System. A text notation for chemical structures.
In Silico
Performed on computer or via computer simulation.
Docking
A method which predicts the preferred orientation of one molecule to a second when bound to each other.
ADMET
Absorption, Distribution, Metabolism, Excretion, and Toxicity.
Tanimoto
A coefficient used to measure the similarity of two sets (chemical fingerprints).
Lipinski's Rule
A rule of thumb to evaluate drug-likeness based on molecular properties.
11 Troubleshooting & FAQ
Virtual Screening (SEARCH-ML)
Error: "Job Failed" or Immediate Crash
This is almost always caused by a "Dirty" PDB file. The engine cannot process files with:
- Water Molecules: Lines starting with
HOH. - Heteroatoms: Non-protein ligands or ions (e.g.,
SO4,ZN). - Missing Atoms: Broken chains in the crystal structure.
Simulation hangs at "Processing..."
If the progress bar does not move for > 2 minutes, check:
- File Size: Ensure PDB is < 5MB. Large viral complexes take too long.
- Grid Box Size: If the search space (X*Y*Z) is too large (> 30,000 points), calculation time increases exponentially. Try narrowing the box to just the active site.
Visualization & Graphics
3D Molecule Viewer is Black / Blank
The viewer relies on WebGL technology. If the canvas is black:
- Chrome: Go to Settings > System > Toggle "Use graphics acceleration when available" to ON.
- Drivers: Ensure your GPU drivers are up to date.
- Mobile: Some older phones block high-end WebGL rendering to save battery.
Network Graph nodes keep moving ("Wiggling")
This is normal! The Network Viewer uses a Force-Directed Layout (Physics Simulation). Nodes repel each other like magnets to find the optimal arrangement.
Tip: Once they settle, you can click and drag a node to "pin" it in place manually.
Search & Data
Search returns "0 Results"
- Try Partial Terms: Instead of searching "Ocimum tenuiflorum", try just "Ocimum". Database spellings may vary slightly.
- Check Synonyms: A drug might be listed under a trade name vs. chemical name (e.g., try "Curcumin" vs "Diferuloylmethane").
- Clear Filters: Ensure you haven't left an Advanced Filter (e.g., MW < 100) active from a previous session.
Download button doesn't work
Downloads are generated dynamically as `.csv` or `.sdf` files. Some browsers block these as "Pop-ups".
Fix: Look for a "Pop-up blocked" icon in your URL bar and select "Always allow for this site".
Using BIMP in your research?
Please cite our publication to support database maintenance.
Chaurasia, D. K., Anjum, R., Sharma, A., Mishra, M., Shekhar, S., Patel, A. K., Mittal, A., & Jayaram, B. (2026). BIMP: Unveiling the phytochemical richness of Indian medicinal plants as potential therapeutic agents. In A. K. Saxena & A. Saxena (Eds.), Global trends in health, technology and management II (pp. 283–298). Springer Nature Switzerland.
https://doi.org/10.1007/978-3-032-12320-6_16