Cursor examples +/- MCP
This document contains Markdown content generated by Cursor when either intentionally using, or intentionally ignoring the Napistu Model Context Protocol (MCP) server.
Cursor without MCP
Can you help me extend the
understanding_sbml_dfs.ipynb
tutorial to flesh out the “from_edgelist” workflow and to include any recent updates to the core data structure? THINK DEEPLY WITHOUT USING THE NAPISTU MCP. Since you’ll have trouble directly editing the ipynb, please suggest what I should incorporate in a separate markdown file. Edit for readability and to prioritize high value content. Limit the total content to less than 30 new sentences.
From Edgelist Workflow
The from_edgelist
method creates SBML_dfs
objects from interaction data without requiring SBML files. This workflow integrates data from protein-protein interaction databases, regulatory networks, and custom pathway definitions.
Required Edgelist Structure
The interaction edgelist must contain these required columns:
upstream_name
: Name of the upstream molecular species (must matchs_name
in species_df)downstream_name
: Name of the downstream molecular species (must matchs_name
in species_df)r_name
: Name for the interaction/reactionr_Identifiers
: Supporting identifiers as anIdentifiers
object
Optional Columns with Defaults
These columns are optional and get sensible defaults if missing:
upstream_compartment
: Compartment for upstream species (default: “cellular component”)downstream_compartment
: Compartment for downstream species (default: “cellular component”)upstream_sbo_term_name
: SBO term defining upstream role (default: “modifier”)downstream_sbo_term_name
: SBO term defining downstream role (default: “modified”)upstream_stoichiometry
: Stoichiometry of upstream species (default: 0)downstream_stoichiometry
: Stoichiometry of downstream species (default: 0)r_isreversible
: Whether reaction is reversible (default: False)
Basic Example
import pandas as pd
from napistu import sbml_dfs_core, identifiers, source
# Define species and compartments
species_df = pd.DataFrame({
's_name': ['TP53', 'MDM2'],
's_Identifiers': [
identifiers.Identifiers([{'ontology': 'uniprot', 'identifier': 'P04637'}]),
identifiers.Identifiers([{'ontology': 'uniprot', 'identifier': 'Q00987'}])
]
})
compartments_df = pd.DataFrame({
'c_name': ['nucleus'],
'c_Identifiers': [identifiers.Identifiers([{'ontology': 'go', 'identifier': 'GO:0005634'}])]
})
# Create interaction edgelist
interaction_edgelist = pd.DataFrame({
'upstream_name': ['TP53'],
'downstream_name': ['MDM2'],
'r_name': ['TP53_binds_MDM2'],
'r_Identifiers': [identifiers.Identifiers([{'ontology': 'pubmed', 'identifier': '12345678'}])]
})
# Create SBML_dfs
model_source = source.Source.single_entry(
model='custom_pathway', pathway_id='p53_network',
data_source='literature', organismal_species='Homo sapiens',
name='Custom p53 network', date='20250101'
)
sbml_dfs = sbml_dfs_core.SBML_dfs.from_edgelist(
interaction_edgelist=interaction_edgelist,
species_df=species_df, compartments_df=compartments_df,
model_source=model_source
)
Recent Core Data Structure Updates
Enhanced Validation
The SBML_dfs
class now includes improved validation with automatic issue resolution:
# Validate and automatically fix common issues
sbml_dfs.validate_and_resolve()
# Check validation status
if sbml_dfs.validate():
print("Model is valid")
New Features
Recent updates provide computed features for analysis:
# Get species features including inferred types
species_features = sbml_dfs.get_species_features()
# Get compartmentalized species features including network properties
cspecies_features = sbml_dfs.get_cspecies_features()
# Generate human-readable reaction formulas
reaction_formulas = sbml_dfs.reaction_formulas()
Data Integration
New capabilities for integrating external data:
# Add species-level data with validation
expression_data = pd.DataFrame({
'expression_level': [10.5, 8.2],
'cell_line': ['HeLa', 'HeLa']
}, index=['TP53', 'MDM2'])
sbml_dfs.add_species_data('expression', expression_data)
Best Practices
- Ensure species names match exactly between edgelist and species_df
- Use meaningful reaction names that describe the interaction type
- Provide comprehensive identifiers for better integration capabilities
- Set appropriate SBO terms to capture biological meaning
- Validate your data before creating the SBML_dfs object
Common Pitfalls
- Missing compartments: Use
sbml_dfs_utils.stub_compartments()
for generic models - Inconsistent species names: Ensure exact string matches across all tables
- Invalid SBO terms: Use terms from the controlled vocabulary in
SBOTERM_NAMES
- Missing identifiers: Create empty
Identifiers([])
objects if no IDs available
Cursor with MCP
Can you help me extend the
understanding_sbml_dfs.ipynb
tutorial to flesh out the “from_edgelist” workflow and to include any recent updates to the core data structure? THINK DEEPLY AND USE THE NAPISTU MCP AS NEEDED. Since you’ll have trouble directly editing the ipynb, please suggest what I should incorporate in a separate markdown file. Edit for readability and to prioritize high value content. Limit the total content to less than 30 new sentences.
From Edgelist Workflow
The from_edgelist
workflow provides a powerful way to create SBML_dfs models from simple interaction tables. This approach is ideal when you have molecular interaction data in tabular format rather than SBML files.
Required Input Structure:
- interaction_edgelist: DataFrame with columns for upstream/downstream species, compartments, SBO terms, stoichiometry, and reversibility
- species_df: Table defining molecular species with names and identifiers
- compartments_df: Table defining cellular compartments with names and identifiers
Key Features:
- Automatic ID generation and validation
- SBO term inference for reaction roles
- Compartmentalized species creation
- Support for preserving additional data columns
- Built-in validation against SBML_dfs schema
Example Usage:
from napistu.sbml_dfs_core import SBML_dfs
from napistu.source import Source
# Create from edgelist
sbml_dfs = SBML_dfs.from_edgelist(
interaction_edgelist=interactions_df,
species_df=species_df,
compartments_df=compartments_df,
model_source=Source(init=True),
keep_species_data=True,
keep_reactions_data=True
)
Recent Core Data Structure Updates
Enhanced Validation System:
validate_and_resolve()
: Automatically attempts to fix common structural issues_attempt_resolve()
: Internal method for intelligent problem resolution- Schema-based validation: Comprehensive checks against defined table schemas
New Data Management Features:
add_species_data()
/add_reactions_data()
: Add validated additional data tablesget_table()
: Retrieve tables with optional attribute validationcopy()
: Create deep copies for safe modification
Improved Identifier Handling:
get_characteristic_species_ids()
: Filter to biologically meaningful identifierssearch_by_ids()
: Advanced identifier-based entity searchget_uri_urls()
: Retrieve reference URLs for entities
Network Analysis Capabilities:
get_network_summary()
: Comprehensive network statisticsget_species_features()
: Automated species classificationget_cspecies_features()
: Compartmentalized species analysis
Data Export and Persistence:
export_sbml_dfs()
: Export to multiple file formatsto_pickle()
/from_pickle()
: Efficient serializationreaction_formulas()
: Human-readable reaction representations
Gap-Filling Section
Automated Problem Resolution:
The validate_and_resolve()
method automatically addresses common issues:
- Missing SBO terms inferred from stoichiometry
- Uncompartmentalized species location inference
- Compartmentalized species naming conflicts
- Underspecified reaction detection and removal
Manual Gap-Filling Tools:
infer_sbo_terms()
: Fill missing SBO terms based on reaction contextinfer_uncompartmentalized_species_location()
: Assign compartments from reaction contextname_compartmentalized_species()
: Resolve naming conflicts automatically
Integration Notes
These extensions should be integrated into the existing tutorial at the following locations:
- From Edgelist section: Replace the “TO DO” placeholder around line 28
- Gap-filling section: Replace the “TO DO” placeholder around line 32
- Recent updates: Add as a new section after the “Creating SBML_dfs” section
The content maintains the tutorial’s practical focus while highlighting the most valuable new features for users working with SBML_dfs objects.