This document contains Markdown content generated by Cursor when either intentionally using, or intentionally ignoring the Napistu Model Context Protocol (MCP) server.

Cursor without MCP

Can you help me extend the understanding_sbml_dfs.ipynb tutorial to flesh out the “from_edgelist” workflow and to include any recent updates to the core data structure? THINK DEEPLY WITHOUT USING THE NAPISTU MCP. Since you’ll have trouble directly editing the ipynb, please suggest what I should incorporate in a separate markdown file. Edit for readability and to prioritize high value content. Limit the total content to less than 30 new sentences.

From Edgelist Workflow

The from_edgelist method creates SBML_dfs objects from interaction data without requiring SBML files. This workflow integrates data from protein-protein interaction databases, regulatory networks, and custom pathway definitions.

Required Edgelist Structure

The interaction edgelist must contain these required columns:

  • upstream_name: Name of the upstream molecular species (must match s_name in species_df)
  • downstream_name: Name of the downstream molecular species (must match s_name in species_df)
  • r_name: Name for the interaction/reaction
  • r_Identifiers: Supporting identifiers as an Identifiers object

Optional Columns with Defaults

These columns are optional and get sensible defaults if missing:

  • upstream_compartment: Compartment for upstream species (default: “cellular component”)
  • downstream_compartment: Compartment for downstream species (default: “cellular component”)
  • upstream_sbo_term_name: SBO term defining upstream role (default: “modifier”)
  • downstream_sbo_term_name: SBO term defining downstream role (default: “modified”)
  • upstream_stoichiometry: Stoichiometry of upstream species (default: 0)
  • downstream_stoichiometry: Stoichiometry of downstream species (default: 0)
  • r_isreversible: Whether reaction is reversible (default: False)

Basic Example

import pandas as pd
from napistu import sbml_dfs_core, identifiers, source

# Define species and compartments
species_df = pd.DataFrame({
    's_name': ['TP53', 'MDM2'],
    's_Identifiers': [
        identifiers.Identifiers([{'ontology': 'uniprot', 'identifier': 'P04637'}]),
        identifiers.Identifiers([{'ontology': 'uniprot', 'identifier': 'Q00987'}])
    ]
})

compartments_df = pd.DataFrame({
    'c_name': ['nucleus'],
    'c_Identifiers': [identifiers.Identifiers([{'ontology': 'go', 'identifier': 'GO:0005634'}])]
})

# Create interaction edgelist
interaction_edgelist = pd.DataFrame({
    'upstream_name': ['TP53'],
    'downstream_name': ['MDM2'],
    'r_name': ['TP53_binds_MDM2'],
    'r_Identifiers': [identifiers.Identifiers([{'ontology': 'pubmed', 'identifier': '12345678'}])]
})

# Create SBML_dfs
model_source = source.Source.single_entry(
    model='custom_pathway', pathway_id='p53_network',
    data_source='literature', organismal_species='Homo sapiens',
    name='Custom p53 network', date='20250101'
)

sbml_dfs = sbml_dfs_core.SBML_dfs.from_edgelist(
    interaction_edgelist=interaction_edgelist,
    species_df=species_df, compartments_df=compartments_df,
    model_source=model_source
)

Recent Core Data Structure Updates

Enhanced Validation

The SBML_dfs class now includes improved validation with automatic issue resolution:

# Validate and automatically fix common issues
sbml_dfs.validate_and_resolve()

# Check validation status
if sbml_dfs.validate():
    print("Model is valid")

New Features

Recent updates provide computed features for analysis:

# Get species features including inferred types
species_features = sbml_dfs.get_species_features()

# Get compartmentalized species features including network properties
cspecies_features = sbml_dfs.get_cspecies_features()

# Generate human-readable reaction formulas
reaction_formulas = sbml_dfs.reaction_formulas()

Data Integration

New capabilities for integrating external data:

# Add species-level data with validation
expression_data = pd.DataFrame({
    'expression_level': [10.5, 8.2],
    'cell_line': ['HeLa', 'HeLa']
}, index=['TP53', 'MDM2'])

sbml_dfs.add_species_data('expression', expression_data)

Best Practices

  1. Ensure species names match exactly between edgelist and species_df
  2. Use meaningful reaction names that describe the interaction type
  3. Provide comprehensive identifiers for better integration capabilities
  4. Set appropriate SBO terms to capture biological meaning
  5. Validate your data before creating the SBML_dfs object

Common Pitfalls

  • Missing compartments: Use sbml_dfs_utils.stub_compartments() for generic models
  • Inconsistent species names: Ensure exact string matches across all tables
  • Invalid SBO terms: Use terms from the controlled vocabulary in SBOTERM_NAMES
  • Missing identifiers: Create empty Identifiers([]) objects if no IDs available

Cursor with MCP

Can you help me extend the understanding_sbml_dfs.ipynb tutorial to flesh out the “from_edgelist” workflow and to include any recent updates to the core data structure? THINK DEEPLY AND USE THE NAPISTU MCP AS NEEDED. Since you’ll have trouble directly editing the ipynb, please suggest what I should incorporate in a separate markdown file. Edit for readability and to prioritize high value content. Limit the total content to less than 30 new sentences.

From Edgelist Workflow

The from_edgelist workflow provides a powerful way to create SBML_dfs models from simple interaction tables. This approach is ideal when you have molecular interaction data in tabular format rather than SBML files.

Required Input Structure:

  • interaction_edgelist: DataFrame with columns for upstream/downstream species, compartments, SBO terms, stoichiometry, and reversibility
  • species_df: Table defining molecular species with names and identifiers
  • compartments_df: Table defining cellular compartments with names and identifiers

Key Features:

  • Automatic ID generation and validation
  • SBO term inference for reaction roles
  • Compartmentalized species creation
  • Support for preserving additional data columns
  • Built-in validation against SBML_dfs schema

Example Usage:

from napistu.sbml_dfs_core import SBML_dfs
from napistu.source import Source

# Create from edgelist
sbml_dfs = SBML_dfs.from_edgelist(
    interaction_edgelist=interactions_df,
    species_df=species_df,
    compartments_df=compartments_df,
    model_source=Source(init=True),
    keep_species_data=True,
    keep_reactions_data=True
)

Recent Core Data Structure Updates

Enhanced Validation System:

  • validate_and_resolve(): Automatically attempts to fix common structural issues
  • _attempt_resolve(): Internal method for intelligent problem resolution
  • Schema-based validation: Comprehensive checks against defined table schemas

New Data Management Features:

  • add_species_data() / add_reactions_data(): Add validated additional data tables
  • get_table(): Retrieve tables with optional attribute validation
  • copy(): Create deep copies for safe modification

Improved Identifier Handling:

  • get_characteristic_species_ids(): Filter to biologically meaningful identifiers
  • search_by_ids(): Advanced identifier-based entity search
  • get_uri_urls(): Retrieve reference URLs for entities

Network Analysis Capabilities:

  • get_network_summary(): Comprehensive network statistics
  • get_species_features(): Automated species classification
  • get_cspecies_features(): Compartmentalized species analysis

Data Export and Persistence:

  • export_sbml_dfs(): Export to multiple file formats
  • to_pickle() / from_pickle(): Efficient serialization
  • reaction_formulas(): Human-readable reaction representations

Gap-Filling Section

Automated Problem Resolution: The validate_and_resolve() method automatically addresses common issues:

  • Missing SBO terms inferred from stoichiometry
  • Uncompartmentalized species location inference
  • Compartmentalized species naming conflicts
  • Underspecified reaction detection and removal

Manual Gap-Filling Tools:

  • infer_sbo_terms(): Fill missing SBO terms based on reaction context
  • infer_uncompartmentalized_species_location(): Assign compartments from reaction context
  • name_compartmentalized_species(): Resolve naming conflicts automatically

Integration Notes

These extensions should be integrated into the existing tutorial at the following locations:

  1. From Edgelist section: Replace the “TO DO” placeholder around line 28
  2. Gap-filling section: Replace the “TO DO” placeholder around line 32
  3. Recent updates: Add as a new section after the “Creating SBML_dfs” section

The content maintains the tutorial’s practical focus while highlighting the most valuable new features for users working with SBML_dfs objects.