Cursor examples +/- MCP

This document contains Markdown content generated by Cursor when either intentionally using, or intentionally ignoring the Napistu Model Context Protocol (MCP) server.

Cursor without MCP

Can you help me extend the understanding_sbml_dfs.ipynb tutorial to flesh out the “from_edgelist” workflow and to include any recent updates to the core data structure? THINK DEEPLY WITHOUT USING THE NAPISTU MCP. Since you’ll have trouble directly editing the ipynb, please suggest what I should incorporate in a separate markdown file. Edit for readability and to prioritize high value content. Limit the total content to less than 30 new sentences.

From Edgelist Workflow

The from_edgelist method creates SBML_dfs objects from interaction data without requiring SBML files. This workflow integrates data from protein-protein interaction databases, regulatory networks, and custom pathway definitions.

Required Edgelist Structure

The interaction edgelist must contain these required columns:

upstream_name: Name of the upstream molecular species (must match s_name in species_df)
downstream_name: Name of the downstream molecular species (must match s_name in species_df)
r_name: Name for the interaction/reaction
r_Identifiers: Supporting identifiers as an Identifiers object

Optional Columns with Defaults

These columns are optional and get sensible defaults if missing:

upstream_compartment: Compartment for upstream species (default: “cellular component”)
downstream_compartment: Compartment for downstream species (default: “cellular component”)
upstream_sbo_term_name: SBO term defining upstream role (default: “modifier”)
downstream_sbo_term_name: SBO term defining downstream role (default: “modified”)
upstream_stoichiometry: Stoichiometry of upstream species (default: 0)
downstream_stoichiometry: Stoichiometry of downstream species (default: 0)
r_isreversible: Whether reaction is reversible (default: False)

Basic Example

import pandas as pd
from napistu import sbml_dfs_core, identifiers, source

# Define species and compartments
species_df = pd.DataFrame({
    's_name': ['TP53', 'MDM2'],
    's_Identifiers': [
        identifiers.Identifiers([{'ontology': 'uniprot', 'identifier': 'P04637'}]),
        identifiers.Identifiers([{'ontology': 'uniprot', 'identifier': 'Q00987'}])
    ]
})

compartments_df = pd.DataFrame({
    'c_name': ['nucleus'],
    'c_Identifiers': [identifiers.Identifiers([{'ontology': 'go', 'identifier': 'GO:0005634'}])]
})

# Create interaction edgelist
interaction_edgelist = pd.DataFrame({
    'upstream_name': ['TP53'],
    'downstream_name': ['MDM2'],
    'r_name': ['TP53_binds_MDM2'],
    'r_Identifiers': [identifiers.Identifiers([{'ontology': 'pubmed', 'identifier': '12345678'}])]
})

# Create SBML_dfs
model_source = source.Source.single_entry(
    model='custom_pathway', pathway_id='p53_network',
    data_source='literature', organismal_species='Homo sapiens',
    name='Custom p53 network', date='20250101'
)

sbml_dfs = sbml_dfs_core.SBML_dfs.from_edgelist(
    interaction_edgelist=interaction_edgelist,
    species_df=species_df, compartments_df=compartments_df,
    model_source=model_source
)

Recent Core Data Structure Updates

Enhanced Validation

The SBML_dfs class now includes improved validation with automatic issue resolution:

# Validate and automatically fix common issues
sbml_dfs.validate_and_resolve()

# Check validation status
if sbml_dfs.validate():
    print("Model is valid")

New Features

Recent updates provide computed features for analysis:

# Get species features including inferred types
species_features = sbml_dfs.get_species_features()

# Get compartmentalized species features including network properties
cspecies_features = sbml_dfs.get_cspecies_features()

# Generate human-readable reaction formulas
reaction_formulas = sbml_dfs.reaction_formulas()

Data Integration

New capabilities for integrating external data:

# Add species-level data with validation
expression_data = pd.DataFrame({
    'expression_level': [10.5, 8.2],
    'cell_line': ['HeLa', 'HeLa']
}, index=['TP53', 'MDM2'])

sbml_dfs.add_species_data('expression', expression_data)

Best Practices

Ensure species names match exactly between edgelist and species_df
Use meaningful reaction names that describe the interaction type
Provide comprehensive identifiers for better integration capabilities
Set appropriate SBO terms to capture biological meaning
Validate your data before creating the SBML_dfs object

Common Pitfalls

Missing compartments: Use sbml_dfs_utils.stub_compartments() for generic models
Inconsistent species names: Ensure exact string matches across all tables
Invalid SBO terms: Use terms from the controlled vocabulary in SBOTERM_NAMES
Missing identifiers: Create empty Identifiers([]) objects if no IDs available

Cursor with MCP

Can you help me extend the understanding_sbml_dfs.ipynb tutorial to flesh out the “from_edgelist” workflow and to include any recent updates to the core data structure? THINK DEEPLY AND USE THE NAPISTU MCP AS NEEDED. Since you’ll have trouble directly editing the ipynb, please suggest what I should incorporate in a separate markdown file. Edit for readability and to prioritize high value content. Limit the total content to less than 30 new sentences.

From Edgelist Workflow

The from_edgelist workflow provides a powerful way to create SBML_dfs models from simple interaction tables. This approach is ideal when you have molecular interaction data in tabular format rather than SBML files.

Required Input Structure:

interaction_edgelist: DataFrame with columns for upstream/downstream species, compartments, SBO terms, stoichiometry, and reversibility
species_df: Table defining molecular species with names and identifiers
compartments_df: Table defining cellular compartments with names and identifiers

Key Features:

Automatic ID generation and validation
SBO term inference for reaction roles
Compartmentalized species creation
Support for preserving additional data columns
Built-in validation against SBML_dfs schema

Example Usage:

from napistu.sbml_dfs_core import SBML_dfs
from napistu.source import Source

# Create from edgelist
sbml_dfs = SBML_dfs.from_edgelist(
    interaction_edgelist=interactions_df,
    species_df=species_df,
    compartments_df=compartments_df,
    model_source=Source(init=True),
    keep_species_data=True,
    keep_reactions_data=True
)

Recent Core Data Structure Updates

Enhanced Validation System:

validate_and_resolve(): Automatically attempts to fix common structural issues
_attempt_resolve(): Internal method for intelligent problem resolution
Schema-based validation: Comprehensive checks against defined table schemas

New Data Management Features:

add_species_data() / add_reactions_data(): Add validated additional data tables
get_table(): Retrieve tables with optional attribute validation
copy(): Create deep copies for safe modification

Improved Identifier Handling:

get_characteristic_species_ids(): Filter to biologically meaningful identifiers
search_by_ids(): Advanced identifier-based entity search
get_uri_urls(): Retrieve reference URLs for entities

Network Analysis Capabilities:

get_network_summary(): Comprehensive network statistics
get_species_features(): Automated species classification
get_cspecies_features(): Compartmentalized species analysis

Data Export and Persistence:

export_sbml_dfs(): Export to multiple file formats
to_pickle() / from_pickle(): Efficient serialization
reaction_formulas(): Human-readable reaction representations

Gap-Filling Section

Automated Problem Resolution: The validate_and_resolve() method automatically addresses common issues:

Missing SBO terms inferred from stoichiometry
Uncompartmentalized species location inference
Compartmentalized species naming conflicts
Underspecified reaction detection and removal

Manual Gap-Filling Tools:

infer_sbo_terms(): Fill missing SBO terms based on reaction context
infer_uncompartmentalized_species_location(): Assign compartments from reaction context
name_compartmentalized_species(): Resolve naming conflicts automatically

Integration Notes

These extensions should be integrated into the existing tutorial at the following locations:

From Edgelist section: Replace the “TO DO” placeholder around line 28
Gap-filling section: Replace the “TO DO” placeholder around line 32
Recent updates: Add as a new section after the “Creating SBML_dfs” section

The content maintains the tutorial’s practical focus while highlighting the most valuable new features for users working with SBML_dfs objects.

Sean Hackett

Cursor without MCP

From Edgelist Workflow

Required Edgelist Structure

Optional Columns with Defaults

Basic Example

Recent Core Data Structure Updates

Enhanced Validation

New Features

Data Integration

Best Practices

Common Pitfalls

Cursor with MCP

From Edgelist Workflow

Recent Core Data Structure Updates

Gap-Filling Section

Integration Notes