brainsig.model#

A module to fit elastic net logistic regression.

This module implements the ElasticNetClassifier for neural signature analysis.

Classes#

ElasticNetClassifier

Elastic Net Logistic Regression classifier for neural signature analysis.

NeuralSignature

Neural Signature classifier for fMRI task condition discrimination.

Module Contents#

class brainsig.model.ElasticNetClassifier(inner_folds: int = 5, outer_folds: int = 5, inner_scoring: str = 'roc_auc_ovr', outer_scoring: dict | None = None, cs: list | None = None, l1_ratios: list | None = None, max_iter: int = 1000, n_jobs: int = -1, random_state: int = 42)#

Elastic Net Logistic Regression classifier for neural signature analysis.

This classifier performs nested cross-validation with elastic net regularization for binary or multi-class classification tasks.

Parameters:
  • inner_folds (int, default=5) – Number of folds for inner cross-validation (hyperparameter tuning).

  • outer_folds (int, default=5) – Number of folds for outer cross-validation (performance evaluation).

  • inner_scoring (str, default='roc_auc_ovr') – Scoring metric for inner CV hyperparameter selection.

  • outer_scoring (dict or None, default=None) – Dictionary of scoring metrics for outer CV. If None, uses default metrics.

  • cs (list or None, default=None) – Regularization parameter values to test. If None, uses default values.

  • l1_ratios (list or None, default=None) – L1 penalty ratios for elastic net. If None, uses default values.

  • max_iter (int, default=1000) – Maximum number of iterations for solver convergence.

  • n_jobs (int, default=-1) – Number of parallel jobs. -1 uses all processors.

  • random_state (int, default=42) – Random seed for reproducibility.

models#

Fitted models for each target variable.

Type:

dict

cv_results#

Cross-validation results for each target variable.

Type:

dict

target_names#

Names of target variables.

Type:

list

inner_scoring = 'roc_auc_ovr'#
outer_scoring#
inner_folds = 5#
outer_folds = 5#
Cs = [0.001, 0.01, 0.1, 1, 10]#
l1_ratios = [0.1, 0.5, 0.7, 0.9, 0.95, 0.99, 1.0]#
max_iter = 1000#
n_jobs = -1#
random_state = 42#
inner_cv#
outer_cv#
dataset = None#
models#
cv_results#
target_names = []#
define_model(random_state: int = 42) sklearn.linear_model.LogisticRegressionCV#

Create a LogisticRegressionCV model with elastic net penalty.

Parameters:

random_state (int, default=42) – Random seed for model initialization.

Returns:

Configured logistic regression model with cross-validation.

Return type:

LogisticRegressionCV

build_cv_scheme(n_splits: int = 5, random_state: int = 42) sklearn.model_selection.StratifiedKFold#

Build a stratified k-fold cross-validation scheme.

Parameters:
  • n_splits (int, default=5) – Number of folds for cross-validation.

  • random_state (int, default=42) – Random seed for reproducible splits.

Returns:

Configured cross-validation splitter.

Return type:

StratifiedKFold

fit_model(dataset, *, keep_dataset: bool = True) None#

Fit elastic net models for each target variable.

Parameters:
  • dataset (Dataset) – Dataset object containing X_train and y_train arrays.

  • keep_dataset (bool, default=True) – Whether to store the dataset as an instance attribute.

predict(dataset) dict#

Make predictions on test set for all targets.

Parameters:

dataset (Dataset) – Dataset object containing X_test arrays.

Returns:

Dictionary with predictions for each target, containing y_pred, y_pred_proba, and y_true arrays.

Return type:

dict

cross_validate(dataset) dict#

Perform nested cross-validation for each target variable.

Parameters:

dataset (Dataset) – Dataset object containing training data.

Returns:

Cross-validation results for each target, including scores and fitted estimators for each fold.

Return type:

dict

get_cv_coefs(dataset, *, exponentiate: bool = False) pandas.DataFrame#

Extract coefficients from cross-validated models.

Parameters:
  • dataset (Dataset) – Dataset object with feature names and target labels.

  • exponentiate (bool, default=False) – If True, exponentiate coefficients to get odds ratios.

Returns:

DataFrame with coefficients indexed by cv_fold, target_variable, and target class.

Return type:

pd.DataFrame

get_model_scores(dataset=None) pandas.DataFrame#

Get fit statistics for the model trained on the full dataset.

Parameters:

dataset (Dataset or None, default=None) – Unused. Kept for API consistency.

Returns:

DataFrame with accuracy, F1, and AUC scores for the fitted model, with columns: value, partition, metric, target, target_label.

Return type:

pd.DataFrame

get_cv_model_scores(dataset=None) pandas.DataFrame#

Get fit statistics for each validation fold separately.

Parameters:

dataset (Dataset or None, default=None) – Dataset object with target labels for multi-class score interpretation.

Returns:

DataFrame with accuracy, F1, and AUC scores per CV fold, with columns: cv_fold, value, partition, metric, target, target_label.

Return type:

pd.DataFrame

get_roc_curve(dataset=None) pandas.DataFrame#

Get ROC curve data for the model trained on the full dataset.

Parameters:

dataset (Dataset or None, default=None) – Unused. Kept for API consistency.

Returns:

Long-format DataFrame with one row per threshold point, with columns: fpr, tpr, threshold, target, target_label.

Return type:

pd.DataFrame

get_cv_roc_curves(dataset) pandas.DataFrame#

Get ROC curve data for each validation fold.

Parameters:

dataset (Dataset) – Dataset object providing X_train and y_train for fold reconstruction.

Returns:

Long-format DataFrame with one row per threshold point, with columns: cv_fold, partition, fpr, tpr, threshold, target, target_label.

Return type:

pd.DataFrame

class brainsig.model.NeuralSignature(inner_folds: int = 5, outer_folds: int = 5, inner_scoring: str = 'roc_auc', outer_scoring: dict | None = None, cs: list | None = None, l1_ratios: list | None = None, max_iter: int = 1000, n_jobs: int = -1, random_state: int = 42)#

Neural Signature classifier for fMRI task condition discrimination.

This class fits an elastic net logistic regression model to discriminate between two fMRI task conditions (labeled 1 and 0) and computes neural signature scores as the difference in predicted probabilities between conditions for each subject.

The neural signature score for a subject is computed as:

score = P(condition=1 | fMRI_condition1) - P(condition=1 | fMRI_condition0)

Parameters:
  • inner_folds (int, default=5) – Number of folds for inner cross-validation (hyperparameter tuning).

  • outer_folds (int, default=5) – Number of folds for outer cross-validation (performance evaluation).

  • inner_scoring (str, default='roc_auc') – Scoring metric for inner CV hyperparameter selection.

  • outer_scoring (dict or None, default=None) – Dictionary of scoring metrics for outer CV. If None, uses default metrics.

  • cs (list or None, default=None) – Regularization parameter values to test. If None, uses default values.

  • l1_ratios (list or None, default=None) – L1 penalty ratios for elastic net. If None, uses default values.

  • max_iter (int, default=1000) – Maximum number of iterations for solver convergence.

  • n_jobs (int, default=-1) – Number of parallel jobs. -1 uses all processors.

  • random_state (int, default=42) – Random seed for reproducibility.

classifier#

Underlying elastic net classifier for condition discrimination.

Type:

ElasticNetClassifier

signature_scores#

Computed neural signature scores for each subject.

Type:

pd.DataFrame or None

Examples

>>> # Prepare data with condition labels (1 and 0)
>>> neural_sig = NeuralSignature(random_state=42)
>>> neural_sig.fit(dataset)
>>> scores = neural_sig.compute_signature_scores(condition1_data, condition0_data)
classifier#
signature_scores = None#
fit(dataset, *, keep_dataset: bool = True) None#

Fit the neural signature model to discriminate between task conditions.

The dataset should contain a binary target variable where: - Label 1 represents the first task condition - Label 0 represents the second task condition

Parameters:
  • dataset (Dataset) – Dataset object with binary condition labels (1 and 0).

  • keep_dataset (bool, default=True) – Whether to store the dataset in the classifier.

cross_validate(dataset) dict#

Perform nested cross-validation for the neural signature model.

Parameters:

dataset (Dataset) – Dataset object containing training data with binary condition labels.

Returns:

Cross-validation results including scores and fitted estimators.

Return type:

dict

compute_signature_scores(condition1_data: numpy.ndarray, condition0_data: numpy.ndarray, *, subject_ids: list | None = None) pandas.DataFrame#

Compute neural signature scores for each subject.

The neural signature score is computed as the difference in predicted probabilities for condition 1 between the two task conditions:

score = P(y=1 | condition1_data) - P(y=1 | condition0_data)
Parameters:
  • condition1_data (np.ndarray) – Preprocessed fMRI data for condition 1 (shape: n_subjects x n_features).

  • condition0_data (np.ndarray) – Preprocessed fMRI data for condition 0 (shape: n_subjects x n_features).

  • subject_ids (list or None, default=None) – Optional list of subject identifiers. If None, uses sequential indices.

Returns:

DataFrame with columns: subject_id, condition1_prob, condition0_prob, signature_score.

Return type:

pd.DataFrame

Raises:

ValueError – If the model hasn’t been fitted yet or if data shapes don’t match.

get_cv_signature_scores(dataset, condition1_indices: numpy.ndarray, condition0_indices: numpy.ndarray, *, subject_ids: list | None = None) pandas.DataFrame#

Compute neural signature scores using cross-validated models.

This method computes signature scores for each CV fold, which is useful for estimating the generalizability of the neural signature.

Parameters:
  • dataset (Dataset) – Dataset object with full data (both conditions for all subjects).

  • condition1_indices (np.ndarray) – Indices in the dataset corresponding to condition 1 trials.

  • condition0_indices (np.ndarray) – Indices in the dataset corresponding to condition 0 trials.

  • subject_ids (list or None, default=None) – Optional list of subject identifiers.

Returns:

DataFrame with signature scores from each CV fold, including columns: cv_fold, subject_id, condition1_prob, condition0_prob, signature_score.

Return type:

pd.DataFrame

Raises:

ValueError – If cross-validation hasn’t been performed yet.

get_coefficients(dataset, *, exponentiate: bool = False) pandas.DataFrame#

Get model coefficients (feature weights) from cross-validated models.

Parameters:
  • dataset (Dataset) – Dataset object with feature names.

  • exponentiate (bool, default=False) – If True, exponentiate coefficients to get odds ratios.

Returns:

DataFrame with coefficients for each feature across CV folds.

Return type:

pd.DataFrame

get_model_scores(dataset=None) pandas.DataFrame#

Get fit statistics for the model trained on the full dataset.

Parameters:

dataset (Dataset or None, default=None) – Unused. Kept for API consistency.

Returns:

DataFrame with accuracy, F1, and AUC scores for the fitted model, with columns: value, partition, metric, target, target_label.

Return type:

pd.DataFrame

get_cv_model_scores(dataset=None) pandas.DataFrame#

Get fit statistics for each validation fold separately.

Parameters:

dataset (Dataset or None, default=None) – Dataset object with target labels for multi-class score interpretation.

Returns:

DataFrame with accuracy, F1, and AUC scores per CV fold, with columns: cv_fold, value, partition, metric, target, target_label.

Return type:

pd.DataFrame

get_roc_curve(dataset=None) pandas.DataFrame#

Get ROC curve data for the model trained on the full dataset.

Parameters:

dataset (Dataset or None, default=None) – Unused. Kept for API consistency.

Returns:

Long-format DataFrame with one row per threshold point, with columns: fpr, tpr, threshold, target, target_label.

Return type:

pd.DataFrame

get_cv_roc_curves(dataset) pandas.DataFrame#

Get ROC curve data for each validation fold.

Parameters:

dataset (Dataset) – Dataset object used during cross-validation.

Returns:

Long-format DataFrame with one row per threshold point, with columns: cv_fold, partition, fpr, tpr, threshold, target, target_label.

Return type:

pd.DataFrame

save(path) None#

Save the fitted model to disk using joblib.

Parameters:

path (str or Path) – File path to save the model to (e.g. ‘model.joblib’).

classmethod load(path) NeuralSignature#

Load a saved model from disk.

Parameters:

path (str or Path) – File path of the saved model.

Returns:

The loaded model instance.

Return type:

NeuralSignature