brainsig.neural_dataset#

A module for creating paired-condition neural signature datasets.

This module provides the NeuralSignatureDataset class for preparing paired condition data for neural signature analysis, with subject-level train/test splitting to prevent data leakage.

Attributes#

Classes#

NeuralSignatureDataset

A dataset for paired-condition neural signature analysis.

Module Contents#

brainsig.neural_dataset.logger#
class brainsig.neural_dataset.NeuralSignatureDataset(condition1_df: pandas.DataFrame, condition0_df: pandas.DataFrame, subject_id_col: str | None = None, missing_threshold: float = 0.5, preprocessor: sklearn.compose.ColumnTransformer | None = None, test_size: float = 0.2, random_state: int | None = None, *, verbose: bool = True)#

A dataset for paired-condition neural signature analysis.

Accepts two DataFrames (one per condition) with matched subjects, performs subject-level train/test splitting to prevent leakage, and exposes preprocessed arrays ready for NeuralSignature.fit and compute_signature_scores.

Parameters:
  • condition1_df (pd.DataFrame) – DataFrame for condition 1 (positive class). Rows are subjects, columns are features.

  • condition0_df (pd.DataFrame) – DataFrame for condition 0 (negative class). Must have the same shape and columns as condition1_df.

  • subject_id_col (str or None, default=None) – Column name containing subject identifiers. If provided, the column is extracted and removed from features. Subject IDs must match between the two DataFrames. If None, integer indices are used.

  • missing_threshold (float, default=0.5) – Columns with a fraction of missing values exceeding this threshold are dropped.

  • preprocessor (sklearn.compose.ColumnTransformer or None, default=None) – Custom preprocessor. If None, a default preprocessor is created that standardizes numeric features and one-hot encodes categorical features.

  • test_size (float, default=0.2) – Proportion of subjects to include in the test split.

  • random_state (int or None, default=None) – Random seed for reproducibility.

  • verbose (bool, default=True) – If True, log information about dropped columns and rows.

subject_ids#

All subject IDs after cleaning, matching row order of condition arrays.

Type:

np.ndarray

condition1#

Preprocessed condition-1 data for all subjects, shape (N, F).

Type:

np.ndarray

condition0#

Preprocessed condition-0 data for all subjects, shape (N, F).

Type:

np.ndarray

X_train#

Combined training features (condition-1 rows then condition-0 rows), shape (2*N_train, F).

Type:

np.ndarray

X_test#

Combined test features, shape (2*N_test, F).

Type:

np.ndarray

y_train#

Binary labels for training data: 1s then 0s.

Type:

np.ndarray

y_test#

Binary labels for test data: 1s then 0s.

Type:

np.ndarray

feature_names#

Feature names from preprocessor.get_feature_names_out().

Type:

np.ndarray

target_labels#

{"condition": [0, 1]}.

Type:

dict

preprocessor#

The fitted preprocessor (fit on training data only).

Type:

sklearn.compose.ColumnTransformer

dropped_summary#

Summary of dropped data with keys all_missing_cols, high_missing_cols, and subjects_dropped.

Type:

dict

dropped_summary#
subject_ids#
y_train#
y_test#
preprocessor = None#
X_train#
X_test#
condition1#
condition0#
feature_names#
target_labels#