brainsig.neural_dataset#

A module for creating paired-condition neural signature datasets.

This module provides the NeuralSignatureDataset class for preparing paired condition data for neural signature analysis, with subject-level train/test splitting to prevent data leakage.

Attributes#

logger

Classes#

NeuralSignatureDataset

A dataset for paired-condition neural signature analysis.

Module Contents#

brainsig.neural_dataset.logger#

class brainsig.neural_dataset.NeuralSignatureDataset(condition1_df: pandas.DataFrame, condition0_df: pandas.DataFrame, subject_id_col: str | None = None, missing_threshold: float = 0.5, preprocessor: sklearn.compose.ColumnTransformer | None = None, test_size: float = 0.2, random_state: int | None = None, *, verbose: bool = True)#

A dataset for paired-condition neural signature analysis.

Accepts two DataFrames (one per condition) with matched subjects, performs subject-level train/test splitting to prevent leakage, and exposes preprocessed arrays ready for NeuralSignature.fit and compute_signature_scores.

Parameters:

condition1_df (pd.DataFrame) – DataFrame for condition 1 (positive class). Rows are subjects, columns are features.
condition0_df (pd.DataFrame) – DataFrame for condition 0 (negative class). Must have the same shape and columns as condition1_df.
subject_id_col (str or None, default=None) – Column name containing subject identifiers. If provided, the column is extracted and removed from features. Subject IDs must match between the two DataFrames. If None, integer indices are used.
missing_threshold (float, default=0.5) – Columns with a fraction of missing values exceeding this threshold are dropped.
preprocessor (sklearn.compose.ColumnTransformer or None, default=None) – Custom preprocessor. If None, a default preprocessor is created that standardizes numeric features and one-hot encodes categorical features.
test_size (float, default=0.2) – Proportion of subjects to include in the test split.
random_state (int or None, default=None) – Random seed for reproducibility.
verbose (bool, default=True) – If True, log information about dropped columns and rows.

subject_ids#

All subject IDs after cleaning, matching row order of condition arrays.

Type:: np.ndarray

condition1#

Preprocessed condition-1 data for all subjects, shape (N, F).

Type:: np.ndarray

condition0#

Preprocessed condition-0 data for all subjects, shape (N, F).

Type:: np.ndarray

X_train#

Combined training features (condition-1 rows then condition-0 rows), shape (2*N_train, F).

Type:: np.ndarray

X_test#

Combined test features, shape (2*N_test, F).

Type:: np.ndarray

y_train#

Binary labels for training data: 1s then 0s.

Type:: np.ndarray

y_test#

Binary labels for test data: 1s then 0s.

Type:: np.ndarray

feature_names#

Feature names from preprocessor.get_feature_names_out().

Type:: np.ndarray

target_labels#

{"condition": [0, 1]}.

Type:: dict

preprocessor#

The fitted preprocessor (fit on training data only).

Type:: sklearn.compose.ColumnTransformer

dropped_summary#

Summary of dropped data with keys all_missing_cols, high_missing_cols, and subjects_dropped.

Type:: dict

dropped_summary#

subject_ids#

y_train#

y_test#

preprocessor = None#

X_train#

X_test#

condition1#

condition0#

feature_names#

target_labels#