brainsig.neural_dataset#
A module for creating paired-condition neural signature datasets.
This module provides the NeuralSignatureDataset class for preparing paired condition data for neural signature analysis, with subject-level train/test splitting to prevent data leakage.
Attributes#
Classes#
A dataset for paired-condition neural signature analysis. |
Module Contents#
- brainsig.neural_dataset.logger#
- class brainsig.neural_dataset.NeuralSignatureDataset(condition1_df: pandas.DataFrame, condition0_df: pandas.DataFrame, subject_id_col: str | None = None, missing_threshold: float = 0.5, preprocessor: sklearn.compose.ColumnTransformer | None = None, test_size: float = 0.2, random_state: int | None = None, *, verbose: bool = True)#
A dataset for paired-condition neural signature analysis.
Accepts two DataFrames (one per condition) with matched subjects, performs subject-level train/test splitting to prevent leakage, and exposes preprocessed arrays ready for
NeuralSignature.fitandcompute_signature_scores.- Parameters:
condition1_df (pd.DataFrame) – DataFrame for condition 1 (positive class). Rows are subjects, columns are features.
condition0_df (pd.DataFrame) – DataFrame for condition 0 (negative class). Must have the same shape and columns as condition1_df.
subject_id_col (str or None, default=None) – Column name containing subject identifiers. If provided, the column is extracted and removed from features. Subject IDs must match between the two DataFrames. If None, integer indices are used.
missing_threshold (float, default=0.5) – Columns with a fraction of missing values exceeding this threshold are dropped.
preprocessor (sklearn.compose.ColumnTransformer or None, default=None) – Custom preprocessor. If None, a default preprocessor is created that standardizes numeric features and one-hot encodes categorical features.
test_size (float, default=0.2) – Proportion of subjects to include in the test split.
random_state (int or None, default=None) – Random seed for reproducibility.
verbose (bool, default=True) – If True, log information about dropped columns and rows.
- subject_ids#
All subject IDs after cleaning, matching row order of condition arrays.
- Type:
np.ndarray
- condition1#
Preprocessed condition-1 data for all subjects, shape
(N, F).- Type:
np.ndarray
- condition0#
Preprocessed condition-0 data for all subjects, shape
(N, F).- Type:
np.ndarray
- X_train#
Combined training features (condition-1 rows then condition-0 rows), shape
(2*N_train, F).- Type:
np.ndarray
- X_test#
Combined test features, shape
(2*N_test, F).- Type:
np.ndarray
- y_train#
Binary labels for training data: 1s then 0s.
- Type:
np.ndarray
- y_test#
Binary labels for test data: 1s then 0s.
- Type:
np.ndarray
- feature_names#
Feature names from
preprocessor.get_feature_names_out().- Type:
np.ndarray
- preprocessor#
The fitted preprocessor (fit on training data only).
- Type:
sklearn.compose.ColumnTransformer
- dropped_summary#
Summary of dropped data with keys
all_missing_cols,high_missing_cols, andsubjects_dropped.- Type:
- dropped_summary#
- subject_ids#
- y_train#
- y_test#
- preprocessor = None#
- X_train#
- X_test#
- condition1#
- condition0#
- feature_names#
- target_labels#