brainsig.neural_dataset ======================= .. py:module:: brainsig.neural_dataset .. autoapi-nested-parse:: A module for creating paired-condition neural signature datasets. This module provides the NeuralSignatureDataset class for preparing paired condition data for neural signature analysis, with subject-level train/test splitting to prevent data leakage. Attributes ---------- .. autoapisummary:: brainsig.neural_dataset.logger Classes ------- .. autoapisummary:: brainsig.neural_dataset.NeuralSignatureDataset Module Contents --------------- .. py:data:: logger .. py:class:: NeuralSignatureDataset(condition1_df: pandas.DataFrame, condition0_df: pandas.DataFrame, subject_id_col: str | None = None, missing_threshold: float = 0.5, preprocessor: sklearn.compose.ColumnTransformer | None = None, test_size: float = 0.2, random_state: int | None = None, *, verbose: bool = True) A dataset for paired-condition neural signature analysis. Accepts two DataFrames (one per condition) with matched subjects, performs subject-level train/test splitting to prevent leakage, and exposes preprocessed arrays ready for ``NeuralSignature.fit`` and ``compute_signature_scores``. :param condition1_df: DataFrame for condition 1 (positive class). Rows are subjects, columns are features. :type condition1_df: pd.DataFrame :param condition0_df: DataFrame for condition 0 (negative class). Must have the same shape and columns as *condition1_df*. :type condition0_df: pd.DataFrame :param subject_id_col: Column name containing subject identifiers. If provided, the column is extracted and removed from features. Subject IDs must match between the two DataFrames. If None, integer indices are used. :type subject_id_col: str or None, default=None :param missing_threshold: Columns with a fraction of missing values exceeding this threshold are dropped. :type missing_threshold: float, default=0.5 :param preprocessor: Custom preprocessor. If None, a default preprocessor is created that standardizes numeric features and one-hot encodes categorical features. :type preprocessor: sklearn.compose.ColumnTransformer or None, default=None :param test_size: Proportion of subjects to include in the test split. :type test_size: float, default=0.2 :param random_state: Random seed for reproducibility. :type random_state: int or None, default=None :param verbose: If True, log information about dropped columns and rows. :type verbose: bool, default=True .. attribute:: subject_ids All subject IDs after cleaning, matching row order of condition arrays. :type: np.ndarray .. attribute:: condition1 Preprocessed condition-1 data for all subjects, shape ``(N, F)``. :type: np.ndarray .. attribute:: condition0 Preprocessed condition-0 data for all subjects, shape ``(N, F)``. :type: np.ndarray .. attribute:: X_train Combined training features (condition-1 rows then condition-0 rows), shape ``(2*N_train, F)``. :type: np.ndarray .. attribute:: X_test Combined test features, shape ``(2*N_test, F)``. :type: np.ndarray .. attribute:: y_train Binary labels for training data: 1s then 0s. :type: np.ndarray .. attribute:: y_test Binary labels for test data: 1s then 0s. :type: np.ndarray .. attribute:: feature_names Feature names from ``preprocessor.get_feature_names_out()``. :type: np.ndarray .. attribute:: target_labels ``{"condition": [0, 1]}``. :type: dict .. attribute:: preprocessor The fitted preprocessor (fit on training data only). :type: sklearn.compose.ColumnTransformer .. attribute:: dropped_summary Summary of dropped data with keys ``all_missing_cols``, ``high_missing_cols``, and ``subjects_dropped``. :type: dict .. py:attribute:: dropped_summary .. py:attribute:: subject_ids .. py:attribute:: y_train .. py:attribute:: y_test .. py:attribute:: preprocessor :value: None .. py:attribute:: X_train .. py:attribute:: X_test .. py:attribute:: condition1 .. py:attribute:: condition0 .. py:attribute:: feature_names .. py:attribute:: target_labels