brainsig.neural_dataset
=======================

.. py:module:: brainsig.neural_dataset

.. autoapi-nested-parse::

   A module for creating paired-condition neural signature datasets.

   This module provides the NeuralSignatureDataset class for preparing paired
   condition data for neural signature analysis, with subject-level train/test
   splitting to prevent data leakage.


Attributes
----------

.. autoapisummary::

   brainsig.neural_dataset.logger


Classes
-------

.. autoapisummary::

   brainsig.neural_dataset.NeuralSignatureDataset


Module Contents
---------------

.. py:data:: logger

.. py:class:: NeuralSignatureDataset(condition1_df: pandas.DataFrame, condition0_df: pandas.DataFrame, subject_id_col: str | None = None, missing_threshold: float = 0.5, preprocessor: sklearn.compose.ColumnTransformer | None = None, test_size: float = 0.2, random_state: int | None = None, *, verbose: bool = True)

   A dataset for paired-condition neural signature analysis.

   Accepts two DataFrames (one per condition) with matched subjects, performs
   subject-level train/test splitting to prevent leakage, and exposes
   preprocessed arrays ready for ``NeuralSignature.fit`` and
   ``compute_signature_scores``.

   :param condition1_df: DataFrame for condition 1 (positive class). Rows are subjects,
                         columns are features.
   :type condition1_df: pd.DataFrame
   :param condition0_df: DataFrame for condition 0 (negative class). Must have the same shape
                         and columns as *condition1_df*.
   :type condition0_df: pd.DataFrame
   :param subject_id_col: Column name containing subject identifiers. If provided, the column
                          is extracted and removed from features. Subject IDs must match between
                          the two DataFrames. If None, integer indices are used.
   :type subject_id_col: str or None, default=None
   :param missing_threshold: Columns with a fraction of missing values exceeding this threshold
                             are dropped.
   :type missing_threshold: float, default=0.5
   :param preprocessor: Custom preprocessor. If None, a default preprocessor is created that
                        standardizes numeric features and one-hot encodes categorical features.
   :type preprocessor: sklearn.compose.ColumnTransformer or None, default=None
   :param test_size: Proportion of subjects to include in the test split.
   :type test_size: float, default=0.2
   :param random_state: Random seed for reproducibility.
   :type random_state: int or None, default=None
   :param verbose: If True, log information about dropped columns and rows.
   :type verbose: bool, default=True

   .. attribute:: subject_ids

      All subject IDs after cleaning, matching row order of condition arrays.

      :type: np.ndarray

   .. attribute:: condition1

      Preprocessed condition-1 data for all subjects, shape ``(N, F)``.

      :type: np.ndarray

   .. attribute:: condition0

      Preprocessed condition-0 data for all subjects, shape ``(N, F)``.

      :type: np.ndarray

   .. attribute:: X_train

      Combined training features (condition-1 rows then condition-0 rows),
      shape ``(2*N_train, F)``.

      :type: np.ndarray

   .. attribute:: X_test

      Combined test features, shape ``(2*N_test, F)``.

      :type: np.ndarray

   .. attribute:: y_train

      Binary labels for training data: 1s then 0s.

      :type: np.ndarray

   .. attribute:: y_test

      Binary labels for test data: 1s then 0s.

      :type: np.ndarray

   .. attribute:: feature_names

      Feature names from ``preprocessor.get_feature_names_out()``.

      :type: np.ndarray

   .. attribute:: target_labels

      ``{"condition": [0, 1]}``.

      :type: dict

   .. attribute:: preprocessor

      The fitted preprocessor (fit on training data only).

      :type: sklearn.compose.ColumnTransformer

   .. attribute:: dropped_summary

      Summary of dropped data with keys ``all_missing_cols``,
      ``high_missing_cols``, and ``subjects_dropped``.

      :type: dict


   .. py:attribute:: dropped_summary


   .. py:attribute:: subject_ids


   .. py:attribute:: y_train


   .. py:attribute:: y_test


   .. py:attribute:: preprocessor
      :value: None


   .. py:attribute:: X_train


   .. py:attribute:: X_test


   .. py:attribute:: condition1


   .. py:attribute:: condition0


   .. py:attribute:: feature_names


   .. py:attribute:: target_labels