Arcana Processing
Submodules
arcana.processing.data_processing module
This module is for data preparation for the model. It includes the following functions: 1. get_data_for_model: get the data for the model 2. data_splits: split the data into train, validation and test data 3. standardize_data: standardize the data based on the train data 4. tensorized_and_pad: convert the data to tensor and pad them 5. pad_the_splits: pad the train, validation and test data 6. prepare_data_for_model: main functions for data preparation
- class arcana.processing.data_processing.DataPreparation(general_config, data_config, procedure_config)
Bases:
object
Data preparation class
- data_splits(data, ratio)
Split the data into train, validation and test data
- get_data_for_model()
Get the data for the model
- get_max_available_scaled_cycle()
Get the maximum available scaled cycle
- pad_the_splits(train, val, test)
Pad the train, validation and test data
- Parameters:
train (pandas dataframe) – train data
val (pandas dataframe) – validation data
test (pandas dataframe) – test data
- Returns:
padded_train (list) – list of padded train data
padded_val (list) – list of padded validation data
padded_test (list) – list of padded test data
- prepare_data_for_model()
Main functions for data preparation
- prepare_test_data_for_pretrained_model()
Prepare the test data for the pretrained model. This is used for the finetuning
- standardize_data()
Standardize the data based on the train data
- tensorized_and_pad(data, padded_data, data_lengths)
Convert the data to tensor and pad them
- Parameters:
data (pandas dataframe) – data to be converted to tensor
padded_data (list) – list of padded data
data_lengths (list) – list of data lengths
- Returns:
padded_data (list) – list of padded data
data_lengths (list) – list of data lengths