Skip to main content

Modeling Quasar time series with Neural processes in Python

Project description

Installation:

pip install QNPy

REQUIREMENTS

This package contains a requirements.txt file with all the requirements that need to be satisfied (mainly other packages) before you can use it as a single package. To install all requirements at once, you will need to:

  1. in the command line navigate to the directory where you downloaded your package (where the requirements.txt file is)
  2. once you are there type: pip install -r requirements.txt

You are ready to use the QNPy package now

EXAMPLE: At this link: https://github.com/kittytheastronaut/QNPy you can find the folder under the name of "Tutorial" where you'll find notebooks that will take you through the entire process of using this package with an example of using each of the modules separately. In the "Tutorial" folder you will also find an example of how your light curves should look like, which you use as input data, in the "Light_curves" folder.

The folders you'll need to make before running the script:

  1. preproc
  2. dataset 2.1. train 2.2. test 2.3. val
  3. padded_lc
  4. output 4.1. prediction 4.1.1. test 4.1.1.1. data 4.1.1.2. plots 4.1.2. train 4.1.2.1. data 4.1.2.2. plots 4.1.3. val 4.1.3.1. data 4.1.3.2. plots

MODULES AND THEIR FUNCTIONS:

1. PREPROCESSING THE DATA

Preprocess.py

In this module, we transform data in the range [-2,2]x[-2,2] to make training faster. This module also contains a function for padding light curves. This function does not need to be used if your light curves have 100 or more points. It is important to emphasize that all light curves must have the same number of points in order to train the model correctly. By testing the package, we found that it is best to do backward padding with the last measured value up to 100 points. You use the padded light curves to train the model, and later, for prediction and plotting, these points are subtracted. According to the needs of the user, another method for padding can be done. The data transformation function in this module also creates subsets of the data named your_curve_name plus and your_curve_name_minus. These subsets are made with respect to errors in magnitudes and serve to augment the model's training set. The original curves are saved as the name_of_your_curve_original.

Before running this script, you must create the following folders in the directory where your Python notebook is located:

  1. ./preproc/ -- It is going to be the folder for saving the transformed data
  2. ./padded_lc/ -- It is going to be the folder for saving the padded light curves

Your data must contain: mjd - MJD or time, mag-magnitude and magerr-magnitude error


backward_pad_curves(folder_path, output_folder, desired_observations=100) -- Backward padding the light curves with the last observed value for mag and magerr -- If your data contains 'time' values it'll add +1 for padded values, and if your data contains 'MJD' values it will add +0.2

""" Args: folder_path (str): The path to a folder containing the .csv files output_path (str): The path to a folder for saving the padded lc desired_observations: The number of points that our package is demanding is 100 but it can be more

How to use: padding= backward_pad_curves('./light_curves', './Padded_lc', desired_observations=100) """


transform(data) --Transforming data into [-2,2]x[-2,2] range. This function needs to be uploaded before using it """ Args: data: Your data must contain: MJD or time, mag-magnitude and magerr-magnitude error

"""


transform_and_save(files, DATA_SRC, DATA_DST, transform):

Transforms and saves a list of CSV files. The function also saves tr coefficients as a pickle file named trcoeff.pickle """

Args:
files (list): A list of CSV or TXT file names.
DATA_SRC (str): The path to the folder containing the CSV or TXT files.
DATA_DST (str): The path to the folder where the transformed CSV or TXT files will be saved.
transform: function for transformation defined previously

Returns:
list: A list of transformation coefficients for each file, where each element is a list containing the file name and the transformation coefficients Ax, Bx, Ay, and By.

How to use:
number_of_points, trcoeff = transform_and_save(files, DATA_SRC, DATA_DST, transform)

After that, you can plot the number of data points
"""

2. SPLITTING AND TRAINING THE DATA

SPLITTING_AND_TRAINING.py

We use this module to split the data into three subsamples that will serve as a test sample, a sample for model training, and a validation sample. This module also contains functions for training and saving models. It contains the following functions that must be executed in order.

Before running this script, you must create the following folders in the directory where your Python notebook is located:

  1. ./dataset/train -- folder for saving data for traing after splitting your original dataset
  2. ./dataset/test -- folder for test data
  3. ./dataset/val -- folder for validation data
  4. ./output -- folder where you are going to save your trained model

create_split_folders(train_folder='./dataset/train/', test_folder='./dataset/test/', val_folder='./dataset/val/'):

-- Creates a TRAIN, TEST and VAL folders in directory

""" how to use: create_split_folders(train_folder='./dataset/train/', test_folder='./dataset/test/', val_folder='./dataset/val/')

"""


split_data(files, DATA_SRC, TRAIN_FOLDER, TEST_FOLDER, VAL_FOLDER)

-- Splits the data into TRAIN, TEST and VAL folders

""" You need to make sure that DATA_SRC, TRAIN_FOLDER, TEST_FOLDER, and VAL_FOLDER are defined before calling the function split_data.

Args: files (list): A list of CSV file names. DATA_SRC (string): Path to preproceed data TRAIN_FOLDER (string): Path for saving the train data TEST_FOLDER (string): Path for saving the test data VAL_FOLDER (string): Path for saving the validation data

How to use: split_data(files, DATA_SRC, TRAIN_FOLDER, TEST_FOLDER, VAL_FOLDER)

"""


TRAINING

Special note for mac os users:

When creating folders with mac operating systems, hidden .DS_Store files may be created. The user must delete these files before starting training from each folder. The best way is to go into each folder individually and run the command:

!rm -f .DS_Store

Important note: Deleting files using the "delete" directly in the folders does not remove hidden files.

Before running the training function you must define:

DATA_PATH_TRAIN = "./dataset/train" # path to train folder DATA_PATH_VAL = "./dataset/val" # path to val folder

MODEL_PATH = "./output/cnp_model.pth" # folder for saving model

BATCH_SIZE = 32# training batch size MUST REMAIN 32 EPOCHS = 6000 #This is optional EARLY_STOPPING_LIMIT = 3000 #This is optional


get_data_loader(data_path_train, data_path_val, batch_size)

--- Defining train and validation loader for training process and validation

""" Args: data_path_train(str): path to train folder data_path_val(str): path to val folder batch_size: it is recommended to be 32

How to use: trainLoader, valLoader = get_data_loader(DATA_PATH_TRAIN,BATCH SIZE) """


create_model_and_optimizer()

--Defines the model as Deterministic Model, optimizer as torch optimizer, criterion as LogProbLoss, mseMetric as MSELoss and maeMetric as MAELoss """ How to use: model, optimizer, criterion, mseMetric, maeMetric = create_model_and_optimizer(device) Device has to be defined before and it can be cuda or cpu """


train_model(model, train_loader, val_loader,criterion, optimizer, num_runs, epochs, early_stopping_limit, mse_metric, maeMetric, device)

-- Trains the model

""" Args: model: Deterministic model train_loader: train loader val_loader: validation loader criterion: criterion optimizer: torch optimizer num_runs: The number of trainings epochs: epochs for training. This is optional, but minimum of 3000 is recomended early_stopping_limit: limits the epochs for stopping the training. This is optional but minimum of 1500 is recomended mse_metric: mse metric mae_metric: mae metric device: torch device cpu or cuda

How to use: If you want to save history_loss_train, history_loss_val, history_mse_train and history_mse_val for plotting you train your model like:

history_loss_train, history_loss_val, history_mse_train, history_mse_val, history_mae_train, history_mae_val, epoch_counter_train_loss, epoch_counter_train_mse, epoch_counter_train_mae, epoch_counter_val_loss, epoch_counter_val_mse, epoch_counter_val_mae = st.train_model(model, trainLoader, valLoader, criterion, optimizer, 1, 3000, 1500, mseMetric, maeMetric, device)

"""


save_lists_to_csv(file_names,lists)

--saving the histories to lists

""" args: file_names: A list of file names to be used for saving the data. Each file name corresponds to a specific data list that will be saved in CSV format. lists (list): A list of lists containing the data to be saved. Each inner list represents a set of rows to be written to a CSV file.

How to use:

Define the file names for saving the lists

file_names = ["history_loss_train.csv", "history_loss_val.csv", "history_mse_train.csv", "history_mse_val.csv","history_mae_train.csv", "history_mae_val.csv", "epoch_counter_train_loss.csv", "epoch_counter_train_mse.csv", "epoch_counter_train_mae.csv", "epoch_counter_val_loss.csv","epoch_counter_val_mse.csv", "epoch_counter_val_mae.csv"]

Define the lists

lists = [history_loss_train, history_loss_val, history_mse_train, history_mse_val, history_mae_train, history_mae_val, epoch_counter_train_loss, epoch_counter_train_mse, epoch_counter_train_mae, epoch_counter_val_loss, epoch_counter_val_mse, epoch_counter_val_mae]

save_list= save_lists_to_csv(file_names, lists)


plot_loss(history_loss_train_file, history_loss_val_file, epoch_counter_train_loss_file)

-- plotting the history losses

""" args: returned data from test_model How to use:

history_loss_train_file = './history_loss_train.csv' # Replace with the path to your history_loss_train CSV file history_loss_val_file = './history_loss_val.csv' # Replace with the path to your history_loss_val CSV file epoch_counter_train_loss_file = './epoch_counter_train_loss.csv' # Replace with the path to your epoch_counter_train_loss CSV file

logprobloss=plot_loss(history_loss_train_file, history_loss_val_file, epoch_counter_train_loss_file)

"""


plot_mse_metric(history_mse_train_file, history_mse_val_file, epoch_counter_train_mse_file)

-- plotting the mse metric

""" args: returned data from test_model How to use:

history_mse_train_file = './history_mse_train.csv' # Replace with the path to your history_mse_train CSV file history_mse_val_file = './history_mse_val.csv' # Replace with the path to your history_mse_val CSV file epoch_counter_train_mse_file = './epoch_counter_train_mse.csv' # Replace with the path to your epoch_counter_train_mse CSV file

msemetric=plot_mse(history_mse_train_file, history_mse_val_file, epoch_counter_train_mse_file)

"""


plot_mae_metric(history_mae_train_file, history_mae_val_file, epoch_counter_train_mae_file)

-- plotting the mae metric

""" args: returned data from test_model How to use:

history_mae_train_file = './history_mae_train.csv' # Replace with the path to your history_mae_train CSV file history_mae_val_file = './history_mae_val.csv' # Replace with the path to your history_mae_val CSV file epoch_counter_train_mae_file = './epoch_counter_train_mae.csv' # Replace with the path to your epoch_counter_train_mae CSV file

maemetric=plot_mae(history_mae_train_file, history_mae_val_file, epoch_counter_train_mae_file) """


save_model(model, MODEL_PATH)

-- saving the model

""" Args: model: Deterministic model MODEL_PATH(str): output path for saving the model

How to use: save_model(model, MODEL_PATH) """


3. PREDICTION AND PLOTTING THE TRANSFORMED DATA, EACH CURVE INDIVIDUALLY

PREDICTION.py

We use this module for prediction and plotting of models of transformed data. Each curve will be plotted separately. It contains the following functions that must be executed in order.

Before running this script, you must create the following folders in the directory where your Python notebook is located:

  1. ./output/predictions/train/plots -- folder for saving training plots
  2. ./output/predictions/test/plots -- folder for saving test plots
  3. ./output/predictions/val/plots -- folder for saving validation plots
  4. ./output/predictions/train/data -- folder for sving train data
  5. ./output/predictions/test/data -- folder for saving test data
  6. ./output/predictions/val/data -- folder for saving val data

prepare_output_dir(OUTPUT_PATH)

-- the function prepare_output_dir takes the OUTPUT_PATH as an argument and removes all files in the output directory using os.walk method.

""" Args: OUTPUT_PATH(str): path to output folder

How to use: prepare_output_dir(OUTPUT_PATH) """


load_trained_model(MODEL_PATH, device)

--Uploading trained model

""" agrs: MODEL_PATH(str) = path to model directorium device = torch device CPU or CUDA How to use: model=load_trained_model(MODEL_PATH, device) """


get_criteria()

-- Gives the criterion and mse_metric

""" How to use: criterion, mseMetric=get_criteria() """


remove_padded_values_and_filter(folder_path)

-- Preparing data for plotting. It'll remove the padded values from lc and it'll delete artifitially added lc with plus and minus errors. If your lc are not padded it'll only delete additional curves

""" Args: folder_path (str): Path to folder where the curves are. In this case it'll be './dataset/test' or './dataset/train' or './dataset/val'

How to use: if name == "main": folder_path = "./dataset/test" # Change this to your dataset folder

remove_padded_values_and_filter(folder_path)

"""


plot_function(target_x, target_y, context_x, context_y, pred_y, var, target_test_x, lcName, save = False, flagval=0, isTrainData = None, notTrainData = None):

-- Defines the plots of the light curve data and predicted mean and variance, and it should be imported separately """

Args:
context_x: Array of shape BATCH_SIZE x NUM_CONTEXT that contains the x values of the context points.
context_y: Array of shape BATCH_SIZE x NUM_CONTEXT that contains the y values of the context points.
target_x: Array of shape BATCH_SIZE x NUM_TARGET that contains the x values of the target points.
target_y: Array of shape BATCH_SIZE x NUM_TARGET that contains the ground truth y values of the target points.
target_test_x: Array of shape BATCH_SIZE x 400 that contains uniformly spread points across in [-2, 2] range.
pred_y: Array of shape BATCH_SIZE x 400 that contains predictions across [-2, 2] range.
var: An array of shape BATCH_SIZE x 400  that contains the variance of predictions at target_test_x points.
"""

load_test_data(data_path)

-- takes data_path as an argument, creates a LighCurvesDataset and returns a PyTorch DataLoader for the test set. The DataLoader is used to load and preprocess the test data in batches """ Args: data_path(str): path to Test data

How to use: testLoader=load_test_data(DATA_PATH_TEST)

"""


load_train_data(data_path)

-- takes data_path as an argument, creates a LighCurvesDataset and returns a PyTorch DataLoader for the test set. The DataLoader is used to load and preprocess the test data in batches """ Args: data_path(str): path to train data

How to use: trainLoader=load_train_data(DATA_PATH_TRAIN)

"""


load_val_data(data_path)

-- takes data_path as an argument, creates a LighCurvesDataset and returns a PyTorch DataLoader for the test set. The DataLoader is used to load and preprocess the test data in batches """ Args: data_path(str): path to VAL data

How to use: valLoader=load_val_data(DATA_PATH_VAL)

"""


plot_light_curves_from_test_set(model, testLoader, criterion, mseMetric, plot_function, device)

-- Ploting the transformed light curves from test set

""" Args: model: Deterministic model testLoader: Uploaded test data criterion: criterion mseMetric: Mse Metric plot_function: plot function defined above device: torch device CPU or CUDA

   How to use: testMetrics = plot_light_curves_from_test_set(model, testLoader, criterion, mseMetric, plot_function, device)

"""


save_test_metrics(OUTPUT_PATH, testMetrics)

-- saving the test metrics as json file

""" Args: OUTPUT_PATH(str): path to output folder testMetrics: returned data from ploting function

   How to use: save_test_metrics(OUTPUT_PATH, testMetrics)

"""


plot_light_curves_from_train_set(model, trainLoader, criterion, mseMetric, plot_function, device)

-- Ploting the transformed light curves from train set

""" Args: model: Deterministic model trainLoader: Uploaded trained data criterion: criterion mseMetric: Mse Metric plot_function: plot function defined above device: torch device CPU or CUDA

   How to use: trainMetrics = plot_light_curves_from_train_set(model, trainLoader, criterion, mseMetric, plot_function, device)

"""


save_train_metrics(OUTPUT_PATH, testMetrics)

-- saving the train metrics as json file

""" Args: OUTPUT_PATH(str): path to output folder trainMetrics: returned data from ploting function

   How to use: save_train_metrics(OUTPUT_PATH, trainMetrics)

"""


plot_light_curves_from_val_set(model, valLoader, criterion, mseMetric, plot_function, device)

-- Ploting the transformed light curves from validation set

""" Args: model: Deterministic model valLoader: Uploaded val data criterion: criterion mseMetric: Mse Metric plot_function: plot function defined above device: torch device CPU or CUDA

   How to use: valMetrics = plot_light_curves_from_val_set(model, valLoader, criterion, mseMetric, plot_function, device)

"""


save_val_metrics(OUTPUT_PATH, valMetrics)

-- saving the validation metrics as json file

""" Args: OUTPUT_PATH(str): path to output folder valMetrics: returned data from ploting function

   How to use: save_val_metrics(OUTPUT_PATH, valMetrics)

"""


4. PREDICTION AND PLOTTING THE TRANSFORMED DATA, IN ONE PDF FILE

PREDICTION_onePDF.py

We use this module for prediction and plotting of models of transformed data. All curves will be plotted in one PDF file. This module contains the following functions that must be executed in order.

Before running this script, you must create the following folders in the directory where your Python notebook is located:

  1. ./output/predictions/train/plots -- folder for saving training plots
  2. ./output/predictions/test/plots -- folder for saving test plots
  3. ./output/predictions/val/plots -- folder for saving validation plots
  4. ./output/predictions/train/data -- folder for sving train data
  5. ./output/predictions/test/data -- folder for saving test data
  6. ./output/predictions/val/data -- folder for saving val data

clear_output_dir(output_path)

-- Removes all files in the specified output directory. """

Args:
    output_path (str): The path to the output directory.

    How to use: clear_output_dir(OUTPUT_PATH)

"""

load_model(model_path, device): --Loads a trained model from disk and moves it to the specified device. """ Args: model_path (str): The path to the saved model. device (str or torch.device): The device to load the model onto, CPU or CUDA

    How to use: model = load_model(MODEL_PATH, device)
"""

get_criteria()

-- Gives the criterion and mse_metric

""" How to use: criterion, mseMetric=get_criteria() """


remove_padded_values_and_filter(folder_path)

-- Preparing data for plotting. It'll remove the padded values from lc and it'll delete artifitially added lc with plus and minus errors. If your lc are not padded it'll only delete additional curves

""" Args: folder_path (str): Path to folder where the curves are. In this case it'll be './dataset/test' or './dataset/train' or './dataset/val'

How to use: if name == "main": folder_path = "./dataset/test" # Change this to your dataset folder

remove_padded_values_and_filter(folder_path)

"""


plot_function(target_x, target_y, context_x, context_y, pred_y, var, target_test_x, lcName, save = False, flagval=0, isTrainData = None, notTrainData = None):

-- Defines the plots of the light curve data and predicted mean and variance """

Args:
context_x: Array of shape BATCH_SIZE x NUM_CONTEXT that contains the x values of the context points.
context_y: Array of shape BATCH_SIZE x NUM_CONTEXT that contains the y values of the context points.
target_x: Array of shape BATCH_SIZE x NUM_TARGET that contains the x values of the target points.
target_y: Array of shape BATCH_SIZE x NUM_TARGET that contains the ground truth y values of the target points.
target_test_x: Array of shape BATCH_SIZE x 400 that contains uniformly spread points across in [-2, 2] range.
pred_y: Array of shape BATCH_SIZE x 400 that contains predictions across [-2, 2] range.
var: An array of shape BATCH_SIZE x 400  that contains the variance of predictions at target_test_x points.
"""

load_test_data(data_path)

-- takes data_path as an argument, creates a LighCurvesDataset and returns a PyTorch DataLoader for the test set. The DataLoader is used to load and preprocess the test data in batches """ Args: data_path(str): path to Test data

How to use: testLoader=load_test_data(DATA_PATH_TEST)

"""


load_train_data(data_path)

-- takes data_path as an argument, creates a LighCurvesDataset and returns a PyTorch DataLoader for the test set. The DataLoader is used to load and preprocess the test data in batches """ Args: data_path(str): path to train data

How to use: trainLoader=load_train_data(DATA_PATH_TRAIN)

"""


load_val_data(data_path)

-- takes data_path as an argument, creates a LighCurvesDataset and returns a PyTorch DataLoader for the test set. The DataLoader is used to load and preprocess the test data in batches """ Args: data_path(str): path to VAL data

How to use: valLoader=load_val_data(DATA_PATH_VAL)

"""


plot_test_light_curves(model, testLoader, criterion, mseMetric, plot_function, device) -- ploting the test set in range [-2,2]

""" Args: model: model testLoader: Test set criterion: criterion mseMetric: mse Metric plot_function: defined above device: Torch device, CPU or CUDA

how to use: testMetrics=plot_test_light_curves(model, testLoader, criterion, mseMetric, plot_function, out_pdf_test, device) """


save_test_metrics(OUTPUT_PATH, testMetrics)

-- saving the test metrics as json file

""" Args: OUTPUT_PATH(str): path to output folder testMetrics: returned data from ploting function

   How to use: save_test_metrics(OUTPUT_PATH, testMetrics)

"""


plot_train_light_curves(model, trainLoader, criterion, mseMetric, plot_function, device)

-- Ploting the transformed light curves from train set

""" Args: model: Deterministic model trainLoader: Uploaded trained data criterion: criterion mseMetric: Mse Metric plot_function: plot function defined above device: torch device CPU or CUDA

   How to use: trainMetrics=plot_train_light_curves(model, trainLoader, criterion, mseMetric, plot_function, device)

"""


save_train_metrics(OUTPUT_PATH, testMetrics)

-- saving the train metrics as json file

""" Args: OUTPUT_PATH(str): path to output folder trainMetrics: returned data from ploting function

   How to use: save_train_metrics(OUTPUT_PATH, trainMetrics)

"""


plot_val_light_curves(model, valLoader, criterion, mseMetric, plot_function, device)

-- Ploting the transformed light curves from val set

""" Args: model: Deterministic model valLoader: Uploaded val data criterion: criterion mseMetric: Mse Metric plot_function: plot function defined above device: torch device CPU or CUDA

   How to use: valMetrics = plot_val_light_curves(model, valLoader, criterion, mseMetric, plot_function, device)

"""


save_val_metrics(OUTPUT_PATH, valMetrics)

-- saving the val metrics as json file

""" Args: OUTPUT_PATH(str): path to output folder valMetrics: returned data from ploting function

   How to use: save_val_metrics(OUTPUT_PATH, valMetrics)

"""


5. PREDICTION AND PLOTTING THE DATA IN ORIGINAL DATA RANGE, EACH CURVE INDIVIDUALLY

PREDICTION_Original_mjd.py

We use this module to predict and plot the model in the original range of data. All curves are plotted individually. This module contains the following functions that must be executed in order.

Before running this script, you must create the following folders in the directory where your Python notebook is located:

  1. ./output/predictions/train/plots -- folder for saving training plots
  2. ./output/predictions/test/plots -- folder for saving test plots
  3. ./output/predictions/val/plots -- folder for saving validation plots
  4. ./output/predictions/train/data -- folder for sving train data
  5. ./output/predictions/test/data -- folder for saving test data
  6. ./output/predictions/val/data -- folder for saving val data

prepare_output_dir(OUTPUT_PATH)

-- the function prepare_output_dir takes the OUTPUT_PATH as an argument and removes all files in the output directory using os.walk method.

""" Args: OUTPUT_PATH(str): path to output folder

How to use: prepare_output_dir(OUTPUT_PATH) """


load_trained_model(MODEL_PATH, device)

--Uploading trained model

""" agrs: MODEL_PATH(str) = path to model directorium device = torch device CPU or CUDA How to use: model=load_trained_model(MODEL_PATH, device) """


get_criteria()

-- Gives the criterion and mse_metric

""" How to use: criterion, mseMetric=get_criteria() """


remove_padded_values_and_filter(folder_path)

-- Preparing data for plotting. It'll remove the padded values from lc and it'll delete artifitially added lc with plus and minus errors. If your lc are not padded it'll only delete additional curves

""" Args: folder_path (str): Path to folder where the curves are. In this case it'll be './dataset/test' or './dataset/train' or './dataset/val'

How to use: if name == "main": folder_path = "./dataset/test" # Change this to your dataset folder

remove_padded_values_and_filter(folder_path)

"""


load_trcoeff()

-- loading the original coefficients from pickle file

""" How to use: tr=load_trcoeff()

"""


plot_function2(tr,target_x, target_y, context_x, context_y, yerr1, pred_y, var, target_test_x, lcName, save = False, isTrainData = None, flagval = 0, notTrainData = None)

-- function for ploting the light curves

""" context_x: Array of shape BATCH_SIZE x NUM_CONTEXT that contains the x values of the context points. context_y: Array of shape BATCH_SIZE x NUM_CONTEXT that contains the y values of the context points. target_x: Array of shape BATCH_SIZE x NUM_TARGET that contains the x values of the target points. target_y: Array of shape BATCH_SIZE x NUM_TARGET that contains the ground truth y values of the target points. target_test_x: Array of shape BATCH_SIZE x 400 that contains uniformly spread points across in [-2, 2] range. yerr1: Array of shape BATCH_SIZE x NUM_measurement_error that contains the measurement errors. pred_y: Array of shape BATCH_SIZE x 400 that contains predictions across [-2, 2] range. var: An array of shape BATCH_SIZE x 400 that contains the variance of predictions at target_test_x points. tr: array of data in pickle format needed to backtransform data from [-2,2] x [-2,2] to MJD x Mag """


load_test_data(data_path)

-- takes data_path as an argument, creates a LighCurvesDataset and returns a PyTorch DataLoader for the test set. The DataLoader is used to load and preprocess the test data in batches """ Args: data_path(str): path to Test data

How to use: testLoader=load_test_data(DATA_PATH_TEST)

"""


load_train_data(data_path)

-- takes data_path as an argument, creates a LighCurvesDataset and returns a PyTorch DataLoader for the test set. The DataLoader is used to load and preprocess the test data in batches """ Args: data_path(str): path to train data

How to use: trainLoader=load_train_data(DATA_PATH_TRAIN)

"""


load_val_data(data_path)

-- takes data_path as an argument, creates a LighCurvesDataset and returns a PyTorch DataLoader for the test set. The DataLoader is used to load and preprocess the test data in batches """ Args: data_path(str): path to VAL data

How to use: valLoader=load_val_data(DATA_PATH_VAL)

"""


plot_test_data(model, testLoader, criterion, mseMetric, plot_function, device, tr)

-- Ploting the light curves from test set in original mjd range

""" Args: model: Deterministic model testLoader: Uploaded test data criterion: criterion mseMetric: Mse Metric plot_function: plot function defined above device: torch device CPU or CUDA tr: trcoeff from pickle file

   How to use: testMetrics=plot_test_data(model, testLoader, criterion, mseMetric, plot_function2, device, tr)

"""


save_test_metrics(OUTPUT_PATH, testMetrics)

-- saving the test metrics as json file

""" Args: OUTPUT_PATH(str): path to output folder testMetrics: returned data from ploting function

   How to use: save_test_metrics(OUTPUT_PATH, testMetrics)

"""


plot_train_light_curves(trainLoader, model, criterion, mseMetric, plot_function, device, tr)

-- Ploting the light curves from train set in original mjd range

""" Args: model: Deterministic model trainLoader: Uploaded trained data criterion: criterion mseMetric: Mse Metric plot_function: plot function defined above device: torch device CPU or CUDA tr: trcoeff from pickle file

   How to use: trainMetrics=plot_train_light_curves(trainLoader, model, criterion, mseMetric, plot_function2, device,tr)

"""


save_train_metrics(OUTPUT_PATH, testMetrics)

-- saving the train metrics as json file

""" Args: OUTPUT_PATH(str): path to output folder trainMetrics: returned data from ploting function

   How to use: save_train_metrics(OUTPUT_PATH, trainMetrics)

"""


plot_val_curves(model, valLoader, criterion, mseMetric, plot_function, device, tr)

-- Ploting the light curves from val set in original mjd range

""" Args: model: Deterministic model valLoader: Uploaded val data criterion: criterion mseMetric: Mse Metric plot_function: plot function defined above device: torch device CPU or CUDA tr: trcoeff from pikle file

   How to use: valMetrics=plot_val_curves(model, valLoader, criterion, mseMetric, plot_function2, device,tr)

"""


save_val_metrics(OUTPUT_PATH, valMetrics)

-- saving the val metrics as json file

""" Args: OUTPUT_PATH(str): path to output folder valMetrics: returned data from ploting function

   How to use: save_val_metrics(OUTPUT_PATH, valMetrics)

"""


6. PREDICTION AND PLOTTING THE DATA IN ORIGINAL DATA RANGE, IN ONE PDF FILE

PREDICTION_onePDF_original_mjd.py

We use this module to predict and plot the model in the original range of data. All curves are plotted in one PDF file. This module contains the following functions that must be executed in order.

Before running this script, you must create the following folders in the directory where your Python notebook is located:

  1. ./output/predictions/train/plots -- folder for saving training plots
  2. ./output/predictions/test/plots -- folder for saving test plots
  3. ./output/predictions/val/plots -- folder for saving validation plots
  4. ./output/predictions/train/data -- folder for sving train data
  5. ./output/predictions/test/data -- folder for saving test data
  6. ./output/predictions/val/data -- folder for saving val data

clear_output_dir(output_path)

-- Removes all files in the specified output directory. """

Args:
    output_path (str): The path to the output directory.

    How to use: clear_output_dir(OUTPUT_PATH)

"""

load_model(model_path, device): --Loads a trained model from disk and moves it to the specified device. """ Args: model_path (str): The path to the saved model. device (str or torch.device): The device to load the model onto, CPU or CUDA

    How to use: model = load_model(MODEL_PATH, device)
"""

get_criteria()

-- Gives the criterion and mse_metric

""" How to use: criterion, mseMetric=get_criteria() """


remove_padded_values_and_filter(folder_path)

-- Preparing data for plotting. It'll remove the padded values from lc and it'll delete artifitially added lc with plus and minus errors. If your lc are not padded it'll only delete additional curves

""" Args: folder_path (str): Path to folder where the curves are. In this case it'll be './dataset/test' or './dataset/train' or './dataset/val'

How to use: if name == "main": folder_path = "./dataset/test" # Change this to your dataset folder

remove_padded_values_and_filter(folder_path)

"""


load_trcoeff()

-- loading the original coefficients from pickle file

""" How to use: tr=load_trcoeff()

"""


plot_function2(tr,target_x, target_y, context_x, context_y, yerr1, pred_y, var, target_test_x, lcName, save = False, isTrainData = None, flagval = 0, notTrainData = None)

-- function for ploting the light curves. It needs to be uploaded separately

""" context_x: Array of shape BATCH_SIZE x NUM_CONTEXT that contains the x values of the context points. context_y: Array of shape BATCH_SIZE x NUM_CONTEXT that contains the y values of the context points. target_x: Array of shape BATCH_SIZE x NUM_TARGET that contains the x values of the target points. target_y: Array of shape BATCH_SIZE x NUM_TARGET that contains the ground truth y values of the target points. target_test_x: Array of shape BATCH_SIZE x 400 that contains uniformly spread points across in [-2, 2] range. yerr1: Array of shape BATCH_SIZE x NUM_measurement_error that contains the measurement errors. pred_y: Array of shape BATCH_SIZE x 400 that contains predictions across [-2, 2] range. var: An array of shape BATCH_SIZE x 400 that contains the variance of predictions at target_test_x points. tr: array of data in pickle format needed to backtransform data from [-2,2] x [-2,2] to MJD x Mag """


load_test_data(data_path)

-- takes data_path as an argument, creates a LighCurvesDataset and returns a PyTorch DataLoader for the test set. The DataLoader is used to load and preprocess the test data in batches """ Args: data_path(str): path to Test data

How to use: testLoader=load_test_data(DATA_PATH_TEST)

"""


load_train_data(data_path)

-- takes data_path as an argument, creates a LighCurvesDataset and returns a PyTorch DataLoader for the test set. The DataLoader is used to load and preprocess the test data in batches """ Args: data_path(str): path to train data

How to use: trainLoader=load_train_data(DATA_PATH_TRAIN)

"""


load_val_data(data_path)

-- takes data_path as an argument, creates a LighCurvesDataset and returns a PyTorch DataLoader for the test set. The DataLoader is used to load and preprocess the test data in batches """ Args: data_path(str): path to VAL data

How to use: valLoader=load_val_data(DATA_PATH_VAL)

"""


plot_test_light_curves(model, testLoader, criterion, mseMetric, plot_function2, device,tr)

-- ploting the test set in original range

""" Args: model: model testLoader: Test set criterion: criterion mseMetric: mse Metric plot_function2: defined above device: Torch device, CPU or CUDA tr: trcoeff from pickle file

how to use: testMetrics=plot_test_light_curves(model, testLoader, criterion, mseMetric, plot_function2, device,tr) """


save_test_metrics(OUTPUT_PATH, testMetrics)

-- saving the test metrics as json file

""" Args: OUTPUT_PATH(str): path to output folder testMetrics: returned data from ploting function

   How to use: save_test_metrics(OUTPUT_PATH, testMetrics)

"""


plot_train_light_curves(model, trainLoader, criterion, mseMetric, plot_function2, device,tr)

-- Ploting the light curves from train set in original mjd range

""" Args: model: Deterministic model trainLoader: Uploaded trained data criterion: criterion mseMetric: Mse Metric plot_function: plot function defined above device: torch device CPU or CUDA tr: trcoeff from pickle file

   How to use: trainMetrics=plot_train_light_curves(model, trainLoader, criterion, mseMetric, plot_function2, device,tr)

"""


save_train_metrics(OUTPUT_PATH, testMetrics)

-- saving the train metrics as json file

""" Args: OUTPUT_PATH(str): path to output folder trainMetrics: returned data from ploting function

   How to use: save_train_metrics(OUTPUT_PATH, trainMetrics)

"""


plot_val_light_curves(model, valLoader, criterion, mseMetric, plot_function2, device,tr)

-- Ploting the light curves from val set in original mjd range

""" Args: model: Deterministic model valLoader: Uploaded val data criterion: criterion mseMetric: Mse Metric plot_function: plot function defined above device: torch device CPU or CUDA tr: trcoeff from pickle file

   How to use: valMetrics=plot_val_light_curves(model, valLoader, criterion, mseMetric, plot_function2, device,tr)

"""


save_val_metrics(OUTPUT_PATH, valMetrics)

-- saving the val metrics as json file

""" Args: OUTPUT_PATH(str): path to output folder valMetrics: returned data from ploting function

   How to use: save_val_metrics(OUTPUT_PATH, valMetrics)

"""


THE END

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

QNPy-0.0.1.tar.gz (32.8 kB view hashes)

Uploaded Source

Built Distribution

QNPy-0.0.1-py3-none-any.whl (33.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page