maica.ml¶
Base Classes¶
The maica.ml.base
module includes basic classes of machine learning algorithms.
It provides a wrapper class of Scikit-learn models and an abstract class of PyTorch models.
- class Model(alg_type: str, alg_name: str)¶
An abstract class of machine learning algorithm in MAICA. All machine learning algorithms in MAICA should inherit this class.
- class SKLearnModel(alg_name: str, **kwargs)¶
A wrapper class of the machine learning algorithms in the Scikit-learn library. This class provides a generic interface of
fit()
andpredict()
functions in the Scikit-learn algorithms.- fit(inputs: numpy.ndarray, targets: numpy.ndarray)¶
Fit model parameters for the given input and target data.
- Parameters
inputs – (numpy.ndarray) The input data of the training dataset.
targets – (numpy.ndarray) The target data of the training dataset.
- Returns
Trained model.
- predict(inputs: numpy.ndarray)¶
Predict target values of the given input data.
- Parameters
inputs – (numpy.ndarray) The input data of the dataset.
- Returns
Predicted values for the given
inputs
.
- save(path_model_file: str)¶
Save model parameters into a file in
path_model_file
.- Parameters
path_model_file – (str) The path of the model file.
- load(path_model_file: str)¶
Load model parameters in a file of
path_model_file
.- Parameters
path_model_file – (str) The path of the model file.
- class PyTorchModel(alg_name: str)¶
An abstract class of the machine learning algorithms in the PyTorch library. In MAICA, all PyTorch algorithms should inherit this class.
- gpu()¶
Move model parameters from CPU to GPU.
- Returns
Model object (self) in GPU.
- save(path_model_file: str)¶
Save model parameters into a file in
path_model_file
.- Parameters
path_model_file – (str) The path of the model file.
- load(path_model_file: str)¶
Load model parameters in a file of
path_model_file
.- Parameters
path_model_file – (str) The path of the model file.
- training: bool¶
Machine Learning Utilities¶
The maica.ml.util
module provides essential functions for training configuration and model reuse.
Most deep learning algorithms in MAICA are based on this module.
- get_batch_sizes(n_data: int, train_setting: int)¶
Return a batch size for stochastic gradient descent method according to the number of data and the train setting.
- Parameters
n_data – (int) The number of data will be used to train.
train_setting – (int) A hyperparameter determining the train settings.
- Returns
A list of candidate batch sizes.
- get_init_lrs(hparam_setting: int)¶
Return initial learning rates according to the given hyperparameter setting level.
- Parameters
hparam_setting – (int) A hyperparameter determining the hyperparameter optimization of the model.
- Returns
A list of candidate learning rates.
- get_data_loader(*data: object, batch_size: int = 8, shuffle: bool = False)¶
Generate data loader object for a given dataset. If the given data is
numpy.ndarray
, it returnstorch.DataLoader
object. If the data ismaica.data.GraphDataset
, it returnstorch_geometric.DataLoader
object to iterate the graph-structured data.- Parameters
data – (object) The dataset to be iterated by the data loader.
batch_size – (int, optional) The batch size of the data loader (default = 8).
shuffle – (int, optional) An option to randomly sample the data when the iterations of the data loader (default =
False
).
- Returns
Data loader object of
torch.DataLoader
ortorch_geometric.DataLoader
.
- get_optimizer(model_params: torch._C.Generator, gd_name: str, init_lr: float = 0.001, l2_reg: float = 1e-06)¶
Return a gradient descent optimizer to fit model parameters.
- Parameters
model_params – (torch.Generator) Model parameters to be trained by the generated optimizer.
gd_name – (str) A name of the gradient descent method to fit model parameters (defined in
maica.core.env
).init_lr – (float, optional) An initial learning rate of the gradient descent optimizer.
l2_reg – (float, optional) A coefficient of the L2 regularization in model parameters.
- Returns
A gradient descent optimizer to fit model parameters.
- get_loss_func(loss_func: str)¶
Return a loss function to evaluate the model performance.
- Parameters
loss_func – (str) A name of the loss function to evaluate model performance (defined in maica.core.env`).
- Returns
A loss function object to evaluate the model performance.
- get_model(alg_name: str, **kwargs)¶
Get a machine learning model for given algorithm name and model hyperparameters. The names of the algorithms are defined in
maica.core.env
.- Parameters
alg_name – (str) A name of the machine learning algorithm (defined in
maica.core.env
).kwargs – (optional) A dictionary containing model hyperparameters.
- Returns
A machine learning model.
- save_eval_results(task_name: str, model: maica.ml.base.Model, dataset_test: maica.data.base.Dataset, preds: numpy.ndarray)¶
Save the model parameters and the prediction results as a model file and an excel file.
- Parameters
task_name – (str) A name of your task.
model – (ml.base.Model) A model used will be evaluated.
dataset_test – (data.base.Dataset) A dataset used to the evaluation.
preds – (numpy.ndarray) Prediction results of the model for the dataset.
- save_interpretation(model: maica.ml.base.Model, file_name: str)¶
Save interpretable information of the machine learning algorithms. For non-interpretable algorithms, it raises
AssertionError
. The following algorithms support this function.Decision Tree Regression (ALG_DCTR).
Symbolic Regression (ALG_SYMR).
Gradient Boosting Tree Regression (ALG_GBTR).
- Parameters
model – (ml.base.Model) A machine learning algorithm to generate interpretation about the prediction.
file_name – (str) The path of file to store the generated interpretation.
Neural Networks¶
The maica.ml.nn
module provides an implementation of the most essential feedforward neural network.
The algorithms in this module are used to predict target values from the feature vectors and the chemical formulas.
- class FCNN(dim_in: int, dim_out: int)¶
Fully-connected neural network with the three hidden layers and the one output layer. For each hidden layer, the batch normalization technique is applied to accelerate the model training.
- forward(x: None._VariableFunctionsClass.tensor)¶
Predict target values for the given data
x
.- Parameters
x – (torch.tensor) A tensor containing input data of the model.
- Returns
A tensor containing predicted values.
- fit(data_loader: torch.utils.data.dataloader.DataLoader, optimizer: torch.optim.optimizer.Optimizer, criterion: object)¶
Fit the model parameters for the given dataset using the data loader, the optimizer, and the loss function. It iterates the parameter optimization once for the entire dataset.
- Parameters
data_loader – (torch.utils.data.DataLoader) A data loader to sample the data from the training dataset.
optimizer – (torch.optim.Optimizer) An optimizer to fit the model parameters.
criterion – (object) A loss function to evaluate the prediction performance of the model.
- Returns
Training loss.
- predict(data_loader: object)¶
Predict target values for the given dataset in the data loader.
- Parameters
data_loader – (torch.utils.data.DataLoader) A data loader to sample the data from the dataset.
- Returns
A NumPy array containing the predicted values.
- training: bool¶
- class Autoencoder(dim_in: int, dim_latent: int)¶
A neural network to generate latent embeddings of input data. It is trained to minimize the difference between the input and its output rather than to be trained based on the target values. The training problem of the autoencoder can be defined by:
\[heta^* = rgmin_{ heta} ||\mathbf{x} - \mathbf{x}^{'}||_2^2,\]where \(mathbf{x}\) is the input data, and \(\mathbf{x}^{'}\) is the predicted value of the autoencoder.
- forward(x: None._VariableFunctionsClass.tensor)¶
Perform encoding and decoding for the given data
x
.- Parameters
x – (torch.tensor) The input data of the model.
- Returns
The decoded input.
- enc(x: None._VariableFunctionsClass.tensor)¶
Generate the latent embedding of the input data
x
. This is calledencoding
in the autoencoders.- Parameters
x – (torch.tensor) The input data of the model.
- Returns
Latent embedding of the input data
x
.
- dec(z: None._VariableFunctionsClass.tensor)¶
Restore the input data from the latent embedding of the input data. This is called
decoding
in the autoencoders.- Parameters
z – (torch.tensor) The latent embedding.
- Returns
Restored input data from the given latent embedding
z
.
- fit(data_loader: torch.utils.data.dataloader.DataLoader, optimizer: torch.optim.optimizer.Optimizer)¶
Fit the model parameters for the given dataset using the data loader and the optimizer. It iterates the parameter optimization once for the entire dataset.
- Parameters
data_loader – (torch.utils.data.DataLoader) A data loader to sample the data from the training dataset.
optimizer – (torch.optim.Optimizer) An optimizer to fit the model parameters.
- Returns
Training loss.
- predict(data_loader: object)¶
Predict target values for the given dataset in the data loader.
- Parameters
data_loader – (torch.utils.data.DataLoader) A data loader to sample the data from the dataset.
- Returns
A NumPy array containing the predicted values.
- training: bool¶
Graph Neural Networks¶
The maica.ml.gnn
module includes various implementation of graph neural networks from the torch_geometric library.
It provides pre-defined graph neural networks for the structure-based predictions.
- class GNN(dim_out: int, alg_name: str)¶
Abstract class of the graph neural networks. It defines two functions for parameter optimization and prediction.
- fit(data_loader: torch_geometric.data.dataloader.DataLoader, optimizer: torch.optim.optimizer.Optimizer, criterion: object)¶
Fit the model parameters for the given dataset in the data loader, optimizer, and loss function. It iterates the parameter optimization once for the entire dataset.
- Parameters
data_loader – (torch_geometric.data.DataLoader) A data loader to sample the data from the training dataset.
optimizer – (torch.optim.Optimizer) An optimizer to fit the model parameters.
criterion – (object) A loss function to evaluate the prediction performance of the model.
- Returns
Training loss.
- predict(data_loader: torch_geometric.data.dataloader.DataLoader)¶
Predict target values for the given dataset in the data loader.
- Parameters
data_loader – (torch_geometric.data.DataLoader) A data loader to sample the data from the dataset.
- Returns
A NumPy array containing the predicted values.
- training: bool¶
- class GCN(n_node_feats: int, dim_out: int, n_graphs: int = 1, readout: str = 'mean')¶
Graph convolutional network (GCN) form the “Semi-supervised Classification with Graph Convolutional Networks” paper.
- forward(g: torch_geometric.data.batch.Batch)¶
Predict target values for the given Batch object.
- Parameters
g – (torch_geometric.data.Batch) An input Batch object of the torch_geometric.data.Data objects.
- Returns
Target values.
- training: bool¶
- class GAT(n_node_feats: int, dim_out: int, n_graphs: int = 1, readout: str = 'mean')¶
Graph attention network (GAT) form the “Graph Attention Networks” paper.
- forward(g: torch_geometric.data.batch.Batch)¶
Predict target values for the given Batch object.
- Parameters
g – (torch_geometric.data.Batch) An input Batch object of the torch_geometric.data.Data objects.
- Returns
Target values.
- training: bool¶
- class GIN(n_node_feats: int, dim_out: int, n_graphs: int = 1, readout: str = 'mean')¶
Graph isomorphism network (GIN) form the “How Powerful are Graph Neural Networks?” paper.
- forward(g: torch_geometric.data.batch.Batch)¶
Predict target values for the given Batch object.
- Parameters
g – (torch_geometric.data.Batch) An input Batch object of the torch_geometric.data.Data objects.
- Returns
Target values.
- training: bool¶
- class CGCNN(n_node_feats: int, n_edge_feats: int, dim_out: int, n_graphs: int = 1, readout: str = 'mean')¶
Crystal graph convolutional neural network (CGCNN) form the “Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties” paper.
- forward(g: torch_geometric.data.batch.Batch)¶
Predict target values for the given Batch object.
- Parameters
g – (torch_geometric.data.Batch) An input Batch object of the torch_geometric.data.Data objects.
- Returns
Target values.
- training: bool¶
Transfer Learning¶
This module includes several functions to perform transfer learning using neural networks.
Popular transfer learning methods called retrain head
and fine tuning
are provided as built-in functions.
- tl_retrain_head(path_source_model, model_target: maica.ml.base.Model, dataset_target: maica.data.base.Dataset, gd_name: str = 'adam', init_lr: float = 0.001, l2_reg: float = 1e-06, loss_func: str = 'mae', max_epoch: int = 300)¶
Perform transfer learning based on feature extractor of the source model. The feature extraction layers of the source model are frozen in the training on the target dataset. Only the prediction layer called head is trained during the training on the target dataset. This transfer learning method is called ‘retrain head’.
- Parameters
path_source_model – (str) The path of the model file of the source model.
model_target – (ml.base.Model) Target model that will be used for transfer learning.
dataset_target – (data.base.Dataset) Target dataset for transfer learning.
gd_name – (str, optional) A name of the optimizer to train the target model (default = AdamOptimizer)
init_lr – (float, optional) An initial learning rate of the gradient descent optimizer (default = 1e-3).
l2_reg – (float, optional) A coefficient of the L2 regularization in model parameters (Default = 1e-6).
loss_func – (str, optional) A name of loss function to evaluate the model performance (default = Mean Absolute Error).
max_epoch – (int, optional) The maximum iteration of the model parameter optimization in the training.
- Returns
Trained model.
- tl_fine_tuning(path_source_model, model_target: maica.ml.base.Model, dataset_target: maica.data.base.Dataset, gd_name: str = 'adam', init_lr: float = 1e-06, l2_reg: float = 1e-06, loss_func: str = 'mae', max_epoch: int = 300)¶
Perform transfer learning based on a pre-trained source model. The target model is initialized by the model parameters of the source model. After the initialization, the target model is trained on the target dataset with a small learning rate. This transfer learning method is called ‘fine tuning’.
- Parameters
path_source_model – (str) The path of the model file of the source model.
model_target – (ml.base.Model) Target model that will be used for transfer learning.
dataset_target – (data.base.Dataset) Target dataset for transfer learning.
gd_name – (str, optional) A name of the optimizer to train the target model (default = AdamOptimizer)
init_lr – (float, optional) An initial learning rate of the gradient descent optimizer (default = 1e-3).
l2_reg – (float, optional) A coefficient of the L2 regularization in model parameters (Default = 1e-6).
loss_func – (str, optional) A name of loss function to evaluate the model performance (default = Mean Absolute Error).
max_epoch – (int, optional) The maximum iteration of the model parameter optimization in the training.
- Returns
Trained model.