maica.core

The maica.data module includes essential utilities and classes to load various chemical data from data files. All chemical data is abstracted to numerical vector or mathematical graph by this module. This module supports the following four essential data types.

  • Numerical vector.

  • Chemical formula.

  • Molecular structure.

  • Crystal structure.

In addition to the above essential data types, a type of composite data of them can be handled by this module. It is useful to process complex chemical data, such as molecule-to-molecule interactions and crystal structure in specific conditions.

Environment Variables

This is a core module of MAICA that includes environment variables to define prediction task, machine learning algorithm, and other algorithmic descriptors.

Data Formats

Variable Name

Value

Description

DATA_TYPE_VEC

vec

Numerical vector

DATA_TYPE_FORM

form

Chemical formula

DATA_TYPE_SMILES

smiles

Simplified Molecular-Input Line-Entry System

DATA_TYPE_CIF

cif

Crystallographic information file

Feature Types

Variable Name

Value

Description

FEAT_TYPE_NUM

‘num’

Numerical value

FEAT_TYPE_FORM

‘form’

Chemical formula

FEAT_TYPE_STRUCT

‘struct’

Mathematical graph

Imputation Methods

Variable Name

Value

Description

IMPUTE_MEAN

‘mean’

Fill empty value by mean

IMPUTE_ZERO

‘zero’

Fill empty value by zero

IMPUTE_KNN

‘knn’

Fill empty value by k-nearest neighbor data.

Machine Learning Algorithms

Variable Name

Value

Description

ALG_SKLEARN

‘sklearn’

Abstract algorithm of Scikit-learn models

ALG_PYTORCH

‘pytorch’

Abstract algorithm of PyTorch models

ALG_LR

‘lr’

Linear Regression

ALG_LASSO

‘lasso’

Least absolute shrinkage and selection operator

ALG_DCTR

‘dctr’

Decision tree regression

ALG_SYMR

‘symr’

Symbolic regression

ALG_KRR

‘krr’

Kernel ridge regression

ALG_KNNR

‘knnr’

K-nearest neighbor regression

ALG_SVR

‘svr’

Support vector regression

ALG_GPR

‘gpr’

Gaussian process regression

ALG_GBTR

‘gbtr’

Gradient boosting tree regression

ALG_FCNN

‘fcnn’

Fully-connected neural network

ALG_ATE

‘autoencoder’

Autoencoder

ALG_DOPNET

‘dopnet’

DopNet

ALG_GCN

‘gcn’

Graph convolutional network

ALG_GAT

‘gat’

Graph attention network

ALG_GIN

‘gin’

Graph isomorphism network

ALG_CGCNN

‘cgcnn’

Crystal graph convolutional neural network

Manifold Learning Methods

Variable Name

Value

Description

EMB_TSNE

‘tsne’

t-distributed stochastic neighbor embedding

EMB_SPECT

‘spect’

Spectral clustering

Pipeline Types of the Machine Learning Algorithms

Variable Name

Value

ALGS_SKLEARN

[ALG_SKLEARN, ALG_LR, ALG_LASSO, ALG_DCTR, ALG_SYMR, ALG_KNNR, ALG_SVR, ALG_GPR, ALG_GBTR]

ALG_PYTORCH

[ALG_PYTORCH, ALG_FCNN, ALG_DOPNET, ALG_GCN, ALG_GAT, ALG_GIN, ALG_CGCNN]

Optimization Types

Variable Name

Value

Description

PROB_MIN

‘min’

Minimization problem

PROB_MAX

‘max’

Maximization problem

Meta-heuristic Algorithms

Variable Name

Value

Description

OPT_EP

‘ep’

Evolutionary programming

System Utilities

This is a core module of MAICA that includes base utilities of the library.

set_gpu_runnable(runnable: bool)

Set GPU enable for running machine learning algorithm in GPU.

Parameters

runnable – A flag to indicate the GPU enable.

Returns

None

Example:

>>> set_gpu_runnable(True) # Enable GPU.