maica.core¶
The maica.data
module includes essential utilities and classes to load various chemical data from data files.
All chemical data is abstracted to numerical vector or mathematical graph by this module.
This module supports the following four essential data types.
Numerical vector.
Chemical formula.
Molecular structure.
Crystal structure.
In addition to the above essential data types, a type of composite data of them can be handled by this module. It is useful to process complex chemical data, such as molecule-to-molecule interactions and crystal structure in specific conditions.
Environment Variables¶
This is a core module of MAICA that includes environment variables to define prediction task, machine learning algorithm, and other algorithmic descriptors.
Data Formats
Variable Name |
Value |
Description |
---|---|---|
DATA_TYPE_VEC |
vec |
Numerical vector |
DATA_TYPE_FORM |
form |
Chemical formula |
DATA_TYPE_SMILES |
smiles |
Simplified Molecular-Input Line-Entry System |
DATA_TYPE_CIF |
cif |
Crystallographic information file |
Feature Types
Variable Name |
Value |
Description |
---|---|---|
FEAT_TYPE_NUM |
‘num’ |
Numerical value |
FEAT_TYPE_FORM |
‘form’ |
Chemical formula |
FEAT_TYPE_STRUCT |
‘struct’ |
Mathematical graph |
Imputation Methods
Variable Name |
Value |
Description |
---|---|---|
IMPUTE_MEAN |
‘mean’ |
Fill empty value by mean |
IMPUTE_ZERO |
‘zero’ |
Fill empty value by zero |
IMPUTE_KNN |
‘knn’ |
Fill empty value by k-nearest neighbor data. |
Machine Learning Algorithms
Variable Name |
Value |
Description |
---|---|---|
ALG_SKLEARN |
‘sklearn’ |
Abstract algorithm of Scikit-learn models |
ALG_PYTORCH |
‘pytorch’ |
Abstract algorithm of PyTorch models |
ALG_LR |
‘lr’ |
Linear Regression |
ALG_LASSO |
‘lasso’ |
Least absolute shrinkage and selection operator |
ALG_DCTR |
‘dctr’ |
Decision tree regression |
ALG_SYMR |
‘symr’ |
Symbolic regression |
ALG_KRR |
‘krr’ |
Kernel ridge regression |
ALG_KNNR |
‘knnr’ |
K-nearest neighbor regression |
ALG_SVR |
‘svr’ |
Support vector regression |
ALG_GPR |
‘gpr’ |
Gaussian process regression |
ALG_GBTR |
‘gbtr’ |
Gradient boosting tree regression |
ALG_FCNN |
‘fcnn’ |
Fully-connected neural network |
ALG_ATE |
‘autoencoder’ |
Autoencoder |
ALG_DOPNET |
‘dopnet’ |
DopNet |
ALG_GCN |
‘gcn’ |
Graph convolutional network |
ALG_GAT |
‘gat’ |
Graph attention network |
ALG_GIN |
‘gin’ |
Graph isomorphism network |
ALG_CGCNN |
‘cgcnn’ |
Crystal graph convolutional neural network |
Manifold Learning Methods
Variable Name |
Value |
Description |
---|---|---|
EMB_TSNE |
‘tsne’ |
t-distributed stochastic neighbor embedding |
EMB_SPECT |
‘spect’ |
Spectral clustering |
Pipeline Types of the Machine Learning Algorithms
Variable Name |
Value |
---|---|
ALGS_SKLEARN |
[ |
ALG_PYTORCH |
[ |
Optimization Types
Variable Name |
Value |
Description |
---|---|---|
PROB_MIN |
‘min’ |
Minimization problem |
PROB_MAX |
‘max’ |
Maximization problem |
Meta-heuristic Algorithms
Variable Name |
Value |
Description |
---|---|---|
OPT_EP |
‘ep’ |
Evolutionary programming |
System Utilities¶
This is a core module of MAICA that includes base utilities of the library.
- set_gpu_runnable(runnable: bool)¶
Set GPU enable for running machine learning algorithm in GPU.
- Parameters
runnable – A flag to indicate the GPU enable.
- Returns
None
Example:
>>> set_gpu_runnable(True) # Enable GPU.