Molecular
Main Import
import jaqpotpy.models
Creating Molecular models with integration for libraries like sklearn, pytorch, pytorch-geometric
MolecularSKLearn
CLASS: MolecularSKLearn(dataset,doa,model,eval=None,preprocess=None)
Class for creating and training a model provided by sklearn model (SVM,Random Forest, ...).
sklearn: https://scikit-learn.org/stable/
Parameters
dataset(MolecularDataset)
: Dataset for the training procedure. Must be created from jaqpotpy. Example: SmilesDataset (See more on Molecular_dataset docs).doa(DOA)
: Domain of applicability function. Check doa docs for more.model(Any)
:Sklearn model for training.eval(Evaluator,optional)
: Evaluator for the model. See evaluator docs for more.preprocess(Preprocesses, optional)
: Preprocessing function. See more on preprocesses docs.
Defining arguments before training an SKlearn molecular model
from jaqpotpy.datasets import SmilesDataset
from jaqpotpy.doa.doa import Leverage
from jaqpotpy.models.evaluator import Evaluator
from jaqpotpy.descriptors.molecular import MACCSKeysFingerprint
from sklearn.metrics import accuracy_score, roc_auc_score
# Declare the Featurizer and the Evaluator's metrics
featurizer = TopologicalFingerprint()
val = Evaluator()
val.register_scoring_function('ACC', accuracy_score)
val.register_scoring_function('AUC', roc_auc_score)
# Create test dataset
train_dataset = SmilesDataset(smiles = [train_smiles], y = [train_y], featurizer = featurizer, task='classification')
jaq_train.create()
# Create validation dataset
validation_dataset = SmilesDataset(smiles = [validation_smiles], y = [validation_y], featurizer = featurizer, task='classification')
validation_dataset.create()
#Update the Evaluator's dataset
val.dataset = jaq_val
Now you can create an sklearn model with the above inputs and perform various actions.
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier()
model = MolecularSKLearn(train_dataset,doa=Leverage(), model=knn, eval=val)
# Fit the model
trained = model.fit()
# Make a prediction on a smile
trained('Cc1cc(NS(=O)(=O)c2ccc(N)cc2)no1')
trained.prediction