eu.sealsproject.platform.res.tool.impl.AbstractPlugin

de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.SentenceTransformersFineTuner

All Implemented Interfaces:: IMatcher<org.apache.jena.ontology.OntModel,Alignment,Properties>, eu.sealsproject.platform.res.domain.omt.IOntologyMatchingToolBridge, eu.sealsproject.platform.res.tool.api.IPlugin, eu.sealsproject.platform.res.tool.api.IToolBridge

public class SentenceTransformersFineTuner extends TransformersBaseFineTuner

This matcher uses the Sentence Transformers library to build an embedding space for each resource given a textual representation of it. Thus this matcher does not filter anything but generates matching candidates based on the text.

Field Summary

Fields

Modifier and Type

Field

Description

private static final org.slf4j.Logger

LOGGER

private SentenceTransformersLoss

loss

private static final String

NEWLINE

private int

numberOfEpochs

private int

testBatchSize

private float

testSize

A number between zero and one which represents the proportion of the data to include in the test split.

private int

trainBatchSize

Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBaseFineTuner
additionallySwitchSourceTarget, resultingModelLocation, trainingFile

Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase
cudaVisibleDevices, extractor, modelName, multipleTextsToMultipleExamples, multiProcessing, trainingArguments, transformersCache, usingTensorflow

Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
FILE_PREFIX, FILE_SUFFIX
Constructor Summary

Constructors

Constructor

Description

SentenceTransformersFineTuner(TextExtractorMap extractor, String initialModelName, File resultingModelLocation)

Run the training of a NLP sentence transformers.

SentenceTransformersFineTuner(TextExtractor extractor, String initialModelName, File resultingModelLocation)

Run the training of a NLP sentence transformers.
Method Summary

Modifier and Type

Method

Description

File

finetuneModel(File trainingFile)

Finetune a given model with the provided text in the csv file (three columns: first text, second text, label(0/1))

float

finetuneModel(File trainingFile, File validationFile)

Run the training on the training file, but evaluate the best model on the validationFile.

SentenceTransformersLoss

getLoss()

int

getNumberOfEpochs()

int

getTestBatchSize()

float

getTestSize()

Returns a number between zero and one which represents the proportion of the data to include in the test split.

int

getTrainBatchSize()

void

setLoss(SentenceTransformersLoss loss)

void

setNumberOfEpochs(int numberOfEpochs)

void

setTestBatchSize(int testBatchSize)

void

setTestSize(float testSize)

Sets the number between zero and one which represents the proportion of the data to include in the test split

void

setTrainBatchSize(int trainBatchSize)

void

setTrainingArguments(TransformersArguments trainingArguments)

This class does not allow setting training argumnets.

void

setUsingTensorflow(boolean usingTensorflow)

This class only allows to set this value to false.

private int

writeOneTriplet(org.apache.jena.rdf.model.Resource anchor, org.apache.jena.rdf.model.Resource positive, org.apache.jena.rdf.model.Resource hardNegative, Map<org.apache.jena.rdf.model.Resource,Map<String,Set<String>>> cache, Writer writer)

int

writeTrainingFile(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment trainingAlignment, File trainFile, boolean append)

Writes the correspondences to a file (append or not can be chosen by a parameter).

private int

writeTripletFormat(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment trainingAlignment, File trainFile, boolean append)

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBaseFineTuner
clearTrainingData, createTrainingFile, finetuneModel, getResultingModelLocation, getTrainingFile, isAdditionallySwitchSourceTarget, match, setAdditionallySwitchSourceTarget, setResultingModelLocation, writeClassificationFormat

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase
addTrainingArgument, getCudaVisibleDevices, getCudaVisibleDevicesButOnlyOneGPU, getExamplesForBatchSizeOptimization, getExtractor, getExtractorMap, getModelName, getMultiProcessing, getTextualRepresentation, getTrainingArguments, getTransformersCache, isMultipleTextsToMultipleExamples, isOptimizeForMixedPrecisionTraining, isUsingTensorflow, setCudaVisibleDevices, setCudaVisibleDevices, setExtractor, setExtractorMap, setModelName, setMultipleTextsToMultipleExamples, setMultiProcessing, setOptimizeForMixedPrecisionTraining, setTransformersCache, writeExamplesToFile

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAAJena
getModelSpec, match, readOntology

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAA
match

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
match

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherURL
align, align, canExecute, getType

Methods inherited from class eu.sealsproject.platform.res.tool.impl.AbstractPlugin
getId, getVersion, setId, setVersion

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface eu.sealsproject.platform.res.tool.api.IPlugin
getId, getVersion

Field Details
- LOGGER
  
  private static final org.slf4j.Logger LOGGER
- NEWLINE
  
  private static final String NEWLINE
- testSize
  
  private float testSize
  
  A number between zero and one which represents the proportion of the data to include in the test split.
- trainBatchSize
  
  private int trainBatchSize
- testBatchSize
  
  private int testBatchSize
- numberOfEpochs
  
  private int numberOfEpochs
- loss
  
  private SentenceTransformersLoss loss
Constructor Details
- SentenceTransformersFineTuner
  
  public SentenceTransformersFineTuner(TextExtractorMap extractor, String initialModelName, File resultingModelLocation)
  
  Run the training of a NLP sentence transformers.
  
  Parameters:
  
  extractor - used to extract text from a given resource. This is the text which represents a resource.
  
  initialModelName - the initial model name for fine tuning which can be downloaded or a path to a directory containing model weights ( see first parameter pretrained_model_name_or_path of the from_pretrained function in huggingface library). This value can be also changed by TransformersBase.setModelName(java.lang.String).
  
  resultingModelLocation - the final location where the fine-tuned model should be stored.
- SentenceTransformersFineTuner
  
  public SentenceTransformersFineTuner(TextExtractor extractor, String initialModelName, File resultingModelLocation)
  
  Run the training of a NLP sentence transformers.
  
  Parameters:
  
  extractor - used to extract text from a given resource. This is the text which represents a resource.
  
  initialModelName - the initial model name for fine tuning which can be downloaded or a path to a directory containing model weights ( see first parameter pretrained_model_name_or_path of the from_pretrained function in huggingface library). This value can be also changed by TransformersBase.setModelName(java.lang.String).
  
  resultingModelLocation - the final location where the fine-tuned model should be stored.
Method Details
- finetuneModel
  
  public File finetuneModel(File trainingFile) throws PythonServerException
  
  Description copied from class: TransformersBaseFineTuner
  
  Finetune a given model with the provided text in the csv file (three columns: first text, second text, label(0/1))
  
  Specified by:
  
  finetuneModel in class TransformersBaseFineTuner
  
  Parameters:
  
  trainingFile - csv file with three columns: first text, second text, label(0/1) (can be generated with TransformersBaseFineTuner.createTrainingFile(OntModel, OntModel, Alignment) )
  
  Returns:
  
  the final location (directory) of the finetuned model (which is also given in the constructor)
  
  Throws:
  
  PythonServerException
- finetuneModel
  
  public float finetuneModel(File trainingFile, File validationFile) throws PythonServerException
  
  Run the training on the training file, but evaluate the best model on the validationFile. The model will be stored at resultingModelLocation given in the constructor.
  
  Parameters:
  
  trainingFile - the training file to use (can be generated with TransformersBaseFineTuner.createTrainingFile(OntModel, OntModel, Alignment)
  
  validationFile - the validation file to use (can be generated with TransformersBaseFineTuner.createTrainingFile(OntModel, OntModel, Alignment)
  
  Returns:
  
  the best score of the validation (using the file or train test split)
  
  Throws:
  
  PythonServerException - in case of some error during the learning
- writeTrainingFile
  
  public int writeTrainingFile(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment trainingAlignment, File trainFile, boolean append) throws IOException
  
  Writes the correspondences to a file (append or not can be chosen by a parameter).
  
  Overrides:
  
  writeTrainingFile in class TransformersBaseFineTuner
  
  Parameters:
  
  source - the source model
  
  target - the target model
  
  trainingAlignment - the training alignment to be written to file.
  
  trainFile - the file to write all texts
  
  append - true if all content should be appended to the file.
  
  Returns:
  
  how many correspondences were written to the file.
  
  Throws:
  
  IOException - in case the writing fails
- writeTripletFormat
  
  private int writeTripletFormat(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment trainingAlignment, File trainFile, boolean append) throws IOException
  
  Throws:
  
  IOException
- writeOneTriplet
  
  private int writeOneTriplet(org.apache.jena.rdf.model.Resource anchor, org.apache.jena.rdf.model.Resource positive, org.apache.jena.rdf.model.Resource hardNegative, Map<org.apache.jena.rdf.model.Resource,Map<String,Set<String>>> cache, Writer writer) throws IOException
  
  Throws:
  
  IOException
- setTrainingArguments
  
  public void setTrainingArguments(TransformersArguments trainingArguments)
  
  This class does not allow setting training argumnets. Everything is determined by attributes.
  
  Overrides:
  
  setTrainingArguments in class TransformersBase
  
  Parameters:
  
  trainingArguments - training arguments
- setUsingTensorflow
  
  public void setUsingTensorflow(boolean usingTensorflow)
  
  This class only allows to set this value to false.
  
  Overrides:
  
  setUsingTensorflow in class TransformersBase
  
  Parameters:
  
  usingTensorflow - should be set to false-
- getTestSize
  
  public float getTestSize()
  
  Returns a number between zero and one which represents the proportion of the data to include in the test split.
  
  Returns:
  
  a number between zero and one which represents the proportion of the data to include in the test split
- setTestSize
  
  public void setTestSize(float testSize)
  
  Sets the number between zero and one which represents the proportion of the data to include in the test split
  
  Parameters:
  
  testSize - number between zero and one which represents the proportion of the data to include in the test split
- getTrainBatchSize
  
  public int getTrainBatchSize()
- setTrainBatchSize
  
  public void setTrainBatchSize(int trainBatchSize)
- getTestBatchSize
  
  public int getTestBatchSize()
- setTestBatchSize
  
  public void setTestBatchSize(int testBatchSize)
- getNumberOfEpochs
  
  public int getNumberOfEpochs()
- setNumberOfEpochs
  
  public void setNumberOfEpochs(int numberOfEpochs)
- getLoss
  
  public SentenceTransformersLoss getLoss()
- setLoss
  
  public void setLoss(SentenceTransformersLoss loss)

Class SentenceTransformersFineTuner

Field Summary

Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBaseFineTuner

Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase

Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile

Constructor Summary

Method Summary

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBaseFineTuner

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAAJena

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAA

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherURL

Methods inherited from class eu.sealsproject.platform.res.tool.impl.AbstractPlugin

Methods inherited from class java.lang.Object

Methods inherited from interface eu.sealsproject.platform.res.tool.api.IPlugin

Field Details

LOGGER

NEWLINE

testSize

trainBatchSize

testBatchSize

numberOfEpochs

loss

Constructor Details

SentenceTransformersFineTuner

SentenceTransformersFineTuner

Method Details

finetuneModel

finetuneModel

writeTrainingFile

writeTripletFormat

writeOneTriplet

setTrainingArguments

setUsingTensorflow

getTestSize

setTestSize

getTrainBatchSize

setTrainBatchSize

getTestBatchSize

setTestBatchSize

getNumberOfEpochs

setNumberOfEpochs

getLoss

setLoss