Class SentenceTransformersFineTuner

All Implemented Interfaces:
IMatcher<org.apache.jena.ontology.OntModel,Alignment,Properties>, eu.sealsproject.platform.res.domain.omt.IOntologyMatchingToolBridge, eu.sealsproject.platform.res.tool.api.IPlugin, eu.sealsproject.platform.res.tool.api.IToolBridge

public class SentenceTransformersFineTuner extends TransformersBaseFineTuner
This matcher uses the Sentence Transformers library to build an embedding space for each resource given a textual representation of it. Thus this matcher does not filter anything but generates matching candidates based on the text.
  • Field Details

    • LOGGER

      private static final org.slf4j.Logger LOGGER
    • NEWLINE

      private static final String NEWLINE
    • testSize

      private float testSize
      A number between zero and one which represents the proportion of the data to include in the test split.
    • trainBatchSize

      private int trainBatchSize
    • testBatchSize

      private int testBatchSize
    • numberOfEpochs

      private int numberOfEpochs
    • loss

  • Constructor Details

  • Method Details

    • finetuneModel

      public File finetuneModel(File trainingFile) throws PythonServerException
      Description copied from class: TransformersBaseFineTuner
      Finetune a given model with the provided text in the csv file (three columns: first text, second text, label(0/1))
      Specified by:
      finetuneModel in class TransformersBaseFineTuner
      Parameters:
      trainingFile - csv file with three columns: first text, second text, label(0/1) (can be generated with TransformersBaseFineTuner.createTrainingFile(OntModel, OntModel, Alignment) )
      Returns:
      the final location (directory) of the finetuned model (which is also given in the constructor)
      Throws:
      PythonServerException
    • finetuneModel

      public float finetuneModel(File trainingFile, File validationFile) throws PythonServerException
      Run the training on the training file, but evaluate the best model on the validationFile. The model will be stored at resultingModelLocation given in the constructor.
      Parameters:
      trainingFile - the training file to use (can be generated with TransformersBaseFineTuner.createTrainingFile(OntModel, OntModel, Alignment)
      validationFile - the validation file to use (can be generated with TransformersBaseFineTuner.createTrainingFile(OntModel, OntModel, Alignment)
      Returns:
      the best score of the validation (using the file or train test split)
      Throws:
      PythonServerException - in case of some error during the learning
    • writeTrainingFile

      public int writeTrainingFile(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment trainingAlignment, File trainFile, boolean append) throws IOException
      Writes the correspondences to a file (append or not can be chosen by a parameter).
      Overrides:
      writeTrainingFile in class TransformersBaseFineTuner
      Parameters:
      source - the source model
      target - the target model
      trainingAlignment - the training alignment to be written to file.
      trainFile - the file to write all texts
      append - true if all content should be appended to the file.
      Returns:
      how many correspondences were written to the file.
      Throws:
      IOException - in case the writing fails
    • writeTripletFormat

      private int writeTripletFormat(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment trainingAlignment, File trainFile, boolean append) throws IOException
      Throws:
      IOException
    • writeOneTriplet

      private int writeOneTriplet(org.apache.jena.rdf.model.Resource anchor, org.apache.jena.rdf.model.Resource positive, org.apache.jena.rdf.model.Resource hardNegative, Map<org.apache.jena.rdf.model.Resource,Map<String,Set<String>>> cache, Writer writer) throws IOException
      Throws:
      IOException
    • setTrainingArguments

      public void setTrainingArguments(TransformersArguments trainingArguments)
      This class does not allow setting training argumnets. Everything is determined by attributes.
      Overrides:
      setTrainingArguments in class TransformersBase
      Parameters:
      trainingArguments - training arguments
    • setUsingTensorflow

      public void setUsingTensorflow(boolean usingTensorflow)
      This class only allows to set this value to false.
      Overrides:
      setUsingTensorflow in class TransformersBase
      Parameters:
      usingTensorflow - should be set to false-
    • getTestSize

      public float getTestSize()
      Returns a number between zero and one which represents the proportion of the data to include in the test split.
      Returns:
      a number between zero and one which represents the proportion of the data to include in the test split
    • setTestSize

      public void setTestSize(float testSize)
      Sets the number between zero and one which represents the proportion of the data to include in the test split
      Parameters:
      testSize - number between zero and one which represents the proportion of the data to include in the test split
    • getTrainBatchSize

      public int getTrainBatchSize()
    • setTrainBatchSize

      public void setTrainBatchSize(int trainBatchSize)
    • getTestBatchSize

      public int getTestBatchSize()
    • setTestBatchSize

      public void setTestBatchSize(int testBatchSize)
    • getNumberOfEpochs

      public int getNumberOfEpochs()
    • setNumberOfEpochs

      public void setNumberOfEpochs(int numberOfEpochs)
    • getLoss

      public SentenceTransformersLoss getLoss()
    • setLoss

      public void setLoss(SentenceTransformersLoss loss)