All Implemented Interfaces:
Filter, IMatcher<org.apache.jena.ontology.OntModel,Alignment,Properties>, eu.sealsproject.platform.res.domain.omt.IOntologyMatchingToolBridge, eu.sealsproject.platform.res.tool.api.IPlugin, eu.sealsproject.platform.res.tool.api.IToolBridge

public class TransformersFilter extends TransformersBase implements Filter
This filter extracts the corresponding text for a resource (with the specified and customizable extractor) given all correspondences in the input alignment. The texts of the two resources are fed into the specified transformer model and the prediction is added in form of a confidence to the correspondence. No filtering is applied in this class.
  • Field Details

    • LOGGER

      private static final org.slf4j.Logger LOGGER

      private static final String NEWLINE
    • changeClass

      private boolean changeClass
    • batchSizeOptimization

      private BatchSizeOptimization batchSizeOptimization
  • Constructor Details

  • Method Details

    • match

      public Alignment match(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment inputAlignment, Properties properties) throws Exception
      Description copied from class: MatcherYAAAJena
      Aligns two ontologies specified via a Jena OntModel, with an input alignment as Alignment object, and returns the mapping of the resulting alignment. Note: This method might be called multiple times in a row when using the evaluation framework. Make sure to return a mapping which is specific to the given inputs.
      Specified by:
      match in interface IMatcher<org.apache.jena.ontology.OntModel,Alignment,Properties>
      Specified by:
      match in class MatcherYAAAJena
      source - This OntModel represents the source ontology.
      target - This OntModel represents the target ontology.
      inputAlignment - This mapping represents the input alignment.
      properties - Additional properties.
      The resulting alignment of the matching process.
      Exception - Any exception which occurs during matching.
    • createPredictionFile

      public Map<Correspondence,List<Integer>> createPredictionFile(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment predictionAlignment, File outputFile, boolean append) throws IOException
      Create the prediction file which is a CSV file with two columns.The first column is the text from the left resource and the second column is the text from the right resource.
      source - The source model
      target - The target model
      predictionAlignment - the alignment to process. All correspondences which have enough text are used.
      outputFile - the csv file to which the output should be written to.
      append - if true, then the training alignment is append to the given file.
      the map which maps the the correspondence to (possibly multiple) row numbers. In case of multipleTextsToMultipleExamples is set to true, multiple rows can correspond to one correspondence, because each text (e.g. label, comment etc) of the two resources is used as an example.
      IOException - in case the writing fails.
    • predictConfidences

      public List<Double> predictConfidences(File predictionFilePath) throws Exception
      Run huggingface transformers library.
      predictionFilePath - path to csv file with two columns (text left and text right).
      a list of confidences
      Exception - in case something goes wrong.
    • getMaximumPerDeviceEvalBatchSize

      protected int getMaximumPerDeviceEvalBatchSize(File trainingFile)
      This functions tries to execute the prediction with number of example equal to the tested batch size. It will start with 2 and checks only powers of 2.
      trainingFile - the training file to use
      the maximum per_device_eval_batch_size
    • isChangeClass

      public boolean isChangeClass()
      Return true if the class is changed in the classification. This is useful if a pretrained model predict exactly the opposite class.
      true if the class is changed in the classification.
    • setChangeClass

      public void setChangeClass(boolean changeClass)
      If set to true, the class is changed in the classification. This is useful if a pretrained model predict exactly the opposite class.
      changeClass - true if the class should be changed in the classification.
    • isOptimizeBatchSize

      public boolean isOptimizeBatchSize()
      better use getBatchSizeOptimization
      Return true if batch size optimization is turned on.
      true if batch size optimization is turned on.
    • setOptimizeBatchSize

      public void setOptimizeBatchSize(boolean optimizeBatchSize)
      better use setBatchSizeOptimization
      Set the value if batch size should be optimized before running the prediction. This should only be set to true, if the dataset is huge. Otherwise the algorithm to find the largest batch size needs too much time.
      optimizeBatchSize - if true, optimize the batch size every time the match method is called.
    • getBatchSizeOptimization

      public BatchSizeOptimization getBatchSizeOptimization()
      Returns how the batch size is optimized.
      how the batch size is optimized
    • setBatchSizeOptimization

      public void setBatchSizeOptimization(BatchSizeOptimization batchSizeOptimization)
      Sets how the batch size is optimized.
      batchSizeOptimization - how the batch size is optimized
    • setOptimizeAll

      public void setOptimizeAll(boolean optimize)
      This will enabled or disable all possible optimization to improve prediction speed. Currently this includes mixed precision training and batch size optimization.
      optimize - true to enable
    • isOptimizeAll

      public boolean isOptimizeAll()
      This will return the value if all optimization techiques are enabled or diabled.
      true if enabled.