All Implemented Interfaces:
Filter, IMatcher<org.apache.jena.ontology.OntModel,Alignment,Properties>, eu.sealsproject.platform.res.domain.omt.IOntologyMatchingToolBridge, eu.sealsproject.platform.res.tool.api.IPlugin, eu.sealsproject.platform.res.tool.api.IToolBridge

public class LLMBinaryFilter extends LLMBase implements Filter
This filter asks a LLM if a given correspondence is correct or not. It has no information about the other correspondences and each correspondence becomes a prediction example for the LLM. It will add the corresponding confidence to the correspondence such that a filtering afterwards is possible.
  • Field Details

  • Constructor Details

    • LLMBinaryFilter

      public LLMBinaryFilter(TextExtractorMap extractor, String modelName, String promt)
      Constructor with all required parameters and default values for optional parameters (can be changed by setters). It uses the systems default tmp dir to store the files with texts generated from the knowledge graphs. Pytorch is used instead of tensorflow and all visible GPUs are used for prediction.
      Parameters:
      extractor - the extractor to select which text for each resource should be used.
      modelName - the model name which can be a model id (a hosted model on huggingface.co) or a path to a directory containing a model and tokenizer ( see first parameter pretrained_model_name_or_path of the from_pretrained function in huggingface library). In case of a path, it should be absolute. The path can be generated by e.g. FileUtil.getCanonicalPathIfPossible(java.io.File)
      promt - The promt to use for the LLM. Use {left} and {right} to insert the text representation of the left and right concept.
    • LLMBinaryFilter

      public LLMBinaryFilter(TextExtractor extractor, String modelName, String promt)
      Constructor with all required parameters and default values for optional parameters (can be changed by setters). It uses the systems default tmp dir to store the files with texts generated from the knowledge graphs. Pytorch is used instead of tensorflow and all visible GPUs are used for prediction.
      Parameters:
      extractor - the extractor to select which text for each resource should be used.
      modelName - the model name which can be a model id (a hosted model on huggingface.co) or a path to a directory containing a model and tokenizer ( see first parameter pretrained_model_name_or_path of the from_pretrained function in huggingface library). In case of a path, it should be absolute. The path can be generated by e.g. FileUtil.getCanonicalPathIfPossible(java.io.File)
      promt - The promt to use for the LLM. Use {left} and {right} to insert the text representation of the left and right concept.
  • Method Details

    • match

      public Alignment match(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment inputAlignment, Properties properties) throws Exception
      Description copied from class: MatcherYAAAJena
      Aligns two ontologies specified via a Jena OntModel, with an input alignment as Alignment object, and returns the mapping of the resulting alignment. Note: This method might be called multiple times in a row when using the evaluation framework. Make sure to return a mapping which is specific to the given inputs.
      Specified by:
      match in interface IMatcher<org.apache.jena.ontology.OntModel,Alignment,Properties>
      Specified by:
      match in class MatcherYAAAJena
      Parameters:
      source - This OntModel represents the source ontology.
      target - This OntModel represents the target ontology.
      inputAlignment - This mapping represents the input alignment.
      properties - Additional properties.
      Returns:
      The resulting alignment of the matching process.
      Throws:
      Exception - Any exception which occurs during matching.
    • createPredictionFile

      public Map<Correspondence,List<Integer>> createPredictionFile(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment predictionAlignment, File outputFile, boolean append) throws IOException
      Create the prediction file which is a CSV file with two columns.The first column is the text from the left resource and the second column is the text from the right resource.
      Parameters:
      source - The source model
      target - The target model
      predictionAlignment - the alignment to process. All correspondences which have enough text are used.
      outputFile - the csv file to which the output should be written to.
      append - if true, then the training alignment is append to the given file.
      Returns:
      the map which maps the the correspondence to (possibly multiple) row numbers. In case of multipleTextsToMultipleExamples is set to true, multiple rows can correspond to one correspondence, because each text (e.g. label, comment etc) of the two resources is used as an example.
      Throws:
      IOException - in case the writing fails.
    • getPositiveWords

      public Set<String> getPositiveWords()
    • setPositiveWords

      public void setPositiveWords(Set<String> positiveWords)
    • addPositiveWord

      public void addPositiveWord(String positiveWord)
    • getNegativeWords

      public Set<String> getNegativeWords()
    • setNegativeWords

      public void setNegativeWords(Set<String> negativeWords)
    • addNegativeWord

      public void addNegativeWord(String negativeWord)
    • getWordsToDetect

      protected List<Set<String>> getWordsToDetect()