All Implemented Interfaces:
IMatcher<org.apache.jena.ontology.OntModel,Alignment,Properties>, eu.sealsproject.platform.res.domain.omt.IOntologyMatchingToolBridge, eu.sealsproject.platform.res.tool.api.IPlugin, eu.sealsproject.platform.res.tool.api.IToolBridge

public class SentenceTransformersMatcher extends TransformersBase
This matcher uses the Sentence Transformers library to build an embedding space for each resource given a textual representation of it. Thus this matcher does not filter anything but generates matching candidates based on the text.
  • Field Details

    • LOGGER

      private static final org.slf4j.Logger LOGGER
    • NEWLINE

      private static final String NEWLINE
    • resourcesExtractor

      private List<ResourcesExtractor> resourcesExtractor
    • queryChunkSize

      private int queryChunkSize
    • corpusChunkSize

      private int corpusChunkSize
    • topK

      private int topK
    • bothDirections

      private boolean bothDirections
    • topkPerResource

      private boolean topkPerResource
    • resourceFilters

      private List<Class<? extends SentenceTransformersPredicate>> resourceFilters
  • Constructor Details

    • SentenceTransformersMatcher

      public SentenceTransformersMatcher(TextExtractorMap extractor, String modelName)
    • SentenceTransformersMatcher

      public SentenceTransformersMatcher(TextExtractor extractor, String modelName)
  • Method Details

    • match

      public Alignment match(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment inputAlignment, Properties parameters) throws Exception
      Description copied from class: MatcherYAAAJena
      Aligns two ontologies specified via a Jena OntModel, with an input alignment as Alignment object, and returns the mapping of the resulting alignment. Note: This method might be called multiple times in a row when using the evaluation framework. Make sure to return a mapping which is specific to the given inputs.
      Specified by:
      match in interface IMatcher<org.apache.jena.ontology.OntModel,Alignment,Properties>
      Specified by:
      match in class MatcherYAAAJena
      Parameters:
      source - This OntModel represents the source ontology.
      target - This OntModel represents the target ontology.
      inputAlignment - This mapping represents the input alignment.
      parameters - Additional properties.
      Returns:
      The resulting alignment of the matching process.
      Throws:
      Exception - Any exception which occurs during matching.
    • createTextFile

      private int createTextFile(Iterator<? extends org.apache.jena.ontology.OntResource> resourceIterator, File file) throws IOException
      Throws:
      IOException
    • initialiseResourceExtractor

      public void initialiseResourceExtractor()
      Initialises the resource extractors such that classes, datatypeproperties, objectproperties, all other properties (rdf properties - not owl), and instances are matched if the properties suggests to do so.
    • initExtractors

      private void initExtractors()
    • getResourcesExtractor

      public List<ResourcesExtractor> getResourcesExtractor()
    • setResourcesExtractor

      public void setResourcesExtractor(List<ResourcesExtractor> resourcesExtractor)
    • getQueryChunkSize

      public int getQueryChunkSize()
      Returns the number of queries which are processed simultaneously.
      Returns:
      the number of queries which are processed simultaneously
    • setQueryChunkSize

      public void setQueryChunkSize(int queryChunkSize)
      Sets the number of queries which are processed simultaneously. Increasing that value increases the speed, but requires more memory. The default value is 100.
      Parameters:
      queryChunkSize - number of queries which are processed simultaneously
    • getCorpusChunkSize

      public int getCorpusChunkSize()
      Returns the number of enties which are scaned at a time. Increasing that value increases the speed, but requires more memory. The default value is 500000.
      Returns:
      the number of enties which are scaned at a time
    • setCorpusChunkSize

      public void setCorpusChunkSize(int corpusChunkSize)
      Sets the number of enties which are scaned at a time. Increasing that value increases the speed, but requires more memory. The default value is 500000.
      Parameters:
      corpusChunkSize - the number of enties which are scaned at a time
    • getTopK

      public int getTopK()
      Returns the number which represents how many correspondences should be created per resource.
      Returns:
      the number which represents how many correspondences should be created per resource
    • setTopK

      public void setTopK(int topK)
      Sets the number which represents how many correspondences should be created per resource. The default is 10
      Parameters:
      topK - the number which represents how many correspondences should be created per resource
    • isBothDirections

      public boolean isBothDirections()
      Returns true if both directions are enabled. This means the left ontology is once the query and once the corpus. Thus each element from the source AND target ontologies has at least number of topK corresponding entities.
      Returns:
      true, if source and target ontology are both query and corpus.
    • setBothDirections

      public void setBothDirections(boolean bothDirections)
      Sets the value if both directions are enabled. If true (the default value), the source and target ontology is once the query and once the corpus. Thus each element from the source AND target ontologies has at least number of topK corresponding entities. If false, only source elements has at least topK corresponding entities. The default is true.
      Parameters:
      bothDirections - true if both directions are enabled
    • isTopkPerResource

      public boolean isTopkPerResource()
      Returns true, if the topk parameter applies to number of resources and not to number of extracted texts. This makes only a difference if multitext is enabled. E.g. if a resource has 5 textual representations and multipleTextsToMultipleExamples is set to true, it would generate for each text a top k canidates and not for each resource. True is the default.
      Returns:
      true, if the topk parameter applies to number of resources - false otherwiese
    • setTopkPerResource

      public void setTopkPerResource(boolean topkPerResource)
      If set to true, the topk parameter applies to number of resources and not to number of extracted texts. This makes only a difference if multipleTextsToMultipleExamples is enabled. E.g. if set TopkPerResource to false and if a resource has 5 textual representations and multipleTextsToMultipleExamples is set to true, it would generate for each text a top k canidates and not for each resource. True is the default.
      Parameters:
      topkPerResource - true if topk should be applied for a resource and not each textual concept.
    • getResourceFilters

      public List<Class<? extends SentenceTransformersPredicate>> getResourceFilters()
    • setResourceFilters

      public void setResourceFilters(List<Class<? extends SentenceTransformersPredicate>> resourceFilters)
    • addResourceFilter

      public void addResourceFilter(Class<? extends SentenceTransformersPredicate> resourceFilter)
    • setTrainingArguments

      public void setTrainingArguments(TransformersArguments trainingArguments)
      No training arguments can be used for SentenceTransformersMatcher - do NOT call this method.
      Overrides:
      setTrainingArguments in class TransformersBase
      Parameters:
      trainingArguments - training arguments
    • setUsingTensorflow

      public void setUsingTensorflow(boolean usingTensorflow)
      SentenceTransformersMatcher only supports PyTorch - thus setting tensorflow to true, will result in an error.
      Overrides:
      setUsingTensorflow in class TransformersBase
      Parameters:
      usingTensorflow - can only be set to false