eu.sealsproject.platform.res.tool.impl.AbstractPlugin

de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.SentenceTransformersMatcher

All Implemented Interfaces:: IMatcher<org.apache.jena.ontology.OntModel,Alignment,Properties>, eu.sealsproject.platform.res.domain.omt.IOntologyMatchingToolBridge, eu.sealsproject.platform.res.tool.api.IPlugin, eu.sealsproject.platform.res.tool.api.IToolBridge

public class SentenceTransformersMatcher extends TransformersBase

This matcher uses the Sentence Transformers library to build an embedding space for each resource given a textual representation of it. Thus this matcher does not filter anything but generates matching candidates based on the text.

Field Summary

Fields

Modifier and Type

Field

Description

private boolean

bothDirections

private int

corpusChunkSize

private static final org.slf4j.Logger

LOGGER

private static final String

NEWLINE

private int

queryChunkSize

private List<Class<? extends SentenceTransformersPredicate>>

resourceFilters

private List<ResourcesExtractor>

resourcesExtractor

private int

topK

private boolean

topkPerResource

Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase
cudaVisibleDevices, extractor, modelName, multipleTextsToMultipleExamples, multiProcessing, trainingArguments, transformersCache, usingTensorflow

Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
FILE_PREFIX, FILE_SUFFIX
Constructor Summary

Constructors

Constructor

Description

SentenceTransformersMatcher(TextExtractorMap extractor, String modelName)

SentenceTransformersMatcher(TextExtractor extractor, String modelName)
Method Summary

Modifier and Type

Method

Description

void

addResourceFilter(Class<? extends SentenceTransformersPredicate> resourceFilter)

private int

createTextFile(Iterator<? extends org.apache.jena.ontology.OntResource> resourceIterator, File file)

int

getCorpusChunkSize()

Returns the number of enties which are scaned at a time.

int

getQueryChunkSize()

Returns the number of queries which are processed simultaneously.

List<Class<? extends SentenceTransformersPredicate>>

getResourceFilters()

List<ResourcesExtractor>

getResourcesExtractor()

int

getTopK()

Returns the number which represents how many correspondences should be created per resource.

private void

initExtractors()

void

initialiseResourceExtractor()

Initialises the resource extractors such that classes, datatypeproperties, objectproperties, all other properties (rdf properties - not owl), and instances are matched if the properties suggests to do so.

boolean

isBothDirections()

Returns true if both directions are enabled.

boolean

isTopkPerResource()

Returns true, if the topk parameter applies to number of resources and not to number of extracted texts.

Alignment

match(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment inputAlignment, Properties parameters)

Aligns two ontologies specified via a Jena OntModel, with an input alignment as Alignment object, and returns the mapping of the resulting alignment.

void

setBothDirections(boolean bothDirections)

Sets the value if both directions are enabled.

void

setCorpusChunkSize(int corpusChunkSize)

Sets the number of enties which are scaned at a time.

void

setQueryChunkSize(int queryChunkSize)

Sets the number of queries which are processed simultaneously.

void

setResourceFilters(List<Class<? extends SentenceTransformersPredicate>> resourceFilters)

void

setResourcesExtractor(List<ResourcesExtractor> resourcesExtractor)

void

setTopK(int topK)

Sets the number which represents how many correspondences should be created per resource.

void

setTopkPerResource(boolean topkPerResource)

If set to true, the topk parameter applies to number of resources and not to number of extracted texts.

void

setTrainingArguments(TransformersArguments trainingArguments)

No training arguments can be used for SentenceTransformersMatcher - do NOT call this method.

void

setUsingTensorflow(boolean usingTensorflow)

SentenceTransformersMatcher only supports PyTorch - thus setting tensorflow to true, will result in an error.

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase
addTrainingArgument, getCudaVisibleDevices, getCudaVisibleDevicesButOnlyOneGPU, getExamplesForBatchSizeOptimization, getExtractor, getExtractorMap, getModelName, getMultiProcessing, getTextualRepresentation, getTrainingArguments, getTransformersCache, isMultipleTextsToMultipleExamples, isOptimizeForMixedPrecisionTraining, isUsingTensorflow, setCudaVisibleDevices, setCudaVisibleDevices, setExtractor, setExtractorMap, setModelName, setMultipleTextsToMultipleExamples, setMultiProcessing, setOptimizeForMixedPrecisionTraining, setTransformersCache, writeExamplesToFile

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAAJena
getModelSpec, match, readOntology

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAA
match

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
match

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherURL
align, align, canExecute, getType

Methods inherited from class eu.sealsproject.platform.res.tool.impl.AbstractPlugin
getId, getVersion, setId, setVersion

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface eu.sealsproject.platform.res.tool.api.IPlugin
getId, getVersion

Field Details
- LOGGER
  
  private static final org.slf4j.Logger LOGGER
- NEWLINE
  
  private static final String NEWLINE
- resourcesExtractor
  
  private List<ResourcesExtractor> resourcesExtractor
- queryChunkSize
  
  private int queryChunkSize
- corpusChunkSize
  
  private int corpusChunkSize
- topK
  
  private int topK
- bothDirections
  
  private boolean bothDirections
- topkPerResource
  
  private boolean topkPerResource
- resourceFilters
  
  private List<Class<? extends SentenceTransformersPredicate>> resourceFilters
Constructor Details
- SentenceTransformersMatcher
  
  public SentenceTransformersMatcher(TextExtractorMap extractor, String modelName)
- SentenceTransformersMatcher
  
  public SentenceTransformersMatcher(TextExtractor extractor, String modelName)
Method Details
- match
  
  public Alignment match(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment inputAlignment, Properties parameters) throws Exception
  
  Description copied from class: MatcherYAAAJena
  
  Aligns two ontologies specified via a Jena OntModel, with an input alignment as Alignment object, and returns the mapping of the resulting alignment. Note: This method might be called multiple times in a row when using the evaluation framework. Make sure to return a mapping which is specific to the given inputs.
  
  Specified by:
  
  match in interface IMatcher<org.apache.jena.ontology.OntModel,Alignment,Properties>
  
  Specified by:
  
  match in class MatcherYAAAJena
  
  Parameters:
  
  source - This OntModel represents the source ontology.
  
  target - This OntModel represents the target ontology.
  
  inputAlignment - This mapping represents the input alignment.
  
  parameters - Additional properties.
  
  Returns:
  
  The resulting alignment of the matching process.
  
  Throws:
  
  Exception - Any exception which occurs during matching.
- createTextFile
  
  private int createTextFile(Iterator<? extends org.apache.jena.ontology.OntResource> resourceIterator, File file) throws IOException
  
  Throws:
  
  IOException
- initialiseResourceExtractor
  
  public void initialiseResourceExtractor()
  
  Initialises the resource extractors such that classes, datatypeproperties, objectproperties, all other properties (rdf properties - not owl), and instances are matched if the properties suggests to do so.
- initExtractors
  
  private void initExtractors()
- getResourcesExtractor
  
  public List<ResourcesExtractor> getResourcesExtractor()
- setResourcesExtractor
  
  public void setResourcesExtractor(List<ResourcesExtractor> resourcesExtractor)
- getQueryChunkSize
  
  public int getQueryChunkSize()
  
  Returns the number of queries which are processed simultaneously.
  
  Returns:
  
  the number of queries which are processed simultaneously
- setQueryChunkSize
  
  public void setQueryChunkSize(int queryChunkSize)
  
  Sets the number of queries which are processed simultaneously. Increasing that value increases the speed, but requires more memory. The default value is 100.
  
  Parameters:
  
  queryChunkSize - number of queries which are processed simultaneously
- getCorpusChunkSize
  
  public int getCorpusChunkSize()
  
  Returns the number of enties which are scaned at a time. Increasing that value increases the speed, but requires more memory. The default value is 500000.
  
  Returns:
  
  the number of enties which are scaned at a time
- setCorpusChunkSize
  
  public void setCorpusChunkSize(int corpusChunkSize)
  
  Sets the number of enties which are scaned at a time. Increasing that value increases the speed, but requires more memory. The default value is 500000.
  
  Parameters:
  
  corpusChunkSize - the number of enties which are scaned at a time
- getTopK
  
  public int getTopK()
  
  Returns the number which represents how many correspondences should be created per resource.
  
  Returns:
  
  the number which represents how many correspondences should be created per resource
- setTopK
  
  public void setTopK(int topK)
  
  Sets the number which represents how many correspondences should be created per resource. The default is 10
  
  Parameters:
  
  topK - the number which represents how many correspondences should be created per resource
- isBothDirections
  
  public boolean isBothDirections()
  
  Returns true if both directions are enabled. This means the left ontology is once the query and once the corpus. Thus each element from the source AND target ontologies has at least number of topK corresponding entities.
  
  Returns:
  
  true, if source and target ontology are both query and corpus.
- setBothDirections
  
  public void setBothDirections(boolean bothDirections)
  
  Sets the value if both directions are enabled. If true (the default value), the source and target ontology is once the query and once the corpus. Thus each element from the source AND target ontologies has at least number of topK corresponding entities. If false, only source elements has at least topK corresponding entities. The default is true.
  
  Parameters:
  
  bothDirections - true if both directions are enabled
- isTopkPerResource
  
  public boolean isTopkPerResource()
  
  Returns true, if the topk parameter applies to number of resources and not to number of extracted texts. This makes only a difference if multitext is enabled. E.g. if a resource has 5 textual representations and multipleTextsToMultipleExamples is set to true, it would generate for each text a top k canidates and not for each resource. True is the default.
  
  Returns:
  
  true, if the topk parameter applies to number of resources - false otherwiese
- setTopkPerResource
  
  public void setTopkPerResource(boolean topkPerResource)
  
  If set to true, the topk parameter applies to number of resources and not to number of extracted texts. This makes only a difference if multipleTextsToMultipleExamples is enabled. E.g. if set TopkPerResource to false and if a resource has 5 textual representations and multipleTextsToMultipleExamples is set to true, it would generate for each text a top k canidates and not for each resource. True is the default.
  
  Parameters:
  
  topkPerResource - true if topk should be applied for a resource and not each textual concept.
- getResourceFilters
  
  public List<Class<? extends SentenceTransformersPredicate>> getResourceFilters()
- setResourceFilters
  
  public void setResourceFilters(List<Class<? extends SentenceTransformersPredicate>> resourceFilters)
- addResourceFilter
  
  public void addResourceFilter(Class<? extends SentenceTransformersPredicate> resourceFilter)
- setTrainingArguments
  
  public void setTrainingArguments(TransformersArguments trainingArguments)
  
  No training arguments can be used for SentenceTransformersMatcher - do NOT call this method.
  
  Overrides:
  
  setTrainingArguments in class TransformersBase
  
  Parameters:
  
  trainingArguments - training arguments
- setUsingTensorflow
  
  public void setUsingTensorflow(boolean usingTensorflow)
  
  SentenceTransformersMatcher only supports PyTorch - thus setting tensorflow to true, will result in an error.
  
  Overrides:
  
  setUsingTensorflow in class TransformersBase
  
  Parameters:
  
  usingTensorflow - can only be set to false

Class SentenceTransformersMatcher

Field Summary

Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase

Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile

Constructor Summary

Method Summary

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAAJena

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAA

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherURL

Methods inherited from class eu.sealsproject.platform.res.tool.impl.AbstractPlugin

Methods inherited from class java.lang.Object

Methods inherited from interface eu.sealsproject.platform.res.tool.api.IPlugin

Field Details

LOGGER

NEWLINE

resourcesExtractor

queryChunkSize

corpusChunkSize

topK

bothDirections

topkPerResource

resourceFilters

Constructor Details

SentenceTransformersMatcher

SentenceTransformersMatcher

Method Details

match

createTextFile

initialiseResourceExtractor

initExtractors

getResourcesExtractor

setResourcesExtractor

getQueryChunkSize

setQueryChunkSize

getCorpusChunkSize

setCorpusChunkSize

getTopK

setTopK

isBothDirections

setBothDirections

isTopkPerResource

setTopkPerResource

getResourceFilters

setResourceFilters

addResourceFilter

setTrainingArguments

setUsingTensorflow