Class SentenceTransformersMatcher
java.lang.Object
eu.sealsproject.platform.res.tool.impl.AbstractPlugin
de.uni_mannheim.informatik.dws.melt.matching_base.MatcherURL
de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAA
de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAAJena
de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase
de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.SentenceTransformersMatcher
- All Implemented Interfaces:
IMatcher<org.apache.jena.ontology.OntModel,
,Alignment, Properties> eu.sealsproject.platform.res.domain.omt.IOntologyMatchingToolBridge
,eu.sealsproject.platform.res.tool.api.IPlugin
,eu.sealsproject.platform.res.tool.api.IToolBridge
This matcher uses the Sentence Transformers library to build an embedding space for each resource given a textual representation of it.
Thus this matcher does not filter anything but generates matching candidates based on the text.
-
Field Summary
Modifier and TypeFieldDescriptionprivate boolean
private int
private static final org.slf4j.Logger
private static final String
private int
private List<Class<? extends SentenceTransformersPredicate>>
private List<ResourcesExtractor>
private int
private boolean
Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase
cudaVisibleDevices, extractor, modelName, multipleTextsToMultipleExamples, multiProcessing, trainingArguments, transformersCache, usingTensorflow
Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
FILE_PREFIX, FILE_SUFFIX
-
Constructor Summary
ConstructorDescriptionSentenceTransformersMatcher
(TextExtractorMap extractor, String modelName) SentenceTransformersMatcher
(TextExtractor extractor, String modelName) -
Method Summary
Modifier and TypeMethodDescriptionvoid
addResourceFilter
(Class<? extends SentenceTransformersPredicate> resourceFilter) private int
createTextFile
(Iterator<? extends org.apache.jena.ontology.OntResource> resourceIterator, File file) int
Returns the number of enties which are scaned at a time.int
Returns the number of queries which are processed simultaneously.List<Class<? extends SentenceTransformersPredicate>>
int
getTopK()
Returns the number which represents how many correspondences should be created per resource.private void
void
Initialises the resource extractors such that classes, datatypeproperties, objectproperties, all other properties (rdf properties - not owl), and instances are matched if the properties suggests to do so.boolean
Returns true if both directions are enabled.boolean
Returns true, if the topk parameter applies to number of resources and not to number of extracted texts.match
(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment inputAlignment, Properties parameters) Aligns two ontologies specified via a Jena OntModel, with an input alignment as Alignment object, and returns the mapping of the resulting alignment.void
setBothDirections
(boolean bothDirections) Sets the value if both directions are enabled.void
setCorpusChunkSize
(int corpusChunkSize) Sets the number of enties which are scaned at a time.void
setQueryChunkSize
(int queryChunkSize) Sets the number of queries which are processed simultaneously.void
setResourceFilters
(List<Class<? extends SentenceTransformersPredicate>> resourceFilters) void
setResourcesExtractor
(List<ResourcesExtractor> resourcesExtractor) void
setTopK
(int topK) Sets the number which represents how many correspondences should be created per resource.void
setTopkPerResource
(boolean topkPerResource) If set to true, the topk parameter applies to number of resources and not to number of extracted texts.void
setTrainingArguments
(TransformersArguments trainingArguments) No training arguments can be used for SentenceTransformersMatcher - do NOT call this method.void
setUsingTensorflow
(boolean usingTensorflow) SentenceTransformersMatcher only supports PyTorch - thus setting tensorflow to true, will result in an error.Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase
addTrainingArgument, getCudaVisibleDevices, getCudaVisibleDevicesButOnlyOneGPU, getExamplesForBatchSizeOptimization, getExtractor, getExtractorMap, getModelName, getMultiProcessing, getTextualRepresentation, getTrainingArguments, getTransformersCache, isMultipleTextsToMultipleExamples, isOptimizeForMixedPrecisionTraining, isUsingTensorflow, setCudaVisibleDevices, setCudaVisibleDevices, setExtractor, setExtractorMap, setModelName, setMultipleTextsToMultipleExamples, setMultiProcessing, setOptimizeForMixedPrecisionTraining, setTransformersCache, writeExamplesToFile
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAAJena
getModelSpec, match, readOntology
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAA
match
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
match
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherURL
align, align, canExecute, getType
Methods inherited from class eu.sealsproject.platform.res.tool.impl.AbstractPlugin
getId, getVersion, setId, setVersion
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface eu.sealsproject.platform.res.tool.api.IPlugin
getId, getVersion
-
Field Details
-
LOGGER
private static final org.slf4j.Logger LOGGER -
NEWLINE
-
resourcesExtractor
-
queryChunkSize
private int queryChunkSize -
corpusChunkSize
private int corpusChunkSize -
topK
private int topK -
bothDirections
private boolean bothDirections -
topkPerResource
private boolean topkPerResource -
resourceFilters
-
-
Constructor Details
-
SentenceTransformersMatcher
-
SentenceTransformersMatcher
-
-
Method Details
-
match
public Alignment match(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment inputAlignment, Properties parameters) throws Exception Description copied from class:MatcherYAAAJena
Aligns two ontologies specified via a Jena OntModel, with an input alignment as Alignment object, and returns the mapping of the resulting alignment. Note: This method might be called multiple times in a row when using the evaluation framework. Make sure to return a mapping which is specific to the given inputs.- Specified by:
match
in interfaceIMatcher<org.apache.jena.ontology.OntModel,
Alignment, Properties> - Specified by:
match
in classMatcherYAAAJena
- Parameters:
source
- This OntModel represents the source ontology.target
- This OntModel represents the target ontology.inputAlignment
- This mapping represents the input alignment.parameters
- Additional properties.- Returns:
- The resulting alignment of the matching process.
- Throws:
Exception
- Any exception which occurs during matching.
-
createTextFile
private int createTextFile(Iterator<? extends org.apache.jena.ontology.OntResource> resourceIterator, File file) throws IOException - Throws:
IOException
-
initialiseResourceExtractor
public void initialiseResourceExtractor()Initialises the resource extractors such that classes, datatypeproperties, objectproperties, all other properties (rdf properties - not owl), and instances are matched if the properties suggests to do so. -
initExtractors
private void initExtractors() -
getResourcesExtractor
-
setResourcesExtractor
-
getQueryChunkSize
public int getQueryChunkSize()Returns the number of queries which are processed simultaneously.- Returns:
- the number of queries which are processed simultaneously
-
setQueryChunkSize
public void setQueryChunkSize(int queryChunkSize) Sets the number of queries which are processed simultaneously. Increasing that value increases the speed, but requires more memory. The default value is 100.- Parameters:
queryChunkSize
- number of queries which are processed simultaneously
-
getCorpusChunkSize
public int getCorpusChunkSize()Returns the number of enties which are scaned at a time. Increasing that value increases the speed, but requires more memory. The default value is 500000.- Returns:
- the number of enties which are scaned at a time
-
setCorpusChunkSize
public void setCorpusChunkSize(int corpusChunkSize) Sets the number of enties which are scaned at a time. Increasing that value increases the speed, but requires more memory. The default value is 500000.- Parameters:
corpusChunkSize
- the number of enties which are scaned at a time
-
getTopK
public int getTopK()Returns the number which represents how many correspondences should be created per resource.- Returns:
- the number which represents how many correspondences should be created per resource
-
setTopK
public void setTopK(int topK) Sets the number which represents how many correspondences should be created per resource. The default is 10- Parameters:
topK
- the number which represents how many correspondences should be created per resource
-
isBothDirections
public boolean isBothDirections()Returns true if both directions are enabled. This means the left ontology is once the query and once the corpus. Thus each element from the source AND target ontologies has at least number of topK corresponding entities.- Returns:
- true, if source and target ontology are both query and corpus.
-
setBothDirections
public void setBothDirections(boolean bothDirections) Sets the value if both directions are enabled. If true (the default value), the source and target ontology is once the query and once the corpus. Thus each element from the source AND target ontologies has at least number of topK corresponding entities. If false, only source elements has at least topK corresponding entities. The default is true.- Parameters:
bothDirections
- true if both directions are enabled
-
isTopkPerResource
public boolean isTopkPerResource()Returns true, if the topk parameter applies to number of resources and not to number of extracted texts. This makes only a difference if multitext is enabled. E.g. if a resource has 5 textual representations and multipleTextsToMultipleExamples is set to true, it would generate for each text a top k canidates and not for each resource. True is the default.- Returns:
- true, if the topk parameter applies to number of resources - false otherwiese
-
setTopkPerResource
public void setTopkPerResource(boolean topkPerResource) If set to true, the topk parameter applies to number of resources and not to number of extracted texts. This makes only a difference if multipleTextsToMultipleExamples is enabled. E.g. if set TopkPerResource to false and if a resource has 5 textual representations and multipleTextsToMultipleExamples is set to true, it would generate for each text a top k canidates and not for each resource. True is the default.- Parameters:
topkPerResource
- true if topk should be applied for a resource and not each textual concept.
-
getResourceFilters
-
setResourceFilters
public void setResourceFilters(List<Class<? extends SentenceTransformersPredicate>> resourceFilters) -
addResourceFilter
-
setTrainingArguments
No training arguments can be used for SentenceTransformersMatcher - do NOT call this method.- Overrides:
setTrainingArguments
in classTransformersBase
- Parameters:
trainingArguments
- training arguments
-
setUsingTensorflow
public void setUsingTensorflow(boolean usingTensorflow) SentenceTransformersMatcher only supports PyTorch - thus setting tensorflow to true, will result in an error.- Overrides:
setUsingTensorflow
in classTransformersBase
- Parameters:
usingTensorflow
- can only be set to false
-