Class BackgroundMatcher
java.lang.Object
eu.sealsproject.platform.res.tool.impl.AbstractPlugin
de.uni_mannheim.informatik.dws.melt.matching_base.MatcherURL
de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAA
de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAAJena
de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.external.matcher.BackgroundMatcher
- All Implemented Interfaces:
IMatcher<org.apache.jena.ontology.OntModel,
,Alignment, Properties> eu.sealsproject.platform.res.domain.omt.IOntologyMatchingToolBridge
,eu.sealsproject.platform.res.tool.api.IPlugin
,eu.sealsproject.platform.res.tool.api.IToolBridge
Template matcher where the background knowledge and the exploitation strategy (represented as
This matcher relies on a similarity metric that is implemented within the background source and used in
ImplementedBackgroundMatchingStrategies
) can be plugged-in.
This matcher can be used as matching component. It is sensible to use a simple string matcher before running this
matcher to increase the performance by filtering out simple matches. If you want a pre-packaged stand-alone
background-based matching system, you can try out BackgroundMatcherStandAlone
.
This matcher relies on a similarity metric that is implemented within the background source and used in
compare(String, String)
.-
Field Summary
Modifier and TypeFieldDescriptionprivate Alignment
Alignmentprivate boolean
If something has been matched in an earlier step, allow for it to be matched again.private final boolean
If true, there is a confidence score for each synonymy relation.private boolean
Log every match.the knowledgeSource to be usedprivate final LabelToConceptLinker
Linker used to link labels to concepts.private static final org.slf4j.Logger
Loggerprivate int
If a concept cannot be linked as full string, the longest substrings are matched.private org.apache.jena.ontology.OntModel
Ontologiesprivate org.apache.jena.ontology.OntModel
Matching strategy.private double
The minimal confidence threshold that is required for a match.private final TextExtractor
The value extractor used to obtain labels for resources.Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
FILE_PREFIX, FILE_SUFFIX
-
Constructor Summary
ConstructorDescriptionBackgroundMatcher
(SemanticWordRelationDictionary knowledgeSourceToBeUsed) Convenience Default Constructor Threshold: 0.0 and Strategy: Synonymy are assumed.BackgroundMatcher
(SemanticWordRelationDictionary knowledgeSourceToBeUsed, ImplementedBackgroundMatchingStrategies strategy) Convenience Default Constructor Threshold: 0.0 is assumed.BackgroundMatcher
(SemanticWordRelationDictionary knowledgeSourceToBeUsed, ImplementedBackgroundMatchingStrategies strategy, double threshold) Main Constructor -
Method Summary
Modifier and TypeMethodDescriptionprivate void
Adds extension values.private boolean
The compare method compares two concepts that are available in a background knowledge source.private double
compareScore
(String lookupTerm1, String lookupTerm2) convertToUriLinkMap
(Map<String, Set<String>> uri2labels, boolean isSourceOntology) This method transforms the uri2labels into a uri2links HashMap.convertToUriLinksMap
(Map<String, Set<String>> uris2labels, boolean isSourceOntology) This method converts a URIs -> labels HashMap to a URIs ->List<nlinks>
.convertToUriTokenMap
(Map<String, Set<String>> uris2labels, boolean isSourceOntology) This method converts a URIs -> labels HashMap to a URIs -> tokens HashMap.fullMatchUsingDictionaryWithLinks
(Set<String> set1, Set<String> set2) Determines whether two sets of links match using the internal knowledgeSource.private String
Get configuration of matcher as string output.Get the name of the matcher.int
double
boolean
Given two lists of links, this method checks whether those are synonymous.isLinkSetSynonymous
(Set<String> set_1, Set<String> set_2) All components of set_1 have to be synonymous to components in set_2.boolean
Checks whether the two lists are synonymous, this means that: each component of one list can be found in the other list OR is synonymous to one component in the other list.isTokenSynonymous
(Set<String> set1, Set<String> set2) Compare the two maps for synonymous terms.boolean
private boolean
Checks whether there exists a mapping cell where the URI is used as source.private boolean
Checks whether there exists a mapping cell where the URI is used as target.match
(org.apache.jena.ontology.OntModel sourceOntology, org.apache.jena.ontology.OntModel targetOntology, Alignment inputAlignment, Properties p) Aligns two ontologies specified via a Jena OntModel, with an input alignment as Alignment object, and returns the mapping of the resulting alignment.private void
match
(org.apache.jena.util.iterator.ExtendedIterator<? extends org.apache.jena.ontology.OntResource> sourceOntologyIterator_1, org.apache.jena.util.iterator.ExtendedIterator<? extends org.apache.jena.ontology.OntResource> targetOntologyIterator_2) Given two iterators, match the resources covered by them.private void
performFullStringSynonymyMatching
(Map<String, Set<String>> uri2labelMap_1, Map<String, Set<String>> uri2labelMap_2) Filter out token synonymy utilizing a synonymy strategy.private void
performLongestStringSynonymyMatching
(Map<String, Set<String>> uri2labelMap_1, Map<String, Set<String>> uri2labelMap_2) Match by determining multiple concepts for a label.private void
performTokenBasedSynonymyMatching
(Map<String, Set<String>> uri2labelMap_1, Map<String, Set<String>> uri2labelMap_2) Match based on token equality and synonymy.void
setAllowForCumulativeMatches
(boolean allowForCumulativeMatches) private boolean
setContainsSynonym
(String word, HashSet<String> set) Check whether the specified word is synonymous to a word in the given set.void
setKnowledgeSource
(ExternalResourceWithSynonymCapability knowledgeSource) void
setMultiConceptLinkerUpperLimit
(int multiConceptLinkerUpperLimit) void
void
setThreshold
(double threshold) void
setVerboseLoggingOutput
(boolean verboseLoggingOutput) tokenizeAndFilter
(String label) Tokenizes a label and filters out stop words.Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAAJena
getModelSpec, match, readOntology
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAA
match
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
match
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherURL
align, align, canExecute, getType
Methods inherited from class eu.sealsproject.platform.res.tool.impl.AbstractPlugin
getId, getVersion, setId, setVersion
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface eu.sealsproject.platform.res.tool.api.IPlugin
getId, getVersion
-
Field Details
-
linker
Linker used to link labels to concepts. -
alignment
Alignment -
ontology1
private org.apache.jena.ontology.OntModel ontology1Ontologies -
ontology2
private org.apache.jena.ontology.OntModel ontology2 -
knowledgeSource
the knowledgeSource to be used -
LOGGER
private static final org.slf4j.Logger LOGGERLogger -
strategy
Matching strategy. -
threshold
private double thresholdThe minimal confidence threshold that is required for a match. -
isAllowForCumulativeMatches
private boolean isAllowForCumulativeMatchesIf something has been matched in an earlier step, allow for it to be matched again. Default: false. -
isVerboseLoggingOutput
private boolean isVerboseLoggingOutputLog every match. Do not use in performance optimized settings. -
valueExtractor
The value extractor used to obtain labels for resources. -
multiConceptLinkerUpperLimit
private int multiConceptLinkerUpperLimitIf a concept cannot be linked as full string, the longest substrings are matched. This is expensive. If there are lengthy description texts etc., this should not be performed. This variable represents the number of tokens within a label up to which the multi-linking will be performed. The limit is inclusive, linking will not be performed if|tokens in label| > multiConceptLinkerUpperLimit
-
isSynonymyConfidenceAvailable
private final boolean isSynonymyConfidenceAvailableIf true, there is a confidence score for each synonymy relation.
-
-
Constructor Details
-
BackgroundMatcher
public BackgroundMatcher(SemanticWordRelationDictionary knowledgeSourceToBeUsed, ImplementedBackgroundMatchingStrategies strategy, double threshold) Main Constructor- Parameters:
knowledgeSourceToBeUsed
- Specify the knowledgeSource to be used.strategy
- The knowledgeSource strategy that shall be applied.threshold
- The minimal required threshold that is required for a match.
-
BackgroundMatcher
Convenience Default Constructor Threshold: 0.0 and Strategy: Synonymy are assumed.- Parameters:
knowledgeSourceToBeUsed
- The knowledge source that is to be used.
-
BackgroundMatcher
public BackgroundMatcher(SemanticWordRelationDictionary knowledgeSourceToBeUsed, ImplementedBackgroundMatchingStrategies strategy) Convenience Default Constructor Threshold: 0.0 is assumed.- Parameters:
knowledgeSourceToBeUsed
- The knowledge source that is to be used.strategy
- The strategy that shall be applied.
-
-
Method Details
-
match
public Alignment match(org.apache.jena.ontology.OntModel sourceOntology, org.apache.jena.ontology.OntModel targetOntology, Alignment inputAlignment, Properties p) throws Exception Description copied from class:MatcherYAAAJena
Aligns two ontologies specified via a Jena OntModel, with an input alignment as Alignment object, and returns the mapping of the resulting alignment. Note: This method might be called multiple times in a row when using the evaluation framework. Make sure to return a mapping which is specific to the given inputs.- Specified by:
match
in interfaceIMatcher<org.apache.jena.ontology.OntModel,
Alignment, Properties> - Specified by:
match
in classMatcherYAAAJena
- Parameters:
sourceOntology
- This OntModel represents the source ontology.targetOntology
- This OntModel represents the target ontology.inputAlignment
- This mapping represents the input alignment.p
- Additional properties.- Returns:
- The resulting alignment of the matching process.
- Throws:
Exception
- Any exception which occurs during matching.
-
addAlignmentExtensions
private void addAlignmentExtensions()Adds extension values. -
getConfigurationListing
Get configuration of matcher as string output.- Returns:
- The configuration as string.
-
match
private void match(org.apache.jena.util.iterator.ExtendedIterator<? extends org.apache.jena.ontology.OntResource> sourceOntologyIterator_1, org.apache.jena.util.iterator.ExtendedIterator<? extends org.apache.jena.ontology.OntResource> targetOntologyIterator_2) Given two iterators, match the resources covered by them.- Parameters:
sourceOntologyIterator_1
- iterator 1 must be that of the source ontologytargetOntologyIterator_2
- iterator 2 must be that of the target ontology
-
performFullStringSynonymyMatching
private void performFullStringSynonymyMatching(Map<String, Set<String>> uri2labelMap_1, Map<String, Set<String>> uri2labelMap_2) Filter out token synonymy utilizing a synonymy strategy. Note that the method accepts a HashMap of Uri -> set(LINKS) rather than Uri -> set(labels).- Parameters:
uri2labelMap_1
- URI2labels map of the source ontology.uri2labelMap_2
- URI2labels map of the target ontology.
-
fullMatchUsingDictionaryWithLinks
public org.javatuples.Pair<Boolean,Double> fullMatchUsingDictionaryWithLinks(Set<String> set1, Set<String> set2) Determines whether two sets of links match using the internal knowledgeSource. Not that no linking is performed but links are expected in the sets.- Parameters:
set1
- Set 1 Set of links 1.set2
- Set 2 Set of links 2.- Returns:
- Pair where (1) boolean indicating whether there is a match, (2) providing the match confidence.
-
performTokenBasedSynonymyMatching
private void performTokenBasedSynonymyMatching(Map<String, Set<String>> uri2labelMap_1, Map<String, Set<String>> uri2labelMap_2) Match based on token equality and synonymy.- Parameters:
uri2labelMap_1
- source uri2labels mapuri2labelMap_2
- target uri2labels map
-
isTokenSetSynonymous
org.javatuples.Pair<Boolean,Double> isTokenSetSynonymous(List<Set<String>> tokenList1, List<Set<String>> tokenList2) Checks whether the two lists are synonymous, this means that: each component of one list can be found in the other list OR is synonymous to one component in the other list.- Parameters:
tokenList1
- List of wordstokenList2
- List of words- Returns:
- true if synonymous, else false
-
isTokenSynonymous
Compare the two maps for synonymous terms.- Parameters:
set1
- Set of tokens 1set2
- Set of tokens 2- Returns:
- true if the term of a set has a synonymous or equal counterpart in the other set. T this is tested both ways (set1 -> set2 and set2 -> set1).
-
performLongestStringSynonymyMatching
private void performLongestStringSynonymyMatching(Map<String, Set<String>> uri2labelMap_1, Map<String, Set<String>> uri2labelMap_2) Match by determining multiple concepts for a label.- Parameters:
uri2labelMap_1
- URI2label map 1.uri2labelMap_2
- URI2label map 2.
-
isLinkListSynonymous
private org.javatuples.Pair<Boolean,Double> isLinkListSynonymous(List<Set<String>> list_1, List<Set<String>> list_2) Given two lists of links, this method checks whether those are synonymous.- Parameters:
list_1
- List of links 1.list_2
- List of links 2.- Returns:
- Returns true, if the links are synonymous.
-
isLinkSetSynonymous
private org.javatuples.Pair<Boolean,Double> isLinkSetSynonymous(Set<String> set_1, Set<String> set_2) All components of set_1 have to be synonymous to components in set_2.- Parameters:
set_1
- Set 1.set_2
- Set 2.- Returns:
- True if synonymous, else false.
-
convertToUriLinksMap
private Map<String,List<Set<String>>> convertToUriLinksMap(Map<String, Set<String>> uris2labels, boolean isSourceOntology) This method converts a URIs -> labels HashMap to a URIs ->List<nlinks>
. Mapped entries are ignored.- Parameters:
uris2labels
- URIs to labels map.isSourceOntology
- True if the map refers to the source ontology.- Returns:
- Map
URI -> tokens
-
setContainsSynonym
Check whether the specified word is synonymous to a word in the given set.- Parameters:
word
- Word to be checked.set
- Set containing the words.- Returns:
- true if synonymous.
-
convertToUriLinkMap
private HashMap<String,Set<String>> convertToUriLinkMap(Map<String, Set<String>> uri2labels, boolean isSourceOntology) This method transforms the uri2labels into a uri2links HashMap. Thereby, the linking function is called only once. Furthermore, concepts that cannot be linked are not included in the resulting HashMap. Mapped entries are not linked.- Parameters:
uri2labels
- Input HashMap URI -> labelsisSourceOntology
- True if the map refers to the source ontology.- Returns:
- HashMap URI -> links
-
convertToUriTokenMap
private Map<String,List<Set<String>>> convertToUriTokenMap(Map<String, Set<String>> uris2labels, boolean isSourceOntology) This method converts a URIs -> labels HashMap to a URIs -> tokens HashMap. Mapped entries are ignored.- Parameters:
uris2labels
- URIs to labels map.isSourceOntology
- True if the map refers to the source ontology.- Returns:
- Map:
URI -> tokens
-
tokenizeAndFilter
Tokenizes a label and filters out stop words.- Parameters:
label
- The label to be tokenized.- Returns:
- Tokenized label.
-
mappingExistsForSourceURI
Checks whether there exists a mapping cell where the URI is used as source.- Parameters:
uri
- URI for which the check shall be performed.- Returns:
- True if at least one mapping cell exists, else false.
-
mappingExistsForTargetURI
Checks whether there exists a mapping cell where the URI is used as target.- Parameters:
uri
- URI for which the check shall be performed.- Returns:
- True if at least one mapping cell exists, else false.
-
compareScore
-
compare
The compare method compares two concepts that are available in a background knowledge source. The concepts will be compared using the specifiedstrategy
and the method will return true if the determined similarity is above the specified minimalthreshold
.- Parameters:
lookupTerm1
- Term 1.lookupTerm2
- Term 2.- Returns:
- True if similarity larger than minimal threshold, else false.
-
getMatcherName
Get the name of the matcher.- Returns:
- A textual representation of the matcher.
-
getStrategy
-
setStrategy
-
getKnowledgeSource
-
setKnowledgeSource
-
getThreshold
public double getThreshold() -
setThreshold
public void setThreshold(double threshold) -
isAllowForCumulativeMatches
public boolean isAllowForCumulativeMatches() -
setAllowForCumulativeMatches
public void setAllowForCumulativeMatches(boolean allowForCumulativeMatches) -
isVerboseLoggingOutput
public boolean isVerboseLoggingOutput() -
setVerboseLoggingOutput
public void setVerboseLoggingOutput(boolean verboseLoggingOutput) -
getMultiConceptLinkerUpperLimit
public int getMultiConceptLinkerUpperLimit() -
setMultiConceptLinkerUpperLimit
public void setMultiConceptLinkerUpperLimit(int multiConceptLinkerUpperLimit) -
isSynonymyConfidenceAvailable
public boolean isSynonymyConfidenceAvailable()
-