Class WikidataLinker
java.lang.Object
de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.external.wikidata.WikidataLinker
- All Implemented Interfaces:
LabelToConceptLinker
,MultiConceptLinker
This linker links strings to Wikidata concepts.
Artificial links are introduced here starting with
MULTI_CONCEPT_PREFIX
.
The refer to a bag of links. All methods can work with URIs and with those multi-concept links!
The linkToSingleConcept(String)
method, for example, will return a multi label link.
In order to obtain the actual Wikidata URIs, use method getUris(String)
.-
Field Summary
Modifier and TypeFieldDescriptionprivate static final String
The public SPARQL endpoint.private boolean
If the disk-buffer is disabled, no buffers are read/written from/to the disk.private boolean
If true, all string modifications are performed to gain a high concept coverage.private String
Linker nameprivate static org.slf4j.Logger
Default loggerstatic final String
Universal prefix for multi concepts.private static ConcurrentMap<String,
Set<String>> Typically, one label refers to multiple wikidata concepts.(package private) PersistenceService
Service responsible for disk buffers.(package private) LinkedList<StringModifier>
The list of operations that is performed to find a concept in the dictionary.(package private) Set<StringModifier>
A set of string operations that are all performed. -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionprivate String
buildFragmentLabelAltLabel
(String label, Language language) private void
commit()
Commit data changes if active.Get instance specific name of the linker.Given a multiConceptLink, this method will return the individual links.Given a set of links where the links can be multi concept links or direct links, a set of only direct links is returned.private void
Initialization of buffers.boolean
boolean
isMultiConceptLink
(String link) Determine whether the link at hand is a multi-concept link.boolean
linkLabelToTokensLeftToRight
(String labelToBeLinked, Language language) Splits the labelToBeLinked in ngrams up to infinite size and tries to link components.linkToPotentiallyMultipleConcepts
(String labelToBeLinked) This method tries to linklabelToBeLinked
to one concept if possible.linkToPotentiallyMultipleConcepts
(String labelToBeLinked, Language language) linkToSingleConcept
(String labelToBeLinked) Will link one label to a multi-link concept.linkToSingleConcept
(String labelToBeLinked, Language language) Link to one concept.private String
linkToSingleConceptByRunningAllModifications
(String labelToBeLinked, Language language) Helper method: Will perform all string modifications and collect all concepts found thereby.private String
linkToSingleConceptGreedy
(String labelToBeLinked, Language language) Helper method.linkWithAltLabel
(String label, Language language) Link with alternative label.linkWithLabel
(String label, Language language) Given a label, a set of Wikidata concepts (= URIs as String) will be returned that carry that label.linkWithMultipleLabels
(Set<String> labels, Language language) This will check the labels as well as the alternative labels in one query.void
setDiskBufferEnabled
(boolean diskBufferEnabled) void
setNameOfLinker
(String nameOfLinker) Set instance specific name of the linker.void
setRunAllStringModifications
(boolean runAllStringModifications)
-
Field Details
-
LOGGER
private static org.slf4j.Logger LOGGERDefault logger -
stringModificationSequence
LinkedList<StringModifier> stringModificationSequenceThe list of operations that is performed to find a concept in the dictionary. Only used ifisRunAllStringModifications
is false. -
stringModificationSet
Set<StringModifier> stringModificationSetA set of string operations that are all performed. Only used ifisRunAllStringModifications
is false. -
ENDPOINT_URL
The public SPARQL endpoint.- See Also:
-
linkerName
Linker name -
MULTI_CONCEPT_PREFIX
Universal prefix for multi concepts.- See Also:
-
persistenceService
PersistenceService persistenceServiceService responsible for disk buffers. -
isRunAllStringModifications
private boolean isRunAllStringModificationsIf true, all string modifications are performed to gain a high concept coverage. This is by default true. If false, this may result to more precise results with a lower coverage. Performance-wise: true will trigger only one query per linking operation, false may trigger many queries. -
isDiskBufferEnabled
private boolean isDiskBufferEnabledIf the disk-buffer is disabled, no buffers are read/written from/to the disk. Default: true. -
multiLinkStore
Typically, one label refers to multiple wikidata concepts. Hence, they are summarized in this data structure with the multi-concept as key. A multi-concept must start with theMULTI_CONCEPT_PREFIX
. The data structure is also used as cache.
-
-
Constructor Details
-
WikidataLinker
public WikidataLinker()Constructor -
WikidataLinker
public WikidataLinker(boolean isDiskBufferEnabled) Constructor- Parameters:
isDiskBufferEnabled
- True if the disk buffer shall be enabled.
-
-
Method Details
-
initializeBuffers
private void initializeBuffers()Initialization of buffers. -
getUris
Given a multiConceptLink, this method will return the individual links.- Specified by:
getUris
in interfaceMultiConceptLinker
- Parameters:
multiConceptLink
- The lookup link.- Returns:
- Individual links, empty set if there are none.
-
getUris
Given a set of links where the links can be multi concept links or direct links, a set of only direct links is returned.- Parameters:
multipleLinks
- Set with multiple links. Multi concept links can be mixed with direct links.- Returns:
- A set with only direct links.
-
linkToSingleConcept
Will link one label to a multi-link concept.- Specified by:
linkToSingleConcept
in interfaceLabelToConceptLinker
- Parameters:
labelToBeLinked
- The label which shall be linked to a single concept.- Returns:
- Link as String (!= wikidata URI).
-
linkToSingleConcept
Link to one concept. Note: Technically, one link will be returned BUT this link may represent multiple concepts. To retrieve those concepts, methodgetUris(String)
is to be called.- Parameters:
labelToBeLinked
- The label which shall be used to link to a concept.language
- Language of the label to be linked.- Returns:
- One link representing one or more concepts on Wikidata as String. The link != URI!
-
linkToSingleConceptGreedy
Helper method. Multiple string operations are tried out. If one wikidata concept could be found, the concept is immediately returned and the process stops prematurely.- Parameters:
labelToBeLinked
- The label that shall be linked.language
- The language of the label.- Returns:
- One link representing one or more concepts on Wikidata as String. The link != URI!
-
linkToSingleConceptByRunningAllModifications
private String linkToSingleConceptByRunningAllModifications(String labelToBeLinked, Language language) Helper method: Will perform all string modifications and collect all concepts found thereby.- Parameters:
labelToBeLinked
- The label that shall be linked.language
- Language of the label.- Returns:
- Link as String (!= Wikidata URI)
-
linkToPotentiallyMultipleConcepts
Description copied from interface:LabelToConceptLinker
This method tries to linklabelToBeLinked
to one concept if possible. If it fails, it will try to link it to multiple concepts.- Specified by:
linkToPotentiallyMultipleConcepts
in interfaceLabelToConceptLinker
- Parameters:
labelToBeLinked
- The label which shall be linked.- Returns:
- One or multiple linked concepts in a set. Null if it could not fully link the label.
-
linkToPotentiallyMultipleConcepts
-
linkLabelToTokensLeftToRight
Splits the labelToBeLinked in ngrams up to infinite size and tries to link components. This corresponds to a MAXGRAM_LEFT_TO_RIGHT_TOKENIZER or NGRAM_LEFT_TO_RIGHT_TOKENIZER OneToManyLinkingStrategy.- Parameters:
labelToBeLinked
- The label that shall be linked.language
- The language of the label.- Returns:
- A set of concept URIs that were found.
-
linkWithLabel
Given a label, a set of Wikidata concepts (= URIs as String) will be returned that carry that label.- Parameters:
label
- The label to be used for the lookup.language
- The language of the given label.- Returns:
- A list of URIs in String form.
-
linkWithMultipleLabels
This will check the labels as well as the alternative labels in one query.- Parameters:
labels
- A set of labels that shall be used for the linking operation.language
- The language of the labels.- Returns:
- Set of URIs that have been found.
-
buildFragmentLabelAltLabel
-
linkWithAltLabel
Link with alternative label.- Parameters:
label
- Label.language
- Language.- Returns:
- A list of URIs in String format.
-
getNameOfLinker
Description copied from interface:LabelToConceptLinker
Get instance specific name of the linker.- Specified by:
getNameOfLinker
in interfaceLabelToConceptLinker
- Returns:
- Name as String.
-
setNameOfLinker
Description copied from interface:LabelToConceptLinker
Set instance specific name of the linker.- Specified by:
setNameOfLinker
in interfaceLabelToConceptLinker
- Parameters:
nameOfLinker
- Name to be set.
-
isRunAllStringModifications
public boolean isRunAllStringModifications() -
setRunAllStringModifications
public void setRunAllStringModifications(boolean runAllStringModifications) -
isDiskBufferEnabled
public boolean isDiskBufferEnabled() -
commit
private void commit()Commit data changes if active. -
setDiskBufferEnabled
public void setDiskBufferEnabled(boolean diskBufferEnabled) -
isMultiConceptLink
Description copied from interface:MultiConceptLinker
Determine whether the link at hand is a multi-concept link.- Specified by:
isMultiConceptLink
in interfaceMultiConceptLinker
- Parameters:
link
- Link to be checked.- Returns:
- True if multi-concept link, else false.
-