java.lang.Object
de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.external.wikidata.WikidataLinker
All Implemented Interfaces:
LabelToConceptLinker, MultiConceptLinker

public class WikidataLinker extends Object implements LabelToConceptLinker, MultiConceptLinker
This linker links strings to Wikidata concepts. Artificial links are introduced here starting with MULTI_CONCEPT_PREFIX. The refer to a bag of links. All methods can work with URIs and with those multi-concept links! The linkToSingleConcept(String) method, for example, will return a multi label link. In order to obtain the actual Wikidata URIs, use method getUris(String).
  • Field Details

    • LOGGER

      private static org.slf4j.Logger LOGGER
      Default logger
    • stringModificationSequence

      LinkedList<StringModifier> stringModificationSequence
      The list of operations that is performed to find a concept in the dictionary. Only used if isRunAllStringModifications is false.
    • stringModificationSet

      Set<StringModifier> stringModificationSet
      A set of string operations that are all performed. Only used if isRunAllStringModifications is false.
    • ENDPOINT_URL

      private static final String ENDPOINT_URL
      The public SPARQL endpoint.
      See Also:
    • linkerName

      private String linkerName
      Linker name
    • MULTI_CONCEPT_PREFIX

      public static final String MULTI_CONCEPT_PREFIX
      Universal prefix for multi concepts.
      See Also:
    • persistenceService

      PersistenceService persistenceService
      Service responsible for disk buffers.
    • isRunAllStringModifications

      private boolean isRunAllStringModifications
      If true, all string modifications are performed to gain a high concept coverage. This is by default true. If false, this may result to more precise results with a lower coverage. Performance-wise: true will trigger only one query per linking operation, false may trigger many queries.
    • isDiskBufferEnabled

      private boolean isDiskBufferEnabled
      If the disk-buffer is disabled, no buffers are read/written from/to the disk. Default: true.
    • multiLinkStore

      private static ConcurrentMap<String,Set<String>> multiLinkStore
      Typically, one label refers to multiple wikidata concepts. Hence, they are summarized in this data structure with the multi-concept as key. A multi-concept must start with the MULTI_CONCEPT_PREFIX. The data structure is also used as cache.
  • Constructor Details

    • WikidataLinker

      public WikidataLinker()
      Constructor
    • WikidataLinker

      public WikidataLinker(boolean isDiskBufferEnabled)
      Constructor
      Parameters:
      isDiskBufferEnabled - True if the disk buffer shall be enabled.
  • Method Details

    • initializeBuffers

      private void initializeBuffers()
      Initialization of buffers.
    • getUris

      public Set<String> getUris(String multiConceptLink)
      Given a multiConceptLink, this method will return the individual links.
      Specified by:
      getUris in interface MultiConceptLinker
      Parameters:
      multiConceptLink - The lookup link.
      Returns:
      Individual links, empty set if there are none.
    • getUris

      public Set<String> getUris(Set<String> multipleLinks)
      Given a set of links where the links can be multi concept links or direct links, a set of only direct links is returned.
      Parameters:
      multipleLinks - Set with multiple links. Multi concept links can be mixed with direct links.
      Returns:
      A set with only direct links.
    • linkToSingleConcept

      public String linkToSingleConcept(String labelToBeLinked)
      Will link one label to a multi-link concept.
      Specified by:
      linkToSingleConcept in interface LabelToConceptLinker
      Parameters:
      labelToBeLinked - The label which shall be linked to a single concept.
      Returns:
      Link as String (!= wikidata URI).
    • linkToSingleConcept

      public String linkToSingleConcept(String labelToBeLinked, Language language)
      Link to one concept. Note: Technically, one link will be returned BUT this link may represent multiple concepts. To retrieve those concepts, method getUris(String) is to be called.
      Parameters:
      labelToBeLinked - The label which shall be used to link to a concept.
      language - Language of the label to be linked.
      Returns:
      One link representing one or more concepts on Wikidata as String. The link != URI!
    • linkToSingleConceptGreedy

      private String linkToSingleConceptGreedy(String labelToBeLinked, Language language)
      Helper method. Multiple string operations are tried out. If one wikidata concept could be found, the concept is immediately returned and the process stops prematurely.
      Parameters:
      labelToBeLinked - The label that shall be linked.
      language - The language of the label.
      Returns:
      One link representing one or more concepts on Wikidata as String. The link != URI!
    • linkToSingleConceptByRunningAllModifications

      private String linkToSingleConceptByRunningAllModifications(String labelToBeLinked, Language language)
      Helper method: Will perform all string modifications and collect all concepts found thereby.
      Parameters:
      labelToBeLinked - The label that shall be linked.
      language - Language of the label.
      Returns:
      Link as String (!= Wikidata URI)
    • linkToPotentiallyMultipleConcepts

      public Set<String> linkToPotentiallyMultipleConcepts(String labelToBeLinked)
      Description copied from interface: LabelToConceptLinker
      This method tries to link labelToBeLinked to one concept if possible. If it fails, it will try to link it to multiple concepts.
      Specified by:
      linkToPotentiallyMultipleConcepts in interface LabelToConceptLinker
      Parameters:
      labelToBeLinked - The label which shall be linked.
      Returns:
      One or multiple linked concepts in a set. Null if it could not fully link the label.
    • linkToPotentiallyMultipleConcepts

      public HashSet<String> linkToPotentiallyMultipleConcepts(String labelToBeLinked, Language language)
    • linkLabelToTokensLeftToRight

      private HashSet<String> linkLabelToTokensLeftToRight(String labelToBeLinked, Language language)
      Splits the labelToBeLinked in ngrams up to infinite size and tries to link components. This corresponds to a MAXGRAM_LEFT_TO_RIGHT_TOKENIZER or NGRAM_LEFT_TO_RIGHT_TOKENIZER OneToManyLinkingStrategy.
      Parameters:
      labelToBeLinked - The label that shall be linked.
      language - The language of the label.
      Returns:
      A set of concept URIs that were found.
    • linkWithLabel

      private List<String> linkWithLabel(String label, Language language)
      Given a label, a set of Wikidata concepts (= URIs as String) will be returned that carry that label.
      Parameters:
      label - The label to be used for the lookup.
      language - The language of the given label.
      Returns:
      A list of URIs in String form.
    • linkWithMultipleLabels

      private Set<String> linkWithMultipleLabels(Set<String> labels, Language language)
      This will check the labels as well as the alternative labels in one query.
      Parameters:
      labels - A set of labels that shall be used for the linking operation.
      language - The language of the labels.
      Returns:
      Set of URIs that have been found.
    • buildFragmentLabelAltLabel

      private String buildFragmentLabelAltLabel(String label, Language language)
    • linkWithAltLabel

      private List<String> linkWithAltLabel(String label, Language language)
      Link with alternative label.
      Parameters:
      label - Label.
      language - Language.
      Returns:
      A list of URIs in String format.
    • getNameOfLinker

      public String getNameOfLinker()
      Description copied from interface: LabelToConceptLinker
      Get instance specific name of the linker.
      Specified by:
      getNameOfLinker in interface LabelToConceptLinker
      Returns:
      Name as String.
    • setNameOfLinker

      public void setNameOfLinker(String nameOfLinker)
      Description copied from interface: LabelToConceptLinker
      Set instance specific name of the linker.
      Specified by:
      setNameOfLinker in interface LabelToConceptLinker
      Parameters:
      nameOfLinker - Name to be set.
    • isRunAllStringModifications

      public boolean isRunAllStringModifications()
    • setRunAllStringModifications

      public void setRunAllStringModifications(boolean runAllStringModifications)
    • isDiskBufferEnabled

      public boolean isDiskBufferEnabled()
    • commit

      private void commit()
      Commit data changes if active.
    • setDiskBufferEnabled

      public void setDiskBufferEnabled(boolean diskBufferEnabled)
    • isMultiConceptLink

      public boolean isMultiConceptLink(String link)
      Description copied from interface: MultiConceptLinker
      Determine whether the link at hand is a multi-concept link.
      Specified by:
      isMultiConceptLink in interface MultiConceptLinker
      Parameters:
      link - Link to be checked.
      Returns:
      True if multi-concept link, else false.