java.lang.Object
de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.external.SemanticWordRelationDictionary
de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.external.wiktionary.WiktionaryKnowledgeSource
All Implemented Interfaces:
ExternalResource, ExternalResourceWithHypernymCapability, ExternalResourceWithSynonymCapability, HypernymCapability, SynonymCapability

public class WiktionaryKnowledgeSource extends SemanticWordRelationDictionary
Class utilizing DBnary, a SPARQL endpoint for Wiktionary. Alternatively, TDB1 can be used as offline storage.
  • Field Details

    • LOGGER

      private static final org.slf4j.Logger LOGGER
      Logger for this class.
    • persistenceService

      private PersistenceService persistenceService
      Service responsible for disk buffers.
    • synonymyBuffer

      private ConcurrentMap<String,HashSet<String>> synonymyBuffer
      Buffer for synonyms.
    • hypernymyBuffer

      private ConcurrentMap<String,HashSet<String>> hypernymyBuffer
      Buffer for hypernymy.
    • askBuffer

      private ConcurrentMap<String,Boolean> askBuffer
      Buffer for ask queries.
    • translationBuffer

      private ConcurrentMap<String,HashSet<String>> translationBuffer
    • translationOfBuffer

      private ConcurrentMap<String,HashSet<String>> translationOfBuffer
    • tdbDataset

      private org.apache.jena.query.Dataset tdbDataset
      The TDB dataset into which the dbnary data set was loaded.
    • ENDPOINT_URL

      private static final String ENDPOINT_URL
      The public SPARQL endpoint.
      See Also:
    • isUseTdb

      private boolean isUseTdb
      True if a tdb source shall be used rather than an on-line SPARQL endpoint.
    • isDiskBufferEnabled

      private boolean isDiskBufferEnabled
      True if buffers shall be written to disk.
    • linker

      private WiktionaryLinker linker
      The linker that links input strings to terms.
  • Constructor Details

    • WiktionaryKnowledgeSource

      public WiktionaryKnowledgeSource()
      Constructor for Wiktionary online (SPARQL endpoint) access. By default, a disk-buffer is enabled.
    • WiktionaryKnowledgeSource

      public WiktionaryKnowledgeSource(boolean isDiskBufferEnabled)
      Constructor
      Parameters:
      isDiskBufferEnabled - True if buffers shall be written to disk.
    • WiktionaryKnowledgeSource

      public WiktionaryKnowledgeSource(String tdbDirectoryPath)
      Constructor for DBnary TDB access.
      Parameters:
      tdbDirectoryPath - Path to the Wiktionary TDB directory.
  • Method Details

    • initialize

      private void initialize()
      Helper functions for constructor-independent actions.
    • close

      public void close()
      De-constructor; call before ending the program.
      Specified by:
      close in class SemanticWordRelationDictionary
    • isInDictionary

      public boolean isInDictionary(String word)
    • isInDictionary

      public boolean isInDictionary(String word, Language language)
      Language dependent query for existence in the dbnary dictionary. Note that case-sensitivity applies ( (Katze, deu) can be found whereas (katze, deu) will not return any results ).
      Parameters:
      word - The word to be looked for.
      language - The language of the word.
      Returns:
      boolean indicating whether the word exists in the dictionary in the corresponding language.
    • isStrongFormSynonymous

      public boolean isStrongFormSynonymous(String link1, String link2)
      Checks for synonymy by determining whether link1 is contained in the set of synonymous words of link2 or vice versa.
      Specified by:
      isStrongFormSynonymous in interface SynonymCapability
      Overrides:
      isStrongFormSynonymous in class SemanticWordRelationDictionary
      Parameters:
      link1 - Word 1
      link2 - Word 2
      Returns:
      True if the given words are synonymous, else false.
    • getSynonymsEncoded

      public Set<String> getSynonymsEncoded(String linkedConcept)
    • getSynonymsLexical

      public Set<String> getSynonymsLexical(String linkedConcept)
      Description copied from class: SemanticWordRelationDictionary
      Retrieves a list of synonyms independently of the word sense. The assumed language is English.
      Specified by:
      getSynonymsLexical in interface SynonymCapability
      Specified by:
      getSynonymsLexical in class SemanticWordRelationDictionary
      Parameters:
      linkedConcept - The linked concept for which synonyms shall be retrieved.
      Returns:
      A set of synonyms in word form (not links).
    • getSynonyms

      public HashSet<String> getSynonyms(String word, Language language)
      Retrieves the synonyms of a particular word in a particular language.
      Parameters:
      word - Word for which the synonyms shall be retrieved.
      language - Language of the word.
      Returns:
      Set of synonyms.
    • getLemmaFromURI

      private static String getLemmaFromURI(String uri)
      Given a resource URI, this method will transform it to a lemma.
      Parameters:
      uri - Resource URI to be transformed.
      Returns:
      Lemma.
    • encodeWord

      static String encodeWord(String word)
      Encodes words so that they can be looked up in the wiktionary dictionary.
      Parameters:
      word - Word to be encoded.
      Returns:
      encoded word
    • getHypernyms

      public HashSet<String> getHypernyms(String linkedConcept)
      Obtain hypernyms for the given concept. The assumed language is English.
      Specified by:
      getHypernyms in class SemanticWordRelationDictionary
      Parameters:
      linkedConcept - The linked concept for which hypernyms shall be retrieved.
      Returns:
      A set of hypernyms.
    • getHypernyms

      public HashSet<String> getHypernyms(String linkedConcept, Language language)
      Obtain hypernyms for the given concept.
      Parameters:
      linkedConcept - The linked concept for which hypernyms shall be retrieved.
      language - The desired language of the hypernyms.
      Returns:
      A set of hypernyms.
    • getTranslation

      public HashSet<String> getTranslation(String linkedConcept, Language sourceLanguage, Language targetLanguage)
      Obtain the translations for the linked concept.
      Parameters:
      linkedConcept - The concept that was linked.
      sourceLanguage - Language of the linked concept.
      targetLanguage - Language to which the concept shall be translated.
      Returns:
      The result is not a linked concept but instead a word.
    • getTranslationOf

      public HashSet<String> getTranslationOf(String translationString, Language languageOfTranslation)
      Given a translation, find concepts which state that the given translation is their translation.
      Parameters:
      translationString - The translation (textual string).
      languageOfTranslation - The language of the translationString.
      Returns:
      A set of concepts of which translation is the given translation.
    • isTranslationDerived

      public boolean isTranslationDerived(String word_1, Language language_1, String word_2, Language language_2)
      Checks whether the two words are translation of the same word (this mechanism uses another language as common denominator).
      Parameters:
      word_1 - Word 1 (does not have to be linked).
      language_1 - Language 1.
      word_2 - Word 2 (does not have to be linked).
      language_2 - Language 2.
      Returns:
      True, if a translation can be derived; else false.
    • isTranslationLinked

      public boolean isTranslationLinked(String linkedConceptToBeTranslated, Language language_1, String linkedConcept_2, Language language_2)
      Checks whether linkedConceptToBeTranslated can be translated to linkedConcept_2. Note that BOTH concepts have to be linked.
      Parameters:
      linkedConceptToBeTranslated - Linked concept
      language_1 - Language of linkedConceptToBeTranslated.
      linkedConcept_2 - Linked concept
      language_2 - Language of linkedConcept_2.
      Returns:
      True if translation from linkedConceptToBeTranslated to linkedConcept_2 possible, else false.
    • isTranslationNonLinked

      public boolean isTranslationNonLinked(String linkedConceptToBeTranslated, Language language_1, String nonlinkedConcept_2, Language language_2)
      Checks whether linkedConceptToBeTranslated can be translated to non linked concept 2. Note that the first concept has to be linked.
      Parameters:
      linkedConceptToBeTranslated - The linked concept.
      language_1 - Language of linkedConceptToBeTranslated.
      nonlinkedConcept_2 - Concept not linked (just a string).
      language_2 - Language of linkedConcept_2.
      Returns:
      True if translation from linkedConceptToBeTranslated to linkedConcept_2 possible, else false.
    • getNormalizedTranslations

      public HashSet<String> getNormalizedTranslations(String linkedConcept, Language sourceLanguage, Language targetLanguage)
      Looks for translations of the given string. The translations are non-aggressively normalized (lower-case etc.) and returned.
      Parameters:
      linkedConcept - The linked concept for which translations shall be obtained.
      sourceLanguage - Source language.
      targetLanguage - Target language.
      Returns:
      The result is not a linked concept but instead a word that was normalized.
    • isUseTdb

      public boolean isUseTdb()
    • normalizeForTranslations

      public static HashSet<String> normalizeForTranslations(HashSet<String> setToBeNormalized)
      Normalization Function for translations.
      Parameters:
      setToBeNormalized - Set whose strings shall be normalized.
      Returns:
      HashSet with Normalized Strings.
    • normalizeForTranslations

      public static String normalizeForTranslations(String stringToBeNormalized)
    • commit

      private void commit(PersistenceService.PreconfiguredPersistences persistence)
      Commit persistence.
      Parameters:
      persistence - The persistence that is to be committed.
    • commitAll

      private void commitAll()
      Commit data changes if active.
    • getLinker

      public LabelToConceptLinker getLinker()
      Description copied from interface: ExternalResource
      Returns the linker instance for this particular resource.
      Specified by:
      getLinker in interface ExternalResource
      Specified by:
      getLinker in class SemanticWordRelationDictionary
      Returns:
      The specific linker used to link words to concepts.
    • getName

      public String getName()
      Description copied from interface: ExternalResource
      Obtain the name of the resource.
      Specified by:
      getName in interface ExternalResource
      Specified by:
      getName in class SemanticWordRelationDictionary
      Returns:
      Name of the resource.