java.lang.Object
de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.external.SemanticWordRelationDictionary
de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.external.embeddings.GensimEmbeddingModel
All Implemented Interfaces:
ExternalResource, ExternalResourceWithHypernymCapability, ExternalResourceWithSynonymCapability, HypernymCapability, SynonymCapability, SynonymConfidenceCapability

public class GensimEmbeddingModel extends SemanticWordRelationDictionary implements SynonymConfidenceCapability
This class represents a single gensim embedding model. It allows for simplified usage in matching systems.
  • Field Details

    • LOGGER

      private static final org.slf4j.Logger LOGGER
      Default logger.
    • threshold

      public double threshold
      The desired threshold to declare two concepts as synonymous / same.
    • gensim

      public PythonServer gensim
      Gensim instance
    • entityFile

      public File entityFile
      File to the vocabulary entries of the model.
    • linker

      public LabelToConceptLinker linker
      Linker
    • modelFilePath

      public String modelFilePath
      Required as String in order to build requests.
    • knowledgeSourceName

      public String knowledgeSourceName
      Name of the knowledge source used such as the name of the underlying corpus (can be used to generate matcher name).
  • Constructor Details

    • GensimEmbeddingModel

      public GensimEmbeddingModel(String pathToModelOrVectorFile, String pathToEntityFile, double threshold, LabelToConceptLinker linker, String knowledgeSourceName)
      Constructor
      Parameters:
      pathToModelOrVectorFile - The file path to the gensim model or gensim vector file.
      pathToEntityFile - The path to the vocabulary entries.
      threshold - The threshold that shall be used for the synonymy strategy.
      linker - The appropriate label to concept linker for the given embedding.
      knowledgeSourceName - The name of the knowledge source (will be used as matcher name)
  • Method Details

    • isInDictionary

      public boolean isInDictionary(String word)
    • getSynonymsLexical

      public Set<String> getSynonymsLexical(String linkedConcept)
      Description copied from class: SemanticWordRelationDictionary
      Retrieves a list of synonyms independently of the word sense. The assumed language is English.
      Specified by:
      getSynonymsLexical in interface SynonymCapability
      Specified by:
      getSynonymsLexical in class SemanticWordRelationDictionary
      Parameters:
      linkedConcept - The linked concept for which synonyms shall be retrieved.
      Returns:
      A set of synonyms in word form (not links).
    • getHypernyms

      public Set<String> getHypernyms(String linkedConcept)
      Description copied from class: SemanticWordRelationDictionary
      Retrieves a set of hypernyms independently of the word sense. The assumed language is English.
      Specified by:
      getHypernyms in class SemanticWordRelationDictionary
      Parameters:
      linkedConcept - The linked concept for which hypernyms shall be retrieved.
      Returns:
      A set of linked concepts.
    • close

      public void close()
      Description copied from class: SemanticWordRelationDictionary
      Closing open resources.
      Specified by:
      close in class SemanticWordRelationDictionary
    • isSynonymous

      public boolean isSynonymous(String linkedConcept1, String linkedConcept2)
      Description copied from class: SemanticWordRelationDictionary
      Checks for synonymous words in a loose-form fashion: There has to be an overlap in the two sets of synonyms or word_1 and word_2. The assumed language is English.
      Specified by:
      isSynonymous in interface SynonymCapability
      Overrides:
      isSynonymous in class SemanticWordRelationDictionary
      Parameters:
      linkedConcept1 - linked word 1
      linkedConcept2 - linked word 2
      Returns:
      True if the given words are synonymous, else false.
    • getBestCrossAverage

      public double getBestCrossAverage(Set<String> links1, Set<String> links2)
      Given two sets, save for each concept in the first set the highest similarity that can be found by comparing it with all concepts in the other set. Average the highest similarities.

      Example:
      Set 1: A, B; Set 2: C, D;
      sim(A, C) = 0.75
      sim(A, D) = 0.10
      sim(B, C) = 0.25
      sim(B, D) = 0.05
      This method will return (0.75 + 0.25)/2 = 0.5
      Parameters:
      links1 - Set of links 1.
      links2 - Set of links 2.
      Returns:
      Best average.
    • isStrongFormSynonymous

      public boolean isStrongFormSynonymous(String linkedWord_1, String linkedWord_2)
      Note that the concepts have to be linked.
      Specified by:
      isStrongFormSynonymous in interface SynonymCapability
      Overrides:
      isStrongFormSynonymous in class SemanticWordRelationDictionary
      Parameters:
      linkedWord_1 - linked word 1
      linkedWord_2 - linked word 2
      Returns:
      True if synonymous, else false.
    • getLinker

      public LabelToConceptLinker getLinker()
      Description copied from interface: ExternalResource
      Returns the linker instance for this particular resource.
      Specified by:
      getLinker in interface ExternalResource
      Specified by:
      getLinker in class SemanticWordRelationDictionary
      Returns:
      The specific linker used to link words to concepts.
    • getName

      public String getName()
      Description copied from interface: ExternalResource
      Obtain the name of the resource.
      Specified by:
      getName in interface ExternalResource
      Specified by:
      getName in class SemanticWordRelationDictionary
      Returns:
      Name of the resource.
    • getThreshold

      public double getThreshold()
    • setThreshold

      public void setThreshold(double threshold)
    • getSynonymyConfidence

      public double getSynonymyConfidence(String linkedConcept1, String linkedConcept2)
      If we have two multi-concept links, the similarity of the best combination is returned.

      Example:
      Set 1: A, B; Set 2: C, D;
      sim(A, C) = 0.75
      sim(A, D) = 0.10
      sim(B, C) = 0.25
      sim(B, D) = 0.05
      This method will return 0.75.

      Specified by:
      getSynonymyConfidence in interface SynonymConfidenceCapability
      Parameters:
      linkedConcept1 - Link 1.
      linkedConcept2 - Link 2.
      Returns:
      Confidence.
    • getStrongFormSynonymyConfidence

      public double getStrongFormSynonymyConfidence(String linkedConcept1, String linkedConcept2)
      Description copied from interface: SynonymConfidenceCapability
      Given two links, determine the degree of synonymy.
      Specified by:
      getStrongFormSynonymyConfidence in interface SynonymConfidenceCapability
      Parameters:
      linkedConcept1 - Linked concept 1.
      linkedConcept2 - Linked concept 2.
      Returns:
      True if synonymous, else false.