Class GensimEmbeddingModel
java.lang.Object
de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.external.SemanticWordRelationDictionary
de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.external.embeddings.GensimEmbeddingModel
- All Implemented Interfaces:
ExternalResource
,ExternalResourceWithHypernymCapability
,ExternalResourceWithSynonymCapability
,HypernymCapability
,SynonymCapability
,SynonymConfidenceCapability
public class GensimEmbeddingModel
extends SemanticWordRelationDictionary
implements SynonymConfidenceCapability
This class represents a single gensim embedding model.
It allows for simplified usage in matching systems.
-
Field Summary
Modifier and TypeFieldDescriptionFile to the vocabulary entries of the model.Gensim instanceName of the knowledge source used such as the name of the underlying corpus (can be used to generate matcher name).Linkerprivate static final org.slf4j.Logger
Default logger.Required as String in order to build requests.double
The desired threshold to declare two concepts as synonymous / same. -
Constructor Summary
ConstructorDescriptionGensimEmbeddingModel
(String pathToModelOrVectorFile, String pathToEntityFile, double threshold, LabelToConceptLinker linker, String knowledgeSourceName) Constructor -
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
Closing open resources.double
getBestCrossAverage
(Set<String> links1, Set<String> links2) Given two sets, save for each concept in the first set the highest similarity that can be found by comparing it with all concepts in the other set.getHypernyms
(String linkedConcept) Retrieves a set of hypernyms independently of the word sense.Returns the linker instance for this particular resource.getName()
Obtain the name of the resource.double
getStrongFormSynonymyConfidence
(String linkedConcept1, String linkedConcept2) Given two links, determine the degree of synonymy.getSynonymsLexical
(String linkedConcept) Retrieves a list of synonyms independently of the word sense.double
getSynonymyConfidence
(String linkedConcept1, String linkedConcept2) If we have two multi-concept links, the similarity of the best combination is returned.double
boolean
isInDictionary
(String word) boolean
isStrongFormSynonymous
(String linkedWord_1, String linkedWord_2) Note that the concepts have to be linked.boolean
isSynonymous
(String linkedConcept1, String linkedConcept2) Checks for synonymous words in a loose-form fashion: There has to be an overlap in the two sets of synonyms or word_1 and word_2.void
setThreshold
(double threshold) Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.external.SemanticWordRelationDictionary
isHypernym, isHypernym, isHypernymous, isSynonymousOrHypernymous
-
Field Details
-
LOGGER
private static final org.slf4j.Logger LOGGERDefault logger. -
threshold
public double thresholdThe desired threshold to declare two concepts as synonymous / same. -
gensim
Gensim instance -
entityFile
File to the vocabulary entries of the model. -
linker
Linker -
modelFilePath
Required as String in order to build requests. -
knowledgeSourceName
Name of the knowledge source used such as the name of the underlying corpus (can be used to generate matcher name).
-
-
Constructor Details
-
GensimEmbeddingModel
public GensimEmbeddingModel(String pathToModelOrVectorFile, String pathToEntityFile, double threshold, LabelToConceptLinker linker, String knowledgeSourceName) Constructor- Parameters:
pathToModelOrVectorFile
- The file path to the gensim model or gensim vector file.pathToEntityFile
- The path to the vocabulary entries.threshold
- The threshold that shall be used for the synonymy strategy.linker
- The appropriate label to concept linker for the given embedding.knowledgeSourceName
- The name of the knowledge source (will be used as matcher name)
-
-
Method Details
-
isInDictionary
-
getSynonymsLexical
Description copied from class:SemanticWordRelationDictionary
Retrieves a list of synonyms independently of the word sense. The assumed language is English.- Specified by:
getSynonymsLexical
in interfaceSynonymCapability
- Specified by:
getSynonymsLexical
in classSemanticWordRelationDictionary
- Parameters:
linkedConcept
- The linked concept for which synonyms shall be retrieved.- Returns:
- A set of synonyms in word form (not links).
-
getHypernyms
Description copied from class:SemanticWordRelationDictionary
Retrieves a set of hypernyms independently of the word sense. The assumed language is English.- Specified by:
getHypernyms
in classSemanticWordRelationDictionary
- Parameters:
linkedConcept
- The linked concept for which hypernyms shall be retrieved.- Returns:
- A set of linked concepts.
-
close
public void close()Description copied from class:SemanticWordRelationDictionary
Closing open resources.- Specified by:
close
in classSemanticWordRelationDictionary
-
isSynonymous
Description copied from class:SemanticWordRelationDictionary
Checks for synonymous words in a loose-form fashion: There has to be an overlap in the two sets of synonyms or word_1 and word_2. The assumed language is English.- Specified by:
isSynonymous
in interfaceSynonymCapability
- Overrides:
isSynonymous
in classSemanticWordRelationDictionary
- Parameters:
linkedConcept1
- linked word 1linkedConcept2
- linked word 2- Returns:
- True if the given words are synonymous, else false.
-
getBestCrossAverage
Given two sets, save for each concept in the first set the highest similarity that can be found by comparing it with all concepts in the other set. Average the highest similarities.
Example:
Set 1: A, B; Set 2: C, D;
sim(A, C) = 0.75
sim(A, D) = 0.10
sim(B, C) = 0.25
sim(B, D) = 0.05
This method will return (0.75 + 0.25)/2 = 0.5- Parameters:
links1
- Set of links 1.links2
- Set of links 2.- Returns:
- Best average.
-
isStrongFormSynonymous
Note that the concepts have to be linked.- Specified by:
isStrongFormSynonymous
in interfaceSynonymCapability
- Overrides:
isStrongFormSynonymous
in classSemanticWordRelationDictionary
- Parameters:
linkedWord_1
- linked word 1linkedWord_2
- linked word 2- Returns:
- True if synonymous, else false.
-
getLinker
Description copied from interface:ExternalResource
Returns the linker instance for this particular resource.- Specified by:
getLinker
in interfaceExternalResource
- Specified by:
getLinker
in classSemanticWordRelationDictionary
- Returns:
- The specific linker used to link words to concepts.
-
getName
Description copied from interface:ExternalResource
Obtain the name of the resource.- Specified by:
getName
in interfaceExternalResource
- Specified by:
getName
in classSemanticWordRelationDictionary
- Returns:
- Name of the resource.
-
getThreshold
public double getThreshold() -
setThreshold
public void setThreshold(double threshold) -
getSynonymyConfidence
If we have two multi-concept links, the similarity of the best combination is returned.Example:
Set 1: A, B; Set 2: C, D;
sim(A, C) = 0.75
sim(A, D) = 0.10
sim(B, C) = 0.25
sim(B, D) = 0.05
This method will return 0.75.- Specified by:
getSynonymyConfidence
in interfaceSynonymConfidenceCapability
- Parameters:
linkedConcept1
- Link 1.linkedConcept2
- Link 2.- Returns:
- Confidence.
-
getStrongFormSynonymyConfidence
Description copied from interface:SynonymConfidenceCapability
Given two links, determine the degree of synonymy.- Specified by:
getStrongFormSynonymyConfidence
in interfaceSynonymConfidenceCapability
- Parameters:
linkedConcept1
- Linked concept 1.linkedConcept2
- Linked concept 2.- Returns:
- True if synonymous, else false.
-