Class MultiSourceDispatcherIncrementalMergeByClusterText
java.lang.Object
de.uni_mannheim.informatik.dws.melt.matching_base.multisource.MatcherMultiSourceURL
de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.multisource.dispatchers.MultiSourceDispatcherIncrementalMerge
de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.multisource.dispatchers.MultiSourceDispatcherIncrementalMergeByCluster
de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.multisource.dispatchers.MultiSourceDispatcherIncrementalMergeByClusterText
- All Implemented Interfaces:
IMatcherMultiSourceCaller
,MultiSourceDispatcher
public class MultiSourceDispatcherIncrementalMergeByClusterText
extends MultiSourceDispatcherIncrementalMergeByCluster
Matches multiple ontologies / knowledge graphs with an incremental merge approach.
This means that two ontologies are merged together and then possibly the union is merged with another ontology and so on.
The order how they are merged is defined by subclasses.
-
Field Summary
Modifier and TypeFieldDescriptionprivate static final Pattern
private boolean
private static final org.slf4j.Logger
private double
private double
private static final String
private static smile.nlp.normalizer.SimpleNormalizer
private static BreakIterator
private static final Pattern
-
Constructor Summary
ConstructorDescriptionMultiSourceDispatcherIncrementalMergeByClusterText
(Object oneToOneMatcher) MultiSourceDispatcherIncrementalMergeByClusterText
(Object oneToOneMatcher, ClusterLinkage linkage) MultiSourceDispatcherIncrementalMergeByClusterText
(Object oneToOneMatcher, ClusterLinkage linkage, double mindf, double maxdf) MultiSourceDispatcherIncrementalMergeByClusterText
(Supplier<Object> matcherSupplier) MultiSourceDispatcherIncrementalMergeByClusterText
(Supplier<Object> matcherSupplier, ClusterLinkage linkage) MultiSourceDispatcherIncrementalMergeByClusterText
(Supplier<Object> matcherSupplier, ClusterLinkage linkage, double mindf, double maxdf) -
Method Summary
Modifier and TypeMethodDescriptionprivate void
checkDocumentFrequency
(double df) getBagOfWords
(org.apache.jena.rdf.model.Model m) double[][]
getClusterFeatures
(List<Set<Object>> models, Object parameters) double
getMaxdf()
Returns the maximum document frequency (relative) a token needs to have, to be included as a feature.double
getMindf()
Returns the minimum document frequency (relative) a token needs to have, to be included as a feature.boolean
isDebug()
Returns true, if debug files are written.private static boolean
isLiteralAString
(org.apache.jena.rdf.model.Literal lit) void
setDebug
(boolean debug) If set to true, write some file which contains helpful information e.g.void
setMaxdf
(double maxdf) Sets the maximum document frequency (relative) a token needs to have, to be included as a feature.void
setMindf
(double mindf) Sets the minimum document frequency (relative) a token needs to have, to be included as a feature.private static String
splitFragment
(String text) private void
writeTextualRepresentationOfModel
(org.apache.jena.rdf.model.Model m, File f) Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.multisource.dispatchers.MultiSourceDispatcherIncrementalMergeByCluster
getClusterer, getDistance, getLinkage, getMergeTree, setClusterer, setDistance, setLinkage
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.multisource.dispatchers.MultiSourceDispatcherIncrementalMerge
getCacheFile, getCopyMode, getIntermediateAlignments, getMatcherSupplier, getNumberOfThreads, getSerializedTreeFile, isAddingInformationToUnion, isLeftModelGreater, isRemoveUnusedJenaModels, isSavingIntermediateAlignments, match, match, needsTransitiveClosureForEvaluation, setAddingInformationToUnion, setCacheFile, setCopyMode, setGoldStandard, setGoldStandard, setMatcherSupplier, setNumberOfThreads, setNumberOfThreadsToCpuCores, setRemoveUnusedJenaModels, setSavingIntermediateAlignments, setSerializedTreeFile
-
Field Details
-
LOGGER
private static final org.slf4j.Logger LOGGER -
mindf
private double mindf -
maxdf
private double maxdf -
debug
private boolean debug -
SENTENCE_SPLITTER
-
NORMALIZER
private static smile.nlp.normalizer.SimpleNormalizer NORMALIZER -
URI_SEPARATOR
-
CAMEL_CASE_SPLIT
-
NEWLINE
-
-
Constructor Details
-
MultiSourceDispatcherIncrementalMergeByClusterText
public MultiSourceDispatcherIncrementalMergeByClusterText(Object oneToOneMatcher, ClusterLinkage linkage, double mindf, double maxdf) -
MultiSourceDispatcherIncrementalMergeByClusterText
public MultiSourceDispatcherIncrementalMergeByClusterText(Object oneToOneMatcher, ClusterLinkage linkage) -
MultiSourceDispatcherIncrementalMergeByClusterText
-
MultiSourceDispatcherIncrementalMergeByClusterText
public MultiSourceDispatcherIncrementalMergeByClusterText(Supplier<Object> matcherSupplier, ClusterLinkage linkage, double mindf, double maxdf) -
MultiSourceDispatcherIncrementalMergeByClusterText
public MultiSourceDispatcherIncrementalMergeByClusterText(Supplier<Object> matcherSupplier, ClusterLinkage linkage) -
MultiSourceDispatcherIncrementalMergeByClusterText
-
-
Method Details
-
getClusterFeatures
- Specified by:
getClusterFeatures
in classMultiSourceDispatcherIncrementalMergeByCluster
-
getBagOfWords
-
splitFragment
-
writeTextualRepresentationOfModel
-
isLiteralAString
private static boolean isLiteralAString(org.apache.jena.rdf.model.Literal lit) -
getMindf
public double getMindf()Returns the minimum document frequency (relative) a token needs to have, to be included as a feature. Default is 0.0 (to include all tokens).- Returns:
- the relative minimumg document frequency.
-
setMindf
public void setMindf(double mindf) Sets the minimum document frequency (relative) a token needs to have, to be included as a feature. Default is 0.0 (to include all tokens).- Parameters:
mindf
- the minimum document frequency (relative). This needs to be between 0.0 and 1.0.
-
getMaxdf
public double getMaxdf()Returns the maximum document frequency (relative) a token needs to have, to be included as a feature. Default is 1.0 (to include all tokens).- Returns:
- the relative maximum document frequency.
-
setMaxdf
public void setMaxdf(double maxdf) Sets the maximum document frequency (relative) a token needs to have, to be included as a feature. Default is 1.0 (to include all tokens).- Parameters:
maxdf
- the maximum document frequency (relative). This needs to be between 0.0 and 1.0.
-
isDebug
public boolean isDebug()Returns true, if debug files are written. Default is false.- Returns:
- true, if debug files are written
-
setDebug
public void setDebug(boolean debug) If set to true, write some file which contains helpful information e.g. documentFrequency.json file which contains all information about all words and their document frequency. Default is false.- Parameters:
debug
- if true, write debug files
-
checkDocumentFrequency
private void checkDocumentFrequency(double df)
-