de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.multisource.dispatchers.MultiSourceDispatcherIncrementalMergeByClusterText

All Implemented Interfaces:: IMatcherMultiSourceCaller, MultiSourceDispatcher

public class MultiSourceDispatcherIncrementalMergeByClusterText extends MultiSourceDispatcherIncrementalMergeByCluster

Matches multiple ontologies / knowledge graphs with an incremental merge approach. This means that two ontologies are merged together and then possibly the union is merged with another ontology and so on. The order how they are merged is defined by subclasses.

Field Summary

Fields

Modifier and Type

Field

Description

private static final Pattern

CAMEL_CASE_SPLIT

private boolean

debug

private static final org.slf4j.Logger

LOGGER

private double

maxdf

private double

mindf

private static final String

NEWLINE

private static smile.nlp.normalizer.SimpleNormalizer

NORMALIZER

private static BreakIterator

SENTENCE_SPLITTER

private static final Pattern

URI_SEPARATOR
Constructor Summary

Constructors

Constructor

Description

MultiSourceDispatcherIncrementalMergeByClusterText(Object oneToOneMatcher)

MultiSourceDispatcherIncrementalMergeByClusterText(Object oneToOneMatcher, ClusterLinkage linkage)

MultiSourceDispatcherIncrementalMergeByClusterText(Object oneToOneMatcher, ClusterLinkage linkage, double mindf, double maxdf)

MultiSourceDispatcherIncrementalMergeByClusterText(Supplier<Object> matcherSupplier)

MultiSourceDispatcherIncrementalMergeByClusterText(Supplier<Object> matcherSupplier, ClusterLinkage linkage)

MultiSourceDispatcherIncrementalMergeByClusterText(Supplier<Object> matcherSupplier, ClusterLinkage linkage, double mindf, double maxdf)
Method Summary

Modifier and Type

Method

Description

private void

checkDocumentFrequency(double df)

Counter<String>

getBagOfWords(org.apache.jena.rdf.model.Model m)

double[][]

getClusterFeatures(List<Set<Object>> models, Object parameters)

double

getMaxdf()

Returns the maximum document frequency (relative) a token needs to have, to be included as a feature.

double

getMindf()

Returns the minimum document frequency (relative) a token needs to have, to be included as a feature.

boolean

isDebug()

Returns true, if debug files are written.

private static boolean

isLiteralAString(org.apache.jena.rdf.model.Literal lit)

void

setDebug(boolean debug)

If set to true, write some file which contains helpful information e.g.

void

setMaxdf(double maxdf)

Sets the maximum document frequency (relative) a token needs to have, to be included as a feature.

void

setMindf(double mindf)

Sets the minimum document frequency (relative) a token needs to have, to be included as a feature.

private static String

splitFragment(String text)

private void

writeTextualRepresentationOfModel(org.apache.jena.rdf.model.Model m, File f)

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.multisource.dispatchers.MultiSourceDispatcherIncrementalMergeByCluster
getClusterer, getDistance, getLinkage, getMergeTree, setClusterer, setDistance, setLinkage

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.multisource.dispatchers.MultiSourceDispatcherIncrementalMerge
getCacheFile, getCopyMode, getIntermediateAlignments, getMatcherSupplier, getNumberOfThreads, getSerializedTreeFile, isAddingInformationToUnion, isLeftModelGreater, isRemoveUnusedJenaModels, isSavingIntermediateAlignments, match, match, needsTransitiveClosureForEvaluation, setAddingInformationToUnion, setCacheFile, setCopyMode, setGoldStandard, setGoldStandard, setMatcherSupplier, setNumberOfThreads, setNumberOfThreadsToCpuCores, setRemoveUnusedJenaModels, setSavingIntermediateAlignments, setSerializedTreeFile

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- LOGGER
  
  private static final org.slf4j.Logger LOGGER
- mindf
  
  private double mindf
- maxdf
  
  private double maxdf
- debug
  
  private boolean debug
- SENTENCE_SPLITTER
  
  private static BreakIterator SENTENCE_SPLITTER
- NORMALIZER
  
  private static smile.nlp.normalizer.SimpleNormalizer NORMALIZER
- URI_SEPARATOR
  
  private static final Pattern URI_SEPARATOR
- CAMEL_CASE_SPLIT
  
  private static final Pattern CAMEL_CASE_SPLIT
- NEWLINE
  
  private static final String NEWLINE
Constructor Details
- MultiSourceDispatcherIncrementalMergeByClusterText
  
  public MultiSourceDispatcherIncrementalMergeByClusterText(Object oneToOneMatcher, ClusterLinkage linkage, double mindf, double maxdf)
- MultiSourceDispatcherIncrementalMergeByClusterText
  
  public MultiSourceDispatcherIncrementalMergeByClusterText(Object oneToOneMatcher, ClusterLinkage linkage)
- MultiSourceDispatcherIncrementalMergeByClusterText
  
  public MultiSourceDispatcherIncrementalMergeByClusterText(Object oneToOneMatcher)
- MultiSourceDispatcherIncrementalMergeByClusterText
  
  public MultiSourceDispatcherIncrementalMergeByClusterText(Supplier<Object> matcherSupplier, ClusterLinkage linkage, double mindf, double maxdf)
- MultiSourceDispatcherIncrementalMergeByClusterText
  
  public MultiSourceDispatcherIncrementalMergeByClusterText(Supplier<Object> matcherSupplier, ClusterLinkage linkage)
- MultiSourceDispatcherIncrementalMergeByClusterText
  
  public MultiSourceDispatcherIncrementalMergeByClusterText(Supplier<Object> matcherSupplier)
Method Details
- getClusterFeatures
  
  public double[][] getClusterFeatures(List<Set<Object>> models, Object parameters)
  
  Specified by:
  
  getClusterFeatures in class MultiSourceDispatcherIncrementalMergeByCluster
- getBagOfWords
  
  public Counter<String> getBagOfWords(org.apache.jena.rdf.model.Model m)
- splitFragment
  
  private static String splitFragment(String text)
- writeTextualRepresentationOfModel
  
  private void writeTextualRepresentationOfModel(org.apache.jena.rdf.model.Model m, File f)
- isLiteralAString
  
  private static boolean isLiteralAString(org.apache.jena.rdf.model.Literal lit)
- getMindf
  
  public double getMindf()
  
  Returns the minimum document frequency (relative) a token needs to have, to be included as a feature. Default is 0.0 (to include all tokens).
  
  Returns:
  
  the relative minimumg document frequency.
- setMindf
  
  public void setMindf(double mindf)
  
  Sets the minimum document frequency (relative) a token needs to have, to be included as a feature. Default is 0.0 (to include all tokens).
  
  Parameters:
  
  mindf - the minimum document frequency (relative). This needs to be between 0.0 and 1.0.
- getMaxdf
  
  public double getMaxdf()
  
  Returns the maximum document frequency (relative) a token needs to have, to be included as a feature. Default is 1.0 (to include all tokens).
  
  Returns:
  
  the relative maximum document frequency.
- setMaxdf
  
  public void setMaxdf(double maxdf)
  
  Sets the maximum document frequency (relative) a token needs to have, to be included as a feature. Default is 1.0 (to include all tokens).
  
  Parameters:
  
  maxdf - the maximum document frequency (relative). This needs to be between 0.0 and 1.0.
- isDebug
  
  public boolean isDebug()
  
  Returns true, if debug files are written. Default is false.
  
  Returns:
  
  true, if debug files are written
- setDebug
  
  public void setDebug(boolean debug)
  
  If set to true, write some file which contains helpful information e.g. documentFrequency.json file which contains all information about all words and their document frequency. Default is false.
  
  Parameters:
  
  debug - if true, write debug files
- checkDocumentFrequency
  
  private void checkDocumentFrequency(double df)

Class MultiSourceDispatcherIncrementalMergeByClusterText

Field Summary

Constructor Summary

Method Summary

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.multisource.dispatchers.MultiSourceDispatcherIncrementalMergeByCluster

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.multisource.dispatchers.MultiSourceDispatcherIncrementalMerge

Methods inherited from class java.lang.Object

Field Details

LOGGER

mindf

maxdf

debug

SENTENCE_SPLITTER

NORMALIZER

URI_SEPARATOR

CAMEL_CASE_SPLIT

NEWLINE

Constructor Details

MultiSourceDispatcherIncrementalMergeByClusterText

MultiSourceDispatcherIncrementalMergeByClusterText

MultiSourceDispatcherIncrementalMergeByClusterText

MultiSourceDispatcherIncrementalMergeByClusterText

MultiSourceDispatcherIncrementalMergeByClusterText

MultiSourceDispatcherIncrementalMergeByClusterText

Method Details

getClusterFeatures

getBagOfWords

splitFragment

writeTextualRepresentationOfModel

isLiteralAString

getMindf

setMindf

getMaxdf

setMaxdf

isDebug

setDebug

checkDocumentFrequency