Class MultiSourceDispatcherIncrementalMergeByClusterText

All Implemented Interfaces:
IMatcherMultiSourceCaller, MultiSourceDispatcher

public class MultiSourceDispatcherIncrementalMergeByClusterText extends MultiSourceDispatcherIncrementalMergeByCluster
Matches multiple ontologies / knowledge graphs with an incremental merge approach. This means that two ontologies are merged together and then possibly the union is merged with another ontology and so on. The order how they are merged is defined by subclasses.
  • Field Details

    • LOGGER

      private static final org.slf4j.Logger LOGGER
    • mindf

      private double mindf
    • maxdf

      private double maxdf
    • debug

      private boolean debug
    • SENTENCE_SPLITTER

      private static BreakIterator SENTENCE_SPLITTER
    • NORMALIZER

      private static smile.nlp.normalizer.SimpleNormalizer NORMALIZER
    • URI_SEPARATOR

      private static final Pattern URI_SEPARATOR
    • CAMEL_CASE_SPLIT

      private static final Pattern CAMEL_CASE_SPLIT
    • NEWLINE

      private static final String NEWLINE
  • Constructor Details

    • MultiSourceDispatcherIncrementalMergeByClusterText

      public MultiSourceDispatcherIncrementalMergeByClusterText(Object oneToOneMatcher, ClusterLinkage linkage, double mindf, double maxdf)
    • MultiSourceDispatcherIncrementalMergeByClusterText

      public MultiSourceDispatcherIncrementalMergeByClusterText(Object oneToOneMatcher, ClusterLinkage linkage)
    • MultiSourceDispatcherIncrementalMergeByClusterText

      public MultiSourceDispatcherIncrementalMergeByClusterText(Object oneToOneMatcher)
    • MultiSourceDispatcherIncrementalMergeByClusterText

      public MultiSourceDispatcherIncrementalMergeByClusterText(Supplier<Object> matcherSupplier, ClusterLinkage linkage, double mindf, double maxdf)
    • MultiSourceDispatcherIncrementalMergeByClusterText

      public MultiSourceDispatcherIncrementalMergeByClusterText(Supplier<Object> matcherSupplier, ClusterLinkage linkage)
    • MultiSourceDispatcherIncrementalMergeByClusterText

      public MultiSourceDispatcherIncrementalMergeByClusterText(Supplier<Object> matcherSupplier)
  • Method Details

    • getClusterFeatures

      public double[][] getClusterFeatures(List<Set<Object>> models, Object parameters)
      Specified by:
      getClusterFeatures in class MultiSourceDispatcherIncrementalMergeByCluster
    • getBagOfWords

      public Counter<String> getBagOfWords(org.apache.jena.rdf.model.Model m)
    • splitFragment

      private static String splitFragment(String text)
    • writeTextualRepresentationOfModel

      private void writeTextualRepresentationOfModel(org.apache.jena.rdf.model.Model m, File f)
    • isLiteralAString

      private static boolean isLiteralAString(org.apache.jena.rdf.model.Literal lit)
    • getMindf

      public double getMindf()
      Returns the minimum document frequency (relative) a token needs to have, to be included as a feature. Default is 0.0 (to include all tokens).
      Returns:
      the relative minimumg document frequency.
    • setMindf

      public void setMindf(double mindf)
      Sets the minimum document frequency (relative) a token needs to have, to be included as a feature. Default is 0.0 (to include all tokens).
      Parameters:
      mindf - the minimum document frequency (relative). This needs to be between 0.0 and 1.0.
    • getMaxdf

      public double getMaxdf()
      Returns the maximum document frequency (relative) a token needs to have, to be included as a feature. Default is 1.0 (to include all tokens).
      Returns:
      the relative maximum document frequency.
    • setMaxdf

      public void setMaxdf(double maxdf)
      Sets the maximum document frequency (relative) a token needs to have, to be included as a feature. Default is 1.0 (to include all tokens).
      Parameters:
      maxdf - the maximum document frequency (relative). This needs to be between 0.0 and 1.0.
    • isDebug

      public boolean isDebug()
      Returns true, if debug files are written. Default is false.
      Returns:
      true, if debug files are written
    • setDebug

      public void setDebug(boolean debug)
      If set to true, write some file which contains helpful information e.g. documentFrequency.json file which contains all information about all words and their document frequency. Default is false.
      Parameters:
      debug - if true, write debug files
    • checkDocumentFrequency

      private void checkDocumentFrequency(double df)