java.lang.Object
de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.multisource.clustering.FamerClustering
All Implemented Interfaces:
Filter, IMatcherMultiSource<Object,Alignment,Object>

public class FamerClustering extends Object implements IMatcherMultiSource<Object,Alignment,Object>, Filter
A filter for multi source matching. It filters the input alignment by analyzing the structure of the correspondences. E.g. if many entities are fully connected, then this indicates that all of those correspondences are correct. More information on all possible algorithmn which should be chosen in the constructor can be found at Scalable Matching and Clustering of Entities with FAMER . The source code can be found at gitlab.
  • Field Details

    • LOGGER

      private static final org.slf4j.Logger LOGGER
    • datsetIdExtractor

      private DatasetIDExtractor datsetIdExtractor
    • clusteringAlgorithm

      private org.gradoop.famer.clustering.parallelClustering.AbstractParallelClustering clusteringAlgorithm
    • addCorrespondences

      private boolean addCorrespondences
    • removeCorrespondences

      private boolean removeCorrespondences
    • parallelism

      private int parallelism
    • NON_DIGIT

      private static Pattern NON_DIGIT
  • Constructor Details

    • FamerClustering

      public FamerClustering(DatasetIDExtractor datsetIdExtractor, org.gradoop.famer.clustering.parallelClustering.AbstractParallelClustering clusteringAlgorithm, boolean addCorrespondences, boolean removeCorrespondences)
    • FamerClustering

      public FamerClustering(DatasetIDExtractor datsetIdExtractor, org.gradoop.famer.clustering.parallelClustering.AbstractParallelClustering clusteringAlgorithm)
    • FamerClustering

      public FamerClustering(DatasetIDExtractor datsetIdExtractor)
  • Method Details

    • match

      public Alignment match(List<Object> models, Alignment inputAlignment, Object parameters) throws Exception
      Description copied from interface: IMatcherMultiSource
      Matches multiple ontologies / knowledge graphs together.
      Specified by:
      match in interface IMatcherMultiSource<Object,Alignment,Object>
      Parameters:
      models - a list of ontologies / knowledge graphs in the desired format.
      inputAlignment - this object represents the input alignment.
      parameters - object representing additional parameters. Only add to this object and do not create a new Object like parameters= new ...() because otherwise the parameters are lost (java ist call by value). Sensible classes are Properties, Map<String, Object> or any similar data structure. Some already specified keys (strings) can be found at ParameterConfigKeys.
      Returns:
      the resulting alignment of the matching process.
      Throws:
      Exception - in case of any errors
    • processAlignment

      public Alignment processAlignment(Alignment inputAlignment)
    • getParallelism

      public int getParallelism()
    • setParallelism

      public void setParallelism(int parallelism)
    • getClusters

      public static Map<String,Set<Long>> getClusters(Alignment alignment, org.gradoop.famer.clustering.parallelClustering.AbstractParallelClustering clusteringAlgorithm, DatasetIDExtractor datsetIdExtractor)
      Computes a map between uris and correspoding clusterId.
      Parameters:
      alignment - alignment
      clusteringAlgorithm - the cluster algorithm to use. The ClusteringOutputType doesn't matter but for best performance choose ClusteringOutputType.GRAPH.
      datsetIdExtractor - the dataset id extractor to use. It gets an URI and returns the corresponding data source id.
      Returns:
      a map between uris and correspoding clusterId
    • getClusters

      public static Map<String,Set<Long>> getClusters(Alignment alignment, org.gradoop.famer.clustering.parallelClustering.AbstractParallelClustering clusteringAlgorithm, DatasetIDExtractor datsetIdExtractor, int parallelism)
      Computes a map between uris and correspoding clusterId.
      Parameters:
      alignment - alignment
      clusteringAlgorithm - the cluster algorithm to use. The ClusteringOutputType doesn't matter but for best performance choose ClusteringOutputType.GRAPH.
      datsetIdExtractor - the dataset id extractor to use. It gets an URI and returns the corresponding data source id.
      parallelism - The parallelism for the local flink environment (can be set to -1 for default which is number of processors).
      Returns:
      a map between uris and correspoding clusterId
    • instanceOfOne

      public static boolean instanceOfOne(Object o, Class<?>... classes)
    • getClusteringFromLogicalGraphWithString

      private static Map<String,Set<Long>> getClusteringFromLogicalGraphWithString(org.gradoop.flink.model.impl.epgm.LogicalGraph clusteredGraph) throws Exception
      Throws:
      Exception
    • getClusteringFromLogicalGraphClip

      private static Map<String,Set<Long>> getClusteringFromLogicalGraphClip(org.gradoop.flink.model.impl.epgm.LogicalGraph clusteredGraph) throws Exception
      Throws:
      Exception
    • getClusteringFromLogicalGraphWithLong

      private static Map<String,Set<Long>> getClusteringFromLogicalGraphWithLong(org.gradoop.flink.model.impl.epgm.LogicalGraph clusteredGraph) throws Exception
      Throws:
      Exception
    • getLogicalGraphFromAlignment

      private static LogicalGraphAndSourceIds getLogicalGraphFromAlignment(Alignment a, DatasetIDExtractor datsetIdExtractor, int parallelism)