de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.multisource.clustering.FamerClustering

All Implemented Interfaces:: Filter, IMatcherMultiSource<Object,Alignment,Object>

public class FamerClustering extends Object implements IMatcherMultiSource<Object,Alignment,Object>, Filter

A filter for multi source matching. It filters the input alignment by analyzing the structure of the correspondences. E.g. if many entities are fully connected, then this indicates that all of those correspondences are correct. More information on all possible algorithmn which should be chosen in the constructor can be found at Scalable Matching and Clustering of Entities with FAMER . The source code can be found at gitlab.

Field Summary

Fields

Modifier and Type

Field

Description

private boolean

addCorrespondences

private org.gradoop.famer.clustering.parallelClustering.AbstractParallelClustering

clusteringAlgorithm

private DatasetIDExtractor

datsetIdExtractor

private static final org.slf4j.Logger

LOGGER

private static Pattern

NON_DIGIT

private int

parallelism

private boolean

removeCorrespondences
Constructor Summary

Constructors

Constructor

Description

FamerClustering(DatasetIDExtractor datsetIdExtractor)

FamerClustering(DatasetIDExtractor datsetIdExtractor, org.gradoop.famer.clustering.parallelClustering.AbstractParallelClustering clusteringAlgorithm)

FamerClustering(DatasetIDExtractor datsetIdExtractor, org.gradoop.famer.clustering.parallelClustering.AbstractParallelClustering clusteringAlgorithm, boolean addCorrespondences, boolean removeCorrespondences)
Method Summary

Modifier and Type

Method

Description

private static Map<String,Set<Long>>

getClusteringFromLogicalGraphClip(org.gradoop.flink.model.impl.epgm.LogicalGraph clusteredGraph)

private static Map<String,Set<Long>>

getClusteringFromLogicalGraphWithLong(org.gradoop.flink.model.impl.epgm.LogicalGraph clusteredGraph)

private static Map<String,Set<Long>>

getClusteringFromLogicalGraphWithString(org.gradoop.flink.model.impl.epgm.LogicalGraph clusteredGraph)

static Map<String,Set<Long>>

getClusters(Alignment alignment, org.gradoop.famer.clustering.parallelClustering.AbstractParallelClustering clusteringAlgorithm, DatasetIDExtractor datsetIdExtractor)

Computes a map between uris and correspoding clusterId.

static Map<String,Set<Long>>

getClusters(Alignment alignment, org.gradoop.famer.clustering.parallelClustering.AbstractParallelClustering clusteringAlgorithm, DatasetIDExtractor datsetIdExtractor, int parallelism)

Computes a map between uris and correspoding clusterId.

private static LogicalGraphAndSourceIds

getLogicalGraphFromAlignment(Alignment a, DatasetIDExtractor datsetIdExtractor, int parallelism)

int

getParallelism()

static boolean

instanceOfOne(Object o, Class<?>... classes)

Alignment

match(List<Object> models, Alignment inputAlignment, Object parameters)

Matches multiple ontologies / knowledge graphs together.

Alignment

processAlignment(Alignment inputAlignment)

void

setParallelism(int parallelism)

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface de.uni_mannheim.informatik.dws.melt.matching_base.multisource.IMatcherMultiSource
needsTransitiveClosureForEvaluation

Field Details
- LOGGER
  
  private static final org.slf4j.Logger LOGGER
- datsetIdExtractor
  
  private DatasetIDExtractor datsetIdExtractor
- clusteringAlgorithm
  
  private org.gradoop.famer.clustering.parallelClustering.AbstractParallelClustering clusteringAlgorithm
- addCorrespondences
  
  private boolean addCorrespondences
- removeCorrespondences
  
  private boolean removeCorrespondences
- parallelism
  
  private int parallelism
- NON_DIGIT
  
  private static Pattern NON_DIGIT
Constructor Details
- FamerClustering
  
  public FamerClustering(DatasetIDExtractor datsetIdExtractor, org.gradoop.famer.clustering.parallelClustering.AbstractParallelClustering clusteringAlgorithm, boolean addCorrespondences, boolean removeCorrespondences)
- FamerClustering
  
  public FamerClustering(DatasetIDExtractor datsetIdExtractor, org.gradoop.famer.clustering.parallelClustering.AbstractParallelClustering clusteringAlgorithm)
- FamerClustering
  
  public FamerClustering(DatasetIDExtractor datsetIdExtractor)
Method Details
- match
  
  public Alignment match(List<Object> models, Alignment inputAlignment, Object parameters) throws Exception
  
  Description copied from interface: IMatcherMultiSource
  
  Matches multiple ontologies / knowledge graphs together.
  
  Specified by:
  
  match in interface IMatcherMultiSource<Object,Alignment,Object>
  
  Parameters:
  
  models - a list of ontologies / knowledge graphs in the desired format.
  
  inputAlignment - this object represents the input alignment.
  
  parameters - object representing additional parameters. Only add to this object and do not create a new Object like parameters= new ...() because otherwise the parameters are lost (java ist call by value). Sensible classes are Properties, Map<String, Object> or any similar data structure. Some already specified keys (strings) can be found at ParameterConfigKeys.
  
  Returns:
  
  the resulting alignment of the matching process.
  
  Throws:
  
  Exception - in case of any errors
- processAlignment
  
  public Alignment processAlignment(Alignment inputAlignment)
- getParallelism
  
  public int getParallelism()
- setParallelism
  
  public void setParallelism(int parallelism)
- getClusters
  
  public static Map<String,Set<Long>> getClusters(Alignment alignment, org.gradoop.famer.clustering.parallelClustering.AbstractParallelClustering clusteringAlgorithm, DatasetIDExtractor datsetIdExtractor)
  
  Computes a map between uris and correspoding clusterId.
  
  Parameters:
  
  alignment - alignment
  
  clusteringAlgorithm - the cluster algorithm to use. The ClusteringOutputType doesn't matter but for best performance choose ClusteringOutputType.GRAPH.
  
  datsetIdExtractor - the dataset id extractor to use. It gets an URI and returns the corresponding data source id.
  
  Returns:
  
  a map between uris and correspoding clusterId
- getClusters
  
  public static Map<String,Set<Long>> getClusters(Alignment alignment, org.gradoop.famer.clustering.parallelClustering.AbstractParallelClustering clusteringAlgorithm, DatasetIDExtractor datsetIdExtractor, int parallelism)
  
  Computes a map between uris and correspoding clusterId.
  
  Parameters:
  
  alignment - alignment
  
  clusteringAlgorithm - the cluster algorithm to use. The ClusteringOutputType doesn't matter but for best performance choose ClusteringOutputType.GRAPH.
  
  datsetIdExtractor - the dataset id extractor to use. It gets an URI and returns the corresponding data source id.
  
  parallelism - The parallelism for the local flink environment (can be set to -1 for default which is number of processors).
  
  Returns:
  
  a map between uris and correspoding clusterId
- instanceOfOne
  
  public static boolean instanceOfOne(Object o, Class<?>... classes)
- getClusteringFromLogicalGraphWithString
  
  private static Map<String,Set<Long>> getClusteringFromLogicalGraphWithString(org.gradoop.flink.model.impl.epgm.LogicalGraph clusteredGraph) throws Exception
  
  Throws:
  
  Exception
- getClusteringFromLogicalGraphClip
  
  private static Map<String,Set<Long>> getClusteringFromLogicalGraphClip(org.gradoop.flink.model.impl.epgm.LogicalGraph clusteredGraph) throws Exception
  
  Throws:
  
  Exception
- getClusteringFromLogicalGraphWithLong
  
  private static Map<String,Set<Long>> getClusteringFromLogicalGraphWithLong(org.gradoop.flink.model.impl.epgm.LogicalGraph clusteredGraph) throws Exception
  
  Throws:
  
  Exception
- getLogicalGraphFromAlignment
  
  private static LogicalGraphAndSourceIds getLogicalGraphFromAlignment(Alignment a, DatasetIDExtractor datsetIdExtractor, int parallelism)

Class FamerClustering

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface de.uni_mannheim.informatik.dws.melt.matching_base.multisource.IMatcherMultiSource

Field Details

LOGGER

datsetIdExtractor

clusteringAlgorithm

addCorrespondences

removeCorrespondences

parallelism

NON_DIGIT

Constructor Details

FamerClustering

FamerClustering

FamerClustering

Method Details

match

processAlignment

getParallelism

setParallelism

getClusters

getClusters

instanceOfOne

getClusteringFromLogicalGraphWithString

getClusteringFromLogicalGraphClip

getClusteringFromLogicalGraphWithLong

getLogicalGraphFromAlignment