java.lang.Object
de.uni_mannheim.informatik.dws.melt.matching_eval.paramtuning.ConfidenceFinder

public class ConfidenceFinder extends Object
This class offers static functionality to analyze and optimize matchers in terms of their confidences (and confidence thresholds).
  • Field Details

    • LOGGER

      private static final org.slf4j.Logger LOGGER
  • Constructor Details

    • ConfidenceFinder

      public ConfidenceFinder()
  • Method Details

    • getSteps

      public static Set<Double> getSteps(double start, double end, double stepWidth)
    • getOccurringConfidences

      public static Set<Double> getOccurringConfidences(Alignment a)
    • getOccurringConfidences

      public static Set<Double> getOccurringConfidences(Alignment a, double begin, double end)
    • getOccurringConfidences

      public static Set<Double> getOccurringConfidences(Alignment alignment, int decimalPrecision)
      If you require a precise solution, set the decimalPrecision to a negative number.
      Parameters:
      alignment - The alignment.
      decimalPrecision - The desired decimal precision. Negative number for optimal precision.
      Returns:
      Set of precision values.
    • getOccurringConfidences

      public static Set<Double> getOccurringConfidences(Alignment a, int decimalPrecision, double begin, double end)
    • getBestConfidenceForFmeasure

      public static double getBestConfidenceForFmeasure(ExecutionResult executionResult)
      Given an ExecutionResult, this method determines the best cutting point in order to optimize the F1-score.
      Parameters:
      executionResult - The execution result for which the optimal confidence threshold shall be determined.
      Returns:
      The optimal confidence threshold for an optimal F1 measure. All correspondences with a confidence LOWER than the result should be discarded. You can directly use ConfidenceFilter to cut correspondences LESS than the optimal threshold determined by this method.
    • getBestConfidenceForFmeasure

      public static double getBestConfidenceForFmeasure(Alignment reference, Alignment systemAlignment, GoldStandardCompleteness gsCompleteness)
      If this method takes too long, you can use the more efficient method getBestConfidenceForFmeasure(Alignment, Alignment, GoldStandardCompleteness, int) and set a decimal precision (e.g. 1 or 2).
      Parameters:
      reference - The reference alignment.
      systemAlignment - The system alignment.
      gsCompleteness - The gold standard completeness.
      Returns:
      The optimal confidence.
    • getBestConfidenceForFmeasure

      public static double getBestConfidenceForFmeasure(Alignment reference, Alignment systemAlignment, GoldStandardCompleteness gsCompleteness, int decimalPrecision)
      Given two alignments, this method determines the best cutting point (main confidence in correspondences) in order to optimize the F1-score.
      Parameters:
      reference - the reference alignment to use
      systemAlignment - the system alignment
      gsCompleteness - What gold standard completeness is given - If reference alignment is a subset of the overall reference alignment AND we have a one-to-one alignment, use GoldStandardCompleteness.PARTIAL_SOURCE_COMPLETE_TARGET_COMPLETE.
      decimalPrecision - The precision of the confidences. A low precision (such as 2) will optimize the runtime performance - however, it may lead to suboptimal results. If you require an optimal solution, set the decimal precision to a negative number.
      Returns:
      The optimal confidence threshold for an optimal F1 measure. All correspondences with a confidence LOWER than the result should be discarded. You can directly use ConfidenceFilter to cut correspondences LESS than the optimal threshold determined by this method.
    • getBestConfidenceForFmeasureBeta

      public static double getBestConfidenceForFmeasureBeta(ExecutionResult executionResult, double beta)
      Given an ExecutionResult, this method determines the best cutting point in order to optimize the F_beta-score (beta is given as a parameter).
      Parameters:
      executionResult - The execution result for which the optimal confidence threshold shall be determined.
      beta - the beta value for F-beta measure
      Returns:
      The optimal confidence threshold for an optimal F_beta measure. All correspondences with a confidence LOWER than the result should be discarded. You can directly use ConfidenceFilter to cut correspondences LESS than the optimal threshold determined by this method.
    • getBestConfidenceForFmeasureBeta

      public static double getBestConfidenceForFmeasureBeta(Alignment reference, Alignment systemAlignment, GoldStandardCompleteness gsCompleteness, double beta)
      Given two alignments, this method determines the best cutting point (main confidence in correspondences) in order to optimize the F_beta-score (beta is given as a parameter).
      Parameters:
      reference - the reference alignment to use
      systemAlignment - the system alignment
      gsCompleteness - what gold standard completeness is given - If reference alignment is a subset of the overall reference alignment AND we have a one-to-one alignment, use GoldStandardCompleteness.PARTIAL_SOURCE_COMPLETE_TARGET_COMPLETE.
      beta - the beta value for F-beta measure
      Returns:
      The optimal confidence threshold for an optimal F_beta measure. All correspondences with a confidence LOWER than the result should be discarded. You can directly use ConfidenceFilter to cut correspondences LESS than the optimal threshold determined by this method.
    • getBestConfidenceForPrecision

      public static double getBestConfidenceForPrecision(ExecutionResult executionResult)
      Given an ExecutionResult, this method determines the best cutting point in order to optimize the precision.
      Parameters:
      executionResult - The execution result for which the optimal confidence threshold shall be determined.
      Returns:
      The optimal confidence threshold for an optimal precision. All correspondences with a confidence LOWER than the result should be discarded. You can directly use ConfidenceFilter to cut correspondences LESS than the optimal threshold determined by this method.
    • getBestConfidenceForPrecision

      public static double getBestConfidenceForPrecision(Alignment reference, Alignment systemAlignment, GoldStandardCompleteness gsCompleteness)
      Given two alignments, this method determines the best cutting point (main confidence in correspondences) in order to optimize the precision.
      Parameters:
      reference - the reference alignment to use
      systemAlignment - the system alignment
      gsCompleteness - what gold standard completeness is given - If reference alignment is a subset of the overall reference alignment AND we have a one-to-one alignment, use GoldStandardCompleteness.PARTIAL_SOURCE_COMPLETE_TARGET_COMPLETE.
      Returns:
      The optimal confidence threshold for an optimal precision. All correspondences with a confidence LOWER than the result should be discarded. You can directly use ConfidenceFilter to cut correspondences LESS than the optimal threshold determined by this method.
    • getConfidenceResultSet

      public static ExecutionResultSet getConfidenceResultSet(ExecutionResult executionResult)
    • divideWithTwoDenominators

      private static double divideWithTwoDenominators(double numerator, double denominatorOne, double denominatorTwo)
      Simple division that is to be performed. The two denominators will be added.
      Parameters:
      numerator - Numerator of fraction
      denominatorOne - Denominator 1
      denominatorTwo - Denominator 2
      Returns:
      Result as double.
    • getFbetaMeasure

      private static double getFbetaMeasure(double precision, double recall, double beta)