Optimal Confidence Determination
Many matching systems use varying confidences for each correspondence, typically in the range [0, 1]. Removing low-confidence matches can significantly improve precision and F1.
In MELT, ConfidenceFinder
can be used to determine the optimal threshold given any ExecutionResult
.
Note that it is much better performance-wise to optimize a matcher execution result that contains no removed correspondences rather than running a matcher multiple times with different cut-off points. Therefore, ConfidenceFinder
works with an ExecutionResult
instance rather than with a matcher instance. If you want to fine-tune parameters of an actual matching instance, use class GridSearch
.
Example:
import de.uni_mannheim.informatik.dws.melt.matching_data.TrackRepository;
import de.uni_mannheim.informatik.dws.melt.matching_eval.ExecutionResult;
import de.uni_mannheim.informatik.dws.melt.matching_eval.ExecutionResultSet;
import de.uni_mannheim.informatik.dws.melt.matching_eval.Executor;
import de.uni_mannheim.informatik.dws.melt.matching_eval.paramtuning.ConfidenceFinder;
import de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.external.matcher.SimpleStringMatcher;
public class ConfidenceFinderExample {
public static void main(String[] args) {
// let's run a default matcher on the OAEI anatomy track:
ExecutionResultSet ers = Executor.run(TrackRepository.Anatomy.Default, new SimpleStringMatcher());
for (ExecutionResult e : ers) {
// the actual optimization:
double bestConfidence = ConfidenceFinder.getBestConfidenceForFmeasure(e);
// just some meaningful output:
System.out.println("Best confidence for matcher " + e.getMatcherName() +
" on " + e.getTrack().getName() + " (" + e.getTestCase().getName() + "): " +
bestConfidence);
}
}
}
All correspondences with a confidence LOWER than the result should be discarded. You can do this by applying a filter in a matching pipeline. MELT provides ConfidenceFilter
for exactly this case:
Example:
import de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherPipelineYAAAJenaConstructor;
import de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAAJena;
import de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.external.matcher.SimpleStringMatcher;
import de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.filter.ConfidenceFilter;
public class ConfidenceFilterExample {
public static void main(String[] args) {
// assume that we determined an optimal confidence as outlined above
double bestConfidence = 0.8;
// build a matcher pipeline with the filter at the end:
MatcherYAAAJena matcher = new MatcherPipelineYAAAJenaConstructor(
new SimpleStringMatcher(), // some matcher
new ConfidenceFilter(bestConfidence)); // let's filter the result using ConfidenceFilter
// do something with the matcher :)
}
}