Class LLMBinaryFilter
java.lang.Object
eu.sealsproject.platform.res.tool.impl.AbstractPlugin
de.uni_mannheim.informatik.dws.melt.matching_base.MatcherURL
de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAA
de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAAJena
de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase
de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.LLMBase
de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.LLMBinaryFilter
- All Implemented Interfaces:
Filter
,IMatcher<org.apache.jena.ontology.OntModel,
,Alignment, Properties> eu.sealsproject.platform.res.domain.omt.IOntologyMatchingToolBridge
,eu.sealsproject.platform.res.tool.api.IPlugin
,eu.sealsproject.platform.res.tool.api.IToolBridge
This filter asks a LLM if a given correspondence is correct or not.
It has no information about the other correspondences and each correspondence becomes a prediction example for the LLM.
It will add the corresponding confidence to the correspondence such that a filtering afterwards is possible.
-
Field Summary
Modifier and TypeFieldDescriptionprivate static final org.slf4j.Logger
Set of negative words to use.private static final String
Set of positive words to use.Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.LLMBase
debugFile, loadingArguments, promt, wordForcer, wordStopper
Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase
cudaVisibleDevices, extractor, modelName, multipleTextsToMultipleExamples, multiProcessing, trainingArguments, transformersCache, usingTensorflow
Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
FILE_PREFIX, FILE_SUFFIX
-
Constructor Summary
ConstructorDescriptionLLMBinaryFilter
(TextExtractorMap extractor, String modelName, String promt) Constructor with all required parameters and default values for optional parameters (can be changed by setters).LLMBinaryFilter
(TextExtractor extractor, String modelName, String promt) Constructor with all required parameters and default values for optional parameters (can be changed by setters). -
Method Summary
Modifier and TypeMethodDescriptionvoid
addNegativeWord
(String negativeWord) void
addPositiveWord
(String positiveWord) createPredictionFile
(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment predictionAlignment, File outputFile, boolean append) Create the prediction file which is a CSV file with two columns.The first column is the text from the left resource and the second column is the text from the right resource.match
(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment inputAlignment, Properties properties) Aligns two ontologies specified via a Jena OntModel, with an input alignment as Alignment object, and returns the mapping of the resulting alignment.void
setNegativeWords
(Set<String> negativeWords) void
setPositiveWords
(Set<String> positiveWords) Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.LLMBase
addGenerationArgument, addLoadingArgument, addLoadingArguments, addTrainingArgument, getDebugFile, getGenerationArguments, getLoadingArguments, getPromt, includeMoreVariations, includeMoreVariations, isWordForcer, isWordStopper, predictConfidences, setDebugFile, setGenerationArguments, setLoadingArguments, setPromt, setTrainingArguments, setWordForcer, setWordStopper
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase
getCudaVisibleDevices, getCudaVisibleDevicesButOnlyOneGPU, getExamplesForBatchSizeOptimization, getExtractor, getExtractorMap, getModelName, getMultiProcessing, getTextualRepresentation, getTrainingArguments, getTransformersCache, isMultipleTextsToMultipleExamples, isOptimizeForMixedPrecisionTraining, isUsingTensorflow, setCudaVisibleDevices, setCudaVisibleDevices, setExtractor, setExtractorMap, setModelName, setMultipleTextsToMultipleExamples, setMultiProcessing, setOptimizeForMixedPrecisionTraining, setTransformersCache, setUsingTensorflow, writeExamplesToFile
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAAJena
getModelSpec, match, readOntology
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAA
match
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
match
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherURL
align, align, canExecute, getType
Methods inherited from class eu.sealsproject.platform.res.tool.impl.AbstractPlugin
getId, getVersion, setId, setVersion
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface eu.sealsproject.platform.res.tool.api.IPlugin
getId, getVersion
-
Field Details
-
NEWLINE
-
LOGGER
private static final org.slf4j.Logger LOGGER -
positiveWords
Set of positive words to use. Default is "yes" (and some variations of it (generated withLLMBase.includeMoreVariations(java.util.Set)
) -
negativeWords
Set of negative words to use. Default is "no" (and some variations of it (generated withLLMBase.includeMoreVariations(java.util.Set)
)
-
-
Constructor Details
-
LLMBinaryFilter
Constructor with all required parameters and default values for optional parameters (can be changed by setters). It uses the systems default tmp dir to store the files with texts generated from the knowledge graphs. Pytorch is used instead of tensorflow and all visible GPUs are used for prediction.- Parameters:
extractor
- the extractor to select which text for each resource should be used.modelName
- the model name which can be a model id (a hosted model on huggingface.co) or a path to a directory containing a model and tokenizer ( see first parameter pretrained_model_name_or_path of the from_pretrained function in huggingface library). In case of a path, it should be absolute. The path can be generated by e.g.FileUtil.getCanonicalPathIfPossible(java.io.File)
promt
- The promt to use for the LLM. Use {left} and {right} to insert the text representation of the left and right concept.
-
LLMBinaryFilter
Constructor with all required parameters and default values for optional parameters (can be changed by setters). It uses the systems default tmp dir to store the files with texts generated from the knowledge graphs. Pytorch is used instead of tensorflow and all visible GPUs are used for prediction.- Parameters:
extractor
- the extractor to select which text for each resource should be used.modelName
- the model name which can be a model id (a hosted model on huggingface.co) or a path to a directory containing a model and tokenizer ( see first parameter pretrained_model_name_or_path of the from_pretrained function in huggingface library). In case of a path, it should be absolute. The path can be generated by e.g.FileUtil.getCanonicalPathIfPossible(java.io.File)
promt
- The promt to use for the LLM. Use {left} and {right} to insert the text representation of the left and right concept.
-
-
Method Details
-
match
public Alignment match(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment inputAlignment, Properties properties) throws Exception Description copied from class:MatcherYAAAJena
Aligns two ontologies specified via a Jena OntModel, with an input alignment as Alignment object, and returns the mapping of the resulting alignment. Note: This method might be called multiple times in a row when using the evaluation framework. Make sure to return a mapping which is specific to the given inputs.- Specified by:
match
in interfaceIMatcher<org.apache.jena.ontology.OntModel,
Alignment, Properties> - Specified by:
match
in classMatcherYAAAJena
- Parameters:
source
- This OntModel represents the source ontology.target
- This OntModel represents the target ontology.inputAlignment
- This mapping represents the input alignment.properties
- Additional properties.- Returns:
- The resulting alignment of the matching process.
- Throws:
Exception
- Any exception which occurs during matching.
-
createPredictionFile
public Map<Correspondence,List<Integer>> createPredictionFile(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment predictionAlignment, File outputFile, boolean append) throws IOException Create the prediction file which is a CSV file with two columns.The first column is the text from the left resource and the second column is the text from the right resource.- Parameters:
source
- The source modeltarget
- The target modelpredictionAlignment
- the alignment to process. All correspondences which have enough text are used.outputFile
- the csv file to which the output should be written to.append
- if true, then the training alignment is append to the given file.- Returns:
- the map which maps the the correspondence to (possibly multiple) row numbers. In case of multipleTextsToMultipleExamples is set to true, multiple rows can correspond to one correspondence, because each text (e.g. label, comment etc) of the two resources is used as an example.
- Throws:
IOException
- in case the writing fails.
-
getPositiveWords
-
setPositiveWords
-
addPositiveWord
-
getNegativeWords
-
setNegativeWords
-
addNegativeWord
-
getWordsToDetect
-