Class LLMBase
java.lang.Object
eu.sealsproject.platform.res.tool.impl.AbstractPlugin
de.uni_mannheim.informatik.dws.melt.matching_base.MatcherURL
de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAA
de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAAJena
de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase
de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.LLMBase
- All Implemented Interfaces:
Filter
,IMatcher<org.apache.jena.ontology.OntModel,
,Alignment, Properties> eu.sealsproject.platform.res.domain.omt.IOntologyMatchingToolBridge
,eu.sealsproject.platform.res.tool.api.IPlugin
,eu.sealsproject.platform.res.tool.api.IToolBridge
- Direct Known Subclasses:
LLMBinaryFilter
,LLMChooseGivenEntityFilter
This filter asks a LLM which entity of the source fits best to an entity of the target.
Correspondences needs to be provided such that candidates are available.
It will only keep correspondences which are stated to be useful.
The difference to
#LLMBinaryFilter
is that all possible matches arte given to the LLM model.-
Field Summary
Modifier and TypeFieldDescriptionprotected File
If set to a existing file, this class writes additional debug information to the corresponding file.protected TransformersArguments
Can add any parameter which are passed to the from_pretrained method.protected String
The promt to use for the LLM.protected boolean
If set to true, the generation will be stopped if yes or no words appear.protected boolean
If set to true, the generation will be stopped if yes or no words appear.Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase
cudaVisibleDevices, extractor, modelName, multipleTextsToMultipleExamples, multiProcessing, trainingArguments, transformersCache, usingTensorflow
Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
FILE_PREFIX, FILE_SUFFIX
-
Constructor Summary
ConstructorDescriptionLLMBase
(TextExtractorMap extractor, String modelName, String promt) Constructor with all required parameters and default values for optional parameters (can be changed by setters).LLMBase
(TextExtractor extractor, String modelName, String promt) Constructor with all required parameters and default values for optional parameters (can be changed by setters). -
Method Summary
Modifier and TypeMethodDescriptionaddGenerationArgument
(String key, Object value) Add parameters which are passed to the generate function of transformers library.addLoadingArgument
(String key, Object value) Can add any parameter which are passed to the from_pretrained method.addLoadingArguments
(TransformersArguments loadingArguments) void
addTrainingArgument
(String key, Object value) Adds a training argument for the transformers trainer.Returns the arguments which can be used for the generate function of transformers library.Returns parameters which are passed to the from_pretrained method.getPromt()
includeMoreVariations
(String... words) This functions add more word variations to the set of words.includeMoreVariations
(Set<String> words) This functions add more word variations to the set of words.boolean
boolean
predictConfidences
(File predictionFilePath, List<Set<String>> wordsToDetect) Run huggingface transformers library.void
setDebugFile
(File debugFile) void
setGenerationArguments
(TransformersArguments generationArguments) Set the arguments which can be used for the generate function of transformers library.void
setLoadingArguments
(TransformersArguments loadingArguments) Set the arguments which are passed to the from_pretrained method.void
void
setTrainingArguments
(TransformersArguments configuration) Do not allow to set training arguments - not used for llms.void
setWordForcer
(boolean wordForcer) When setting this option to true, the constrained beam search is activated and the words yes and no will be forced.void
setWordStopper
(boolean wordStopper) If set to true the text generation will automatically stop if the word yes or no is generated.Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase
getCudaVisibleDevices, getCudaVisibleDevicesButOnlyOneGPU, getExamplesForBatchSizeOptimization, getExtractor, getExtractorMap, getModelName, getMultiProcessing, getTextualRepresentation, getTrainingArguments, getTransformersCache, isMultipleTextsToMultipleExamples, isOptimizeForMixedPrecisionTraining, isUsingTensorflow, setCudaVisibleDevices, setCudaVisibleDevices, setExtractor, setExtractorMap, setModelName, setMultipleTextsToMultipleExamples, setMultiProcessing, setOptimizeForMixedPrecisionTraining, setTransformersCache, setUsingTensorflow, writeExamplesToFile
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAAJena
getModelSpec, match, match, readOntology
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAA
match
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
match
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherURL
align, align, canExecute, getType
Methods inherited from class eu.sealsproject.platform.res.tool.impl.AbstractPlugin
getId, getVersion, setId, setVersion
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface eu.sealsproject.platform.res.tool.api.IPlugin
getId, getVersion
-
Field Details
-
promt
The promt to use for the LLM. Subclasses may interpret the promt differently. -
debugFile
If set to a existing file, this class writes additional debug information to the corresponding file. -
wordStopper
protected boolean wordStopperIf set to true, the generation will be stopped if yes or no words appear. -
wordForcer
protected boolean wordForcerIf set to true, the generation will be stopped if yes or no words appear. -
loadingArguments
Can add any parameter which are passed to the from_pretrained method.
-
-
Constructor Details
-
LLMBase
Constructor with all required parameters and default values for optional parameters (can be changed by setters). It uses the systems default tmp dir to store the files with texts generated from the knowledge graphs. Pytorch is used instead of tensorflow and all visible GPUs are used for prediction.- Parameters:
extractor
- the extractor to select which text for each resource should be used.modelName
- the model name which can be a model id (a hosted model on huggingface.co) or a path to a directory containing a model and tokenizer ( see first parameter pretrained_model_name_or_path of the from_pretrained function in huggingface library). In case of a path, it should be absolute. The path can be generated by e.g.FileUtil.getCanonicalPathIfPossible(java.io.File)
promt
- The promt to use for the LLM. Use {left} and {right} to insert the text representation of the left and right concept.
-
LLMBase
Constructor with all required parameters and default values for optional parameters (can be changed by setters). It uses the systems default tmp dir to store the files with texts generated from the knowledge graphs. Pytorch is used instead of tensorflow and all visible GPUs are used for prediction.- Parameters:
extractor
- the extractor to select which text for each resource should be used.modelName
- the model name which can be a model id (a hosted model on huggingface.co) or a path to a directory containing a model and tokenizer ( see first parameter pretrained_model_name_or_path of the from_pretrained function in huggingface library). In case of a path, it should be absolute. The path can be generated by e.g.FileUtil.getCanonicalPathIfPossible(java.io.File)
promt
- The promt to use for the LLM. Use {left} and {right} to insert the text representation of the left and right concept.
-
-
Method Details
-
predictConfidences
protected List<List<Double>> predictConfidences(File predictionFilePath, List<Set<String>> wordsToDetect) throws Exception Run huggingface transformers library.- Parameters:
predictionFilePath
- path to csv file with two columns (text left and text right).wordsToDetect
- the words which should be detected- Returns:
- a list of confidences
- Throws:
Exception
- in case something goes wrong.
-
getPromt
-
setPromt
-
getDebugFile
-
setDebugFile
-
isWordStopper
public boolean isWordStopper() -
setWordStopper
public void setWordStopper(boolean wordStopper) If set to true the text generation will automatically stop if the word yes or no is generated.- Parameters:
wordStopper
- fi true the generation stops on yes or no automatically.
-
isWordForcer
public boolean isWordForcer() -
setWordForcer
public void setWordForcer(boolean wordForcer) When setting this option to true, the constrained beam search is activated and the words yes and no will be forced. This also means that the "num_beams" attribute in the generation arguments needs to be set to a number higher than one.- Parameters:
wordForcer
- true or false
-
includeMoreVariations
This functions add more word variations to the set of words. This will be applied to all words in the set. Thus it should only contain similar words or variations. This includes lower, upper, and title case as well as prefixing with space.- Parameters:
words
- words- Returns:
- all variation of the words.
-
includeMoreVariations
This functions add more word variations to the set of words. This will be applied to all words in the set. Thus it should only contain similar words or variations. This includes lower, upper, and title case as well as prefixing with space.- Parameters:
words
- words- Returns:
- all variation of the words.
-
getGenerationArguments
Returns the arguments which can be used for the generate function of transformers library.- Returns:
- the generation arguments
-
setGenerationArguments
Set the arguments which can be used for the generate function of transformers library.- Parameters:
generationArguments
- the new geenration arguments
-
addGenerationArgument
Add parameters which are passed to the generate function of transformers library.- Parameters:
key
- the key to use: possible options.value
- the corresponding value- Returns:
- the object to allow for further addGenerationArgument calls.
-
getLoadingArguments
Returns parameters which are passed to the from_pretrained method.- Returns:
- the loading arguments.
-
setLoadingArguments
Set the arguments which are passed to the from_pretrained method.- Parameters:
loadingArguments
- new loading arguments
-
addLoadingArgument
Can add any parameter which are passed to the from_pretrained method.- Parameters:
key
- the key to use e.g. load_in_8bitvalue
- the corresponding value- Returns:
- the object to allow for further addGenerationArgument calls.
-
addLoadingArguments
-
setTrainingArguments
Do not allow to set training arguments - not used for llms.- Overrides:
setTrainingArguments
in classTransformersBase
- Parameters:
configuration
- the trainer configuration
-
addTrainingArgument
Description copied from class:TransformersBase
Adds a training argument for the transformers trainer. Any of the training arguments which are listed on the documentation can be used.- Overrides:
addTrainingArgument
in classTransformersBase
- Parameters:
key
- The key of the training argument like warmup_ratiovalue
- the corresponding value like 0.2
-