eu.sealsproject.platform.res.tool.impl.AbstractPlugin

de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.LLMBase

All Implemented Interfaces:: Filter, IMatcher<org.apache.jena.ontology.OntModel,Alignment,Properties>, eu.sealsproject.platform.res.domain.omt.IOntologyMatchingToolBridge, eu.sealsproject.platform.res.tool.api.IPlugin, eu.sealsproject.platform.res.tool.api.IToolBridge

Direct Known Subclasses:: LLMBinaryFilter, LLMChooseGivenEntityFilter

public abstract class LLMBase extends TransformersBase implements Filter

This filter asks a LLM which entity of the source fits best to an entity of the target. Correspondences needs to be provided such that candidates are available. It will only keep correspondences which are stated to be useful. The difference to #LLMBinaryFilter is that all possible matches arte given to the LLM model.

Field Summary

Fields

Modifier and Type

Field

Description

protected File

debugFile

If set to a existing file, this class writes additional debug information to the corresponding file.

protected TransformersArguments

loadingArguments

Can add any parameter which are passed to the from_pretrained method.

protected String

promt

The promt to use for the LLM.

protected boolean

wordForcer

If set to true, the generation will be stopped if yes or no words appear.

protected boolean

wordStopper

If set to true, the generation will be stopped if yes or no words appear.

Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase
cudaVisibleDevices, extractor, modelName, multipleTextsToMultipleExamples, multiProcessing, trainingArguments, transformersCache, usingTensorflow

Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
FILE_PREFIX, FILE_SUFFIX
Constructor Summary

Constructors

Constructor

Description

LLMBase(TextExtractorMap extractor, String modelName, String promt)

Constructor with all required parameters and default values for optional parameters (can be changed by setters).

LLMBase(TextExtractor extractor, String modelName, String promt)

Constructor with all required parameters and default values for optional parameters (can be changed by setters).
Method Summary

Modifier and Type

Method

Description

LLMBase

addGenerationArgument(String key, Object value)

Add parameters which are passed to the generate function of transformers library.

LLMBase

addLoadingArgument(String key, Object value)

Can add any parameter which are passed to the from_pretrained method.

LLMBase

addLoadingArguments(TransformersArguments loadingArguments)

void

addTrainingArgument(String key, Object value)

Adds a training argument for the transformers trainer.

File

getDebugFile()

TransformersArguments

getGenerationArguments()

Returns the arguments which can be used for the generate function of transformers library.

TransformersArguments

getLoadingArguments()

Returns parameters which are passed to the from_pretrained method.

String

getPromt()

static Set<String>

includeMoreVariations(String... words)

This functions add more word variations to the set of words.

static Set<String>

includeMoreVariations(Set<String> words)

This functions add more word variations to the set of words.

boolean

isWordForcer()

boolean

isWordStopper()

protected List<List<Double>>

predictConfidences(File predictionFilePath, List<Set<String>> wordsToDetect)

Run huggingface transformers library.

void

setDebugFile(File debugFile)

void

setGenerationArguments(TransformersArguments generationArguments)

Set the arguments which can be used for the generate function of transformers library.

void

setLoadingArguments(TransformersArguments loadingArguments)

Set the arguments which are passed to the from_pretrained method.

void

setPromt(String promt)

void

setTrainingArguments(TransformersArguments configuration)

Do not allow to set training arguments - not used for llms.

void

setWordForcer(boolean wordForcer)

When setting this option to true, the constrained beam search is activated and the words yes and no will be forced.

void

setWordStopper(boolean wordStopper)

If set to true the text generation will automatically stop if the word yes or no is generated.

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase
getCudaVisibleDevices, getCudaVisibleDevicesButOnlyOneGPU, getExamplesForBatchSizeOptimization, getExtractor, getExtractorMap, getModelName, getMultiProcessing, getTextualRepresentation, getTrainingArguments, getTransformersCache, isMultipleTextsToMultipleExamples, isOptimizeForMixedPrecisionTraining, isUsingTensorflow, setCudaVisibleDevices, setCudaVisibleDevices, setExtractor, setExtractorMap, setModelName, setMultipleTextsToMultipleExamples, setMultiProcessing, setOptimizeForMixedPrecisionTraining, setTransformersCache, setUsingTensorflow, writeExamplesToFile

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAAJena
getModelSpec, match, match, readOntology

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAA
match

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
match

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherURL
align, align, canExecute, getType

Methods inherited from class eu.sealsproject.platform.res.tool.impl.AbstractPlugin
getId, getVersion, setId, setVersion

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface eu.sealsproject.platform.res.tool.api.IPlugin
getId, getVersion

Field Details
- promt
  
  protected String promt
  
  The promt to use for the LLM. Subclasses may interpret the promt differently.
- debugFile
  
  protected File debugFile
  
  If set to a existing file, this class writes additional debug information to the corresponding file.
- wordStopper
  
  protected boolean wordStopper
  
  If set to true, the generation will be stopped if yes or no words appear.
- wordForcer
  
  protected boolean wordForcer
  
  If set to true, the generation will be stopped if yes or no words appear.
- loadingArguments
  
  protected TransformersArguments loadingArguments
  
  Can add any parameter which are passed to the from_pretrained method.
Constructor Details
- LLMBase
  
  public LLMBase(TextExtractorMap extractor, String modelName, String promt)
  
  Constructor with all required parameters and default values for optional parameters (can be changed by setters). It uses the systems default tmp dir to store the files with texts generated from the knowledge graphs. Pytorch is used instead of tensorflow and all visible GPUs are used for prediction.
  
  Parameters:
  
  extractor - the extractor to select which text for each resource should be used.
  
  modelName - the model name which can be a model id (a hosted model on huggingface.co) or a path to a directory containing a model and tokenizer ( see first parameter pretrained_model_name_or_path of the from_pretrained function in huggingface library). In case of a path, it should be absolute. The path can be generated by e.g. FileUtil.getCanonicalPathIfPossible(java.io.File)
  
  promt - The promt to use for the LLM. Use {left} and {right} to insert the text representation of the left and right concept.
- LLMBase
  
  public LLMBase(TextExtractor extractor, String modelName, String promt)
  
  Constructor with all required parameters and default values for optional parameters (can be changed by setters). It uses the systems default tmp dir to store the files with texts generated from the knowledge graphs. Pytorch is used instead of tensorflow and all visible GPUs are used for prediction.
  
  Parameters:
  
  extractor - the extractor to select which text for each resource should be used.
  
  modelName - the model name which can be a model id (a hosted model on huggingface.co) or a path to a directory containing a model and tokenizer ( see first parameter pretrained_model_name_or_path of the from_pretrained function in huggingface library). In case of a path, it should be absolute. The path can be generated by e.g. FileUtil.getCanonicalPathIfPossible(java.io.File)
  
  promt - The promt to use for the LLM. Use {left} and {right} to insert the text representation of the left and right concept.
Method Details
- predictConfidences
  
  protected List<List<Double>> predictConfidences(File predictionFilePath, List<Set<String>> wordsToDetect) throws Exception
  
  Run huggingface transformers library.
  
  Parameters:
  
  predictionFilePath - path to csv file with two columns (text left and text right).
  
  wordsToDetect - the words which should be detected
  
  Returns:
  
  a list of confidences
  
  Throws:
  
  Exception - in case something goes wrong.
- getPromt
  
  public String getPromt()
- setPromt
  
  public void setPromt(String promt)
- getDebugFile
  
  public File getDebugFile()
- setDebugFile
  
  public void setDebugFile(File debugFile)
- isWordStopper
  
  public boolean isWordStopper()
- setWordStopper
  
  public void setWordStopper(boolean wordStopper)
  
  If set to true the text generation will automatically stop if the word yes or no is generated.
  
  Parameters:
  
  wordStopper - fi true the generation stops on yes or no automatically.
- isWordForcer
  
  public boolean isWordForcer()
- setWordForcer
  
  public void setWordForcer(boolean wordForcer)
  
  When setting this option to true, the constrained beam search is activated and the words yes and no will be forced. This also means that the "num_beams" attribute in the generation arguments needs to be set to a number higher than one.
  
  Parameters:
  
  wordForcer - true or false
- includeMoreVariations
  
  public static Set<String> includeMoreVariations(String... words)
  
  This functions add more word variations to the set of words. This will be applied to all words in the set. Thus it should only contain similar words or variations. This includes lower, upper, and title case as well as prefixing with space.
  
  Parameters:
  
  words - words
  
  Returns:
  
  all variation of the words.
- includeMoreVariations
  
  public static Set<String> includeMoreVariations(Set<String> words)
  
  This functions add more word variations to the set of words. This will be applied to all words in the set. Thus it should only contain similar words or variations. This includes lower, upper, and title case as well as prefixing with space.
  
  Parameters:
  
  words - words
  
  Returns:
  
  all variation of the words.
- getGenerationArguments
  
  public TransformersArguments getGenerationArguments()
  
  Returns the arguments which can be used for the generate function of transformers library.
  
  Returns:
  
  the generation arguments
- setGenerationArguments
  
  public void setGenerationArguments(TransformersArguments generationArguments)
  
  Set the arguments which can be used for the generate function of transformers library.
  
  Parameters:
  
  generationArguments - the new geenration arguments
- addGenerationArgument
  
  public LLMBase addGenerationArgument(String key, Object value)
  
  Add parameters which are passed to the generate function of transformers library.
  
  Parameters:
  
  key - the key to use: possible options.
  
  value - the corresponding value
  
  Returns:
  
  the object to allow for further addGenerationArgument calls.
- getLoadingArguments
  
  public TransformersArguments getLoadingArguments()
  
  Returns parameters which are passed to the from_pretrained method.
  
  Returns:
  
  the loading arguments.
- setLoadingArguments
  
  public void setLoadingArguments(TransformersArguments loadingArguments)
  
  Set the arguments which are passed to the from_pretrained method.
  
  Parameters:
  
  loadingArguments - new loading arguments
- addLoadingArgument
  
  public LLMBase addLoadingArgument(String key, Object value)
  
  Can add any parameter which are passed to the from_pretrained method.
  
  Parameters:
  
  key - the key to use e.g. load_in_8bit
  
  value - the corresponding value
  
  Returns:
  
  the object to allow for further addGenerationArgument calls.
- addLoadingArguments
  
  public LLMBase addLoadingArguments(TransformersArguments loadingArguments)
- setTrainingArguments
  
  public void setTrainingArguments(TransformersArguments configuration)
  
  Do not allow to set training arguments - not used for llms.
  
  Overrides:
  
  setTrainingArguments in class TransformersBase
  
  Parameters:
  
  configuration - the trainer configuration
- addTrainingArgument
  
  public void addTrainingArgument(String key, Object value)
  
  Description copied from class: TransformersBase
  
  Adds a training argument for the transformers trainer. Any of the training arguments which are listed on the documentation can be used.
  
  Overrides:
  
  addTrainingArgument in class TransformersBase
  
  Parameters:
  
  key - The key of the training argument like warmup_ratio
  
  value - the corresponding value like 0.2

Class LLMBase

Field Summary

Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase

Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile

Constructor Summary

Method Summary

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAAJena

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAA

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherURL

Methods inherited from class eu.sealsproject.platform.res.tool.impl.AbstractPlugin

Methods inherited from class java.lang.Object

Methods inherited from interface eu.sealsproject.platform.res.tool.api.IPlugin

Field Details

promt

debugFile

wordStopper

wordForcer

loadingArguments

Constructor Details

LLMBase

LLMBase

Method Details

predictConfidences

getPromt

setPromt

getDebugFile

setDebugFile

isWordStopper

setWordStopper

isWordForcer

setWordForcer

includeMoreVariations

includeMoreVariations

getGenerationArguments

setGenerationArguments

addGenerationArgument

getLoadingArguments

setLoadingArguments

addLoadingArgument

addLoadingArguments

setTrainingArguments

addTrainingArgument