All Implemented Interfaces:
Filter, IMatcher<org.apache.jena.ontology.OntModel,Alignment,Properties>, eu.sealsproject.platform.res.domain.omt.IOntologyMatchingToolBridge, eu.sealsproject.platform.res.tool.api.IPlugin, eu.sealsproject.platform.res.tool.api.IToolBridge
Direct Known Subclasses:
LLMBinaryFilter, LLMChooseGivenEntityFilter

public abstract class LLMBase extends TransformersBase implements Filter
This filter asks a LLM which entity of the source fits best to an entity of the target. Correspondences needs to be provided such that candidates are available. It will only keep correspondences which are stated to be useful. The difference to #LLMBinaryFilter is that all possible matches arte given to the LLM model.
  • Field Details

    • promt

      protected String promt
      The promt to use for the LLM. Subclasses may interpret the promt differently.
    • debugFile

      protected File debugFile
      If set to a existing file, this class writes additional debug information to the corresponding file.
    • wordStopper

      protected boolean wordStopper
      If set to true, the generation will be stopped if yes or no words appear.
    • wordForcer

      protected boolean wordForcer
      If set to true, the generation will be stopped if yes or no words appear.
    • loadingArguments

      protected TransformersArguments loadingArguments
      Can add any parameter which are passed to the from_pretrained method.
  • Constructor Details

    • LLMBase

      public LLMBase(TextExtractorMap extractor, String modelName, String promt)
      Constructor with all required parameters and default values for optional parameters (can be changed by setters). It uses the systems default tmp dir to store the files with texts generated from the knowledge graphs. Pytorch is used instead of tensorflow and all visible GPUs are used for prediction.
      Parameters:
      extractor - the extractor to select which text for each resource should be used.
      modelName - the model name which can be a model id (a hosted model on huggingface.co) or a path to a directory containing a model and tokenizer ( see first parameter pretrained_model_name_or_path of the from_pretrained function in huggingface library). In case of a path, it should be absolute. The path can be generated by e.g. FileUtil.getCanonicalPathIfPossible(java.io.File)
      promt - The promt to use for the LLM. Use {left} and {right} to insert the text representation of the left and right concept.
    • LLMBase

      public LLMBase(TextExtractor extractor, String modelName, String promt)
      Constructor with all required parameters and default values for optional parameters (can be changed by setters). It uses the systems default tmp dir to store the files with texts generated from the knowledge graphs. Pytorch is used instead of tensorflow and all visible GPUs are used for prediction.
      Parameters:
      extractor - the extractor to select which text for each resource should be used.
      modelName - the model name which can be a model id (a hosted model on huggingface.co) or a path to a directory containing a model and tokenizer ( see first parameter pretrained_model_name_or_path of the from_pretrained function in huggingface library). In case of a path, it should be absolute. The path can be generated by e.g. FileUtil.getCanonicalPathIfPossible(java.io.File)
      promt - The promt to use for the LLM. Use {left} and {right} to insert the text representation of the left and right concept.
  • Method Details

    • predictConfidences

      protected List<List<Double>> predictConfidences(File predictionFilePath, List<Set<String>> wordsToDetect) throws Exception
      Run huggingface transformers library.
      Parameters:
      predictionFilePath - path to csv file with two columns (text left and text right).
      wordsToDetect - the words which should be detected
      Returns:
      a list of confidences
      Throws:
      Exception - in case something goes wrong.
    • getPromt

      public String getPromt()
    • setPromt

      public void setPromt(String promt)
    • getDebugFile

      public File getDebugFile()
    • setDebugFile

      public void setDebugFile(File debugFile)
    • isWordStopper

      public boolean isWordStopper()
    • setWordStopper

      public void setWordStopper(boolean wordStopper)
      If set to true the text generation will automatically stop if the word yes or no is generated.
      Parameters:
      wordStopper - fi true the generation stops on yes or no automatically.
    • isWordForcer

      public boolean isWordForcer()
    • setWordForcer

      public void setWordForcer(boolean wordForcer)
      When setting this option to true, the constrained beam search is activated and the words yes and no will be forced. This also means that the "num_beams" attribute in the generation arguments needs to be set to a number higher than one.
      Parameters:
      wordForcer - true or false
    • includeMoreVariations

      public static Set<String> includeMoreVariations(String... words)
      This functions add more word variations to the set of words. This will be applied to all words in the set. Thus it should only contain similar words or variations. This includes lower, upper, and title case as well as prefixing with space.
      Parameters:
      words - words
      Returns:
      all variation of the words.
    • includeMoreVariations

      public static Set<String> includeMoreVariations(Set<String> words)
      This functions add more word variations to the set of words. This will be applied to all words in the set. Thus it should only contain similar words or variations. This includes lower, upper, and title case as well as prefixing with space.
      Parameters:
      words - words
      Returns:
      all variation of the words.
    • getGenerationArguments

      public TransformersArguments getGenerationArguments()
      Returns the arguments which can be used for the generate function of transformers library.
      Returns:
      the generation arguments
    • setGenerationArguments

      public void setGenerationArguments(TransformersArguments generationArguments)
      Set the arguments which can be used for the generate function of transformers library.
      Parameters:
      generationArguments - the new geenration arguments
    • addGenerationArgument

      public LLMBase addGenerationArgument(String key, Object value)
      Add parameters which are passed to the generate function of transformers library.
      Parameters:
      key - the key to use: possible options.
      value - the corresponding value
      Returns:
      the object to allow for further addGenerationArgument calls.
    • getLoadingArguments

      public TransformersArguments getLoadingArguments()
      Returns parameters which are passed to the from_pretrained method.
      Returns:
      the loading arguments.
    • setLoadingArguments

      public void setLoadingArguments(TransformersArguments loadingArguments)
      Set the arguments which are passed to the from_pretrained method.
      Parameters:
      loadingArguments - new loading arguments
    • addLoadingArgument

      public LLMBase addLoadingArgument(String key, Object value)
      Can add any parameter which are passed to the from_pretrained method.
      Parameters:
      key - the key to use e.g. load_in_8bit
      value - the corresponding value
      Returns:
      the object to allow for further addGenerationArgument calls.
    • addLoadingArguments

      public LLMBase addLoadingArguments(TransformersArguments loadingArguments)
    • setTrainingArguments

      public void setTrainingArguments(TransformersArguments configuration)
      Do not allow to set training arguments - not used for llms.
      Overrides:
      setTrainingArguments in class TransformersBase
      Parameters:
      configuration - the trainer configuration
    • addTrainingArgument

      public void addTrainingArgument(String key, Object value)
      Description copied from class: TransformersBase
      Adds a training argument for the transformers trainer. Any of the training arguments which are listed on the documentation can be used.
      Overrides:
      addTrainingArgument in class TransformersBase
      Parameters:
      key - The key of the training argument like warmup_ratio
      value - the corresponding value like 0.2