All Implemented Interfaces:
IMatcher<org.apache.jena.ontology.OntModel,​Alignment,​Properties>, eu.sealsproject.platform.res.domain.omt.IOntologyMatchingToolBridge, eu.sealsproject.platform.res.tool.api.IPlugin, eu.sealsproject.platform.res.tool.api.IToolBridge
Direct Known Subclasses:
SentenceTransformersMatcher, TransformersBaseFineTuner, TransformersFilter

public abstract class TransformersBase
extends MatcherYAAAJena
This is a base class for all Transformers. It just contains some variables and getter and setters.
  • Field Details

    • LOGGER

      private static final org.slf4j.Logger LOGGER
    • extractor

      protected TextExtractor extractor
    • modelName

      protected String modelName
    • trainingArguments

      protected TransformersTrainerArguments trainingArguments
    • usingTensorflow

      protected boolean usingTensorflow
    • cudaVisibleDevices

      protected String cudaVisibleDevices
    • transformersCache

      protected File transformersCache
    • multiProcessing

      protected TransformersMultiProcessing multiProcessing
    • multipleTextsToMultipleExamples

      protected boolean multipleTextsToMultipleExamples
  • Constructor Details

  • Method Details

    • getExtractor

      public TextExtractor getExtractor()
      Returns the text extractor which extracts text from a given resource. This is the text which represents a resource.
      Returns:
      the text extractor
    • setExtractor

      public void setExtractor​(TextExtractor extractor)
      Sets the extractor which computes the text from a given resource. This is the text which represents a resource.
      Parameters:
      extractor - the text extractor
    • getModelName

      public String getModelName()
      Returns the model name which can be a model id (a hosted model on huggingface.co) or a path to a directory containing a model and tokenizer ( see first parameter pretrained_model_name_or_path of the from_pretrained function in huggingface library)
      Returns:
      the model name as a string
    • setModelName

      public void setModelName​(String modelName)
      Sets the model name which can be a model id (a hosted model on huggingface.co) or a path to a directory containing a model and tokenizer ( see first parameter pretrained_model_name_or_path of the from_pretrained function in huggingface library). In case of a path, it should be abolute. The path can be generated by e.g. FileUtil.getCanonicalPathIfPossible(java.io.File)
      Parameters:
      modelName - the model name as a string
    • getTrainingArguments

      public TransformersTrainerArguments getTrainingArguments()
      Returns the training arguments of the huggingface trainer. Any of the training arguments can be used. For further documentation, seeTransformersTrainerArguments
      Returns:
      the transformer location
    • setTrainingArguments

      public void setTrainingArguments​(TransformersTrainerArguments configuration)
      Sets the training arguments of the huggingface trainer. Any of the training arguments can be used. For further documentation, seeTransformersTrainerArguments
      Parameters:
      configuration - the trainer configuration
    • isUsingTensorflow

      public boolean isUsingTensorflow()
      Returns a boolean value if tensorflow is used to train the model. If true, the models are run with tensorflow. If false, pytorch is used.
      Returns:
      true, if tensorflow is used. false, if pytorch is used.
    • setUsingTensorflow

      public void setUsingTensorflow​(boolean usingTensorflow)
      Sets the boolean value if tensorflow is used. If set to false, true, pytorch is used.
      Parameters:
      usingTensorflow - true to use tensorflow and false to use pytorch.
    • getCudaVisibleDevices

      public String getCudaVisibleDevices()
      Returns a string which is set to the environment variable CUDA_VISIBLE_DEVICES to select on which GPU the process should run. If null or empty, the default is used (all available GPUs).
      Returns:
      the variable CUDA_VISIBLE_DEVICES
    • setCudaVisibleDevices

      public void setCudaVisibleDevices​(String cudaVisibleDevices)
      Sets the environment variable CUDA_VISIBLE_DEVICES to select on which GPUs the process should run. If null or the string is empty, the default is used (all available GPUs). If multiple GPUs can be used, then the values should be comma separated. Example: "0" to use only the first GPU. "1,3" to use the second and fourth GPU. The use of setCudaVisibleDevices(int...) is preffered because it is more type safe.
      Parameters:
      cudaVisibleDevices - the string which is set to the environment variable CUDA_VISIBLE_DEVICES
    • setCudaVisibleDevices

      public void setCudaVisibleDevices​(int... cudaVisibleDevices)
      Sets the environment variable CUDA_VISIBLE_DEVICES to select on which GPUs the process should run. If no values are provided, then all available GPUs are used. If multiple GPUs should be used, then provide the values one after the other. All indices are zero based. So call setCudaVisibleDevices(0,1) to use the first two GPUs.
      Parameters:
      cudaVisibleDevices - the integer numbers which refers to the GPUs which should be used.
    • getTransformersCache

      public File getTransformersCache()
      Returns the cache folder where the pretrained transformers models are stored. If set to null, the default locations is used ( which is usually ~/.cache/huggingface/transformers/).
      Returns:
      the transformers cache folder.
    • setTransformersCache

      public void setTransformersCache​(File transformersCache)
      Sets the cache folder where the pretrained transformers models are stored. If set to null, the default locations is used ( which is usually ~/.cache/huggingface/transformers/). This setter is useful, if the default location does not have enough space available. Then just set it to a folder which have a lot of free space.
      Parameters:
      transformersCache - The transformers cache folder.
    • getMultiProcessing

      public TransformersMultiProcessing getMultiProcessing()
      Returns the multiprocessing value of the transformer. The transformers library may not free all memory from GPU. Thus the prediction and training are wrapped in an external process. This enum defines how the process is started and if multiprocessing should be used at all. Default is to use the system dependent default.
      Returns:
      the enum which represent the multi process starting method.
    • setMultiProcessing

      public void setMultiProcessing​(TransformersMultiProcessing multiProcessing)
      Sets the multiprocessing value of the transformer. The transformers library may not free all memory from GPU. Thus the prediction and training are wrapped in an external process. This enum defines how the process is started and if multiprocessing should be used at all. Default is to use the system dependent default.
      Parameters:
      multiProcessing - the enum which represent the multi process starting method.
    • setOptimizeForMixedPrecisionTraining

      public void setOptimizeForMixedPrecisionTraining​(boolean mpt)
      Enable or disable the mixed precision training. This will optimize the runtime of training and
      Parameters:
      mpt - true to enable mixed precision training
    • isOptimizeForMixedPrecisionTraining

      public boolean isOptimizeForMixedPrecisionTraining()
      Returns the value if mixed precision training is enabled or diabled.
      Returns:
      true if mixed precision training is enabled.
    • isMultipleTextsToMultipleExamples

      public boolean isMultipleTextsToMultipleExamples()
      Returns the value if all texts returned by the text extractor are used separately to generate the examples. Otherwise it will concatenate all texts together to form one example(the default). This should be only enabled when the extractor does not return many texts because otherwise a lot of examples are produced.
      Returns:
      true, if generation of multiple examples is enabled
    • setMultipleTextsToMultipleExamples

      public void setMultipleTextsToMultipleExamples​(boolean multipleTextsToMultipleExamples)
      Is set to true, then all texts returned by the text extractor are used separately to generate the examples. Otherwise it will concatenate all texts together to form one example(the default). This should be only enabled when the extractor does not return many texts because otherwise a lot of examples are produced.
      Parameters:
      multipleTextsToMultipleExamples - true, to enable the generation of multiple examples.
    • copyCSVLines

      protected boolean copyCSVLines​(File source, File target, int numberOfCSVLines) throws IOException
      This function copies a part of a csv file to another file.This is used to find the best batch size.
      Parameters:
      source - the source file
      target - the target file
      numberOfCSVLines - how many lines should be copied
      Returns:
      true if enough lines are found in the input file.
      Throws:
      IOException - in case of any io exception