java.lang.Object
de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.external.services.labelToConcept.nGramTokenizers.MaxGramLeftToRightTokenizer
All Implemented Interfaces:
LeftToRightTokenizer, OneToManyLinkingStrategy

public class MaxGramLeftToRightTokenizer extends Object implements LeftToRightTokenizer, OneToManyLinkingStrategy
This tokenizer is able to assist in linking labels that consist of multiple concepts to the most specific concept possible. To understand the full capability of the program you can also have a look at the extensive unit test.
  • Field Details

    • LOGGER

      private static final org.slf4j.Logger LOGGER
    • endIndexExclusive

      private int endIndexExclusive
    • startIndex

      private int startIndex
    • terminated

      private boolean terminated
    • delimiter

      private String delimiter
      Used to concatenate single tokens. The classic Classic data set, for instance, requires a space between labels which are made up of more than one word.
    • notLinked

      private ArrayList<String> notLinked
      List of terms that could not be linked.
  • Constructor Details

    • MaxGramLeftToRightTokenizer

      public MaxGramLeftToRightTokenizer(String[] arrayToLink, String delimiter)
      Constructor
      Parameters:
      arrayToLink - The array that shall be linked
      delimiter - The delimiter.
  • Method Details

    • getNextTokenNotSuccessful

      public String getNextTokenNotSuccessful()
      Get a new token based on the information that the last string tested was not successful.
      Specified by:
      getNextTokenNotSuccessful in interface LeftToRightTokenizer
      Returns:
      String representation for the next test.
    • getNextTokenSuccessful

      public String getNextTokenSuccessful()
      Get a new token based on the information that the last string tested was successful.
      Specified by:
      getNextTokenSuccessful in interface LeftToRightTokenizer
      Returns:
      String representation for next trial.
    • getInitialToken

      public String getInitialToken()
      Getting the very first string formation. This method can only be called as long as the process is not terminated.
      Specified by:
      getInitialToken in interface LeftToRightTokenizer
      Returns:
      String representation for next test.
    • processArrayForLookup

      String processArrayForLookup(String[] arrayToConvert, int start, int end)
      Cuts the given array as specified and concatenates the components as defined by the delimiter.
      Parameters:
      arrayToConvert - The array to be cut.
      start - Start index of cut.
      end - End index of cut.
      Returns:
      Single String of space-separated components.
    • getArrayToLink

      public String[] getArrayToLink()
      Get the token sequence that is to be linked.
      Returns:
      Token sequence that is to be linked.
    • setArrayToLink

      public void setArrayToLink(String[] arrayToLink)
    • isTerminated

      public boolean isTerminated()
    • getDelimiter

      public String getDelimiter()
    • setDelimiter

      public void setDelimiter(String delimiter)
    • getNotLinked

      public ArrayList<String> getNotLinked()