Class MaxGramLeftToRightTokenizer
java.lang.Object
de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.external.services.labelToConcept.nGramTokenizers.MaxGramLeftToRightTokenizer
- All Implemented Interfaces:
LeftToRightTokenizer
,OneToManyLinkingStrategy
public class MaxGramLeftToRightTokenizer
extends Object
implements LeftToRightTokenizer, OneToManyLinkingStrategy
This tokenizer is able to assist in linking labels that consist of multiple concepts to the most specific concept
possible.
To understand the full capability of the program you can also have a look at the extensive unit test.
-
Field Summary
-
Constructor Summary
ConstructorDescriptionMaxGramLeftToRightTokenizer
(String[] arrayToLink, String delimiter) Constructor -
Method Summary
Modifier and TypeMethodDescriptionString[]
Get the token sequence that is to be linked.Getting the very first string formation.Get a new token based on the information that the last string tested was not successful.Get a new token based on the information that the last string tested was successful.boolean
(package private) String
processArrayForLookup
(String[] arrayToConvert, int start, int end) Cuts the given array as specified and concatenates the components as defined by the delimiter.void
setArrayToLink
(String[] arrayToLink) void
setDelimiter
(String delimiter)
-
Field Details
-
LOGGER
private static final org.slf4j.Logger LOGGER -
arrayToLink
-
endIndexExclusive
private int endIndexExclusive -
startIndex
private int startIndex -
terminated
private boolean terminated -
delimiter
Used to concatenate single tokens. The classic Classic data set, for instance, requires a space between labels which are made up of more than one word. -
notLinked
List of terms that could not be linked.
-
-
Constructor Details
-
MaxGramLeftToRightTokenizer
Constructor- Parameters:
arrayToLink
- The array that shall be linkeddelimiter
- The delimiter.
-
-
Method Details
-
getNextTokenNotSuccessful
Get a new token based on the information that the last string tested was not successful.- Specified by:
getNextTokenNotSuccessful
in interfaceLeftToRightTokenizer
- Returns:
- String representation for the next test.
-
getNextTokenSuccessful
Get a new token based on the information that the last string tested was successful.- Specified by:
getNextTokenSuccessful
in interfaceLeftToRightTokenizer
- Returns:
- String representation for next trial.
-
getInitialToken
Getting the very first string formation. This method can only be called as long as the process is not terminated.- Specified by:
getInitialToken
in interfaceLeftToRightTokenizer
- Returns:
- String representation for next test.
-
processArrayForLookup
Cuts the given array as specified and concatenates the components as defined by the delimiter.- Parameters:
arrayToConvert
- The array to be cut.start
- Start index of cut.end
- End index of cut.- Returns:
- Single String of space-separated components.
-
getArrayToLink
Get the token sequence that is to be linked.- Returns:
- Token sequence that is to be linked.
-
setArrayToLink
-
isTerminated
public boolean isTerminated() -
getDelimiter
-
setDelimiter
-
getNotLinked
-