Class StringUtil
java.lang.Object
de.uni_mannheim.informatik.dws.melt.matching_ml.python.StringUtil
A collection of useful String operations that can be used for matcher development.
-
Field Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic boolean
containsMostlyNumbers
(String term) static int
damerauLevenshtein
(String compOne, String compTwo) static double
static int
editDistance
(String a, String b, boolean cased) static double
static String
exactLength
(String in, int length) private static double
getMaxLength
(String a, String b) private static double
getNormalised
(double editDistance, double maxLength) static String
getProcessedString
(String text) static boolean
static boolean
removeStopwords
(List<String> tokens) removeStopwords
(List<String> tokens, Set<String> stopwords) Make tokens out of a String.static String
tokenizeToString
(String text)
-
Field Details
-
tokenMap
-
myFormat
-
ENGLISH_STOPWORDS
A set of English stopwords.
-
-
Constructor Details
-
StringUtil
public StringUtil()
-
-
Method Details
-
tokenize
Make tokens out of a String.- Parameters:
text
- String to be tokenized.- Returns:
- A list of tokens.
-
tokenizeToString
-
containsMostlyNumbers
-
getProcessedString
-
getTokensWithoutStopword
-
removeStopwords
-
removeStopwords
-
editDistance
-
editDistanceNormalised
-
isSuffix
-
isPrefix
-
damerauLevenshtein
-
damerauLevenshteinNormalised
-
exactLength
-
getNormalised
private static double getNormalised(double editDistance, double maxLength) -
getMaxLength
-