Class StringUtil
java.lang.Object
de.uni_mannheim.informatik.dws.melt.matching_ml.python.StringUtil
public class StringUtil extends Object
A collection of useful String operations that can be used for matcher development.
-
Field Summary
-
Constructor Summary
Constructors Constructor Description StringUtil()
-
Method Summary
Modifier and Type Method Description static boolean
containsMostlyNumbers(String term)
static int
damerauLevenshtein(String compOne, String compTwo)
static double
damerauLevenshteinNormalised(String a, String b)
static int
editDistance(String a, String b, boolean cased)
static double
editDistanceNormalised(String a, String b)
static String
exactLength(String in, int length)
private static double
getMaxLength(String a, String b)
private static double
getNormalised(double editDistance, double maxLength)
static String
getProcessedString(String text)
static List<String>
getTokensWithoutStopword(String text)
static boolean
isPrefix(String s1, String s2)
static boolean
isSuffix(String s1, String s2)
static List<String>
removeStopwords(List<String> tokens)
static List<String>
removeStopwords(List<String> tokens, Set<String> stopwords)
static List<String>
tokenize(String text)
Make tokens out of a String.static String
tokenizeToString(String text)
-
Field Details
-
tokenMap
-
myFormat
-
ENGLISH_STOPWORDS
A set of English stopwords.
-
-
Constructor Details
-
StringUtil
public StringUtil()
-
-
Method Details
-
tokenize
Make tokens out of a String.- Parameters:
text
- String to be tokenized.- Returns:
- A list of tokens.
-
tokenizeToString
-
containsMostlyNumbers
-
getProcessedString
-
getTokensWithoutStopword
-
removeStopwords
-
removeStopwords
-
editDistance
-
editDistanceNormalised
-
isSuffix
-
isPrefix
-
damerauLevenshtein
-
damerauLevenshteinNormalised
-
exactLength
-
getNormalised
private static double getNormalised(double editDistance, double maxLength) -
getMaxLength
-