convert unicode string to ASCII in java which works in unix/linux -
i have tried using normalizer
string s = "口水雞 hello Ä"; string s1 = normalizer.normalize(s, normalizer.form.nfkd); string regex = pattern.quote("[\\p{incombiningdiacriticalmarks}\\p{islm}\\p{issk}]+"); string s2 = new string(s1.replaceall(regex, "").getbytes("ascii"), "ascii"); system.out.println(s2); system.out.println(s.length() == s2.length());
i want work in unix/linux ,
there ascii character class matching code points in ascii set:
string s = "口水雞 hello Ä"; string s1 = normalizer.normalize(s, normalizer.form.nfkd); string nonascii = "[^\\p{ascii}]+"; string s2 = s1.replaceall(nonascii, ""); system.out.println(s2); system.out.println(s.length() == s2.length());
as joop eggan notes, java string , char types utf-16. can have ascii-encoded data in byte form:
byte[] ascii = s2.getbytes(standardcharsets.us_ascii);
Comments
Post a Comment