convert unicode string to ASCII in java which works in unix/linux -


i have tried using normalizer

string s = "口水雞 hello Ä";  string s1 = normalizer.normalize(s, normalizer.form.nfkd); string regex = pattern.quote("[\\p{incombiningdiacriticalmarks}\\p{islm}\\p{issk}]+");  string s2 = new string(s1.replaceall(regex, "").getbytes("ascii"), "ascii");  system.out.println(s2); system.out.println(s.length() == s2.length()); 

i want work in unix/linux ,

there ascii character class matching code points in ascii set:

string s = "口水雞 hello Ä";  string s1 = normalizer.normalize(s, normalizer.form.nfkd); string nonascii = "[^\\p{ascii}]+"; string s2 = s1.replaceall(nonascii, "");  system.out.println(s2); system.out.println(s.length() == s2.length()); 

as joop eggan notes, java string , char types utf-16. can have ascii-encoded data in byte form:

byte[] ascii = s2.getbytes(standardcharsets.us_ascii); 

Comments

Popular posts from this blog

google api - Incomplete response from Gmail API threads.list -

Installing Android SQLite Asset Helper -

Qt Creator - Searching files with Locator including folder -