The problem I faced:
opencsv was corrupting my characters in Japanese language.
Myths
- CSV cannot hold all types of unicode characters. (it can. even a notepad can.)
- FileWriter is not good for handling all types of unicode characters.
What was failing?
- The ResultSetHelperService class of opencsv where there is rs.getString() was corrupting the data.
How?
I need the figure this out :( But ofcourse it must be not encoding it to the correct character set)
What was the solution?
I derived a child class of ResultSetHelperService and overloaded getColumnValues. I copied everything and did a small change.
instead of
value = rs.getString(colIndex)
I replaced it with
value = new String(rs.getBytes(colIndex), "UTF-8")
and it worked !!!
I also read with newer version of Java and Oracle it just works. But for mySql 3.0 and JDBC 4 it did't work.
References:
- The classes java.io.InputStreamReader, java.io.OutputStreamWriter, java.lang.String, and classes in the java.nio.charset package can convert between Unicode and a number of other character encodings.(http://docs.oracle.com/javase/6/docs/technotes/guides/intl/encoding.doc.html)
- http://stackoverflow.com/questions/5892163/should-i-be-using-jdbc-getnstring-instead-of-getstring
- http://www.joelonsoftware.com/printerFriendly/articles/Unicode.html
- http://stackoverflow.com/questions/496321/utf8-utf16-and-utf32