Monday, June 24, 2013

opencsv and Japanese character


The problem I faced:

opencsv was corrupting my characters in Japanese language.

Myths
- CSV cannot hold all types of unicode characters. (it can. even a notepad can.)
- FileWriter is not good for handling all types of unicode characters.

What was failing?

- The  ResultSetHelperService class of opencsv where there is rs.getString() was corrupting the data.

How?

I need the figure this out :( But ofcourse it must be not encoding it to the correct character set)

What was the solution?

I derived a child class of ResultSetHelperService and overloaded getColumnValues. I copied everything and did a small change.

instead of

value =  rs.getString(colIndex)

I replaced it with

value =  new String(rs.getBytes(colIndex), "UTF-8")

and it worked !!!

I also read with newer version of Java and Oracle it just works. But for mySql 3.0 and JDBC 4 it did't work.

References:

- The classes java.io.InputStreamReader, java.io.OutputStreamWriter, java.lang.String, and classes in the java.nio.charset package can convert between Unicode and a number of other character encodings.(http://docs.oracle.com/javase/6/docs/technotes/guides/intl/encoding.doc.html)

- http://stackoverflow.com/questions/5892163/should-i-be-using-jdbc-getnstring-instead-of-getstring

- http://www.joelonsoftware.com/printerFriendly/articles/Unicode.html

- http://stackoverflow.com/questions/496321/utf8-utf16-and-utf32