看见网上有很多问中文乱码的问题,基于最近对中文问题的一些了解,写了下面一个例子,有兴趣的网友可以试试看。这个例子在中文Windows和英文linux下都试过,看起来还可以。
import java.io.*;
public class Test{
String utf8file = "utf8.txt";
String gbfile = "gbk.txt";
String big5file = "big5.txt";
public Test(){
}
public String encode(String str,String fromCharset,String toCharset){
String newstr = "";
try{
if(fromCharset.trim().length()==0){
newstr = new String(str.getBytes(),toCharset);
}else{
newstr = new String(str.getBytes(fromCharset),toCharset);
}
}catch(UnsupportedEncodingException uee){
uee.printStackTrace();
}
return newstr;
}
public String readfile(String filename,String charset){
String str = "";
String dataline;
try{
FileInputStream pydata = new FileInputStream(filename);
BufferedReader in = null;
if(charset.trim().length()==0)
in = new BufferedReader(new InputStreamReader(pydata));
else
in = new BufferedReader(new InputStreamReader(pydata, charset));
while ((dataline = in.readLine()) != null) {
str = str + dataline;
}
}catch(IOException ioe){
ioe.printStackTrace();
}
return str;
}
public void showMessage(){
java.util.Locale lc = java.util.Locale.getDefault();
lc.setDefault(java.util.Locale.US);
System.out.println("Locale:"+lc.toString());
String utf8str = readfile(utf8file,"UTF-8");
System.out.println("Read from UTF-8 file:\n"+utf8str);
utf8str = encode(utf8str,"GBK","iso8859_1");
System.out.println("UTF-8 String after encode:\n"+utf8str);
String gbkstr = readfile(gbfile,"GBK");
System.out.println("Read from GBK file:\n"+gbkstr);
gbkstr = encode(gbkstr,"GBK","iso8859_1");
System.out.println("GBK String after encode:\n"+gbkstr);
String big5str = readfile(big5file,"BIG5");
System.out.println("Read from BIG5 file:\n"+big5str);
big5str = encode(big5str,"GBK","iso8859_1");
System.out.println("BIG5 String after encode:\n"+big5str);
}
public static void main(String[] args){
Test t = new Test();
t.showMessage();
}
}
在中文系统下,从文件中读出的字符串已经可以正常显示,结果如下:
D:\source\77>java Test
Locale:zh_CN
Read from UTF-8 file:
简体中文企业免激活版-删除
UTF-8 String after encode:
??ò???????ó?????¤??°?-????
Read from GBK file:
还能跟你留点什么的,或许能慰藉一些思念之苦吧
GBK String after encode:
?????ú???????????????ò?í?????????????????à°?
Read from BIG5 file:
香味,好像還意猶未竟的,弘史看在眼裡不由得感 嘆!
BIG5 String after encode:
???????????????q???????????·???????e???????? ?@??
但是在LANG=en_US的linux下,从文件中读出的字符串显示为??????????????,必须经过转码后才能正常显示,而且转码的过程都是从GBK转为ISO8859-1,这一点我也不太明白。呵呵,知其然,不知其所以然。希望大家可以探讨一下。
运行结果:
[root@rmi2 test]# java Test
Locale:en_US
Read from UTF-8 file:
???????????-??
count = 0, total = 788
UTF-8 String after encode:
简体中文企业免激活版-删除
Read from GBK file:
??????????????????????
GBK String after encode:
还能跟你留点什么的,或许能慰藉一些思念之苦吧
Read from BIG5 file:
?????????????????????? ??
BIG5 String after encode:
香味,好像還意猶未竟的,弘史看在眼裡不由得感 嘆!
三个文件下载UTF8,GBK,BIG5