python读入中文文本编码错误
python读入中文txt文本:
#coding:utf-8
def readFile():
fp = open('emotion_dict//neg//neg_all_dict.txt','r')
list = []
for line in fp:
list.append(line)
fp.close()
print(list)
readFile()
但是有时候会出现错误提示:
**UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 10: illegal multibyte sequence
**
此时,需要对代码做一个小的调整,就可以读入中文,即以中文二进制'rb'读入txt,然后转换为'utf-8',具体代码如下:
#coding:utf-8
def readFile():
fp = open('emotion_dict//neg//neg_all_dict.txt','rb')
list = []
for line in fp.readlines():
line = line.strip()
line = line.decode('utf-8')
list.append(line)
fp.close()
print(list)
readFile()
手机扫一扫
移动阅读更方便
你可能感兴趣的文章