python读入中文文本编码错误
阅读原文时间:2023年07月08日阅读:1

python读入中文文本编码错误

python读入中文txt文本:

#coding:utf-8

def readFile():
    fp = open('emotion_dict//neg//neg_all_dict.txt','r')
    list = []
    for line in fp:
        list.append(line)
    fp.close()
    print(list)
readFile()

但是有时候会出现错误提示:

**UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 10: illegal multibyte sequence
**

此时,需要对代码做一个小的调整,就可以读入中文,即以中文二进制'rb'读入txt,然后转换为'utf-8',具体代码如下:

#coding:utf-8

def readFile():
    fp = open('emotion_dict//neg//neg_all_dict.txt','rb')
    list = []
    for line in fp.readlines():
        line = line.strip()
        line = line.decode('utf-8')
        list.append(line)
    fp.close()
    print(list)
readFile()