当用pandas库读取.csv文件时,出现如下报错:
My Code:
impor tpandas as pd
df=pd.read_csv('C:\\Users\\登亮\\Desktop\\test.csv',encoding='utf-8')
Error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd3 in position 0: invalid continuation byte
Reason:
In binary, 0xE9 looks like1110 1001. If you read about UTF-8 on Wikipedia, you’ll see that such a byte must be followed by two of the form 10xx xxxx. So, for example
>>>b'\xe9\x80\x80'.decode('utf-8')u'\u9000'
But that’s just the mechanical cause of the exception. In this case, you have a string that is almost certainly encoded in latin 1. You can see how UTF-8 and latin 1 look different:
>>>u'\xe9'.encode('utf-8')b'\xc3\xa9'>>>u'\xe9'.encode('latin-1')b'\xe9'
(Note, I'm using a mix of Python 2 and 3 representation here. The input is valid in any version of Python, but your Python interpreter is unlikely to actually show both unicode and byte strings in this way.)
Solution:
Ttry calling read_csv withen coding='latin1',encoding='iso-8859-1'orencoding='cp1252'; these the various encodings found on Windows.