UnicodeDecodeError, invalid continuation byte

当用pandas库读取.csv文件时,出现如下报错:
My Code:

impor tpandas as pd
df=pd.read_csv('C:\\Users\\登亮\\Desktop\\test.csv',encoding='utf-8')

Error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd3 in position 0: invalid continuation byte

Reason:
In binary, 0xE9 looks like1110 1001. If you read about UTF-8 on Wikipedia, you’ll see that such a byte must be followed by two of the form 10xx xxxx. So, for example

>>>b'\xe9\x80\x80'.decode('utf-8')u'\u9000'

But that’s just the mechanical cause of the exception. In this case, you have a string that is almost certainly encoded in latin 1. You can see how UTF-8 and latin 1 look different:

>>>u'\xe9'.encode('utf-8')b'\xc3\xa9'>>>u'\xe9'.encode('latin-1')b'\xe9'

(Note, I'm using a mix of Python 2 and 3 representation here. The input is valid in any version of Python, but your Python interpreter is unlikely to actually show both unicode and byte strings in this way.)
Solution:
Ttry calling read_csv withen coding='latin1',encoding='iso-8859-1'orencoding='cp1252'; these the various encodings found on Windows.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • PLEASE READ THE FOLLOWING APPLE DEVELOPER PROGRAM LICENSE...
    念念不忘的阅读 13,564评论 5 6
  • 之前不知怎的 经常有写不完的繁文 可能是图片很美 也可能是思念故友 如今却盯着手机 9格键盘24个拼音 依然打不出...
    萝琐阅读 300评论 0 0
  • 当 树发现了根的骗局 蒲公英在自由落体 梦境 在五月的清晨 丢失了失重感 当 树找不到归途 种子在鸟儿的肚子里 远...
    云翳阅读 3,767评论 0 4
  • 从小到大我一直是个固守成规的人,说白了,老实人一个,老师让干嘛干嘛,说往东绝不敢往西的那种人。 小学的时候,压根...
    曲奇_52阅读 265评论 0 0