>>> html = html.decode('utf-8') Traceback (most recent call last): File "", line 1, in UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb8 in position 38395: invalid start byte