Web Scraping and "invalid multibyte string"
Since I couldn’t reproduce these errors on my machine it appeared to be something relating to their particular machine setup. Looking at their locale provided a clue:
whereas on my machine I have:
The document that they were trying to scrape is encoded in UTF-8, which I see in my locale but not in theirs. Perhaps changing locale will sort out the problem? Since the
en_ZA locale is a bit of an acquired taste (unless you’re South African, in which case it’s de rigueur!), the following should resolve the problem:
It’s possible that command might produce an error stating that it cannot be honoured by your system. Do not fear. Try the following (which seems to work almost universally!):
Try scraping again. Your issues should be resolved.