When Bit.ly links break
short URL problem—helpdesk brickwall—then debugging

The problem

This info may be useful if you have ever had mysterious characters, letters or numbers appearing in your URLs, or if you have used a link shortener and found spurious characters and numbers in your long links. First I describe the response of the helpdesk (no help at all) but if you just want the diagnosis go to the bottom of the page.

This is the short URL info page for one of my links on Bit.ly. (It is one of several Bit.ly links that don’t work.)

Bit.ly info page for one of the short links that doesn't work

This Bit.ly short URL (clicked right from that page above) takes you to this address. Note: this is not the right page—something has inserted %E2%80%8B into the URL to make a filename that has never existed.

Target long link reached from the Bit.ly info page

The Bitly helpdesk

Bit.ly claim it is impossible for any of their short URLs to be changed, ever, and one would hope their short URLs would at least take you to the right address. Here is my email conversation with the Bit.ly helpdesk (I have put them in sequence for readability).

Emails to Bit.ly about broken links

So Bit.ly confirmed the fault but “could not replicate” it. I followed up:

Emails to Bit.l about broken links

Things went a bit quiet.

Emails to Bit.l about broken links

I guess, as the first ticket #403 had been “deemed resolved”, that was that. So I opened another one #463. This time I used the Bit.ly helpdesk site (which requires a new login).

Bit.l helpdesk about broken links

Agent Kristine answered promptly, you can’t fault her reaction time.

Bit.l helpdesk about broken links

The solution

So that’s 48 hours and no answer. During that time I dug deeper and double checked. My site hasn’t changed recently and shows no errors of any kind.

A Google search for %E2%80%8B found several results. People do report this occurring in URLs and the most likely reason seems to be it is the UTF-8 code U+200B for Zero Width Space. Opera, for example, encodes an invisible HTML Word Break tag WBR as a Zero Width Space.

The browser I generally use is Safari and I tend to copy my Bit.ly long links out of the address bar. The editor I use is BBEdit and I might sometimes copy links from there. Although the editor will allow WBR tags or Zero Width Spaces to be entered I have never seen or used either.

The address bar (URL) must contain only ASCII characters 0-9a-zA-Z and $-_.+!*'(), and if there are any other characters they will be encoded as %nn (a space is %20). These coded ASCII characters represent their normal ASCII character, so %20 would mean a space in a filename for example.

Now, copying my long links straight out of the Bit.ly page and into the address bar does generate the spurious %E2%80%8B if I send them. So they look OK but they contain a bad character which is revealed when they are used as a URL. My UTF-8 HTML editor also shows the bad character if I paste them in there:

http://www.bemuso.com/bemuso/a¿boutbemuso.html and

http://www.bemuso.com/bemuso/a¿utobiography.html for example.

If anyone can tell me how an invisible word break might have got into those URLs I’d be very interested.

I’m a bit stumped now since I can’t trust the links I see in Bit.ly and I don’t know where that extra bad character has come from. I have never used the Word Break tag or the Zero Width Space. The best work round I can think of is to switch to the Goo.gl URL shortener which does at least show any invisible characters in a long link on the URL info page.

Goo.gl info page showing invisible characters

I have mentioned this on Twitter and Facebook in the hope of getting more information. Why isn’t this problem in the Bit.ly helpdesk script, and why doesn't their URL info page reveal invisible characters?

But most of all why don’t link shorteners screen out non-ASCII characters? Browsers must insert some %nn coded characters to make valid URLs for filenames that contain spaces, for example, but invisible characters are something else.

go to the top of this page  go to the main menu for this page  go to the home page


© Rob Cumberland 2002–2013, all rights reserved • This is a UK web site • About Bemuso