900 characters were written in a tweet that normally allows only 140

Nov 16, 2011 14:58 GMT  ·  By

Twitter customers were shocked when they saw that someone managed to write over 900 characters in a tweet when everyone knows that the maximum limit is 140.

According to a post on StackExchange, the strange message consist only of backslashes and a bunch of numbers, accompanied by a clear message that states in Russian (or a related language) “Twitty and do not limit people !!!!!! 140 no limit!”

The first explanation that came was that the message actually contains “Unicode surrogate code points that are improperly encoded as UTF-8.”

The CESU-8 encoding utilized in the tweet is accepted by some Twitter interface,s but for display purposes the social network expects valid UTF-8 sequences. Each surrogate code point ends up being displayed with 12 characters since 3 bytes on each of these sequences are displayed as “3 C-style octal escape sequences of 4 characters each.”

“For example \355\240\265\355\263\220 when decoded as C-escaped UTF-8, without rejecting surrogates as would normally be done when decoding UTF-8, decodes to the surrogate pair U+D835 U+DCD0.

“Treating this surrogate pair as UTF-16, as would be done when decoding CESU-8, produces the Unicode character U+1D4D0 MATHEMATICAL BOLD SCRIPT CAPITAL A,” said a user nicknamed mark4o.

When the above sequence is decoded the \355\240\265\355\263\220 string actually spells ALMATY, which represents the name of the former capital of Kazakhstan, the largest city in the country.

A user called Ladadadada from the UK claims that this is possible since each group of characters that begins with a backslash represents an escape sequence, which is regarded as a valid character constant.

That means that each of these sequences is viewed as a single character, but Twitter actually displays them as four.

“Some of the escape sequences available are 'control characters'. These tell the computer to do something such as playing an alert sound or moving the cursor left or right or up or down or deleting the character to the left of the cursor. Although none of them are the last one I mentioned (deleting the previous character), he might have used that character to confuse Twitter as well,” Ladadadada says.