Emojis In Your Web App!

Emojis have made ways into our lives and are used everywhere. Today, there is an emoji for almost everything. Web Apps have adjusted themselves to make room for emojis. Let's take a deeper dive at emojis.

While working on a project Emoji Interpreter, it takes an emoji as an input and tell you what it means. I came across something fascinating while playing with javascript.

Guess the output for the following code:

const emojiString = "😎";
console.log(emojiString.length);

If you guessed the answer is 1, I have two things to tell you. First, we think the same! and second, we both were wrong. The correct answer is 2. Let's explore more.

Consider the following code

const emojiString = "😎";
console.log(str.split(""));
/*OUTPUT
["\ud83d", "\ude02"]
*/

In Javascript, a string is a sequence of 16-bit code points. Since emoji are encoded above the BMP, it means that they are represented by a pair of code points, also known as a surrogate pair.

Surrogate pair is a representation for a single abstract character that consists of a sequence of code units of two 16-bit code units, where the first value of the pair is a high-surrogate code unit and the second value is a low-surrogate code unit.

The first pair is called the lead surrogate, and the latter the tail surrogate. \ud83d is lead surrogate, \ude02 is tail surrogate.

Using the surrogate individually leads to nothing (�):-

console.log("\ud83d");  // => �
console.log("\ude02");  //=> �

Using the surrogates together will get you an entity here, an emoji!

console.log("\ud83d\ude02");  // => 😎

We can't switch the order of surrogates.

console.log("\ude02\ud83d");  // => ��

To summarize, string is stored using 16-bit code points, and an emoji takes more than 16-bit code, so in unicode emoji is stores using two 16-bit codes. On dividing it into two parts, first one is called Lead surrogate and second one Tail surrogate. Together they make up an emoji 😎