lone surrogates are strings, contains 16-bit Code Unit(unicode characters) such as \uD914, should be one of below things

  • Leading surrogates: Range between 0XD800 to 0XDBFF
  • Trailing Surrogate: Range between 0XDC00 to 0XDFFF

The string is not wellformed its characters contain lone surrogates. String introduced below two methods to check and convert wellformed strings.

String.prototype.isWellFormed method

Check if the string contains lone surrogates or not.

Returns true, if unicode string is not present.

const str1 = "hello\uD914";
const str2 = "welcome";
console.log(str1.isWellFormed()); // false
console.log(str2.isWellFormed()); // true

Another example on Emoji and leading surrogates

// emoji wellformed utf-16 string
const str = "welcome 😃 ";

console.log(str.isWellFormed()); // true

// Not wellformed string with a lone leading surrogate
const str1 = "user \uD83C";

console.log(illFormed.isWellFormed()); // false

String.prototype.toWellFormed method

This method returns a string by converting unpaired surrogate code points with U+FFFD Replacement characters.

unpaired surrogates are pairs that are leading and trailing surrogates

const str1 = "hello\uD914";
const str2 = "welcome";
console.log(str1.toWellFormed()); // hello�
console.log(str2.toWellFormed()); // welcome

where do we use these methods?

encodeURI methods throws an error if string is not wellformed.

const str = "https://domain.com/query?q=\uD413";

try {
  encodeURI(str);
} catch (e) {
  console.log(e); // URI malformed error
}

To avoid encodeURI errors, Check and convert

const str = "https://domain.com/query?q=\uD413";
// Check wellformed string or not
if (str.isWellFormed()) {
  // Convert and encode
  console.log(encodeURI(str.toWellFormed()))
} else {
    console.log(' string is not wellformed')
}

Supported Browsers

  • Chrome
  • Firefox
  • Safari
  • Edge

In Summary, checking and conversion Wellformed strings helps developers to works with encoding a string process.