


该建议已在V8中实现,并且没有标记可以在8.7版本中使用(更确切地说,在版本中8.7.38以及更高版本),因此可以在Google Chrome Canary(从版本开始87.0.4252.0)或Node.js V8 Canary(从版本开始v15.0.0-v8-canary202009025a2ca762b8;二进制文件可用于Windows)中进行测试 v15.0.0-v8-canary202009173b56586162)。




在Richard Gibson的支持下,提案处于第3阶段。




, UAX 29. , JavaScript .

Chrome API Intl.v8BreakIterator. API . API, API JavaScript — , ES2015.

, segment(), Intl.Segmenter, Iterable.

//      .
let segmenter = new Intl.Segmenter("fr", {granularity: "word"});

//       .
let input = "Moi?  N'est-ce pas.";
let segments = segmenter.segment(input);

//    !
for (let {segment, index, isWordLike} of segments) {
  console.log("segment at code units [%d, %d): «%s»%s",
    index, index + segment.length,
    isWordLike ? " (word-like)" : ""

//  console.log:
// segment at code units [0, 3): «Moi» (word-like)
// segment at code units [3, 4): «?»
// segment at code units [4, 6): «  »
// segment at code units [6, 11): «N'est» (word-like)
// segment at code units [11, 12): «-»
// segment at code units [12, 14): «ce» (word-like)
// segment at code units [14, 15): « »
// segment at code units [15, 18): «pas» (word-like)
// segment at code units [18, 19): «.»

, API .

// ┃0 1 2 3 4 5┃6┃7┃8┃9
// ┃A l l o n s┃-┃y┃!┃
let input = "Allons-y!";

let segmenter = new Intl.Segmenter("fr", {granularity: "word"});
let segments = segmenter.segment(input);
let current = undefined;

current = segments.containing(0)
// → { index: 0, segment: "Allons", isWordLike: true }

current = segments.containing(5)
// → { index: 0, segment: "Allons", isWordLike: true }

current = segments.containing(6)
// → { index: 6, segment: "-", isWordLike: false }

current = segments.containing(current.index + current.segment.length)
// → { index: 7, segment: "y", isWordLike: true }

current = segments.containing(current.index + current.segment.length)
// → { index: 8, segment: "!", isWordLike: false }

current = segments.containing(current.index + current.segment.length)
// → undefined



new Intl.Segmenter(locale, options)


options , granularity, ("grapheme" ( ), "word" ( ) "sentence" ( ); — "grapheme").


%Segments% Iterable .


  • segment — .
  • index — (code unit index) , .
  • input — .
  • isWordLiketrue, "word" ( ) ( /// ..); false, "word" ( // ..); undefined, "word".



, , (code unit) , undefined, .


%SegmentIterator%, "" (lazy, ) , .


next() Iterator, IteratorResult, value , .


? ?

— , . . . CLDR. , CLDR/ICU , .


, 3- , . TC39 . ; , , .


API, , API : , API (, ). API CSS Houdini.



  • .
  • .
  • , (.. Web API (Web Platform), ECMAScript).
  • , . CLDR ICU . CSS, . . , , , ; .


%SegmentIterator%.prototype, (, seek([inclusiveStartIndex = thisIterator.index + 1]) seekBefore([exclusiveLastIndex = thisIterator.index]), . ECMA-262 ( ). , , .

API Intl, String?

, . segment() SegmentIterator. , API Intl, ECMA-402. , . String, , .


n (code unit), . , "Hello, world\u{1F499}" ( , - — ), 0, 5, 6, 7 12. : ┃Hello┃,┃ ┃world┃\u{1F499}┃, (code units), (code point). , .


, next().

, ?

, - QA ;)

Number: null 0, — 0 1, , , Symbol BigInt, undefined NaN *. , ( , ).

* . "fail". Chrome Canary, Symbol BigInt TypeError, undefined NaN , 0.


