textToWords method
- String text
override
Given unsegmented text
, perform text segmentation particular to the
language and return a list of parsed words.
For example, in the case of Japanese, '日本語は難しいです。', this should ideally return a list containing '日本語', 'は', '難しい', 'です', '。'.
In the case of English, 'This is a pen.' should ideally return a list containing 'This', ' ', 'is', ' ', 'a', ' ', 'pen', '.'. Delimiters should stay intact for languages that feature such, such as spaces.
Implementation
@override
List<String> textToWords(String text) {
List<String> splitText = text.splitWithDelim(RegExp(r'[-\n\r\s]+'));
return splitText
.mapIndexed((index, element) {
if (index.isEven && index + 1 < splitText.length) {
return [splitText[index], splitText[index + 1]].join();
} else if (index + 1 == splitText.length) {
return splitText[index];
} else {
return '';
}
})
.where((e) => e.isNotEmpty)
.toList();
}