Fosstodon @fosstodon

Recent searches

Search options

Only available when logged in.

Luke T. Shumaker @lukeshu@fosstodon.org

#Python str.lower() converts 'İ' (U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE)) to 'i̇' (U+0069 + U+0307 (LATIN SMALL LETTER I + COMBINING DOT ABOVE)), while #GoLang strings.ToLower() and #Emacs downcase-word convert it to just 'i' (U+0069 (LATIN SMALL LETTER I)).

And I have a hard time arguing why either is wrong. #Unicode. I sure wish I could consult util.unicode.org, but it's down rn.

Aug 17, 2023, 08:35 PM·

1boost·1favorite

**Luke T. Shumaker** @lukeshu · Aug 17, 2023

Aug 17, 2023

Luke T. Shumaker @lukeshu

So UnicodeData.txt and SpecialCasings.txt disagree. UnicodeData.txt is supposed to leave the field blank if it wants you to defer to SpecialCasings.txt. But UnicodeData.txt says that just U+0069 is correct, while SpecialCasings.txt says that U+0307 should be included.

Now, Go's implementation of strings.ToLower 100% cannot handle case-conversions that change the number of codepoints, so even if we accept that it's correct in this case, it's broken in general.

Drag & drop to upload

Recent searches

Search options

Administered by:

Server stats:

Recent searches

Search options

Administered by:

Server stats:

Back