Fosstodon @fosstodon

Recent searches

Search options

Only available when logged in.

Luke T. Shumaker @lukeshu@fosstodon.org

#Python _PyUnicode_IsLowerCase(ch) claims to return whether ch has General_Category=Ll (Lowercase_Letter), but it returns true for U+037A, which has General_Category=Lm (Modifier_Letter)??? It's a member of Lm for #Unicode 13.0, 14.0, and 15.0, so that's not the source of my confusion.

Jun 21, 2023, 04:49 AM·

0boosts·0favorites

**Luke T. Shumaker** @lukeshu · Jun 21, 2023

Jun 21, 2023

Luke T. Shumaker @lukeshu

I think Python is actually checking the "isLowercase" property (defined in Unicode section 3.13), rather than checking the "General_Category" property. Which, like, good; that's better behavior. But that's not what the docs/comments say it's doing.

**Luke T. Shumaker** @lukeshu · Jun 21, 2023

Jun 21, 2023

Luke T. Shumaker @lukeshu

No, that's not quite right either. I'm going to have to read+understand makeunicodedata.py, aren't I?

**Luke T. Shumaker** @lukeshu · Jun 21, 2023 *

Jun 21, 2023 *

Luke T. Shumaker @lukeshu

Ah, it's using the "Lowercase" property.

Let that sink in.

#Unicode has a Boolean "isLowercase" property and a Boolean "Lowercase" property, AND THEY'RE NOT THE SAME.

(For example, U+00AA "ª" has isUppercase=Yes but Uppercase=no)

**Luke T. Shumaker** @lukeshu · Jun 21, 2023 *

Jun 21, 2023 *

Luke T. Shumaker @lukeshu

The "isXXX" variants are "is this as XXX as it can be?"

ª (U+00AA) is clearly lowercase, so it has Uppercase=No. But there's no uppercase equivalent of it, so it has isUppercase=Yes, because it can't get any more uppercase.

#unicode.

Drag & drop to upload

Recent searches

Search options

Administered by:

Server stats:

Recent searches

Search options

Administered by:

Server stats:

Back