fosstodon.org is one of the many independent Mastodon servers you can use to participate in the fediverse.
Fosstodon is an invite only Mastodon instance that is open to those who are interested in technology; particularly free & open source software. If you wish to join, contact us for an invite.

Administered by:

Server stats:

11K
active users

Luke T. Shumaker

_PyUnicode_IsLowerCase(ch) claims to return whether ch has General_Category=Ll (Lowercase_Letter), but it returns true for U+037A, which has General_Category=Lm (Modifier_Letter)??? It's a member of Lm for 13.0, 14.0, and 15.0, so that's not the source of my confusion.

I think Python is actually checking the "isLowercase" property (defined in Unicode section 3.13), rather than checking the "General_Category" property. Which, like, good; that's better behavior. But that's not what the docs/comments say it's doing.

No, that's not quite right either. I'm going to have to read+understand makeunicodedata.py, aren't I?

Ah, it's using the "Lowercase" property.

Let that sink in.

has a Boolean "isLowercase" property and a Boolean "Lowercase" property, AND THEY'RE NOT THE SAME.

(For example, U+00AA "ª" has isUppercase=Yes but Uppercase=no)

The "isXXX" variants are "is this as XXX as it can be?"

ª (U+00AA) is clearly lowercase, so it has Uppercase=No. But there's no uppercase equivalent of it, so it has isUppercase=Yes, because it can't get any more uppercase.

.