fosstodon.org is one of the many independent Mastodon servers you can use to participate in the fediverse.
Fosstodon is an invite only Mastodon instance that is open to those who are interested in technology; particularly free & open source software. If you wish to join, contact us for an invite.

Administered by:

Server stats:

10K
active users

#utf8

0 posts0 participants0 posts today

#Unicode is one of those little things in life that I can't help but smile about.

Is it perfect? No, of course not. Is it better than the alternative? Yes, so much so that every time I'm confronted with a long list of character encodings I can choose from, I feel a sense of relief when I find #UTF8 among them.

I wouldn't have thought it possible to standardize a single character encoding for everyone, and yet, somehow, there is just such a standard.

Imutin kaikki #Facebook'in julkaisuni – ainakin jos #Meta'a uskotaan. Pyysin #JSON-muodossa toivossa, että tulisi sutjakammin. Hieman ongelmia aiheutti JSONin koodaus: merkkijonot ovat validia #UTF8:aa mutta JSON ilmeisesti olettaa #UTF16:n, joten vaaditaan mukamuunnos eestaas; apua löytyi #StackOverflow’sta. Aikaleimat sentään olivat standardi-#POSIX’ia.

En tiedä, kuinka täydellinen ”arkisto” on, mutta ainakin jotakin saisi talteen, kun lähtee lätkimään. #some #atkjuttuja

Hey everyone. I must admit, I don't believe I have ever seen someone enter #utf8 #unicode characters on a #computer in a natural way. Which seems weird, because a bunch of languages use them.

I wrote a #commonLisp #asdf package that just looks up a list of symbols in a file that has every non-surrogate unicode codepoint in it, and an #emacs #elisp function that just calls the #lisp one.

codeberg.org/tfw/unicode-chars

Multilingual people, what can you tell me about doing this at all?

There are three #UTF8 characters (2 symbols, 1 emoji) for old-stye telephones: ☎, ☏, ☎️.

1. On some systems "black telephone" renders as the "red telephone" emoji instead of as the expected winding/dingbat-style character I was expecting. This seems system- and application-specific.

2. Depending on font, the "white telephone" may not match the height or midline of other letters or even ☎/☎️. The "black flag" ⚑ is often even worse.

Not huge deals, but created some formatting issues today.

[Перевод] Кодирование UTF-8 без ветвления

Можно ли кодировать UTF-8 без ветвлений? Да . Вопрос Натан Голдбаум задал в чате Recurse вопрос: Я знаю, как декодировать UTF-8 с помощью битовой математики и таблиц поиска (см. github.com/skeeto/branchless-u ), но если я хочу преобразовать кодовую точку UTF-8, то можно ли сделать ли это без ветвлений? Для начала, можно ли как-то написать эту функцию на C, которая возвращает количество байтов, необходимых для хранения байтов UTF-8 кодовой точки, без использования ветвления? Или для этого потребуется огромная таблица поиска?

habr.com/ru/companies/mkb/arti

GitHubGitHub - skeeto/branchless-utf8: Branchless UTF-8 decoderBranchless UTF-8 decoder. Contribute to skeeto/branchless-utf8 development by creating an account on GitHub.