Detecting URLs in text is great fun.

When should characters that are allowed in URLs not be considered part of it? Cases I've dealt with today: dot/period, question mark, closing parenthesis. Examples…

Looks like the Mastodon web UI could do better, too. Especially when URLs are wrapped in angle brackets there's no reason to end them early.

@cketti It's actually doing better than many other things I've seen though... 😕

@brandon Apologies. I didn't know replies go to the public timeline by default.

@cketti yeah, I honestly think it's a little bit of a bug, but have been too lazy to open an issue about it

@abloo Apparently people like wrapping URLs ( in parentheses when including them in text 😃 However, most of the time the closing parenthesis shouldn't be considered part of the URL (although it is an allowed character).

@cketti There's the option of testing resolution and alerting on non-2XX HTTP responses.

Empiricism FTFW.

@dredmorbius @cketti
I have already seen undesired effects from software that auto-visits links (for preview-generation, malware-detection, whatever) and one-time usable links from registration mails... :(

Is there a HTTP Method which only tests URL validity (and reliably so with all common webservers)?

@INCO Fair enough. Though those shouldn't be getting placed in a "validate-this-URL" context.

Should state-change require POST?


@INCO Also: what kinds of undesired effects? Examples, more fully fleshed out?


@dredmorbius @INCO It very much depends on the context. I'm working on an email client. There, auto-visiting links has privacy implications (a sender could learn when you've downloaded or opened their email). Also, it could have side effects like unsubscribing from notification emails etc.

@cketti So you're parsing received URIs?

I was thinking of an authoring tool, which might be more reliably / less destructively in general be able to test nondestructive URIs for validity as part of the creation workflow.

Once someone's sent you some string ... the problem is much harder. There is no plain-text HTML standard. Even in Markdown, which accepts naked URIs, I tend to wrap them in angle brackets: <URI> That removes ambiguity. And ... the angle bracket is invalid in HTML (it's a tag delimiter).

You're punting at best. Though there's ample prior art.


@dredmorbius @cketti on an email client? Are you nuts?

that would be all kinds of bad, considering that you need to assume hostile input
Sign in to participate in the conversation

Fosstodon is an English speaking Mastodon instance that is open to anyone who is interested in technology; particularly free & open source software.