Detecting URLs in text is great fun.
When should characters that are allowed in URLs not be considered part of it? Cases I've dealt with today: dot/period, question mark, closing parenthesis. Examples…
@cketti yeah, I honestly think it's a little bit of a bug, but have been too lazy to open an issue about it
@cketti There's the option of testing resolution and alerting on non-2XX HTTP responses.
I have already seen undesired effects from software that auto-visits links (for preview-generation, malware-detection, whatever) and one-time usable links from registration mails... :(
Is there a HTTP Method which only tests URL validity (and reliably so with all common webservers)?
@cketti So you're parsing received URIs?
I was thinking of an authoring tool, which might be more reliably / less destructively in general be able to test nondestructive URIs for validity as part of the creation workflow.
Once someone's sent you some string ... the problem is much harder. There is no plain-text HTML standard. Even in Markdown, which accepts naked URIs, I tend to wrap them in angle brackets: <URI> That removes ambiguity. And ... the angle bracket is invalid in HTML (it's a tag delimiter).
You're punting at best. Though there's ample prior art.
Fosstodon is an English speaking Mastodon instance that is open to anyone who is interested in technology; particularly free & open source software.