Follow

I need your help.

I wrote a lengthy and technical blog post in which I describe an idea I have for a distributed system to verify content on websites.

TLDR: servers with cryptographically-signed hashes verify content and notify website owners if said content is changed.

yarmo.eu/blog/dcvs-proposal

Am I missing obvious flaws? Has this problem already been solved?

Please do let me know if this would be worth the effort.

Also

@yarmo

> If a "truth" node is hacked and the stored public key is modified, we have a problem. "Truth" nodes should verify each other as well to make sure no funny business like this happens.

This looks like the biggest hole in your proposal. If one node is authoritative, then how can changes to the public key be propagated?

How does a node become authoritative for site X?

Additionally, what's hashed? The main content? The HTTP response body? The entire HTTP response, headers and all?

@wizzwizz4 ideally, I guess no single node becomes authoritative. Multiple nodes should get the "full package" including public key and hashes.

Regarding what exactly gets hashed: good point. I initially thought a parsed version of the main content (with all dynamic bits stripped out).

The way I thought about it is that it just verifies the content that visitors see, so that the information displayed can be trusted. In that case, HTTP headers do not need hashing IMO

@wizzwizz4 Interesting. Could you enlighten me as to why? Is the HTML content always slightly different?

On a sidenote: I might build a simple peer-to-peer proof-of-concept in the near-future.

@yarmo React sites are one big chunk of JavaScript containing a multi-page website, served on multiple different URLs. The JavaScript then reads the URL to figure out which page to extract from data embedded in the JavaScript program and insert into the DOM.

@wizzwizz4 Hadn't thought about the big JS frameworks. Yeah, ok, I can live with that, this system would only verify static content. If you can't CURL it, can't check it (oversimplified, but you get my point 😅 )

@yarmo See if you can make it work for pages that, e.g., add a "served by webserver at date and time", too, but that shouldn't be a priority over data integrity.

@wizzwizz4 so ideally, you would include a HTML tag to denote the little nuggets of dynamic content on an otherwise static page. The parser removes dynamic tags and only hashes the static content.

@yarmo Perhaps the IndieWeb has a way of marking that up?

@yarmo

> If a "content" node is hacked and new files are uploaded, the "truth" node will not be triggered as it won't handle these files. But at least, the content displayed to visitors remains unchanged.

Unless they manage somehow to trigger a redirect.

---

And then maybe the CI/CD pipeline (if there is one) might also be vulnerable, depending how it is set up (e.g. SaaS).

@yarmo

> Unless they manage somehow to trigger a redirect.

Or pass a link elsewhere.

You might also have a hash of the total content on the site (or a sitemap file) to tackle this.

@humanetech but a redirect would trigger the system as the newly returned content doesn't match the hash, right?

If CI/CD pipeline is compromised........ Yeah, that would cause trouble, the deploy would be perfectly "valid" from the standpoint of the "truth" nodes.

I want to say "if CI/CD is compromised, nothing can help you" but that's too dismissive. In this idea, the crypto key serves as "absolute" authenticity, if hackers have that......... Not sure how to mitigate that scenario

@yarmo It sounds to me a lot like you want DNS. DNS is based on authority and trust rather than being permissionless, but it is what the internet currently relies on to know some data's provenance.

You also might want to look into distributed hash tables though? Just a thought.

@tomosaigon hadn't considered the DNS angle. Though similar, DNS is like you say "data provenance", I'd like something additional that says that the content I am seeing is indeed what the developer/manager/owner wrote.

Distributed hash tables, on my "to research" list, thank you!

@yarmo a quick hack would be to add a DNS record for a sub-domain whose value is a hash of the page's contents (more of a PoC, it would have scaling issues)

@yarmo Interesting. I think if you build this out to its logical place you end up with a "web of trust" type of thing.

Personally I think we should have such a thing, and not just for verifying people's content or identities.

The concept of web-of-trust is underutilized, IMO. We could practically eliminate spam with something like that.

I wrote about a "Web Of OK People" to highlight a different use case for the pattern, but the idea would be similar.

olivierforget.net/blog/2020/we

@teleclimber excellent read, thanks for sharing! I do like that vision but also share your concerns.

I see how my idea interplays with your vision, to build a network of people verifying each other's content, though it would be a very niche network. I'd love to see some form of "web of ok people" come to life for actual "social-like" interaction.

@yarmo @Crocmagnon Nice read, thanks ! A few remarks : wouldn't "truth" node become sorts of CA, with all the centralization/oligopoly problem ? The web-of-trust approach would improve that, but, well, Web-of-Trust never socially scaled worldwide :(. I quite like the DNS(SEC) answer : you could have (DNSSEC-signed) entries in your DNS zone, under a speicific namespace, which would be checked periodically by remote friends (or visitors with a browser plugin) ?

@yarmo @Crocmagnon This would match the threat model where an attacker compromises your web hosting (as long as your DNS is served elsewhere). It's hard to go further than that : as soon as the attacker can compromise your personnal host (or other parts of your infrastructure, such as DNS), you lose. That's the bigger shortcoming I can see:your nice idea can only cope with webhost compromise, even if content is vulnerable to many other threats during its production/transfer/etc.

@yarmo @Crocmagnon As your keybase comparison, keybase is in fact based on trustless model (even if I don't use nor promote it):keybase does not prove anything, it only gives pointers to proofs. This is quite nice and could, probably, be done in distributed. You have a trustless webpage giving pointers to crypto-proofs (juste hashes, not sigs) published on Twitter/Fediverse/etc. , linking to your other accounts (buckets, flattr, etc.). "Just" needs some standardized declaration and software...

Sign in to participate in the conversation
Fosstodon

Fosstodon is an English speaking Mastodon instance that is open to anyone who is interested in technology; particularly free & open source software.