"Google Says It'll Scrape Everything You Post Online for AI"
https://gizmodo.com/google-says-itll-scrape-everything-you-post-online-for-1850601486
"An update to Google's privacy policy suggests that the entire public internet is fair game for it's AI projects."
@floppy
Off topic, yet I can't resist to point it out: the same Gzimodo promotes there an article regarding “Bluesky «record» growth”.
Edit, lol:
> Still, alternative platforms (like the decentralized sites Mastodon and Nostr) have so far failed to pose any real threat to Twitter’s status as the king of microblogging.
https://gizmodo.com/bluesky-jack-dorsey-record-web-traffic-twitter-elon-mus-1850602976
20 years of gmail
@slaeg For a moment I wondered whether some approach would make sense where the Fediverse is still federated, but only accessible via an account.
So in a way closed to the "public", but the private circle it is open to is the whole Fediverse.
The idea being that to read posts on the Fediverse, you must identify with an account (unlike anonymous reading which is possible at the moment). Which then can be blocked, especially if it's a bot or data scraper.
@floppy sooo kinda like what eoln did over on #birdsite, or was that a convenient excuse for a Whole Other Thing? And wouldn't such a solution pose risks for searchability and incentive to join the fediverse?
It's an interesting idea and perhaps ideal for specific instances for the very interested or aware folks, but maybe not for the whole of the #fediverse
Just thinking of laying responsibility for this on the scraper and not the scrapée, if that makes sense
@slaeg Well put!
@floppy yeah thanks, I'm just spitballing here. I see variants of this discussion all over, like how https://mastodon.online/@Bloonface@mastodon.social writes about it here https://blog.bloonface.com/2023/07/04/the-fediverse-is-a-privacy-nightmare/
Also interesting discussions following that write-up on their page. I mean, I disagree with a lot of it, but interesting nonetheless. I still think #meta poses a bigger threat with the whole experimenting with their algo to break democracy and all, than some rando in another country raging in my replies.
@floppy "we'll scrape everything you put online... for free!"
@floppy is there a way to block the scrapers without being delisted from Google searches?
@floppy Time to start pumping out sites full of "content" that is nothing but gibberish, random letters and colors, that all cross link to each other for #SEO juice... and then submit them all to Google's index.
Hmm... I wonder if someone would write a #Yunohost app to easily automate the process.
Better yet, you could then cross link with your own real websites, so if #Google started excluding groups of gibberish sites from their #AI 's, they may well remove your legitimate one, too.
@floppy
Guess I need to start creating EULA's for the sites I own that grant me whatever I want, and the verbage "Scraping this site through automated means indicates acceptance of this EULA"
@floppy The entire public internet *is* fair game. It's public. If you can read it, you can run software on it - the fact a bad company is doing it doesn't make it a bad act.
@floppy The frustrating thing about this, to me, is that since scraping is now being abused, more desirable uses of it like the @internetarchive will undoubtedly be screwed over in the process.
Not to mention, the entire idea of putting things on the open Web for other human beings to read is now under threat.
@pteryx Very true. I'm also a bit worried about what might happen when openness and freedom get abused for misinformation, impersonation, and generally manipulation. And when the general consensus becomes that we cannot go on like this.
I don't believe that the idea of the open Web would die. But the Web as we know it now might.
At the moment I can imagine that the sacrifice we have to bring is giving up anonymity. Interaction in the "open" space might require some identity.
@pteryx Giving up anonymity does not necessarily mean giving up "pseudonymity". Rather it's enforcing that we might need to provide an identity to not only write stuff on the internet, but also read it.
@floppy That *is* the end of the open Web, because then it's no longer open.
@pteryx
Does openness necessarily imply anonymity?
Don't get me wrong, I'm not happy with developments as described above, but I think what makes the open Web open is not strictly anonymity, but things like easy access to various forms of what's called "content".
My point is that openness is relative to who you're talking to and what conditions they are in. If argued in the other direction, the open Web as it is right now is not open to people without internet-capable device.
(1/3)
If every internet citizen had a Fediverse account and we could only read things on the internet by identifying with our account handles (online identities), we would kinda have the same situation as right now in terms of accessibility of (open) content.
However, of course then it would be easier (for better and worse) to block access to certain resources for specific people or groups of people or bad bots. But how that's being used is up to the people.
(2/3)
If 1) people of the #Fediverse had to identify with an account to _read_ posts and 2) the control of who to block is not in the hands of Big Tech but communities (which is the case), then we might sacrifice strong anonymity for safety (from e.g. data scrapers), but not necessarily openness.
We would not sacrifice "pseudonymity" or persistence of an identity (of our choice!), which is what's required when talking about blocking people or bots.
(3/3)
Disclaimer: I'm not advocating for this, just trying to find a solution to current problems. I'm pondering this idea for a while and need a little longer to better see what other consequences this might bring.
(4/3)