fosstodon.org is one of the many independent Mastodon servers you can use to participate in the fediverse.
Fosstodon is an invite only Mastodon instance that is open to those who are interested in technology; particularly free & open source software. If you wish to join, contact us for an invite.

Administered by:

Server stats:

11K
active users

Floppy 💾

"Google Says It'll Scrape Everything You Post Online for AI"

gizmodo.com/google-says-itll-s

"An update to Google's privacy policy suggests that the entire public internet is fair game for it's AI projects."

GizmodoGoogle Says It'll Scrape Everything You Post Online for AIBy Thomas Germain

@floppy
Off topic, yet I can't resist to point it out: the same Gzimodo promotes there an article regarding “Bluesky «record» growth”.

Edit, lol:
> Still, alternative platforms (like the decentralized sites Mastodon and Nostr) have so far failed to pose any real threat to Twitter’s status as the king of microblogging.

gizmodo.com/bluesky-jack-dorse

GizmodoBluesky Sees "Record" Web Traffic After Elon's Latest Dumb Twitter DecisionBy Lucas Ropek

@floppy this obviously poses very interesting #privacy challenges, but another interesting angle is through #ip #law. Could training #AI for business purposes be considered fair use?
Are there any #patent #lawyers in the #fediverse who've commented on this?

@slaeg For a moment I wondered whether some approach would make sense where the Fediverse is still federated, but only accessible via an account.

So in a way closed to the "public", but the private circle it is open to is the whole Fediverse.

The idea being that to read posts on the Fediverse, you must identify with an account (unlike anonymous reading which is possible at the moment). Which then can be blocked, especially if it's a bot or data scraper.

@floppy sooo kinda like what eoln did over on #birdsite, or was that a convenient excuse for a Whole Other Thing? And wouldn't such a solution pose risks for searchability and incentive to join the fediverse?

It's an interesting idea and perhaps ideal for specific instances for the very interested or aware folks, but maybe not for the whole of the #fediverse

Just thinking of laying responsibility for this on the scraper and not the scrapée, if that makes sense

@floppy yeah thanks, I'm just spitballing here. I see variants of this discussion all over, like how mastodon.online/@Bloonface@mas writes about it here blog.bloonface.com/2023/07/04/

Also interesting discussions following that write-up on their page. I mean, I disagree with a lot of it, but interesting nonetheless. I still think #meta poses a bigger threat with the whole experimenting with their algo to break democracy and all, than some rando in another country raging in my replies.

MastodonBloonface (@Bloonface@mastodon.social)795 Posts, 434 Following, 592 Followers · This place sucks. Not around much any more. Might be on BlueSky: https://bsky.app/profile/bloonface.bsky.social Owns some bots: @swearclock @JooblyCrooblins @swearchart @KamuroFriday

@floppy "we'll scrape everything you put online... for free!"

@floppy is there a way to block the scrapers without being delisted from Google searches?

@floppy Time to start pumping out sites full of "content" that is nothing but gibberish, random letters and colors, that all cross link to each other for #SEO juice... and then submit them all to Google's index.

Hmm... I wonder if someone would write a #Yunohost app to easily automate the process.

Better yet, you could then cross link with your own real websites, so if #Google started excluding groups of gibberish sites from their #AI 's, they may well remove your legitimate one, too. 🤔

@floppy
Guess I need to start creating EULA's for the sites I own that grant me whatever I want, and the verbage "Scraping this site through automated means indicates acceptance of this EULA"

@floppy The entire public internet *is* fair game. It's public. If you can read it, you can run software on it - the fact a bad company is doing it doesn't make it a bad act.

@floppy The frustrating thing about this, to me, is that since scraping is now being abused, more desirable uses of it like the @internetarchive will undoubtedly be screwed over in the process.

Not to mention, the entire idea of putting things on the open Web for other human beings to read is now under threat.

@pteryx Very true. I'm also a bit worried about what might happen when openness and freedom get abused for misinformation, impersonation, and generally manipulation. And when the general consensus becomes that we cannot go on like this.

I don't believe that the idea of the open Web would die. But the Web as we know it now might.

At the moment I can imagine that the sacrifice we have to bring is giving up anonymity. Interaction in the "open" space might require some identity.

@pteryx Giving up anonymity does not necessarily mean giving up "pseudonymity". Rather it's enforcing that we might need to provide an identity to not only write stuff on the internet, but also read it.

@floppy That *is* the end of the open Web, because then it's no longer open.

@pteryx
Does openness necessarily imply anonymity?

Don't get me wrong, I'm not happy with developments as described above, but I think what makes the open Web open is not strictly anonymity, but things like easy access to various forms of what's called "content".

My point is that openness is relative to who you're talking to and what conditions they are in. If argued in the other direction, the open Web as it is right now is not open to people without internet-capable device.

(1/3)

@pteryx

If every internet citizen had a Fediverse account and we could only read things on the internet by identifying with our account handles (online identities), we would kinda have the same situation as right now in terms of accessibility of (open) content.

However, of course then it would be easier (for better and worse) to block access to certain resources for specific people or groups of people or bad bots. But how that's being used is up to the people.

(2/3)

@pteryx

If 1) people of the had to identify with an account to _read_ posts and 2) the control of who to block is not in the hands of Big Tech but communities (which is the case), then we might sacrifice strong anonymity for safety (from e.g. data scrapers), but not necessarily openness.

We would not sacrifice "pseudonymity" or persistence of an identity (of our choice!), which is what's required when talking about blocking people or bots.

(3/3)

@pteryx

Disclaimer: I'm not advocating for this, just trying to find a solution to current problems. I'm pondering this idea for a while and need a little longer to better see what other consequences this might bring.

(4/3)