Digital communities (like #Fediverse instances) - where discourse about new knowledge happens - need to think seriously about their stance on conversation data being used for training large language models.
A few possibilities:
a) Ban it altogether. This will require closing down our communities and implementing proof-of-humanness, reputation scores etc.
b) Exclusively license the data to #AI companies and use that to pay for server costs.
c) Change nothing. Allow anyone to use the data.
To be clear, I don't think we are in a position of strength on this. Most open-source communities are on Discord or Slack. GitHub is already owned by Microsoft and will try to become even more of a walled garden, now that conversation data has become highly valuable.
But at least administration/moderation/hosting costs could get funded this way.
B is the best option.
@nilesh i'm for C but wont be mad if B is done with consent