Last Friday was a dark day for people who are concerned about how their work is being stolen for AI training. Without warning, a couple of major platforms were found to be handing over their users’ content for AI training without explicit consent but probably with some fine-print update of their voluminous Terms and Conditions.
Four years ago, it was noted that actually reading the Terms and Conditions for about a dozen of the popular apps would take you about 250 hours of time. That’s assuming you can understand the legalese baked into so many of these documents, too. The percentage of people who read all of these Terms and Condition is surely a non-zero number, but it’s also unlikely to be much above zero.
Well, if you never “reddit”, it’s time for a refresher because Reddit just decided to show off how much control they have over their user’s content by selling it all off to OpenAI for a pittance. This did a nice job bumping up their stock after an almost 50% increase in ad revenue didn’t give it the boost they’d probably expected. This simultaneously extracted the value of user-generated content and lowered the value of content in general by bundling it into these limited deals even though the work value extracted will be useful in perpetuity.
So, what did they do? They announced the deal on Friday afternoon to bury it but still catch investors’ interest after a banner week for OpenAI. After showing off their Her-level update to ChatGPT, OpenAI was buzzing big time and surely that will be loud enough to prevent Reddit’s army of largely volunteer moderators to try another revolt. The punchline? Reddit already did a similar deal with Google so don’t worry, they’re selling your work to all of Big AI.
That said, don’t beat up Reddit’s little alien too much. Meta and Google don’t even apologize for using their users’ information for training. I mean, Facebook was a bit coy about it, suggesting that they did nothing of the sort before admitting, yes, maybe they were using their user’s content for training…but not those private messages, right? We all trust Mark Zuckerberg to do the right thing, don’t we?
Slacking Off
In other bad news for people over AIs, it was revealed that Slack has been using user data to train their AI tools as well. No disclosures were made, no asking, no discount for using the treasure trove of user content to train their AI tools to try and catch up with OpenAI, Anthropic, and other LLMs out there. Nope, Slack just took what they had.
Was this in the Terms and Conditions all along? Probably not. A few years back, not everyone realized what Google did first, Meta did soon thereafter (presumably), that all of this massive content was going to be useful for training data. But I’m confident they slipped it into one of those updates where you click-click through ‘reading’ the new Terms and Conditions.
Back in 2016 when Facebook let the monsters at Cambridge Analytica steal data from not just some users who took surveys, but all their friends, too, we saw how billions of lines of content could help identify personality details. CA helped manipulate people and turn the tide on some elections that really changed the world. Brexit and the election of a certain game show host pushed the world onto a decidedly different timeline. And what happened to the folks at CA? Facebook? Cambridge had to split up…and reform elsewhere. Facebook? They got a parking ticket from the US government since they promised to, as Sheryl Sandberg used to say ad nauseam, “Do better.” (h/t to Scott Galloway, who keeps pointing this out on the Pivot podcast so we don’t forget).
Perhaps that’s what opened the door for so much AI theft. If we never were going to hold Big Tech accountable for taking peoples’ data and manipulating them based on it, why in the world would Big AI think any of their theft of everyone’s content would get them in trouble?
I’m sure both Reddit and Slack have some squirrelly language in their T&C that allow them to pack up the data from their site and sell it to the highest bidder (or all the bidders), but that doesn’t mean it’s right.
What percentage of Slack and Reddit users do you think actually read those Terms of Service, and then used these systems knowing that they were producing work that would be used to fatten up the companies that own the platform?
In the case of Reddit, at least it was a social media shackle, where the user got some value out of being a Reddit user so their thoughts, private communications, and opinions could be stolen from them so AI tools can be trained to be smarter than people. But Slack? The majority of people on the collaboration app probably joined because of the fact that their company decided to bring it in for improved productivity. I’ve been on dozens of Slacks, loving the app’s useful approach to internal communications for a long time.
Maybe it doesn’t bother some people because they expect it will improve Slack’s AI tools (which cost users plenty of money despite us users providing our content as training data), but I feel betrayed. Sure, I got some value from Slack when I used it on community sites and didn’t pay for the license. But all that content I fed to the work Slacks over the last decade plus? All of those documents, those deep conversations? Those personal details shared with friends? I’ve probably added hundreds of thousands of messages, posts, and more to Slack’s training data. Slack just consumed the extracted value of that information even though we paid for use of the platform.
As it happens, Credtent just bought a year of Slack for our company. Now, I’m wondering if we should migrate to another system because how can we believe in protecting people’s content if we are contributing to the AI theft that Slack is perpetrating? The problem is: What solutions are NOT trying to harvest value out of the data they collect to make money for their corporate parent rather than their users? Is Credtent the only one?
To learn more about how Credtent is helping ensure creators retain control of their work in the Age of AI while also setting standards for credible content, visit us at: credtent.org and sign up for our Waitlist.
CONTENT ORIGIN: This badge signifies that this is Human-Created Content: