Training AI With Our Data is Stalkerish
#NEWSLETTER | The savvy lot reading this will wonder if the data used for digital marketing purposes is really so different than the data culled to train AI? Well, a million times I say: YES!
I’m going to risk repeating my “you are a data farm” message again because it’s the most important thing to consider right now as AI development accelerates. And while, yes, we are all used to being sucked dry of our data for marketing purposes, we shouldn’t necessarily accept the same to support a company’s mission 🤑 of *training* AI.
A Different Kettle of Robots…
Why? Well, significantly more data is required to train AI (and more *diverse* data too) meaning a far greater number of legit companies seeking to coerce info from us; bad actors working harder to steal it; and no clarity on where our data will ultimately end up (literally and figuratively).
But you don’t need to go off the grid to address (I mean, you could)… just our collective awareness alone of the issue will make it harder for companies to joyride with our bytes. And frankly no one is hiding the truth either… you just need to know where to look…
Case-in-point, one of my favorite truth-telling tomes is from Goldman Sachs (bankers can be oddly earnest about their capitalistic designs on us all). A nugget from the December 2023 report:
As we reach the limits of publicly available data, private data will likely grow in importance. While proprietary data comes with additional concerns, including around privacy and licensing, companies will be incentivized to find solutions to increase their data pools. We anticipate this will open new commercial pathways in creating systems to buy and sell access to trusted data…
…We anticipate a new model emerging for “data rich, revenue poor” platforms.
And just like that Reddit got its IPO wings🧚 … seems Goldman Sachs was prescient in the foretelling of “data rich, revenue poor” models finding new economic AI life (and gosh what a coincidence that they were tapped to help manage the IPO). Reddit explains: ”…in January it entered into data licensing arrangements with an aggregate contract value of $203 million and terms ranging from two to three years.”
So Here is What You Need to Know Now…
Internet derived data used to train generative AI (chatbots, text-to-image, etc.) will run out. Soon. Researchers estimate that “language data” from the web will be depleted by 2026. Images have a longer shelf life. So the pressure is on to “generate” more.
First up… your social media content. You must have known social media platforms were never *free* right? And, like, Elon Musk was also never “saving free speech” by purchasing X (Twitter)? In fact, X always offered perfectly bite-sized data sets converging current events, opinion, and emotion — excellent for training AI. And you know those *innocent* photo trends like “me in high school vs. today”? well, they are handy for training computer vision technology on age progression (have been for a long while).
Speaking of innocence… those cameras at the grocery store checkout? You do know that they aren’t really for preventing theft, right? They are, in fact, far more useful for facial recognition software training. In NYC retailers are required to put up a “biometric data notice” when using video content too… but as the New York Times recently reported … retailers don’t seem to have a clue.
Well Amazon has a clue… it sells “pre-trained” facial recognition software as part of its enterprise business. Called Amazon Rekognition, the company boasts: “Amazon Rekognition offers pre-trained and customizable computer vision (CV) capabilities to extract information and insights from your images and videos.” One word for you (well, two-ish) “pre-trained.” You should ask yourself, well, how did they get those handy training images?
We know how China gets them… communism with a little espionage on the side. “China Is the World’s Biggest Face Recognition Dealer,” says Wired. And NPR adds: “When it comes to the dangers of AI, surveillance poses more risk than anything.” And as I highlighted in a recent reel on Instagram, any concerns we have about home-grown surveillance are compounded by China’s thirst to diversify its coffers with Western video and image data. So, yes, especially in dense cities, we are being watched.
What to Do?
We can’t underestimate the simple power of awareness and critical inquiry. I’m excited about the benefits of what AI can bring to our collective future. But I’m far less pumped by how shamelessly all entities (private and public) will seek to extract data from us in the years ahead.
But if you know the value of your data you can stop giving it freely. So together let’s just be a bit tougher, savvier and make it far less easy to turn us into data farms.