Discover more from TIKI
Monetizing Zero-Party Data for Training AI (Using Snowflake)
TIKI's Snowflake integration is a simple, time-efficient way to create a new revenue stream
Quick summary: AI is booming. AI is expensive. Training data is especially expensive. It’s also unreliable and a major obstacle for businesses to overcome. Zero-party data makes great training data. Companies will pay more for high-quality data. We can help create data-driven reward and loyalty programs to trade perks and benefits for customer data. Data can be sold on Snowflake’s marketplace for a brand new revenue stream. Oh yeah, and the global spend on AI is on pace to pass $300 billion in 2026. Nice!
Last week, we featured the TIKI SDK’s capability to assign data licenses to music for the purpose of training generative AI models. Prior to that, we featured the creation of a data-driven reward program for fintech that offers cash compensation to users for trading data, which can then be re-listed on the Snowflake marketplace. (Major shoutout to the folks at Snowflake for partnering with us to make this possible!)
What happens when these are combined? Using TIKI’s Snowflake Integration:
- Resolve row-level license requirements by licensing user data, and re-listing and re-licensing data downstream on the Snowflake marketplace.
-Acquire zero-party data directly from users by offering compensation, incentive-based programs users love
- Create, index, and search programmatic, immutable, digitally-signed data licenses at massive scale (millions, billions, trillions…) with ease and
- Start creating valuable zero-party datasets for internal use and re-sale revenue streams.
In this example, we’ll look at the creation of zero-party datasets for the purpose of downstream re-licensing and monetization that AI companies can purchase via Snowflake to source training data for their AI/ML models. Quite the topical example, we think, but first, why does zero-party data matter so much?
The Growing Importance of Zero-Party Data
Zero-party data (ZPD) may be the equivalent to gold when it comes to understanding customers.
ZPD is data proactively and intentionally shared by a customer to a brand. This definition alone implies a few things.
First, the data is extremely reliable. It is sourced directly from a brand customers who willingly provide it to a brand. Second, it means the there is enough trust and loyalty in the B2C relationship for the consumer to want to provide the data in the first place.
For data buyers, a zero-party dataset is immensely valuable. For AI companies specifically, it could be their saving grace. Given the muddied waters when it comes to the legality of using scraped web data to train AI models, a solution that meets requirements of national and global privacy initiatives is imperative. In 2023, you wouldn’t build a product similar to Napster. You’d build a product similar to Spotify. Because Napster got their asses sued…a lot. And then they died. RIP Napster.
So if you’re an AI company, or a company looking to implement a new AI product, and you want to not get Napstered, you probably don’t want to have your entire product consisting of illegally sourced copyrighted material and intellectual property.
AI Reaches New Heights
The year is 2022. Rackspace Technology has just polled 1,420 IT professionals around the globe working in financial services, manufacturing, retail, hospitality, government, and healthcare. And what did they find?
69% of the surveyed ranked AI/ML as a high priority within their organizations, a 15% increase from 2020. Only 5% of the surveyed considered it a low priority, meaning in a two year span, the 15% increase was from considering AI/ML initiative somewhat of a priority to a high priority.
What’s more fascinating? Close to half (47%) of AI/ML users have begun their initiatives in the past two years and 78% of the surveyed had moved past the ideation stage into execution, with 40% actively using AI right now.
Safe to say, AI/ML is popping off, as the youths say. But most companies are in the early stages of execution. They’re looking to properly set up their AI systems and improve them over time, and getting access to proper training data is a huge part of that equation.
It’s not always smooth sailing; in fact, often times it is not.
57% of the surveyed cited cost as an adoption barrier, a 31% increase from 2020. Additionally, 28% and 24% cite data security concerns and a lack of confidence in data quality as barriers in properly implementing AI.
Using our inference caps, we can infer there’s a market for quality data, but with costs of implementation already high, burning more resources on getting the right data can compound the issues businesses are facing.
“Help Us, We Need Better Training Data! We’ll Pay You!”
The above is absolutely and definitively a quote proclaimed by the majority of senior executives who are using AI to improve their business. Okay, maybe I can’t prove the quote is real, but…STATISTICS SPEED ROUND!
All these stats are from “315 senior decision-makers with verified relevant AI experience at US firms with annual revenue of over $100 million and a company size of more than 500 employees.” The survey was conducted by LXT in partnership with Reputation Leaders, a research firm.
24% have implemented a data strategy focused on data acquisition
66% expect needs for training data to increase
0% expect needs for training data to decrease
13% of investment budget in AI is dedicated to training data (tied for highest w/ product development)
33% cite data availability as the biggest challenge faced when it comes to training data for AI
30% cite lack of quality data available as the biggest challenge
87% are willing to spend more for higher quality training data
This is where zero-party data becomes a huge priority. It’s the best data around! It’s straight from the customer! They provide it purposefully and intentionally! It’s as accurate as a Greg Maddux fastball (which is very accurate, if you’re not a baseball fan!)!
“Oh! I have zero-party data! Wait…can I sell it to you?”
The above is absolutely and definitively a quote proclaimed by every business everywhere who just read those statistics.
2022’s spend was already at $118 billion.
Taking a look at that previous set of data…would you look at that. 13% of the invested budget in AI is dedicated to training data. 13% of $118 billion is right around $15 billion. 13% of $300 billion is…$39 billion. Now I know this isn’t exactly how statistics work, so don’t grill me. But in short: global spend for training data for AI is a multi-billion dollar industry. Factor in that 66% of these execs expect the need for training data to increase and 87% are willing to spend more for higher quality data then…well, you get the point.
The key takeaways are these: more and more businesses are prioritizing AI initiative at their company. Most of them are very new to it, and most view cost as a major barrier to adoption. Because cost is an issue, incomplete, unreliable, unverified data is a huge obstacle to overcome. It screws up the product, and even bad data can be expensive. Training data is one of, it not the biggest, expenditures when it comes to AI budgets.
More than half of executives (63%) view issues related to data availability and data quality to be their biggest challenge when it comes to training data. So a heck of a lot of them (87%) are willing to spend more for higher quality data. They’re relying on it. ZERO percent predict training data need to decrease. 66% say it will increase.
How’s that for a summary?
If you’re a business with data, TIKI is here to help get your customer data licensed and listed for sale quickly, safely, and programmatically. You can even use it to train your own AI, if that suits your fancy.
If you’re a data buyer at an AI company or company with an AI initiative, you might want to let everyone know…HEY IF YOU SET UP THIS LITTLE PINEAPPLEY SDK THING I WILL PAY YOU MONEY. LOTS OF IT! Or you know, a reasonable amount. Maybe you don’t want to start on the desperate end of the negotiation. But, I digress:
If you’re a business, you may be wondering, well…how does this work? How do I set up the pineappley SDK so I can start licensing and re-listing customer data for monetization/revenue stream creation.
*Montell Jordan voice* THIIIIS IS HOW YOU DOOOOO IT (…this is how you do it!)
How Does it Work?
Let’s say you’re an EdTech company with an awesome social media presence and an endearing green owl as your mascot. Let’s say you’re called DuoLingo. You provide your own version of language learning in the form of gamification. You have gigantic databases of consumer behavior from their interactions with your product and also a bunch of valuable data in multiple languages, which is a “hot commod,” as you folks at DuoLingo know. And you realize there’s a gigantic opportunity to sell that data to companies in need. How would you go about doing such a thing? Allow TIKI to break it down for you, you silly owl.
Create a Data Reward Program
(or improve an existing program)
First, you’re going to want to be able to collect data directly from your users. So go ahead and initialize the TIKI SDK create a flow for creating a data program and licensing that coveted zero-party data. For the techy folks, here’s a diagram of a common flow:
The data license offer is recorded, signed, and indexed, resulting in a license record available to DuoLingo via callbacks and servers through TIKI’s API. Here’s how simple the code can look on the front-end, specifically utilizing the Snowflake integration:
The ingredients for creating a data reward program can be seen here, which include:
a. a short informative description (“Did you know you can partially pay for your subscription with your data?”)
b. Basic conditions for the agreement of what happens to the data (“Aggregated and anonymized,” and “Sold to AI [companies] to pay your bill.”
c. Basic conditions for the agreement of what doesn’t happen to the data (“Used to target and follow you”)
Additionally, the full legal terms of the agreement are embedded in to appear on the front-end when a user accepts. The resulting data offer may look something like this:
On our end, we provide the guidelines and a boilerplate customizable template for the license agreement. Check’ em out here.
Connect, Collect, Configure, Deploy & Publish!
The next steps don’t include as many pretty graphics. It may seem like a lot, but we can assure you it is very simple. In fact, we’ve created an entire blog walking you through the process of creating a data reward system with data monetization on Snowflake’s marketplace. We estimate the entire process will clock in at under one hour.
Sourcing and Buying Training Data on Snowflake?
We get it. If you’re an AI company reading this, the idea of sourcing and buying training data on Snowflake’s Marketplace (or any existing company marketplace, for that matter) is likely less than ideal for your current stack. We’re actively working on this, so keep your ear to the ground for new developments from TIKI. Or, better yet, please reach out if you’re interested in working with us in building a solution you need!