WTF did we just build?
An anonymous, contextual data pipeline.
We all know anonymity is necessary for a data marketplace. History has shown buyers cannot protect consumers' data. Plus, if they could, the government will always come knocking. A user data marketplace demands user trust. The only way to truly protect identifiable data is never to have it.
But, on the flip-side, Spotify nailed the sentiment with their first tagline, "better than piracy." Going to market with something more ethical and safe only works if it's also better.
That is why context is critical. No one has cracked a general method for de-identifying data at scale, let alone what I'd call medium-scale (data on a few thousand people). Critical because the right data from the right one thousand users can produce 90%+ confidence intervals. If you're a startup (TIKI) without billions of users (FB), this is necessary to go to market.
So how do we do it? We remove identity from data. Not de-identify; remove it. Our data coalesces on the knowledge graph connected by occurrences of events, not people. What this means in practicality is our knowledge graph is a bit like a giant customer journey map spanning products, services, platforms, companies, etc. The interconnectivity (edges) of events (vertices) holds a weight for the number of times it was experienced (# of people).
Obviously, data doesn't come in this format from any 3rd party APIs (Google, IG, LinkedIn, etc.). To go from raw identifiable data to anonymous contextual data demands all data fetching, enrichment, and consent to happen at the edge (phone) in the user's pocket. Identity is stripped out, and based on the user's consent/decisions, edges are added to a local, individualized version of the user's knowledge graph. The edges are pushed via HTTPS to the cloud-hosted ingestion breakers, ensuring edges meet the cohort minimum threshold (ε). Finally, cleared edges are added to the knowledge graph for business consumption.
Mobile Data SDK (fetch, enrich, anonymize)
LocalGraph SDK (local kgraph & publish)
Ingestion Service (ingestion breaker & write cache)
Knowledge Graph Service (vertex defs & storage)
User data ownership
To get paid for something, you first must own it. Then you can license/sell it. But if you have a record of who owns the data, it's no longer anonymous.
Hence NFTs. I know, I know, the crypto market is melting down right now, but don't conflate what an NFT is with the current speculative market. An NFT is just the fancy web3/crypto version of proof of ownership. It's not the data; it's the paperwork that says you own the data. We all experience this all the time in normal non-web3 life, a receipt being one of the most simplistic forms. The deed to your house, and title to your car, are a couple of others. The cool part about an NFT is the NF (non-fungible), meaning the record cannot change. This creates the opportunity to embed within the NFT an encrypted pointer to the data owned. Only the owner of the NFT can decrypt and see what data it corresponds to, no one else, not even TIKI.
So NFTs, what's so special about that? Well, for starters, we need a shit ton of them. Like trillions, and they have to cost like $0.000000001 to make because only in aggregate is data valuable. The second is we need a way to decouple public ownership of an NFT from the data it's for because if all your owned data points are tied to your wallet address, that's not very anonymous...
To handle the volume, we use a new kind of blockchain (layer 0+1) that uses logical decentralization in addition to the typical architectural and political decentralization found in popular protocols. What this means is instead of one giant chain connecting everyone's NFTs, each user effectively has their own mini blockchain.
It's possible simply because your data is yours. You don't need to access someone else data; you don't need to validate someone else data. Your data is validated by your identity when you link your account to the tiki app, aka Proof of Identity.
Meaning right there at the edge (phone), we mint the NFTs, adding blocks directly to the user's chain. The chain itself is the standard pattern; each block is hashed with the hash pointer to the previous block, forcing chain level immutability. i.e., if you tamper with any block, the whole structure fractures.
Once the blocks are added locally to the chain, they are digitally signed and backed up to immutable public storage — if you lose/break your phone, you don't lose your NFTs. Our current implementation of the public backup uses S3 Object Lock, which has been assessed for SEC Rule 17a-4(f), FINRA Rule 4511, and CFTC Regulation 1.31. There are other "less amazon-y choices," but it's battle-tested and so insanely cheap that we can provide it for free to users.
Now to double back to decoupling the public NFT ownership from the data point itself. We aes256 encrypt the NFTs payload (transactions), which contains one or more data fingerprints and proof key pairs. The proof key is a cryptographically secure generated key that is hashed with the binary value of the data point (kgraph edge) to generate a unique fingerprint. The fingerprint follows the data point to the kgraph, plus the fingerprint and proof key are encrypted and saved in the NFT. The proof key is used to claim compensation (more on that another time).
So security, risk, hacking, and tampering isn't that why we need distributed consensus? Well, no.
Distributed consensus is a by-product of a distributed network sharing a single state (a single chain). Important for peer-to-peer payments (bitcoin) or shared compute (Ethereum), but not important for data ownership. And it's actually both a protection measure and a risk vector (51% attack).
At TIKI, our security principle is simple; everything gets hacked. See eth's own very controversial hack/fix.
So instead of trying to build something impenetrable, we focus on building something that minimizes points of failure. Logical decentralization of user data is 🤌 (chef's kiss). There is no shared chain, meaning there is no single hack that impacts more than one user's chain. Add to that breaking the immutability of the local chain on a remote mobile phone and breaking individually signed blocks and public immutable AWS storage. All that gets you is some blocks that point to nothing. You'd then still need to break the AES256 encryption of each block which according to these guys would take 13,668,946,519,203,305,597,215,004,987,461,470,161,805,533,714,878,481 years with the entire worlds compute.
Oh, and the result... an anonymous data point worth rough $0.000000001. talk about not worth the squeeze.
So, does it work? Yup, it’s annoyingly simple (aka scalable). A typical iPhone can mint north of 25,000 NFTs/second, and the s3 bucket costs a measly $0.023 per GB/mth.
Local chain (mobile native blockchain)