FAQs

Welcome to AIDC - the marketplace where data engineers can monetize and trade high-quality datasets for AI deep learning. 

What is AIDC and how is it different?

Data drives learning in our current generation of AI systems.  You have valuable datasets; 
from current training projects, from your businesses and projects, from work you have done combining other datasets.  We connect you with companies, big and small, hungry for your data to train and optimize their models.  You get paid.  Innovation accelerates.  Everyone benefits.

Shouldn’t data be open and free?

Much data is open.  And data that you and I contribute to is sometimes free.  Companies like Reddit and LinkedIn are packaging it and selling it.  Individuals who author content, written, painted, or sung, should get their just due and get paid.  And data engineers who package datasets, organize it, tag it and get it ready for training should get paid their due.  We know contracts, enterprise systems and tech.  We also support posting public datasets and all appropriate public licenses. So, if there is a good dataset that you need to post, along with your licensed content, you can do that too.  The point is, to get all the data, in one package, to make specific training scenarios easy.

Who can sell datasets on AIDC?

You can be an individual or a company.  You do have to set up a merchant account.  We step you through that process.  You can be a data engineer working on projects, but perhaps there are some datasets, some augmented or synthetic data that you use over and over, that you have rights to use, and you can sell that to your clients.  We allow you to do that, with the contracts and provenance that your enterprise customers trust.  If you are an enterprise or a startup, and you have some data that you have aggregated for your service, that data might be useful for others to train on.  You can turn that data into a passive income stream to subsidize your profits.

What kinds of datasets are in demand?

We see demand for fine tuning and RAG for LLMs with unstructured data in JSON and CSV file formats.  We offer hosting services for these file types directly on our platform.  The availability of inexpensive AI models for local execution, such as Meta’s Llama models or the new DeepSeek models allow you to efficiently execute and tune models with datasets that can be acquired here.  For larger datasets like multimedia, photo, audio or video collections, we provide a linked dataset option. This means you can keep the data in your preferred storage location while handling the marketing and sales through our platform.  Keep track of our blog posts on the most active trends in datasets and model training scenarios.

What about data privacy and confidentiality?

You must own the rights to the datasets that you want to distribute and license.  This includes any copyrights that may be included in your datasets.  Note our code of conduct regarding being a Member.  You can define what kinds of data your datasets may contain, such as HIPAA data and how you may have processed that data.  Same goes for PII data and GDPR/CCPA compliance.  We can process datasets for you and we have data pipelines which we use to anonymize data in datasets which can then be used for training.  See the info down in our footer for data transformation services.  Note that for unstructured training purposes, you often do not need specifics about individuals and can remove much of that data.  The remaining data can still be very valuablemodel training scenarios.

How do I set my licensing terms?

We help you do that with this platform.  We’ve done this before, and we have worked with many folks who have done this before.  We walk you through the process of listing your datasets and we give you a few options.  You can pick your own price, you can have us pick your price, or we have third party partners who will go out and estimate a price.  Once you agree to the price, we generate a contract.  You can review the contract and approve or download it and have your legal counsel review it.  You can step out of the process at any point, or you can post a dataset in 5 minutes.  When you are comfortable with it, post it up and it is live.  Bottom line is we are using modern legal market standards, applied to AI training licenses, for you to monetize your datasets.

I’m a buyer, how do I know I’m getting a quality dataset?

We audit every dataset (yes, Members you are on notice).  Every dataset that is loaded, whether it is linked or hosted is audited through an automated process to make sure that the dataset is what it says it is and in compliance with our code of conduct.  If a Member does not comply with our code of conduct, they will be removed from our platform.  Members who post quality content will be promoted.  And we will listen to Purchasers.  Our goal is to help both sides prosper.  We ask that you all help us do that. 

What do you, AIDC get out of this?

We get a cut of each transaction.  It varies depending on the transaction, your level of hosting, and the scale of the transaction, but it averages to around 20%.  The rest goes straight to the Member who posted the dataset.  We also make a bit on hosting datasets, but that’s mostly a wash to make it easier for individuals who want to post up their datasets and build their marketplaces. 

I think AIDC used to be AI Data CO-OP, what happened there?

Yes, we started as AI Data CO-OP as a non-profit data distribution platform.  To be honest, with our background, it’s easier to raise money than get grants.  We are still committed to giving back to the community and we have continued the CO-OP board.  Follow the blog posts and our developer evangelists on what we are doing there.  We will be actively contributing to open source focused on data transformation making it easier to convert data from various sources to better structures, such as JSON and JSONL for training purposes.

OK, so how do I get started?

Go back to the top page and press that big BEGIN button.  It will walk you through building an account and setting up your bank info so that you can get paid.  You can post your first datasets, you can even post a public dataset to get started.  Feel free to provide us feedback along the way with suggestions on how to improve.  We are updating the application continuously to make it better.  Once you have uploaded a link or hosted a dataset and have described it, you can go select the price and approve the contract.  Once that is done, we start marketing it.

How do you market my dataset? 

We work with a variety of partners to list and display the datasets as they are posted.  You also can showcase the datasets that you have listed to your clients and partners so they can buy them.  We work with the top AI training providers and their channel partners.  We are also working with tool providers to put your datasets right into the training pipelines of many development processes.  If you have ideas on where we should do more here, we are happy to talk.

How does someone buy my dataset?

Other Members like you, or other buyers, called Purchasers, are on the platform.  They see datasets on their dashboard and select the ones they are interested in.  When they review the contract and approve of the terms, they execute the contract and then execute payment.  They can kick out to us to help facilitate the negotiations if needed to move things forward.  We use Stripe and once payment is executed, you, the Member get a signed, watermarked contract, the Purchaser gets the dataset transfer instructions (this can vary depending on the size and scale of the dataset) and the Purchaser gets the dataset to start working with delivered to their target environment.