Written by Dr. Anestis Fachantidis, Co-founder and CEO, Medoid AI
Reimagining Creator’s Role in the AI-Driven Economy
This is not just about an idealistic view of fairness in AI economies. I strongly believe that incorporating content creators into these economies is essential for their long-term sustainability and growth. Today’s AI business models are actually hindering the development and future potential of AI. Generative AIs heavily depend on open-access, unstructured data, yet their interfaces, based on chat, prompting and indirect consumption, demotivate the production of open-access content — the very content that was necessary in building these systems and remains vital for keeping them relatable.
I believe that the traditional incentives for open-access content creation are diminishing. During the Web 2.0 era, creators were driven by direct human interaction (H2H) with their content, through mechanisms like ads, revenue sharing, subscriptions, or the intrinsic rewards of positive community impact and reputation building. However, with the shift towards a human-to-machine-to-human (H2M2H) model, these motivations are less effective since all of them are based on a crucial assumption: human interaction with the creator’s content.
AI training data crawlers, neither click on ads nor subscribe to services, nor do they enhance the creator’s reputation or community presence. This lack of engagement risks pushing creators towards paywalled platforms, which they in turn may ironically act as training data brokers (in unfavorable terms for the creators). This shift poses also a threat to creators’ audience reach since content access will be restricted to those able to afford subscriptions. Additionally, it may lead to a competitive market for AI training data, disadvantaging smaller AI teams and stalling AI democratization. Early indicators of this trend are evident for example in the actions of X (formerly Twitter), which limited API access, effectively benefiting alone from user-generated data, a move that notably impacts research and non-profit initiatives relying on this data. There’s an urgent need for new systems that recognize and integrate content creation into AI economies, ensuring fair recognition and compensation for creators.
Generative AIs heavily depend on open-access, unstructured data, yet their interfaces, based on chat, prompting and indirect consumption, demotivate the production of open-access content — the very content that was necessary in building these systems and remains vital for keeping them relatable.
Most of the Gen AI commercial systems today have utilized a significant amount of such open-access training data. We are already witnessing signs, where training data were even reproduced verbatim without any attribution to its creators (please see Further Reading). This oversight is part of a larger trend where the training data contribution is overshadowed by the technological and hardware aspects credited to AI vendors. It’s crucial to recognize that while these technological contributions are very important, data is the most important factor in the value chain of AI-driven economies.
Before I continue, let me define two key terms of this article, “content” and “open-access”. With content I refer to materials, typically digital in nature, that represent any original creation of human intellect in any form (sound, image, text, code, video) and for any purpose. With open-access content I refer to content that is made freely available to the public without any financial, legal, or technical barriers. The key principle behind open access is to promote the sharing of knowledge and information, allowing unrestricted access and reuse by anyone, anywhere, with an internet connection. Open access content is usually provided with licensing that permits free use and redistribution, such as Creative Commons licenses.
A New Ecology of AI
Data and creators do more than just contribute their face value; they ground Generative AI systems in any current reality with their knowledge and human perspective. If for example, ChatGPT stopped retrieving or training on fresh content a significant part of its knowledge would soon be outdated or obsolete. This includes everything, from new programming frameworks to new fashion trends and scientific discoveries. Therefore, the continuous inputs needed by these systems are not just electric energy but also our own intellectual resources, which should be “clean” (authentic, non-synthetic), “renewable” (motivated and incentivized) and “accessible” (open-access).
I’d like to draw a parallel that might initially seem unexpected. In the evolving AI-driven economy, I am identifying the emergence of a new ecology. The original human intellect, as manifested in our various creations, is actually a kind of natural resource. Just as we express concern over the overexploitation of physical natural resources or the oligopolies controlling them, a similar consideration should be given to the preservation and encouragement of content creation. But isn’t human creativity inexhaustible, unlike a typical natural resource? Not exactly. Creators’ motivation and sentiment towards these new realities may significantly reduce original creative output and/or restrict its distribution. This mirrors the shift towards renewable energy resources, emphasizing the need for sustainable open-access intellectual resources. We could call them ‘renewable intellectual resources’. Their open-access nature will help democratize the AI landscape and foster competitive innovation, with smaller teams being able to use these data too.
This mirrors the shift towards renewable energy resources, emphasizing the need for sustainable open-access intellectual resources. We could call them ‘renewable intellectual resources’.
This is not about content creation competing with generative AI. Modern creators are likely to use AI outputs to extend their creativity. The issue lies in their motivation to contribute open access content within an imbalanced AI economy that leans more towards its consumer side. It’s also about how this lack of motivation can hinder AI development itself.
Our AIs, much like their need for vast amounts of energy, require vast amounts of original human intellect. As we advocate for our computational resources to be powered by renewable energy resources, we should also advocate for ‘renewable intellectual resources’. This can help ensure motivation and fairness towards the human intellect and a continual flow of creative content to fuel the AI ecosystems.
This is not about content creation competing with generative AI. Modern creators are likely to use AI outputs to extend their creativity. The issue lies in their motivation to contribute open access content within an imbalanced AI economy that leans more towards its consumer side
The GiveBackGPT Experiment
What we are discussing here is the foundation of a new economy that inherently credits open access content creators. Such an economy is fair and at the same time fosters growth by supporting multiple AI initiatives towards the common good.
I am introducing to you GiveBackGPT as the minimal example of a concept that can credit and acknowledge creators simply by utilizing web search.
The principle behind GiveBackGPT is straightforward: any searchable content that is similar to a generative AI response should be acknowledged and rewarded. This content likely reflects aspects of training content, irrespective of whether it is actually part of the training dataset. This is important since we don’t aim to find and credit the actual training data of a Gen AI but also to credit such data that can be used for training in the future. Using the Gen AI response as the primary proxy for attributing credit to creators ensures that the content most relevant to popular Gen AI usage is prioritized over general web search content.
The GiveBackGPT process is simple and intuitive:
- Submit a normal query to GiveBackGPT, just like you do in ChatGPT.
- Receive a full response from GiveBackGPT.
- The response is then summarized in X words and used for a web search.
- The top Y search results are analyzed to extract author, domain, and content details.
- A similarity score (0 to 1) is then calculated based on how closely the original bot response matches the full page content of each top result.
- These results, along with their authors, domains, and similarity scores, are published to the public GiveBackGPT Google sheet, where the similarity score serves directly as a form of credit. In future implementations this credit can lead to actual monetary rewards. A leaderboard aggregates the scores per author for a comprehensive view of credit assignment.
The principle behind GiveBackGPT is straightforward: any searchable content that is similar to a generative AI response should be acknowledged and rewarded.
The concept is deliberately simple, the rationale for this simplicity is threefold:
- Using just the generative AI outputs we don’t assume any access to the underlying LLM and its architecture and this makes the method universally applicable across different Gen AI products and platforms. This also invites us to imagine its extension beyond text to include images, videos and sounds.
- Focusing on open-access content is the first priority here, since content behind paywalls or social media can or should have its own acknowledgment and monetization mechanisms.
- Reducing the problem to a web search one, seems a fair compromise nowadays, as we have long accepted the typical criteria of search engine ranking and even built economies around them (such as search engine optimization).
You can access and try GiveBackGPT here and share your feedback. This first version, implemented as an OpenAI GPT, requires a Plus subscription. For those without access you can check this and this sample conversations.
How will the monetization mechanism work?
The current GiveBackGPT release is, of course, a credit simulation demonstration. Major Gen AI vendors like OpenAI, Google, Microsoft and Meta should consider integrating mechanisms like GiveBackGPT into their AI product workflows, to provide actual financial compensation to content creators.
Some might argue that crediting creators could be a significant cost for Gen AI vendors hindering growth or increasing user costs. But this is an essential cost, akin to infrastructure costs. It can legally become a legitimate and less costly way of accessing data than either ad-hoc data licensing agreements or, of course, illegal access. Moreover the cost is aligned with the use since the crediting follows only after a user query and multiple mechanisms to prevent abuse can be applied. A legitimate worry is whether the cost could be transferred to users. However this may happen in any case, since an emergence of closed access data agreements would enable oligopolies and this in turn could again lead to increased consumer-side costs.
Major Gen AI vendors like OpenAI, Google, Microsoft and Meta should consider integrating mechanisms like GiveBackGPT into their AI product workflows, to provide actual financial compensation to content creators.
As to the monetization mechanism itself, we are currently experimenting on a system that identifies each piece of content by its embedding (vectorized form of text, images, sound, etc.) and links it to a rightful owner’s digital wallet. Such a mechanism could facilitate instantaneous transactions for content that aligns closely with LLM responses. For further details on this novel approach, please stay tuned for the upcoming version of GiveBackGPT.
Which type of content is GiveBackGPT focusing on?
GiveBackGPT focuses on unstructured data such as those used by generative AI for training or retrieval. Typical tabular machine learning data used for example in forecasting models or recommendation systems, are not so readily available or they come with explicit licensing terms. Instead, it is unstructured data that help build generative models that demand our attention because their form and accessibility does not imply any form of consent. These data are everywhere from our social media chats to our artistic creations and our blogs and were originally meant to be consumed by humans.
The emphasis is particularly on the knowledge content within the responses of generative AI. The fundamental ability of these systems to generate syntactically correct phrases or meaningful images is attributed to their full training set. However, the specific knowledge conveyed in a response — whether it’s about a political figure’s biography or a unique artistic style — is derived from a much narrower and more specific subset of this data. Although generative models like LLMs don’t usually produce outputs that directly mirror their training data, even their most creative and novel responses are fundamentally grounded in the original content provided in the training set. Therefore, it’s essential to acknowledge and credit this original content, as it forms the basis of the knowledge and creativity exhibited by these AI models.
However, the specific knowledge conveyed in a response — whether it’s about a political figure’s biography or a unique artistic style — is derived from a much narrower and more specific subset of this data.
Epilogue: Toward a Sustainable and Democratized AI Economy
If all these resonate with you and you think that such initiatives are truly towards a more sustainable and democratized AI economy, I encourage you to share this post and expand the conversation. As we progress towards the next release of the concept presented here, I invite you all to engage, discuss, and collaborate on this journey. Your thoughts and contributions are invaluable, and I am always open for a dialogue.
This is not a personal initiative, I see this is an integral part of what we do at Medoid AI. We are not just builders following mindlessly the next project and task, many of us have been working on AI long before it became a mainstream focus and we have a thesis and a vision on AI that predates its recent popularity. This is a small, practical part of this thesis on how AI can be truly effective, inclusive, safe and thus sustainable.
For AI economies to remain relevant and thriving, it is very important that we look beyond their consumer or inference aspects. The essence of these economies lies in their capacity to integrate and reflect the vast and diverse range of human input, especially as they aim to serve the broad spectrum of humanity. The transition from a knowledge economy to an intelligence economy is undeniable, and the latter’s continued growth and relevance will depend on the knowledge and contributions of people from all walks of life.
Thank you,
Anestis
Further Reading and Resources
- MIT Technology Review – This artist is dominating AI-generated art. And he’s not happy about it.
- The Washington Post – Inside the secret list of websites that make AI like ChatGPT sound smart
- VentureBeat: The copyright case against AI art generators just got stronger with more artists and evidence
- Futurism: AI Companies Desperately Hiring Authors and Poets to Fix Writing
- The Verge: Microsoft, OpenAI, and GitHub Copilot face class action lawsuit over AI copyright violation and training data
- Futurism: OpenAI Sued for Using Everybody’s Writing to Train AI
- Futurism: AI Companies Are Running Out of Training Data
Technical notes on current GiveBackGPT release (6th December 2023):
- We chose to implement this first version on OpenAI’s GPTs platform but expect its transition to other platforms too.
- This first version has no mechanism to prevent manipulation of the results or ensure their reliability since this is just a conceptualization for now. Treat it as a very early prototype and with your feedback more updates are coming!
- From our experiments 20-word summaries of LLM responses act sufficiently well as proxies of the actual response while still providing an adequate number of web search results. Until we have global vector search engines (like Google for embeddings) we may need to stick to this!
- We avoided using GPTs’ API actions and instead chose to generate a link encapsulating a GET request so that the GPT is also accessible through mobile (currently API actions are not supported in mobile). We did this also for transparency on the data sent, and to avoid hallucinations; the API definition takes up a big part of the LLM context and such problems become more frequent.
- The GiveBackGPT process can also run asynchronously, or in the background, not delaying the user interactions with the Gen AI system.
- LLΜs can not act in a highly predictable way, in some cases the process will just fail.