Constellation Network, a Web3 ecosystem validated by the US Department of Defense, today announced the launch of a customized blockchain developed in partnership with the Common Crawl Foundation. Together, they aim to create the industry’s first cryptographically secure, immutable archive of internet data for AI training and development.
This new system uses blockchain technology to validate and secure nearly 9 petabytes of internet data used to train Large Language Models (LLMs). It offers enhanced transparency, traceability, and data integrity throughout the AI development lifecycle.
This innovative application-specific network, or Metagraph, addresses pressing concerns in AI development while exploring vast new use cases for blockchain technology in emerging industries: data provenance, privacy, and ethical sourcing.
Key Technological Innovations
- Comprehensive Data Archiving: A fully immutable copy of internet history, providing unprecedented transparency and traceability for AI training datasets
- End-to-End Encryption: Cryptographic security that ensures data integrity throughout the AI development lifecycle
- Ethical AI Framework: A robust solution for addressing concerns around data collection, storage, and usage in large language models
“This integration is a critical step forward in securing the future of AI development,” said Alex Brandes, the CTO of Constellation Network. “By ensuring cryptographic integrity and immutability of training data, we are addressing one of the most pressing challenges in the field today: trustworthiness and provenance of datasets.”
Brandes believes their platform will grow to become a cornerstone in the field of responsible AI development, setting new standards for data integrity and trust.
Furthermore, the network will utilize Constellation’s DAG utility asset to secure the archived internet crawls. Projects like TraceAI, supported by the National Science Foundation, are already utilizing this technology to improve AI models and develop advanced watermarking.
TraceAI will also leverage Common Crawl’s Constellation-built solution to further extend their work in blockchain encrypted AI to include tracking the origin source of data.
Kevin Jackson, the Vice President of Space Domain Communications & Commercialization for Forward EdgeAI, emphasizes the significance of this breakthrough, “This represents the natural evolution of AI and machine learning model development—transforming data management from a technical challenge to a trusted business tool that drives global standardization and verification.”
Over the coming months, Constellation Network and Common Crawl Foundation will work together to expand on solution sets for AI developers and further integrate the distribution of the cryptographically validated access to the crawl as part of the standard release process.
“For users of the Crawl who are concerned about the provenance of the data, especially those using it for AI models, Constellation and their hypergraph blockchain provides an elegant solution,” said Rich Skrenta, the Executive Director of the Common Crawl, “We are looking forward to adding the ability to securely validate the crawl as part of our standard distribution by partnering with Constellation.”
Evidence of this integration can be found on Constellation’s transaction viewer, called the “DAG explorer,” and developers can get started using verified historical crawls for AI applications. Please follow along for further solutions to be developed by Constellation Network, Forward Edge-AI, and Common Crawl Foundation.