On January 10, the “2021 Open Source Knowledge Campaign”, hosted by EpiK protocol, brought an open and connected knowledge graph to the industry. The event attracted the participation of heavyweight guests including Xing Chunxiao, vice President of information Technology Research Institute of Tsinghua University, Wang Haofen, president of Knowledge Graph SIG of China Computer Society, renowned knowledge graph expert and main initiator of OpenKG, and Wang Huizhen, deputy director of Natural Language Processing Laboratory of Northeastern University and founder of Neosu. At this conference, EpiK’s concept and practice of building an open knowledge base based on the decentralized cooperation model of blockchain became the core highlight and was highly praised by many experts and scholars.
The following article will fully analyze the EpiK open source knowledge movement from the following aspects:
Why build a decentralized knowledge graph collaboration platform
Challenges facing the open source knowledge movement
EpiK Name Protocol solution
Who can participate in this open source knowledge movement
01
Why build a decentralized knowledge graph collaboration platform
At present, the era of artificial intelligence has come to the second half, we are no longer satisfied with the inexplicable model simulation, giving AI cognitive ability is the bottleneck that must be broken. As an important medium for machines to understand human knowledge, knowledge graph is becoming an important infrastructure in the era of artificial intelligence on the road to broaden AI cognition.
However, the construction of large-scale knowledge graph infrastructure involves a large amount of knowledge content from various fields and requires high data quality. Therefore, it is necessary to organize a large number of labor forces from different fields to participate in the construction. However, the trust cost of knowledge graph construction is extremely high, and the mutual distrust between enterprises and countries leads to a lot of repeated labor. The demand of how to build the knowledge graph construction platform emerges, and how to share the common benefit among contributors on the construction platform is an inevitable problem to be solved.
2020 is the year of the maturity of blockchain decentralized storage technology. It is possible to build a permitless, tamper-proof and traceable public database, and a practical foundation has been laid for the collaborative knowledge graph platform of co-construction and sharing of common benefits.
02
Challenges facing the open source knowledge movement
The price of Bitcoin continues to hit record highs, and DeFi, IPFS, DAO and other emerging blockchain applications are emerging one after another, making people see the possibility of more blockchain. However, it is not easy to build a collaborative knowledge graph platform based on blockchain, which faces a series of challenges:
First of all, how to realize joint construction? It is necessary to organize people in various knowledge fields to participate in the construction of high-quality large-scale knowledge map, and effective incentive mechanism and strict data quality acceptance mechanism are both indispensable. Second, how to implement sharing? Shared knowledge graph data is faced with the problem of trusted storage. It is necessary to create a public storage platform that can be accessed by all contributors without permission. Third, how to achieve common benefit? Knowledge graph data can be replicated and spread at zero cost, and finding efficient ways for contributors to monetize is the power source of continuous collaboration.
Based on this, EpiK proposes a complete set of solutions with the help of three cutting-edge technology branches of blockchain: decentralized storage, decentralized autonomous organization, and token economy model.
03
EpiK Name Protocol solution
In view of the pain points of building decentralized knowledge maps, EpiK deeply analyzes the application of blockchain technology and Outlines the technical architecture based on the underlying logic of blockchain for building decentralized knowledge maps.
Among them, the most core is knowledge storage, here we introduce three important components:
Storage: provides shared trusted Storage. Data cannot be tampered with arbitrarily, and data cannot be denied access.
Incentive provides incentives for a variety of contributors within the ecology to jointly build a high-quality knowledge map while ensuring that all parties pursue their own interests maximization;
DAO allows the community to participate in system parameter governance and make dynamic adjustment according to different development stages.
1. Storage
The Storage component of EpiK is constructed based on the IPFS protocol. IPFS is a distributed network transport protocol that connects connected computer devices to the same file system. The files submitted to the IPFS network will be divided into multiple pieces, each of which has an independent Hash value. With the help of Merkle Trie data structure, the segmented data blocks are organized and connected to the same Root node to generate a unique File Root Hash, which is the Hash value of the File. The roots of multiple files are also organized into a larger Merkle Trie structure to form a unique Root Hash. One advantage of this structure is that duplicate blocks are not stored repeatedly and nodes only need to synchronize Root Hash with each other to maintain a consistent view of the global file. Each node can freely choose which data blocks to save and inform other nodes which data blocks they have saved. Each node will record the storage situation of other nodes that they have learned in DHT, so that they can quickly lock which nodes have corresponding data and demand from them when receiving access requirements. IPFS successfully connects the honest and selfless nodes to provide a unified file system operation interface. But IPFS also has its own practical problems: lack of incentives and anti-cheating mechanisms, the potential for nodes to do evil and go offline at any time, all of which make it unreliable to build storage on IPFS alone.
Incentive means will be introduced in 3-2. Here we briefly introduce the potential cheating methods of nodes. A document, for example, in order to guarantee the high availability, multiple places in the network to save more, if there are two miners to network broadcast they store the same document, to the whole system for two storage reward, but the two miners may actually share the same physical storage, real only saved a file, The whole system is supposed to pay only one store reward. This is a common witch attack in distributed systems.
To protect against witch attacks, EpiK integrates the zero-knowledge Proof of Replication (PoRep, proof-of-Replication) and space-time Proof (PoSt, Proof-of-spacetime). Among them, the function of replication proof is to prove that the node does store a completely new copy of the original data locally as required. The purpose of the space-time proof is to prove that the node still keeps a new copy of the original data locally. The principle of replication proof is that the global unique ID of the current node is used as the seed, and then the source file is sealed through the computation-intensive encryption algorithm, and then the zero-knowledge proof of the sealed data is broadcast. Although the sealing process is complicated, the correctness of the sealing process can be easily verified by other nodes. The principle is the proof of time and space, node need regular radio a random zero knowledge proof of existing documents, this proved that if the source file to start generating from the seal will be time consuming, and may lead to a node can’t finish the proof of broadcasting, if not timely received the other node prove broadcast of space and time, will think that the node has lost this document. Therefore, in order to ensure the timeliness of the space-time proof, the node cannot discard the sealed file data.
With the storage system and authentication mechanism, we also need to ensure that all nodes maintain data consistency before, which requires all nodes to be consistent about what files are broadcast to the whole network in what order. The blockchain ledger technology is introduced here. The creation of all new files and their creation sequence, the behavior of nodes storing files, and the behavior of nodes submitting storage certificates are all recorded in a blockchain ledger with the consensus of the whole network. Each node will synchronize the complete ledger and obtain the same data perspective of the whole network. EpiK can store the kG-database operation log files in the Storage component. After synchronizing these log files on each node in sequence, EpiK can restore the kG-database locally.
EpiK has 9000+ nodes registered and 5000+ nodes connected to the network to provide storage. EpiK’s current setup stores 3,000 copies of each file on the network. If fewer than that, new nodes get an extra incentive, making it extremely difficult for hackers to DDoS the entire EpiK file knowledge graph database. And the entire network synchronizes the same ledger information, hackers only control more than 51% of the network nodes can tamper with the ledger, attack costs will be extremely high.
2. Incentives
EpiK classifies contributors to the knowledge graph into three categories: data miners, domain experts, and bounty hunters, with another user role: data gateways. Every day, The EpiK network generates a fixed number of rewards for points. How to allocate these points to the three types of roles to motivate them to contribute to the public knowledge graph database, and how to design an appropriate mechanism to reclaim points are defined in the Incentives component.
Data miners are physical equipment providers who earn revenue by providing storage and bandwidth resources, with 75% of the points produced per day belonging to data miners. The more data stored, the higher the revenue, the more data download traffic provided, the more revenue. At the same time, in order to prevent data miners from going offline at will, which leads to the reduction of data backup and the deterioration of system security, all data miners need to pledge a part of their points to become data miners and benefit by providing storage and bandwidth resources. Rewards will be automatically delivered via blockchain contracts, without any middleman review.
Domain experts are contributors and validators of KGS data, and are the only group in the whole system that has the right to upload KGS data. They earn benefits by contributing high-quality KGS data. Nine percent of the points produced each day belong to a group of domain experts, and the more data they contribute, the higher the benefit. However, in order to take account of the differences in the size of data in different fields, the size of data contributed by experts in different fields will be rewarded proportionally after taking the log. Of course, as the only group with the right to upload data in the whole system, there is a strict supervision mechanism for experts in the field. First of all, the domain expert must be nominated by people who are already domain experts. The nominated domain expert also needs to get 10W votes from the community, and each vote counts as a point locked. Once the number of domain expert votes (lock-up points) is less than 100,000 votes, you lose the qualification. If a domain expert uploads fake or junk data, the community will punish him or her for being removed from the list, and those who nominate a domain expert to be removed from the list will also be punished. In order to encourage people to vote, 1% of the output points per day goes to all users who vote, and the more votes you vote, the higher the revenue.
Before we talk about bounty hunters, let’s talk about data gateways. Data gateway is the only way for users to obtain the latest first-hand knowledge graph data. Data gateway requires collateral points to obtain data access traffic, for example, collateral 1 points can obtain 10MB data access traffic per day. The more demand there is for the knowledge graph data on EpiK, the more credits the data gateways pledge, the more credits the demand increases, and the more valuable the credits held by contributors become.
With the concept of data gateway collateral points, let’s talk about bounty hunters. Bounty hunters are annotators and validators of knowledge graph data and are paid for completing tasks published by domain experts. Rewards for bounty hunters vary dynamically with the number of credits pledged by the data gateway. If the more credits pledged by the data gateway proves that the current knowledge graph data quality on EpiK is good, then we will incentivize data miners to increase bandwidth and make the data access service smoother. So the remaining 15% of the points produced each day are allocated more to the data miners; However, if the number of credits pledged by the data gateway is low, then the quality of the knowledge graph data on EpiK needs to be improved, so we allocate more of the remaining 15% to bounty hunters, allowing more human input to improve the quality of the data.
In the whole ecosystem, each role maximizes its own benefits through incentive models. Data miners should provide more storage, and they need to push domain experts to optimize the quality of knowledge graph data to earn more revenue. Domain experts are constantly providing updated, higher quality data in return for higher returns; Bounty hunters do more tasks for more money, and the invisible hand drives the knowledge graph.
3. Decentralized community governance
A driverless car was cruising around, looking for passengers. After passengers get off the bus, the car uses its profits to recharge at a charging station, deciding how to carry out its tasks without outside help beyond its initial programming. This is an ideal use case for a decentralized organization, or DAO, described by Mike Hearn, the developer of bitcoin’s core protocol, where smart contracts make it possible for organizations to operate without hierarchical management. DAO is an important extension in the evolution of blockchain, and EpiK EpiK protocol borrows this form of organization and applies it to the construction of decentralized knowledge maps.
EpiK has multiple DAOs, and EpiK DAOs have global parameters to manage, such as modifying the income share of each group and other parameters. There are Experts DAO to govern internal parameters of domain Experts, such as modifying integral allocation algorithm among domain Experts; There is a Miners DAO that governs internal parameters for Miners, such as modifying the number of backups per file. Roles at all levels in DAO realize their functions in the organization through smart contracts, so that the construction of knowledge graph is endowed with an automated process system, which greatly improves their professionalism and efficiency. Once DAO is in operation, it will liberate huge productivity for the construction of global super-large knowledge graph.
Relying on the troika, EpiK’s knowledge graph + blockchain model bursts into unprecedented vitality and builds a platform for open source knowledge co-construction and sharing.
04
Who can participate in this open source knowledge movement
EpiK’s open Source Knowledge campaign has enabled more and more people to see the future value of the knowledge graph for AI, while encouraging more and more people to join EpiK’s efforts to build and share its benefits. EpiK is, in fact, an underlying data platform that allows people of all identities to participate. So who can get involved?
First of all, experienced practitioners in various industries can sign up to become domain experts in the industry. One of their responsibilities is to ensure the accuracy of the data, and at the same time, to split and distribute the knowledge graph data annotation tasks properly on the platform, so that users can participate in the maintenance of the knowledge graph in these fields.
Second, EpiK introduces the bounty hunter character to help domain experts accomplish domain-specific tasks. EpiK Bounty Hunters complete simple multiple choice questions, such as Yes or No, and behind each answer is a gradual improvement of the knowledge graph. Upon completion of the quest, the bounty hunter will receive rewards assigned by the domain experts for his work. According to the current calculation, not less than 36 yuan per hour. EpiK hopes to mobilize more people, to participate part time using fragmented time, and also to promote new job opportunities in third and fourth tier cities.
Third, you can choose to be a data miner by providing storage space. While obtaining abundant rewards, it is also making its own contribution to the eternal knowledge base of mankind.
There is also data realization, which involves two aspects: on the one hand, data gateway. With the increase of on-chain data, participants can provide some access services with good knowledge through knowledge aggregation for on-chain data to obtain corresponding compensation and benefits; Another aspect is docking applications, which can help enterprises save the cost of building a high database.
05
Said in the last
This paper explains the threefold construction logic of EpiK’s decentralized knowledge graph open collaboration platform. Based on this, EpiK knowledge Graph library will become an important foundation for the future development of artificial intelligence, providing important data support for the future implementation of intelligent applications and promoting the continuous upgrading of data value.
EpiK’s open Source knowledge movement is beginning a 50-year epic sermon from carbon-based life to silicon-based life, and a road to the future of AI is shining brightly.
EpiK Protocol Indicates the EpiK Protocol
EpiK Protocol is dedicated to building a decentralized knowledge map on a super-scale. Through decentralized storage technology (IPFS), decentralized autonomous Organization (DAO) and Token Economy model, EpiK Protocol organizes and motivates members of the global community to organize human knowledge into knowledge maps. Build, share and constantly update this knowledge base of humanity to expand the vision of artificial intelligence (AI) into a smarter future.