Introductory seminars on blockchain often start with easy-to-understand stories about peer-to-peer networks and bank ledgers, and then jump straight to writing smart contracts, which is jarring. So imagine yourself walking into the jungle and imagining that the Ethereum blockchain is a strange creature you’re about to study. Today we’re going to look at the creature, interact with it and then collect all the data about it into a centralized storage for our own use.

Set for the first time

First, you need to install Web3py. Web3py is a Python library for connecting to the Ethereum blockchain. What you need to know in advance is that there is no central management system from which to download data. Inter-connected nodes (” peers “) that share resources with each other store validated copies (or parts of them) of data. The network implements the Ethereum protocol, which defines rules for how nodes interact with each other and smart contracts on the network.

To access information about transactions, balances, blocks, or anything else written to the blockchain, the protocol requires you to connect to the node. Nodes are constantly sharing new data with each other and validating data, so that you can determine which data has not been tampered with and which data is up to date.

There are two basic types of nodes you can use in your first approach to Ethereum: local or managed. Local nodes can run on your computer, which means you first need to download a client like Geth, which syncs the blockchain to your device, taking up storage space and taking a lot of time to complete. For first time learning, a managed node is a better choice — it’s controlled by someone else, but you can easily connect to it and play with the blockchain yourself.

Go to Infura and create your own free account to access such managed nodes. When you’re done, you can see the mainnet mainnet (aka the ethereum blockchain) and a bunch of testnets testnets that basically test your smart contracts so you can make mistakes and correct them before deploying expensive code to mainnet.

This is the first time we import a Web3 object and establish an HTTP connection.

from web3 import Web3 
web3 = Web3(Web3.HTTPProvider("https://mainnet.infura.io/your-own-personal-number"))
Copy the code

Now you’re done! You can now browse query data structures using the Web3 API.

Example Query information about a specific block

#current block number
>>> web3.eth.blockNumber
5658173
#get the content of the most recently mined block
>>> web3.eth.getBlock('latest')
Copy the code

This command returns AttributeDict data structure, which is a dictionary of key-value key-value pairs, as shown below:

AttributeDict({'difficulty': 3297284143124448,
 'extraData': HexBytes('0x65746865726d696e652d6177732d61736961312d34'),
 'gasLimit': 7999992,
 'gasUsed': 7990111,
 'hash': HexBytes('0x8c09ba67123601c08ef5d292acaffd36798ca178b7d6fecd5e1144ce8e3b9e50'),
 'logsBloom': HexBytes('0x348000240b40620836308460180004415000c8ccb260021402420721c22801ca847c625c0a89030482044001523a4d100050100250d1008583812 60a186312088006c154010000491216446840888200c1812088c12b06000809a808530014160000812c2ac20008a201c83380314d02242338400c050 0c2a028005010988c44b0608a020400201032e10e16142b931115469824248066100b082224200222140a41a20aa2006224d608210f1a22811d03969 423e8c08058100388c0800402002a000802130c40d289201900c38142a1cc0380a4010f0201040d4022200022018c5801346c168502841906940485e a1d9864044060a00000a00616004d006090'),
 'miner': '0xEA674fdDe714fd979de3EdF0F56AA9716B898ec8'.'mixHash': HexBytes('0x84320fd71345778b48e437f3403e9021575520ba23aaac48dd7a352c9ce31f75'),
 'nonce': HexBytes('0x98a0b1e00bfabac6'),
 'number': 5658173,
 'parentHash': HexBytes('0x01eda8a47a0151533d1afacf9b9108606d4d89a86e269dddaac9698b6fb12930'),
 'receiptsRoot': HexBytes('0xc40f774ad10ad443457c3a5a0db96b539af3007f8d351b198ca7bf2ef196b7e0'),
 'sha3Uncles': HexBytes('0x55725ec296c6c64257ed6a88d7d8c66160abe7b672f5d89bbad5487779b1d5fe'),
 'size': 27161,
 'stateRoot': HexBytes('0x5bfc7a9a87fb9991f2760807d56319154f1dab91d3cfc9530a597b6c5d064aba'),
 'timestamp': 1527002893,
 'totalDifficulty': 4339832462578780814928,
 'transactions': [HexBytes('0x1ce6bca99701c4e8acae986b10e7649d628d70ec62b7a8314bbb13726a312225'),
  HexBytes('0x6ba5e657243aea5f95afb40090313d10bb9443db41ed1216fbf7e7e60a16749a'),
 loooooots_of_transactions_here],
 'transactionsRoot': HexBytes('0x67e1e1f2f4b1d33791a0fba2d5ebf039bd6c331c665cb8020ff41d0e0eade46e'),
 'uncles': [HexBytes('0x3df1bffa62d73b3847b434e9ea459c10cfdc3e212a2e78ebbf0db58adbef30b5'),
  HexBytes('0x74bdcd4b88427854ae18f9c7ada28d46f26411bed09af6b040cbede66fdb1853')]})
Copy the code

Not all of these variables will be immediately useful to you, as some are very technical and what they mean only makes sense once you have a deeper understanding of how blockchains actually work. You can read more about them in the so-called Yellow Book, or skip them for a while and use an easy-to-understand method.

In short, a block containing the block header information, a list of verified transactions written to it, and an unconfirmed list (block identifiers for miners whose blocks are too slow to enter the main blockchain but still get aether rewards for their computational effort). Below you can see the meaning of each variable, which I have divided into subcategories.

General General data

Block variable Meaning translation
number scalar value equal to the number of ancestor blocks (genesis block=0) Scalar value relative to the number of genesis blocks, Genesis block=0
size size of the block in bytes Block size, in bytes
timestamp Unix’s time() at this block’s inception The Unix time at which this block started
miner 160-bit address for fees collected from successful mining Successful mining collects 160 bit address of aether
gasLimit maximum gas expenditure allowed in this block Maximum allowable gas consumption for this block
gasUsed total gas used by all transactions in this block The total amount of gas used in all transactions in this block
transactions list of transaction hashes included in the block Block contains a hash list of transactions
parentHash Keccak256 hash of the parent block’s header The Keccak 256 hash of the parent block header
hash current block’s hash Hash value of the current block
extraData extra data in byte array Extra data in a byte array

Mining related

Block variable Meaning translation
difficulty scalar value corresponding to the difficulty level of the block Scalar value corresponding to the difficulty level of the block
totalDifficulty integer of the total difficulty of the chain until this block The total difficulty value of the chain up to this block
nonce hash of the generated proof-of-work; null when its a pending block Generate hash values for proof of work; Null when a block is suspended
mixHash 256-bit hash which is combined with the nonce and used to prove that sufficient amount of computation has been carried out on this block A 256-bit hash is used in conjunction with the nonce to prove that sufficient computation has been performed on the block

Uncle related

Block variable Meaning translation
uncles list of uncle hashes List of uncle hashes
sha3Uncles SHA3 of the uncles data in the block A: And the SHA3 value of the data in the block

Technology related

Block variable Meaning translation
receiptsRoot Keccak 256-bit hash of the root node of the tree structure populated with receipts of all transactions in this block The 256-bit hash at the root of the Keccak tree structure fills receipts for all transactions in this block
stateRoot Keccak256 hash of the root node if the state trie after all transactions are executed and finalisations applied The KECCAK256 hash of the root node of TRIe, for example, after all transactions are executed and the application is terminated
transactionsRoot Keccak256 hash of the root node of the trie structure populated with the receipts of each transaction in the transactions list The KECCAK256 hash of the root node of the TRIE structure populates the receipt for each transaction in the transaction list
logsBloom the Bloom filter from indexable info (logger address and log topics) contained in each log entry from the receipt of each transaction in the transaction list Bloom filters for indexable information (logger address and log subject) contained in the receive log entry for each transaction in the transaction list

Transactions and receipts

We can now also look up individual transactions in a block by their unique identifier, which is a transaction hash.

>>> web3.eth.getTransaction('0x1ce6bca99701c4e8acae986b10e7649d628d70ec62b7a8314bbb13726a312225')

AttributeDict({'blockHash': HexBytes('0x8c09ba67123601c08ef5d292acaffd36798ca178b7d6fecd5e1144ce8e3b9e50'),
 'blockNumber': 5658173,
 'from': '0x390dE26d772D2e2005C6d1d24afC902bae37a4bB'.'gas': 45000,
 'gasPrice': 123400000000,
 'hash': HexBytes('0x1ce6bca99701c4e8acae986b10e7649d628d70ec62b7a8314bbb13726a312225'),
 'input': '0x'.'nonce': 415710,
 'r': HexBytes('0x1bb901ad0a3add517504cc459fdb1545d193020ec5c63a566e440ee39dbfe131'),
 's': HexBytes('0x4b7ac95eb321b5947948ecb624e1d80b19d9cc876668c69cc2b32670f52b061a'),
 'to': '0xBbA2D99C9B3aF394B0d6417b1D58815eE495029D'.'transactionIndex': 0.'v': 37.'value': 1000000000000000000})
Copy the code

As before, web3py returns a dictionary of attributes. The following table summarizes the meaning of each key.

Transaction variable Meaning translation
blockHash hash of the block the transaction belongs to The hash value of the block to which the transaction belongs
blockNumber number of that block Number of the block
hash transaction hash (unique identifier) Transaction address hash (unique identifier)
from 160-bit address of a sender of a transaction Hash 160-bit address from the sender of the transaction
to address of the recipient or null for a contract creation transaction The address of the recipient or creating a contract transaction is null
value number of wei to be transfered to the recipient or newly created account (case of contract creation) Number of weIs to transfer to recipients or newly created accounts (in case of contract creation)
gas gas consumed by the transaction Trade in consumed natural gas
gasPrice number of Wei to be paid per unit of gas for all computatioon costs of this transaction The amount paid per unit of gas for all calculated costs of this transaction
nonce number of transactions/contract creations sent by the sender prior to this one The number of transactions and contracts created by the sender prior to this date
v/r/s used to identify the sender; the signature values of the transaction The signature value used to identify sender transactions
input the data sent along with the transaction Data sent with a transaction
transactionIndex index of the transaction in the block An index of transactions in a block

Finally, we can also look at transaction receipts:

>>> web3.eth.getTransactionReceipt('0x68c70c5ffe54a42ebf7118e7e931aeac018cee4656a816ffe6a01388da50c851')

AttributeDict({'blockHash': HexBytes('0x44338e1f80302037c7213e8f56dd35d8a473b000319bc200f76e910e62d12f98'),
 'blockNumber': 5617795,
 'contractAddress': None,
 'cumulativeGasUsed': 21004,
 'from': '0xea6e3e41ebaa09d550d3c3f0d72971b3c5ccc261'.'gasUsed': 21004,
 'logs': [].'logsBloom': HexBytes('0x000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000 '),
 'status': 1,
 'to': '0xd96a6e75d099ce529bbd257cbcb403224cceaebf'.'transactionHash': HexBytes('0x68c70c5ffe54a42ebf7118e7e931aeac018cee4656a816ffe6a01388da50c851'),
 'transactionIndex': 0})
Copy the code

Transaction receipts contain some duplicates and new entries, which are explained below.

Receipt variable Meaning translation
status boolean whether the transaction was successfull; false if the EVM (Ethereum Virtual Machine) reverted the transaction Whether the transaction was successful or false if EVM (Ethereum VIRTUAL machine) restored the transaction
contractAddress the contract address created if the transaction was a contract creation; otherwise null The contract address created if the transaction is a contract creation; Otherwise, null
gasUsed the total amount of gas used when this transaction was executed in the block The total amount of gas used to execute this transaction in the block
cumulativeGasUsed the sum of gasUsed by this transaction and all preceding transactions in the same block The gasUse used for this transaction and the sum of all previous transactions in the same block
logs array of log objects which the transaction has generated An array of log objects generated by transactions

For reference, in addition to the Yellow Book, I have included various additional resources to prepare these tables 2,3,4,5.

As you can see, with just a few simple commands, you can connect to the network and get basic information about the transaction, block, or status in its original format. This will open a new window for this data!

Database management system

When planning to write data to the appropriate database, you may be aware that there are many management system solutions for Python enthusiasts, such as serverless SQLite, or server-based MySQL, PostgreSQL, or Hadoop. Depending on your intentions, you must determine which option is best for your project. Overall, I find these points helpful:

  • What is the expected size of the database (that is, it can be processed on a single machine system)?
  • Are the entries edited frequently or do they stay the same?
  • Should the database be accessed and edited by multiple parties/applications simultaneously?

The Ethereum blockchain is growing steadily over time, approaching 1TB as of June 2018, which is small and therefore not optimal for distributed processing systems like Hadoop. The blockchain database will be written once, then only expanded with new entries, leaving the old entries unchanged. The intended use case for this database is written by one channel and accessed read-only by other channels, so we don’t actually need to run it on the server. Keeping the database locally on your machine will result in fast reads, which is desirable and achievable for serverless management systems like SQLite. Python has a built-in library called SQlite3, so we don’t even need to install new packages.

Database design

The next step is to design the database. Remember which data fields are most relevant to your analysis and are designed to optimize search and storage. For example, if you don’t plan to use stateRoot, you might want to skip it entirely or keep it in a separate table. You can search faster for tables with fewer columns, and if you realize later that you actually have a use case for stateRoot, you can still access it. You may also want to separate block information from transaction information; If this is not done, then block attributes such as TIMESTAMP will repeat all transactions in the block N times, wasting a lot of space. You can easily match a transaction with its block attributes later using the JOIN operation.

The database I designed contains three tables:

  • Quick: The most relevant trading information for Quick access and analysis.
  • TX: All remaining transaction information,
  • Block: information about a specified Block.

The naming convention for variables is slightly changed from the original Web3py to disambiguation, such as calling both block hashes and transaction hashes “hashes”, or using “from”/” to “as column names, which have different meanings in SQL and crash the program.

Transaction values, balances, and other large numbers need to be stored as strings in the database. The reason is that SQLite can only handle signed integers stored up to 8 bytes, and the maximum is 2 to the power of 63 -1, which is about 9223372036854775807. This is usually much lower than the transaction value in wei (for example, 1ETH = 10*18 WEI).

Build your mini-database

The full code can be found on GitHub. It organizes blockchain information according to the superarchitecture and outputs a blockchain.db file containing a pre-specified amount of block data. To test it, go to the database.py file and choose a reasonable number for the number of blocks to write, such as:

Nblocks = 10000
Copy the code

By default, you should point web3 objects to Infura endpoints. If you have an IPC provider (i.e. your local node), you can also switch to an IPC provider by uncommenting the line:

# or connection via node on the VM 
#web3 = Web3(Web3.IPCProvider('/path-to-geth.ipc/'))
Copy the code

Change the path, then simply run it from the command line python database.py. The code dumps the number of the lastblock written to lastblock. TXT in case you need to reboot.

How to use the database

Once the first entry is written to the database, you can start communicating with it through the Ipython shell. For example, to print the first five lines of the table “Quick”, you could run the following code.

import sqlite3 as sq3
conn = sq3.connect("blockchain.db")
cur = conn.cursor()

# some SQL code, e.g. select first five entries of the table Quick
cur.execute("SELECT * FROM Quick LIMIT 5")
a = cur.fetchall() #list of tuples containing all elements of the row
print(a)
conn.close()
Copy the code

Local node with Infura

If you are building a large database, download GETH and synchronize the nodes. Synchronization can be done in three basic modes:

If you do not need the past account state, you can synchronize node 6 in Quick mode.

The chart below shows the speed at which this code is written to the database, communicating with the local fully synchronized node (IPC) with the address on Infura (Infura). As you can see, running this code on a local node is worth it because you can increase the speed by nearly two orders of magnitude (100x)!

conclusion

Now that you have your own local database and know what’s happening on the blockchain, you can start exploring it. For example, you can count the number of transactions since its origin and look at the number of addresses generated as a function of time — the sky is what you can learn about the limits of the blockchain. We’ve laid the foundation for your data science playground. So keep exploring, or check out the next article for potential applications.

Python uses web3.py to develop ethereum easily. If you are interested, check out our Python Ethereum tutorial, which focuses on python engineers using web3.py to develop blockchain Ethereum in detail.

Here is the original text