I have been interested in the shared/distributed ledger technology (a.k.a. block chain, a.k.a. the magic behind cryptocurrencies) for more than a year already but recently I had finally put real time and effort into it.
And since I believe that the best way to understand something is to get your hands dirty, that is what I have done, after I got a grasp of the core principles (or that is what I thought back then), I decided to code my own shared ledger simulator.
Initially, I also considered to look into the main existing code bases (e.g., the source code of the main/official Bitcoin client/miner or Ethereum's) since they are open source, but seeing code like this put me off... That file is 3229 lines big!!! Plus it is written in C++.
Do not get me wrong, I truly believe Satoshi Nakamoto (i.e., whoever hides behind that name) is a genius and also a great software developer, but he/she/they for sure did not have readability as their main priority.
I also noticed that some other people had the same idea before but I did not really like how they were implemented, for example:
This one --> as pointed out in one of the comments, it does not include the hash of the previous block in the current block... trying to make things simple is not an excuse for this, if you do not have this core feature, then what you have is not a shared ledger implementation at all, period.
Or this other one is not exactly what I had in mind, I wanted my running code to demonstrate how the algorithm works by generating multiple blocks and transactions in an automated way (more on this later).
So I chose Java to code my shared ledger simulator from scratch mainly because it is my favorite programming language and also because it has better support for cryptography than many other languages out there. The complete source is available on GitHub under the MIT license.
Assumptions and compromises taken
I wanted my implementation to be compliant with the core shared ledger principles while keeping things simple (or at least not too complicated), so these are the main characteristics/assumptions/compromises I made (in no particular order):
1.- Bitcoin (and Ethereum too) use Elliptic Curve Digital Signature Algorithm but I used good old RSA instead. Conceptually there is not much difference, since both rely in the same common concepts such as private key, public key and signature
2.- The addresses are represented as arrays of bytes and each address is unequivocally associated to an RSA key pair. For simplicity, I took the address as the last 20 bytes (or 40 hexadecimal digits/characters) of the public key.
3.- There are only two type of nodes: full nodes (i.e., miners) and thin clients (which submit transactions). Each node runs on a different thread, analogously to real life where each node runs on a different computer which may be geographically very distant from other nodes.
4.- To simulate the network, I used two simple abstract data types, BlocksInTime.java and TransactionsInTime.java. Every thread/node has access to both ADTs:
- Full Node (Miners) have "read and add" access to BlocksInTime and "read only" access to TransactionsInTime
- Thin clients have "read only" access to BlocksInTime (to check when is safe to consider a submitted transaction as confirmed, e.g., after five blocks, but please note that this functionality has not been implemented yet) and "add" access to TransactionsInTime
5.- Every transaction must be signed (by a thin client) using the private key associated with the source address of the transaction --> this is one of the core principles of shared ledger
6.- Thin clients do not explicitly set a transaction fee, the difference between total inputs and total outputs of the transaction is considered to be the transaction fee --> exactly how Bitcoin works
7.- The higher the fee is, the higher chance for the transaction to be included in the next block by the lucky miner, i.e., transactions with higher fee have higher priority.
8.- Signing a transaction is implemented with the help of the standard Java serialization, once we get the array of bytes which represent an unsigned transaction, the bytes are signed using the private key of the thin client and the resulting signature is added ("stamped") to the signed transaction object, you may check this method.
9.- The number of maximum transactions than can fit in a block is set through a property in the properties file (Bitcoin limits the size of the block in bytes but I just limited the number of transactions for the sake of simplicity), so if the transaction fee is too low, it may take longer (or even forever) to be included in a mined block,
but please note that submitted transactions never really expire, more info here.
10.- Every block (except the genesis block) must contain the hash of its parent block --> this is another core principle of shared ledger
11.- Every block contains the address of the miner, and includes a transaction with no source address which sends the miner's reward amount to that address, this is how new coins are being generated, thus the origin of every single coin/amount in every transaction can be traced to a mining reward transaction.
12.- In Bitcoin, the number of coins is limited (to 21 million) and at some point in the future (the year 2140) no more bitcoins will be created from mining, this limitation was not implemented in the simulator for the sake of simplicity and because I did not considerate it essential.
13.- In the current implementation, a very basic (but reliable) Proof of Work mechanism is used when generating new blocks, I may add other mechanisms (e.g., Proof of Stake or something else which I already have in mind) in the future.
For the sake of simplicity, the number of leading zeros in the block hash required is just set through a property value, instead of being auto-adjusted by the network to maintain an approximate constant frequency of blocks generation.
14.- The most difficult part to come up with was the algorithm for a full node (miner) to decide which block is the latest valid one from the longest chain (yes, there are multiple chains of blocks, I actually think that the term block chain is somehow misleading, block tree would be a better one). I had to put into practice a little of Graph theory (more precisely Depth First Search/Traversal algorithm) to calculate the distance (a.k.a. depth in a tree node) of a given block from the Genesis block.
This may actually deserve its own post.
15.- There is no single data structure to permanently store the balances (i.e., what amount is available on what address/account). Balances are calculated on the fly by checking every single transaction present in a block chain --> this is another core principle of share ledger which my implementation honors.
16- I added several "sleep statements" across the source code in order to slow down the execution for didactic purposes.
Running the code
So let's see the code in action.
In the this video you can see the execution of the FullNodeAndThinClientTester.java which showcases two full nodes (miners) competing to generate the next block and also two thin clients submitting transactions to the network which then are included into the blocks.
You may execute the code yourself in your computer and even modify the timings of the main events (e.g., when to start a miner to join the network and start mining) or change the value of the properties to see how that impacts the outcome of the execution (i.e., the blocks generated).
Also, please let me know in case I missed anything or any of my assumptions or compromises made are not accurate or properly explained.
And since I believe that the best way to understand something is to get your hands dirty, that is what I have done, after I got a grasp of the core principles (or that is what I thought back then), I decided to code my own shared ledger simulator.
Initially, I also considered to look into the main existing code bases (e.g., the source code of the main/official Bitcoin client/miner or Ethereum's) since they are open source, but seeing code like this put me off... That file is 3229 lines big!!! Plus it is written in C++.
Do not get me wrong, I truly believe Satoshi Nakamoto (i.e., whoever hides behind that name) is a genius and also a great software developer, but he/she/they for sure did not have readability as their main priority.
I also noticed that some other people had the same idea before but I did not really like how they were implemented, for example:
This one --> as pointed out in one of the comments, it does not include the hash of the previous block in the current block... trying to make things simple is not an excuse for this, if you do not have this core feature, then what you have is not a shared ledger implementation at all, period.
Or this other one is not exactly what I had in mind, I wanted my running code to demonstrate how the algorithm works by generating multiple blocks and transactions in an automated way (more on this later).
So I chose Java to code my shared ledger simulator from scratch mainly because it is my favorite programming language and also because it has better support for cryptography than many other languages out there. The complete source is available on GitHub under the MIT license.
Assumptions and compromises taken
I wanted my implementation to be compliant with the core shared ledger principles while keeping things simple (or at least not too complicated), so these are the main characteristics/assumptions/compromises I made (in no particular order):
1.- Bitcoin (and Ethereum too) use Elliptic Curve Digital Signature Algorithm but I used good old RSA instead. Conceptually there is not much difference, since both rely in the same common concepts such as private key, public key and signature
2.- The addresses are represented as arrays of bytes and each address is unequivocally associated to an RSA key pair. For simplicity, I took the address as the last 20 bytes (or 40 hexadecimal digits/characters) of the public key.
3.- There are only two type of nodes: full nodes (i.e., miners) and thin clients (which submit transactions). Each node runs on a different thread, analogously to real life where each node runs on a different computer which may be geographically very distant from other nodes.
4.- To simulate the network, I used two simple abstract data types, BlocksInTime.java and TransactionsInTime.java. Every thread/node has access to both ADTs:
- Full Node (Miners) have "read and add" access to BlocksInTime and "read only" access to TransactionsInTime
- Thin clients have "read only" access to BlocksInTime (to check when is safe to consider a submitted transaction as confirmed, e.g., after five blocks, but please note that this functionality has not been implemented yet) and "add" access to TransactionsInTime
5.- Every transaction must be signed (by a thin client) using the private key associated with the source address of the transaction --> this is one of the core principles of shared ledger
6.- Thin clients do not explicitly set a transaction fee, the difference between total inputs and total outputs of the transaction is considered to be the transaction fee --> exactly how Bitcoin works
7.- The higher the fee is, the higher chance for the transaction to be included in the next block by the lucky miner, i.e., transactions with higher fee have higher priority.
8.- Signing a transaction is implemented with the help of the standard Java serialization, once we get the array of bytes which represent an unsigned transaction, the bytes are signed using the private key of the thin client and the resulting signature is added ("stamped") to the signed transaction object, you may check this method.
9.- The number of maximum transactions than can fit in a block is set through a property in the properties file (Bitcoin limits the size of the block in bytes but I just limited the number of transactions for the sake of simplicity), so if the transaction fee is too low, it may take longer (or even forever) to be included in a mined block,
but please note that submitted transactions never really expire, more info here.
10.- Every block (except the genesis block) must contain the hash of its parent block --> this is another core principle of shared ledger
11.- Every block contains the address of the miner, and includes a transaction with no source address which sends the miner's reward amount to that address, this is how new coins are being generated, thus the origin of every single coin/amount in every transaction can be traced to a mining reward transaction.
12.- In Bitcoin, the number of coins is limited (to 21 million) and at some point in the future (the year 2140) no more bitcoins will be created from mining, this limitation was not implemented in the simulator for the sake of simplicity and because I did not considerate it essential.
13.- In the current implementation, a very basic (but reliable) Proof of Work mechanism is used when generating new blocks, I may add other mechanisms (e.g., Proof of Stake or something else which I already have in mind) in the future.
For the sake of simplicity, the number of leading zeros in the block hash required is just set through a property value, instead of being auto-adjusted by the network to maintain an approximate constant frequency of blocks generation.
14.- The most difficult part to come up with was the algorithm for a full node (miner) to decide which block is the latest valid one from the longest chain (yes, there are multiple chains of blocks, I actually think that the term block chain is somehow misleading, block tree would be a better one). I had to put into practice a little of Graph theory (more precisely Depth First Search/Traversal algorithm) to calculate the distance (a.k.a. depth in a tree node) of a given block from the Genesis block.
This may actually deserve its own post.
15.- There is no single data structure to permanently store the balances (i.e., what amount is available on what address/account). Balances are calculated on the fly by checking every single transaction present in a block chain --> this is another core principle of share ledger which my implementation honors.
16- I added several "sleep statements" across the source code in order to slow down the execution for didactic purposes.
Running the code
So let's see the code in action.
In the this video you can see the execution of the FullNodeAndThinClientTester.java which showcases two full nodes (miners) competing to generate the next block and also two thin clients submitting transactions to the network which then are included into the blocks.
You may execute the code yourself in your computer and even modify the timings of the main events (e.g., when to start a miner to join the network and start mining) or change the value of the properties to see how that impacts the outcome of the execution (i.e., the blocks generated).
Also, please let me know in case I missed anything or any of my assumptions or compromises made are not accurate or properly explained.
Comments
Post a Comment