Blockchain Technology: A case study of Bitcoin
There has been a lot of hype around this mysterious new complex technology of storing and exchanging money that is decentralized, distributed and public. Well, Spoiler-alert: It's not as complex as most of us think.
To be able to understand this technology, we'll be looking at Bitcoin as our case study, The most popularly used cryptocurrency that implements Blockchain technology under the hood.
This will be a 3 part series that will comprise of:
- Blockchain Technology: A case study of Bitcoin
- Implementing Blockchain Technology in Code
- Understanding the Math Behind Bitcoin
Buckle up and let's get into it.
So What is Blockchain Technology?
At its most simplified existance, blockchain basically defines a chain of blocks whose blocks store digital data and are interlinked with each other in a unique way. Let me explain further:
1. Blocks - They store Transactional data (i.e the way e-commerce sites store your transaction data like the date, amount and your name). So a single block can be defined as a group of transactions. These blocks also store unique information about themselves (known as a hash - a unique string of numbers and letters) that are used to identify the blocks in the blockchain. 2. Blockchain - These are a group of blocks that are interconnected with each other, pretty much like a database containing a list of blocks as its entries.
Now lets get deeper.
How does Bitcoin implement Blockchain Technology?
For starters, as long as you ended up here, you know that bitcoin is the most popular cryptocurrency currently in use - Well, as of 17th October 2019 - Technology is very dynamic and so this might not be the case a few years later. Either way, most new cryptocurrencies out there are a fork of Bitcoin technology with a few tweaks here and there to try to overcome the limitations of Bitcoin. So at this point, I'd like to assume that it's safe for me to call Bitcoin the father of other cryptocurrencies.
So how does Bitcoin do it? Well, all this boils down to Cryptography (The act of securing communication between two points), Math, Technology and a pseudonymous Japanese person named Satoshi Nakamoto.
Let's consider bitcoin as a decentralized, distributed, public ledger. Decentralized in the form that no central bank controls controls it, Distributed such that every peer on the network contains a copy of the ledger and public also such that every peer on the network has access to this distributed ledger.
Now, let's also consider two popular individuals: Alice and Bob, who are trying to transact using bitcoin. So both these individuals start off which getting a Bitcoin Wallet - A program responsible for managing bitcoins which generates a private and public key pair where the private key is responsible for authorising transactions from your bitcoin wallet and a public key that is used as your public address when sending and receiving bitcoins. The public Key is generated from the private Key.
(You can read more about What is a bitcoin wallet here and [the differences between a bitcoin client and a wallet here] (https://bitcoin.stackexchange.com/questions/20487/whats-the-difference-between-a-bitcoin-client-and-wallet))
So What is a private key? A private key is a uniquely generated string of letters and numbers that will be used to authorise your bitcoin transactions (Basically, like a pin to your ATM Card).
What is a public key? A public key is also a uniquely generated string of letters and numbers - generated from the public key - that when hashed, will be used as your address when sending or receiving money. So this means for Alice to send Bob money, Alice will need to know Bob's public key.
What is Hashing - Well hashing is the process of converting a certain string of characters to a unique string of letters and numbers of a certain length through a hashing function. As an example, given a String
john passed through a SHA256 hashing function, you get
96D9632F363564CC3032521409CF22A852F2032EEC099ED5967C0D000CEC607A which has a total of 64 characters which is equivalent to 256 bits. If you are not a computer Guru - well, most of us aren't - I'll try to explain the math briefly. In computers, every single character in Hexadecimal notation has a total of 4 bits, ranging from 0-9 and A-F. Bits are basically zeros and ones which computers use to represent data. So in this case, the hash generated is a length of 64 characters and since every character is 4 bits, the total becomes 256 bits. This length of 256 bits produced will always be the same regardless of what you feed into the SHA256 hashing function. So even a String longer than
Micheal would still yield a 256 bit long String, except that it would be quite different from what was gotten when using
So Alice and Bob have successfully acquired their Bitcoin wallets. Bob then decides to gamble with his rent money by betting on racing horses. Well long story short, the Horse Bob bet on wasn't actually a horse so Bob looses all his rent money. So Bob is on the verge of being kicked out of his home and at the same time is considering betting on a football match that comes on 4 hours later and therefore reaches out to Alice for financial help. Alice, having being brought up in a sharing family, decides to send Bob 2 bitcoins. So to do this, Alice needs Bob's Bitcoin address (which as we had discussed earlier, is Bob's hashed public key). So Bob sends Alice this address and Alice decides to send 2 Bitcoins to Bob.
To be able to do this, another important concept about Bitcoin should be understood, and this is that bitcoin does not store people's balances. Bitcoin rather stores a list of all transactions that have taken place since it's birth. Therefore, to prevent Alice from sending money that she doesn't have, the Bitcoin client will look at previous transactions that Alice was involved in (Referred to as inputs in bitcoin terminology) and check whether Alice received a sum-total that is either equal to or greater than the amount Alice wants to send, in this case 2 Bitcoins and also whether these transactions were already involved in any other transaction. If the inputs pass these parameters, then it is verified that Alice has at least 2 bitcoins in her account and can be able to send them. Also, it is important to note that not always will the inputs be exactly equivalent to the amount Alice wishes to send and therefore the surplus amount will be sent back to the Sender. So if Alice's previous transactions add up to a total of 3 bitcoins, then the extra bitcoin will be sent back to Alice.
Alice is the heir to a massive estate and therefore very wealthy, and Alice recalls that she had recently converted a couple millions to bitcoins and had this money sent to her address by the bank she was using. So Alice is quite confident that she has the capability of sending Bob 2 Bitcoins, so Alice authorises the transaction.
To be able to authorise and validate transactions, we need to understand that for starters, the bitcoin network is an interconnection of multiple nodes all containing the public list of transactions. These transactions are grouped together in blocks and this blocks are inter-linked together to form a block-chain. Each block has a unique hash that identifies it (generated from the transactions in that block together with the hash value of the previous block) and every next block in the block-chain stores this hash value of the previous block. This way, Blocks cannot be swapped with other blocks within the block chain since new blocks will always have a different hash value. At the same time, since the block's hash value is generated from the list of transactions, changing a single transaction in a block will change the hash value of that block and subsequently all the other hashes of following blocks. This is therefore an important security feature of blockchain technology that we'll look at later.
So Alice inputs Bob's bitcoin address and presses the send button on her wallet. This generates a transaction that is added to a list of unverified transactions on the bitcoin network - So this transaction hasn't been added to any block, and therefore not on the block-chain, and therefore not on the public list of transactions. For this transaction to be considered valid, the nodes on the network will first check whether the inputs (As mentioned earlier, previous transactions) have been used in any other prior transactions. At the same time, these nodes check on whether the send to address is valid. So in general, the Bitcoin network performs the following tasks before considering a transaction/group of transactions are valid:
Steps run in the Bitcoin network:
1) New transactions are broadcast to all nodes - Alice's transaction is broadcast together with other transactions currently taking place, to all other nodes in the network. Nodes here are Bitcoin clients running connected to the network. Bitcoin clients are programs that you install on your machine that perform these tasks. 2) Each node collects new transactions into a block 3) Each node works on finding a difficult proof-of-work for its block - Here, proof of work defines a mathematical problem that a node needs to solve in order to mark a block of transactions as valid. I'll talk more about this in the next section 4) When a node finds a proof-of-work, it broadcasts the block to all nodes 5) Nodes accept the block only if all transactions in it are valid and not already spent 6) Nodes express their acceptance of the block by working on creating the next block in the chain, using the hash of the accepted block as the previous hash
So what is this proof-of-work jargon?
Well, the answer to this question my also be the answer to other questions you might have: like where does bitcoin come from - can also rather be phrased as where did the first bitcoin come from - or does Alice have a thing for Bob, or whether Bob has a gambling problem and should sick professional help. Either way, hold on to the railings cause this Titanic is about to hit an iceberg.
So as we mentioned earlier, every pending transactions is held in a virtual pool that can be identified as a pool of unidentified transactions. The nodes in the bitcoin network are responsible for verifying these transactions and putting them inside a block. These nodes also attempt to solve a mathematical problem through which they use the correct answer to this problem to mark a block as valid. In the end, the node that does this first, gets rewarded for the work in form of bitcoins. As we also mentioned earlier, bitcoin uses a hashing algorithm to generate addresses. Bitcoin then uses this same technique to generate a mathematical problem, say generate a certain hash value whose first 32 bits are zero. Since such problems can only be solved through brute-force (continuously guessing using random values until you achieve the desired result) this makes the probability of solving such guesses at 2^32 since a single bit can only have two states, 0 and 1, and this value equates to above 4 Billion guesses! So there is a single solution every 4 billion guesses. A node on the bitcoin network approximately solves a mathematical problem every 10 minutes so bitcoin fine-tunes the mathematical problem such that this aspect is maintained. So this entire phenomena of solving a mathematical problem so as to validate a single block is what is referred to as Bitcoin Mining.
After all parameters of Alice's transaction check out, the transactions is considered valid and added to a block that has been validated and Bob is free to either pay his rent, or make another betting blunder. At the same time, a certain bitcoin amount is sent to the minor - pretty much like a transaction cost - As a thank you to the miner for devoting their computer's precious time and power to solving blocks. Because the transaction has been sent to Bob's public address, only Bob has his private key and therefore he's the only individual who can spend the bitcoins associated with his public address. Therefore, whoever has the private key - Stored locally by the bitcoin wallet up as a wallet.dat file - controls the bitcoin, So you should consider keeping your private key as private as possible.
This is how bitcoin greases it's continually moving gears in the bitcoin network. Wasn't so hard after all was it?
If you're a sceptical type of individual, you may have questions about whether or not this network controlled by everyone is secure on not, so in this next section, we're going to look at Security issues in bitcoin and how Bitcoin overcomes them.
Security issues - Also contains a somewhat more technical explanation of 'proof-of-work'
1. Transaction Order:
Every new Bitcoin transactions is broadcast-ed across the network to all other nodes. In this case, Bitcoin is decentralised such that every Bitcoin client contains a copy of all the transactions made in Bitcoins. Therefore, you cannot always accurately state that a transaction reaches all nodes in the bitcoin network at the same time. Consider an example: Alice wants to buy a car from Bob. Alice makes the transaction and sends Bob 5 Bitcoins for the car and waits for Bob to ship the car. Afterwards, Alice then uses the same inputs she used to send Bob 5 bitcoins and send herself back the money. Due to different network propagation times, some nodes will receive Alice's second transaction of sending herself back the 5 bitcoins before they receive Alice's transaction of sending Bob 5 bitcoins. These nodes would then consider the second transaction they received as invalid since they would be re-using inputs.
How does bitcoin overcome this?
Bitcoin arranges all transactions in a block (a limit of around 2400 transactions per block). These blocks are connected together in a chain, where every block references the previous block. You can therefore traverse these blocks in order, to the first transaction ever made in bitcoin. This therefore solves the problem of order, but in what way are these transactions added to each block and in the correct order? In this case, nodes take the 'un-ordered' or 'unconfirmed' transactions and puts them in a new block and broadcasts this block to the rest of the network. The un-ordered or unconfirmed transactions are transactions recently completed but have not been included into any block yet. But this also creates a problem in that multiple nodes could create their own Blocks containing the same transactions and broadcast them to the rest of the network. So how does the network know which block to choose? Bitcoin dictates that each valid block must contain an answer to a specific mathematical problem. Bitcoin therefore uses cryptographic hash functions, which are basically functions that when given a message as input, gives a random string of letters and numbers of a specific length (called the message digest) that is unique to this input as the output. This output will always be the same given that the input does not change. Changing even a minor part of the input will cause the output to change completely. These functions are referred to as one-way functions since you cannot use the output to inversely try to retrieve the input. In this case therefore, the only way of discovering the input is to run multiple guesses at a time until an output that matches the given output is produced. Bitcoin uses an algorithm: SHA256 that generates an output of the length of 256 bits. The mathematical problem that the nodes now have to solve is generate a certain hash value whose first n bits are all zeros (n can be any numerical value). To put this more into perspective, the total number of guesses that can be made is 2^n guesses, depending on the value of n. if n was 32, the total number of guesses would be 2^32 which is above 4 billion guesses, as we had calculated earlier. Therefore, bitcoin network fine-tunes the value of n such that the correct input will be produced around every 10 minutes. As we stated earlier, This phenomena is referred to as Bitcoin mining
The mathematical probability of this problem makes it very rare for more than one node to arrive at the correct input at the same time (i.e if n was 32, the probability would be one in over 4 billion guesses) but in cases nodes do overcome this probability and solve the problem at the same time, both blocks generated by these nodes will be considered valid and will be added to the block chain and the block's hash be used in the next block. This therefore raises another problem where there will be duplicate blocks in the chain so what the network does, is accept the longest block-chain as the proof-of-work. It is highly improbable that other nodes will again end up at a solution at the same time for the next block (due to how the mathematical problem is constantly fine-tuned) and therefore, there will always be a longer block chain and this chain will be considered as the valid chain by other nodes and therefore duplicate chains will be discarded. The reasoning behind this is that as long as honest nodes occupy the majority of the bitcoin network, the longest chain will be attributed to majority nodes working towards solving the mathematical problem (guessing correct values of the n number of zeros) therefore winning against all other competing chains in the network. Therefore, the longest chain will certainly always be produced by the majority number of nodes since the more the nodes trying to solve the problem, the higher the probability that the problem will be solved faster and therefore the block chain will grow faster. i.e., in the case of 4 billion guesses, a group of 1 million nodes will achieve the solution faster than a group of 1 thousand nodes, which makes it even harder for a single attacker with one node to win the race against the entire network.
It is therefore advisable to wait for more blocks to be added to the block chain before being sure that a transaction is actually valid since it becomes significantly harder to generate a longer block chain the further back you go in the chain. Consider this, our chain has 4 blocks already and the majority nodes are adding an extra valid block every 2 minutes, so for an attacker to be able to control this network, they need to have enough computing power to generate newer blocks faster than the network itself. So if the block chain grows to 8 blocks and the attacker has their copy of 4 blocks, they have to create the subsequent missing 4 blocks before they get to 8 blocks. Since the other nodes in the network are still actively creating blocks, this makes it very tough for the attacker to catch-up to the speed of the network the further back in the block-chain the attacker is.
Whoa! That's a whole lotta information to take in but I'd like to assume that deep down, you feel that it was worth it. Thanks for reading and if you have any questions or clarifications, don't be afraid to leave a comment behind:-) Also, stay tuned for the next post in this series: Implementing Blockchain Technology in Code. You can also choose to read more on Bitcoin on the original paper posted here