I took an "Intro to Blockchain" course and here's what I learned...
Background
So a little background about why I took a blockchain course in the first place. For starters, prior to this, I had no idea about blockchain technology and how it works. For example, I'd always heard that crypto involves "mining" and that it is not super environment friendly but had no idea about what's being mined here. The course helped me understand this and many other underlying concepts of blockchain. A blockchain course was offered back when I was in college but I was too cool (or uncool) to take it then.
As I don't come from a traditional CS background but do know a little programming, going into the course the aim was not to build something myself but more so to get a general understanding of how it works since it's a growing field. Also since the crypto space is getting soo much traction these days I was curious as to why the hype. The other reason is from an investment point where I wanted to get into this space but didn't want to put my money into something I have no idea about. I thought it's better to first understand how it works at least to some extent rather than diving in blindly. Also, this is not going to be super-technical, I plan to explain how I understood it which is in a simple manner.
About the course
I did some research on the available courses out there that serve as an intro to the blockchain and came across this one course on Pluralsight titled "Blockchain – Principles and Practices" and since I had access to it through my employer, I chose to go with it. The course also goes through sample code where the author implements a sample blockchain in C#, since my plan was not to get into development I skipped these.
What did I learn?
Blockchain, what?
The course starts when an intro section, where the author introduces the concept of a 'blockchain'. From what I understood, a 'blockchain' is basically a ledger or record of transactions stored in blocks that are linked together cryptographically. If that sounds complicated then just imagine a page that has multiple transactions listed on it and each page has a page number that lets you link it with the others. He then teases what this can be used for, a common use case being digital currency(crypto), more on this later.
Cryptography but simplified
Diving deeper into the core logic behind blockchain, the course next visits the concept of cryptography. Now I did take a course in cryptography in my undergrad however I didn't pay much attention then so this was a lot for me. The three major concepts discussed here are hashing, HMAC, and digital signature.
Hashing is a technique to generate a unique identifier code for a document by using an algorithm. The technique makes sure no two documents can map to the same code. Now the way this is used in the blockchain is by generating and adding a hash for each block that you want to add to the blockchain.
HMAC takes hashing to the next level by adding a key used for generating the Hashed Message Authentication Code(HMAC) known only to the sender and receiver. Hence the integrity and authentication are both achieved as only the ones with the key can verify the message.
Digital signatures as the name suggests is synonymous with a physical signature at the end of a document that lets you identify the person who signed it. This is again a cryptographically generated code that only the user signing can generate with their keys but anyone else can verify to make confirm their identity. These are also included in the blocks to ensure the user who signed it cannot later deny the same.
The 'block' in a blockchain:
The course next goes into what a block comprises and how the blocks are linked together. From earlier now you know blockchains are majorly for storing transactions, to do so you can store all relevant data for a transaction. For example, in retail, this would be the transaction ID, the item details like SKU, payment info, the purchase amount, and so on. The other piece besides this is the header which has info for identifying and linking blocks. Now the way the blocks are "cryptographically" linked is by adding the block hash for the previous block. The hash code for the current block is calculated as follows:
Block_hash(current-block) = Hashing_function[
Data +
Previous-block-hash +
Digital-signature +
Header-info
]
Now if a block is altered there's no way to generate the same hashcode for that block since the Data element in the above equation changes. Also, the link breaks since now the corresponding blocks will have an incorrect Previous block hash.
The image below shows what a block looks like and how they're linked cryptographically:
Adding to a blockchain
The transactions which are added to these blocks are first added to a pool of transactions. This pool has all the transactions that we'd like to be added to the transaction. For example, if the retail store has multiple branches in multiple locations they will first send their transactions to the pool. The transactions are then picked up by nodes that create blocks in the above fashion and add it to the blockchain. You can think about these nodes as computers doing the math for calculating the hashes and digital signatures to add a block. For big blockchains like BitCoin, multiple such nodes are running together. In this case, these nodes need to see the same transaction pool, different blockchains achieve this in different ways.
Proof of work
Now if you're like me, you may think altering a specific block in the blockchain would require re-calculating a new hash for the block and changing the hashes in the following blocks which sounds like a lot of work but is somewhat doable. This is where the concept of "proof of work" comes into the picture which makes it really difficult to add new blocks and almost impossible to change the existing ones especially if the blockchain is somewhat big.
The way this is achieved is two-fold, first by adding a random number to the block called a nonce. Since the nonce is a part of the block it's included in the calculation for the block hash. The nonce can be any number but the way it's chosen is by setting a rule for all the hashes in a blockchain. For example, the only acceptable hash would be one that starts with four zeros like 000084fv...
Now for a block to be added to the blockchain the node would have to find a nonce value which when added to the other data generates a hash that starts with four zeros. Solving this complex math problem would entail a brute force search to find a compatible nonce value that satisfies the condition. Finding this nonce value to add the block is "mining" a block. This also makes the blockchain immutable since now altering a block would require solving multiple such puzzles for changing the block hash.
You may have heard about how blockchain mining isn't environment friendly, it's because this operation requires a lot of computing power from the nodes leaving a heavy carbon footprint. Moreover, the way the blockchain works is each node that successfully adds a block by solving a puzzle is rewarded in terms of tokens which may be of immense value depending on the blockchain itself(eg. Bitcoin). Hence all the nodes in a blockchain network are in a race to solve a puzzle to add a block.
Your blockchain v/s mine
So in the last section, we went over how nodes have to put in some work by solving a puzzle to add blocks to the blockchain, whichever node succeeds first gets to add a block and claim the reward. Now the logical next question is what happens if two nodes solve the puzzle at the same time, this is rare but may happen given a large number of nodes. In this case, technically both nodes get to add blocks which creates a 'fork' in the blockchain.
Now, this is not desired because this creates confusion as to which chain is the real one. However, this problem is resolved automatically in a blockchain. So in a network of nodes, once a node adds a block it propagates this info to all the other nodes for them to maintain the blockchain and have a fair chance at mining the next block and keep the chain growing. In the case of a fork, both the nodes will send this info about the block to all other nodes. Now putting yourself in the shoes of the other nodes, you're getting two contradicting pieces of information about the most latest block but you'll trust the one that gets to you first and ignore the second and continue to build on top of the first block. In this fashion, every time there's a fork, all the other nodes naturally come to a consensus on which branch to accept as the correct one by picking the longest chain and continuing to build on it, hence resolving the fork. Now for the blocks added to the abandoned branch, the transactions in these blocks are added back to the transaction pool and are available to be mined once again.
What's next?
Now that I have a good idea about how a blockchain works, I'm looking to learn more about its use cases, especially the popular ones like cryptocurrency. So keep an eye out for a similar blog on that.
If you found this helpful, share it. If you're into this, you can find me on Twitter @abhishek27297.