Bitcoin

This article covers some specific aspects of the Bitcoin and Blockchains in general.

Overview

Bitcoin is built upon a blockchain database. A blockchain is a ledger consisting of many blocks, with each block containing a list of transactions and links to the previous block (hence, forming a 'chain' of blocks). Each successive block that is added generates a reward, starting with 50 BTC and halving every 210,000 blocks.

In Bitcoin, adding a new block is computationally hard work. It is the difficulty in adding blocks that makes the blockchain secure and consistent. In addition, the difficulty can be adjusted automatically to keep new blocks at a rate of 12 per hour. While adding new blocks is hard work, verifying this work is easy.

Bitcoin uses Hashcash as its proof of work algorithm and was initially developed to combat email spam. The algorithm hashes some data (ie. the block headers) with a nonce (an arbitrary value that is incremented for ever step) over and over until the hash meets a certain requirement (ie. the difficulty). The original Hashcash implementation requires the first ~20 bits to be zeros. In Bitcoin, this is adjusted based on the difficulty. As more miners participate in the protocol, the difficulty is driven up and the more zeros are needed in the hashes for blocks that are added to the blockchain.

Transactions

The Bitcoin blockchain contains only transactions. The blockchain itself contains no account information, no balances, and no coins. Balances are derived from the state generated by the transactions that have taken place.

type Transaction struct {
  ID       []byte            // transaction ID is a hash
  Inputs   []TXInput         // list of transaction inputs
  Outputs  []TXOutput        // list of transaction outputs
}

Transaction outputs are what stores the actual 'coins' in the protocol. It contains the amount being transacted as well as a ScriptPubKey which defines conditions that must be met before this output can be used. In Bitcoin, the scripting language is a Forth-like scripting language. The most common script is a Pay-to-PubkeyHash where the script verifies that the public key hashes to the scriptPubKey and that the signature matches the public key.

type TXOutput struct {
  Value         int          // value being transferred in satoshi (0.00000001 BTC)
  ScriptPubKey  string       // Script used to verify who can use this output
}

Transaction inputs references the transaction that contains the transaction output, the amount being transacted, and a ScriptSig which provides data to be used in the transaction output's ScriptPubKey. If the data in ScriptSig is correct, the output can be unlocked and its value can be used to generate new outputs. This mechanism guarantees that only owners of the outputs can spend their coins.

type TXInput struct {
  PreviousTxID    []byte         // transaction ID of Output
  TxIndex          int           // index of the transaction's output
  ScriptSig        string        // signature data used to unlock the output's ScriptPubKey. RIPEMD16(SHA256(PubKey))
}

All transaction inputs must reference an output except for the Coinbase transaction. A coinbase transaction has no inputs but has an output and is generated as the first transaction in a block and contains the block reward as well as any difference between the inputs and output transactions in this block which is considered as a transaction fee.

Because transactions are linked to each other, a person's 'balance' can only be calculated by finding all unspent transaction outputs (UTXO). An unspent transaction output is unspent if it is not referenced by any transaction inputs.

Pseudocode for finding unspent transaction outputs:

FindUnspentTransactionOutputs(address A) {
  unspentTransactions = {}
  spentTransactions = {}

  for each block B from the tip to the genesis block {
    for each transaction T from block B {
      for each index idx and output O from transaction T {
        // If the output exists in the spentTransaction list, the output has been spent by some other input
        if spentTransactions[T.ID] exists and idx in spentTransaction[T.ID] {
           // output has been spent
           break
        }

        // If the output matches that of address A, and has not been used by some input, it is unspent.
        if output O CanUnlockWithAddress A {
          // index idx on transaction T is unspent for address A
          unspentTransactions[T.ID] += idx
        }
      }

      // keep track of all inputs whose output is from address A
      // outputs used by these inputs here are spent.
      for each input I from transaction T {
        if input I OutputCanUnlockWithAddress A {
          spentTransaction[I.PreviousTxID] += I.TxIndex
        }
      }
    }
  }
}

// FindUnspentTransactions returns a list of transactions, each containing a list of indexes to unspent outputs.

Hashes

Bitcoin typically uses double SHA256 for hashing and sometimes RIPEMD160 when a shorter hash is desired (such as Bitcoin Addresses). As the name suggests, a SHA256 hash is 256bits or 64 bytes long and a RIPEMD160 is 160bits or 40bytes long.

A Block ID that is generated is the double SHA256 of the block headers. Each block header contains the version, previous block hash, merkle root hash of all the transactions, the time, difficulty target, and the nonce.

The transaction ID is generated using double SHA256 of the transaction values (version, transaction inputs, transaction outputs, nlocktime).

The Merkle root hash used in the block headers is built by building a binary tree containing all the transaction hashes.

Addresses are hashes of a ECDSA public key, prefixed with the version and suffixed with a checksum. A special encoding called Base58 is used to then encode the hash into a readable string without zeros. Address = Base58Encode(Version + RIPEMD160(SHA256(Public Key)) + SHA256(SHA256(KeyHash)))

Signatures

Digital signatures are algorithms that guarantee that the data wasn't modified and that the data was created only by the sender. Signing something requires the private key. Verification requires the data, signature, and the public key.

Every transaction input in Bitcoin is signed by the sender and each transaction must be verified (ie. checking input permissions, and that transaction signatures are correct) before being added to a block. Transactions are signed using Elliptic Curve Digital Signature Algorithm (ECDSA).

Bitcoin Core Client

Bitcoin Core is the reference client and wallet.

Database

The core client uses LevelDB which is a key-value based database to store metadata, chainstate (Unspent Transaction Outputs, UTXO set), and block index information. The raw bitcoin blocks are stored inside the blk.dat files.

Block information:

b + 32byte block hash = block header, height, transactions, filename and offset
f + 4byte file number = file information, block ranges, time ranges, sizes
l + 4byte file number = last block file used
R = reindexing

UXTO Chainstate information:

c + 32byte transaction hash = block height, scriptPubKey and amount
B = block hash up to which this chainstate information represents

Importing a Private Key

Steps to add a private key to your wallet:

Open the console (Help -> Debug Window)
Unlock your wallet walletpassphrase "YourLongPassphrase" 600
Import Key: importprivkey <private key> "<label>"

Source: http://bitcoin.stackexchange.com/questions/5941/how-do-i-import-a-private-key-into-bitcoin-qt

Brain Wallets

A brain wallet allows for the generation of the public/private key pair using a passphrase. It is accomplished by:

Converting a pass phrase into a private key
- PrivateKey = SHA256(PassPhrase)
Generate a public key using the private key
- PublicKey = privateToPublic(PrivateKey)
Generate the bitcoin address
- Hash = SHA256(PublicKey)
- Hash160 = RIPEMD160(Hash) (used by transactions)
- Address = Base58Check(Hash160)

The publickey can either be uncompressed or compressed. A compressed public key is simply a truncated key where the missing parts can be recomputed.

Attack

Ryan Castellucci did a talk on Defcon 23 on cracking brain wallets using his program called Brainflayer, available at https://github.com/ryancdotorg/brainflayer.

The basic idea is to extract all unique addresses in the bitcoin system. Dump the addresses into a bloom filter for almost constant time search, then use Brainflayer to bruteforce check on a word list that matches a specific address hash.

That is:

get all Bitcoin Addresses as Hash160 > hashes.hex
hex2blf hashes.hex hashes.blf
brainflayer -b hashes.blf -i phraselist.txt
or cat password.txt | brainflayer -b hashes.blf
or john --incremental --stdout | brainflayer -b hashes.blf

Key Stretching

To make brain wallets stronger, the key generation can be made to be more expensive and thereby limiting the number of tries an attacker can made per second.

https://keybase.io/warp/warp_1.0.6_SHA256_e68d4587b0e2ec34a7b554fbd1ed2d0fedfaeacf3e47fbb6c5403e252348cbfc.html

Utilities

Blockparser

Reads blocks and retrieves information.

https://github.com/znort987/blockparser

To build this project, install these dependencies:

# yum install openssl-devel perl-Data-Dumper perl-Digest-SHA boost-devel gcc-c++ gcc

The project also depends on google dense hash map: https://github.com/sparsehash/sparsehash

Run the make script to compile.

Usage

# ./parser allBalances > allBalances.txt
# cat allBalances.txt