This article comes from our Partner Fabien Stepho from Object’ive and was previously written for medium.com

Bitcoin transactions have become very expensive since the end of the last year 2017, with fees rising to $55 as demand has increased considerably. So, I wanted to analyze what was happening in the bitcoin network.
This story describes how I proceed to analyze billions of bitcoin transactions.

The tools and frameworks I wanted to use for this, are all open source and include :

Bcoin — Javascript bitcoin library for node.js
Elasticsearch — RESTful, distributed search & analytics 
Kibana — Tool to explore, visualize, and discover data
Docker — Open platform for developers and sysadmins to build, ship, and run distributed applications
Microsoft Azure — Cloud platform

Here are the following steps I came accross to accomplish my goal :

Step 1 — Installing the cloud infrastructure
Step 2 — Running and syncing a full bitcoin node with the complete blockchain data
Step 3— Extracting bitcoin data from the local blockchain
Step 4 — Indexing the data into Elasticsearch
Step 5 — Visualizing and analysing data

Step 1 — Installing and defining the cloud infrastructure

The first step was to install the platform. A virtual machine with at least 500GB of disk space is required for the need of blockchain data and elasticsearch index.

Here is an overview of the Azure cloud portal interface :

Azure cloud portal interface for Elastic Search

Docker is a must have for packaging and running isolated containers.
My docker compose definition consists mainly of the bitcoin node container, the elasticsearch container and the kibana container :

version: '2'

services:
  bcoin:
    build: .
    container_name: bcoin
    restart: unless-stopped
    ports:
      #-- Mainnet
      - "8333:8333"
      - "8332:8332" # RPC
      #-- Testnet
      #- "18333:18333"
      #- "18332:18332" # RPC
    environment:
      BCOIN_CONFIG: /data/bcoin.conf
      VIRTUAL_HOST: bcoin.yourdomain.org
      VIRTUAL_PORT: 8332 # Mainnet
      #VIRTUAL_PORT: 18332 # Testnest
    networks:
      - "bcoin"
    volumes:
      - data:/data
      - ./secrets/bcoin.conf:/data/bcoin.conf
      - ./app:/code/app

  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:6.0.0
    container_name: elasticsearch
    hostname: elasticsearch
    environment:
      - cluster.name=bitcoin-cluster
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms1400m -Xmx1400m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - esdata:/usr/share/elasticsearch/data
      - ./elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
    ports:
      - 9200:9200
    networks:
      - bcoin

  kibana:
    image: docker.elastic.co/kibana/kibana:6.0.0
    container_name: kibana
    hostname: kibana
    volumes:
      - ./kibana.yml:/usr/share/kibana/config/kibana.yml
    ports:
      - 5601:5601
    networks:
      - bcoin

  head:
    image: mobz/elasticsearch-head:5
    container_name: head
    ports:
      - 9100:9100
    networks:
      - bcoin

networks:
  bcoin:
    external:
      name: "bcoin"

volumes:
  data:
  esdata:

As you can see, the data are not hosted inside the containers, but outside on the vm host, via docker “volumes”feature. So the generated data are not lost when containers are restarted or deleted.

Docker images and containers are constructed via the commande : docker-compose up. You can see running containers on the host vm via the command : docker ps

Running docker containers

Step 2 — Running and syncing a full bitcoin node with the complete blockchain data

I choosed to run the bcoin open source library. 
The project is hosted on github : https://github.com/bcoin-org/bcoin. I tried first with Bitcore library, but I encountered performance issues while syncing.
The syncing of the blockchain data was a bit long and lasted several days, so I decided to switch with Bcoin library on wich I can develop code written in node js (step 3 and step 4).

Bitcoin node synchronizing data
Bitcoin node synchronizing data

Step 3 — Extracting bitcoin data from the local blockchain

Thanks to the bcoin API, I writed some pieces of codes to retrieve bitcoin transaction data from the full running node. 
The API function I used is client.getBlock(height). Once I extracted the data, I pushed it to Elasticsearch index.

'use strict';

const defaults = require('../lib/helpers').getDefaultClients();
const client = defaults.client;
const esIndexer = require('./es.js');

indexTxs();
async function indexTxs() {

/**
 * Get Block by Height
 */ 
  let blockHeight;
  blockHeight=560000;
  /**
     curl $url/block/$blockHeight
   * ---
     bcoin cli block $blockHeight
   */
  try {
    for (var height = 0 ; height<blockHeight; height++){
        var block = await client.getBlock(height);
       
        console.log("Processing Block : " + height + " (" + block.txs.length +" txs)");
        for (var i = 0; i< block.txs.length; i++){
          var tx = block.txs[i];
          // Queue the block heigth for processing
         
          tx.blockheight = block.height;
          tx.blockhash = block.hash;
          tx.time = block.time;
          
          var forceIndex =  (height == blockHeight -1)  && (i == block.txs.length -1);
          const response = await esIndexer.indexItem(tx, forceIndex);
        }
    }
  } catch (e) {
    console.error(e);
  }
}

Step 4 — Indexing the data into Elasticsearch

I created three indexes to handle the data. One for output transactions, one for input transactions and one for the main transactions. Each index contains fields like address, block height, block hash, fee, amount, coinbase, etc.

Elasticsearch indexes
Elasticsearch indexes

The code which push the data into Elasticsearch use the javascript elasticsearch API with his bulk indexing feature. The bulk API allows me to make multiple index requests in a single step. This is particularly useful since I need to index a lot of transactions, which can be queued up and indexed in batches of thousands (100 000 in my case).

Here is the code for indexing :

'use strict';
var exports = module.exports = {};
var elasticsearch = require('elasticsearch');
var async = require("async");
var body = [];

exports.indexItem = async function (item, forceIndex) {
    
    console.log("Adding " + item.hash + " for bulk indexing")
    // INDEXATION VOUT
    item.outputs.forEach(function (vout) {
        var toIndex = {
            type: "output",
            blockhash: item.blockhash,
            blockheight: item.blockheight,
            hash: item.hash,
            date: new Date(item.time * 1000),
            value: vout.value,
            //script: vout.script,
            address: vout.address
        }
        // action description
        body.push({ index:  { _index: 'bcoin-output', _type: 'output'} });
        body.push(toIndex);

    });
    
    // INDEXATION VIN
    item.inputs.forEach(function (vin) {
        var toIndex = {
            type: "input",
            blockhash: item.blockhash,
            blockheight: item.blockheight,
            hash: item.hash,
            date: new Date(item.time * 1000),
            prevout_hash: vin.prevout.hash,
            prevout_index: vin.prevout.index,
            //script: vin.script,
            witness: vin.witness,
            sequence: vin.sequence,
            address: vin.address
        }
        if (vin.coin) {
            toIndex["utxo_version"] = vin.coin.version;
            toIndex["utxo_height"] = vin.coin.height;
            toIndex["utxo_value"] = vin.coin.value;
            //toIndex["utxo_script"] = vin.coin.script;
            toIndex["utxo_address"] = vin.coin.address;
            toIndex["utxo_coinbase"] = vin.coin.coinbase;
        }

        // action description
        body.push({ index:  { _index: 'bcoin-input', _type: 'input'} });
        body.push(toIndex);
    });
    
    // INDEXATION TX
    var toIndex = {
        type: "tx",
        blockhash: item.blockhash,
        blockheight: item.blockheight,
        hash: item.hash,
        date: new Date(item.time * 1000),
        witnessHash: item.witnessHash,
        fee: item.fee,
        rate: item.rate,
        mtime: item.mtime,
        index: item.index,
        version: item.version,
        locktime: item.locktime,
        //hex: item.hex
    }

    // action description
    body.push({ index:  { _index: 'bcoin-tx', _type: 'tx'} });
    body.push(toIndex);
    if (body.length>100000 || forceIndex){
        var esClient = new elasticsearch.Client({
            host: 'elasticsearch:9200',
            log: 'info',
            requestTimeout: 1000 * 60 * 60,
            timeout: 1000 * 60 * 60,
            keepAlive: false,
            deadTimeout : 6000000, 
            maxRetries : 15
        });
        console.log("BULK INDEXING " + body.length + " txs" );
        const resp = await esClient.bulk({
            body
        });
        esClient.close();
        console.log("INDEXED " + body.length + " txs" );
        console.log("Took: " + resp.took);
        console.log("Errors: " + resp.errors);
        body = [];
    }

}

Once runned, I waited several hours before I got 500 000 blocks totally indexed :

Bulk indexation processing result
Bulk indexation processing result

Step 5 — Visualizing and analysing data

Once the data was indexed, I plugged Kibana of the elastic stack, on the indexes :

Discovering transaction data with Kibana
Discovering transaction data with Kibana

I defined several visualizations and created a simple dashboard with metrics like “Total transaction fees”, “Number of unspent transaction outputs”, “miners repartition”, “Most popular addresses”

Kibana dashboard for bitcoin analysis
Kibana dashboard for bitcoin analysis

Conclusion

With all these five steps done, I could now make things like :

  • Transaction monitoring,
  • Complex querying
  • Exploring anomalies with machine learning,
  • Analyzing transaction relationships with Graph,

My next story will be on these use cases, stay tuned !

Note : The source code is freely available on my github repository : https://github.com/fstepho/bcoin-es

You can follow Fabien Stepho on his github for cryptocurrency development ideas