Downsampling large data for visualization

This code is inspired by Bokeh:datashader. I have a time series of millions of data points which I would like to visualize in a browser. But, 1. It is slow to transmit and plot so many points in JS. 2. Even if the browser is powerful enough to draw all those points, the points will for sure lie on top of each other since the computer scree has at most a few thousands of pixels in one direction. The idea of datashader is to aggregate the data so that I plot at most one point per pixel. This way I make full use of the screen without losing any information visually. However, datashader overqualified for my application and it is not flexible enough for my situation. So I wrote the following code to do some simple downsampling for time series.

# To deal with time series, first need to convert pandas timestamp to int64
# df['time']=df.time.values.astype(np.int64)/1e6

import pandas as pd
import numpy as np
def sampling1d(dataframe,x,y,width,xmin=None,xmax=None):
    if xmin is not None:
    if xmax is not None:
    bins=np.searchsorted(bin_edges, df[x])
    return df2

Here is a version for sampling big data in 2D

def downsample2d(x,y,logx=False,logy=False,width=500,height=500,weights=None):
    if logx:
    if logy:

    z,binx2,biny2=np.histogram2d(x,y,bins=[binx, biny])
    binx2=(binx2[:-1] + binx2[1:])/2
    biny2=(biny2[:-1] + biny2[1:])/2
    if weights is not None:
        z2,_,_=np.histogram2d(x,y,bins=[binx, biny],weights=weights)
        return binx2[xi],biny2[yi],z2[xi,yi]/z[xi,yi]
    return binx2[xi],biny2[yi]

Finally get cairo to work with igraph

I have an anaconda distribution of python, so I tried

conda install cairo

conda install pycairo

But the latter throws error cannot find pixman even after I conda install pixman succesfully. So I gave up on this route and

brew install cairo

brew install py2cairo

This way cairo is installed in the brew directory. To use it with anaconda python, add it to the sys path

import sys


Then it works!

p.s. to manually compile pycairo, remember to add cairo to the path because I had hard time to have configure find cairo. This is not necessary if you use brew.

export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:/opt/X11/lib/pkgconfig

pkg-config –cflags-only-I cairo

error: ‘NAN’ undeclared when installing igraph

I got this strange error when installing igraph:

plfit/gss.c: In function ‘gss’:
plfit/gss.c:92: error: ‘NAN’ undeclared (first use in this function)
plfit/gss.c:92: error: (Each undeclared identifier is reported only once
plfit/gss.c:92: error: for each function it appears in.)
plfit/gss.c:93: error: ‘INFINITY’ undeclared (first use in this function)

It turns out to be a compiler standard problem. Adding the flag CFLAGS=’-std=gnu99′ to make solves the problem

easy_install does not work after distribute upgrade

I tried to upgrade matplotlib which asked me to upgrade distribute. I upgraded distribute and then easy_install does not work…… It is solved by the following

1. Check your /usr/bin and /usr/local/bin for easy_install installations and remove any old script:

sudo rm /usr/bin/easy_install*

sudo rm /usr/local/bin/easy_install*

2. Download and run distribute:

curl -O

sudo python

sudo rm

Copy from