Downsampling large time series for visualization

This code is inspired by Bokeh:datashader. I have a time series of millions of data points which I would like to visualize in a browser. But, 1. It is slow to transmit and plot so many points in JS. 2. Even if the browser is powerful enough to draw all those points, the points will for sure lie on top of each other since the computer scree has at most a few thousands of pixels in one direction. The idea of datashader is to aggregate the data so that I plot at most one point per pixel. This way I make full use of the screen without losing any information visually. However, datashader overqualified for my application and it is not flexible enough for my situation. So I wrote the following code to do some simple downsampling for time series.


# To deal with time series, first need to convert pandas timestamp to int64
# df['time']=df.time.values.astype(np.int64)/1e6

import pandas as pd
import numpy as np
def sampling1d(dataframe,x,y,width,xmin=None,xmax=None):
    df=dataframe[[x,y]]
    if xmin is not None:
        df=df[df[x]>=xmin]
    if xmax is not None:
        df=df[df[x]<=xmax]
    bin_edges=np.linspace(df[x].min(),df[x].max(),width+1)
    bins=np.searchsorted(bin_edges, df[x])
    bins[bins==0]=1
    agg=df.groupby(bins)
    df2=pd.DataFrame()
    df2[x]=agg[x].max()
    df2[y+'_mean']=agg[y].mean()
    df2[y+'_min']=agg[y].min()
    df2[y+'_max']=agg[y].max()
    return df2
Advertisements

Gephi streaming from python igraph

It is a nightmare to do visualization in python igraph, at least for me. After hours tweaking cairo and pycairo and distorted node labels, I found an alternative route – push graphs to Gephi from igraph. And what’s cool about it, I can update my graph dynamically!

  1. Download the streamer plugin fro gephi
  2. start the master server in gephi
  3. Run the following python code.

import igraph as ig
import igraph.remote.gephi as igg

# Create graph
g = ig.Graph([(0,1), (0,2), (2,3), (3,4), (4,2), (2,5), (5,0), (6,3), (5,6)]
g.vs[“name”] = [“Alice”, “Bob”, “Claire”, “Dennis”, “Esther”, “Frank”, “George”]
g.vs[“age”] = [25, 31, 18, 47, 22, 23, 50]
g.vs[“gender”] = [“f”, “m”, “f”, “m”, “f”, “m”, “m”]
g.es[“is_formal”] = [False, False, True, True, True, False, True, False, False]

# Send to Gephi
gephi=igg.GephiConnection()
streamer=igg.GephiGraphStreamer()
streamer.post(g,gephi)

# Update graph
api = igg.GephiGraphStreamingAPIFormat()
event=api.get_add_node_event(“1″, dict(label=”eggs”))
streamer.send_event(event,gephi)

Finally get cairo to work with igraph

I have an anaconda distribution of python, so I tried

conda install cairo

conda install pycairo

But the latter throws error cannot find pixman even after I conda install pixman succesfully. So I gave up on this route and

brew install cairo

brew install py2cairo

This way cairo is installed in the brew directory. To use it with anaconda python, add it to the sys path

import sys

sys.path.append(“/usr/local/lib/python2.7/site-packages”)

Then it works!

p.s. to manually compile pycairo, remember to add cairo to the path because I had hard time to have configure find cairo. This is not necessary if you use brew.

export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:/opt/X11/lib/pkgconfig

pkg-config –cflags-only-I cairo

error igraph_attributes.h: No such file or directory when installing igraph

It takes me a lot of time to pip install python-igraph on a remote ubuntu machine. The  error I got is “igraph_attributes.h: No such file or directory.” but that is not the real problem.

The real problem happens when pip was trying to compile the c core of igraph and it failed due to missing library lxml2!. And what I really need is THE FOLLOWING:

sudo apt-get install libxml2-dev

ipython notebook server on a remote machine

Goal: running an ipython notebook server on a remote machine, and access from a local browser

How to: (shamelessly copied from someone’s blog

1. On the remote machine:

ipython notebook --no-browser --port=7777

2. On the local machine, my remote machine can only be accessed via a login node, so I need to use a multi-hop ssh tunnel. In order not to type the following every time, save it into a file.

host1=username@login_node.com
host2=username@dest.ination.com
ssh -L 7777:localhost:7777 $host1 ssh -L 7777:localhost:7777 -N $host2

If you don’t need to go through a login node, it is a little easier:

ssh -N -f -L localhost:7777:localhost:7777 username@dest.ination.com

error: ‘NAN’ undeclared when installing igraph

I got this strange error when installing igraph:

plfit/gss.c: In function ‘gss’:
plfit/gss.c:92: error: ‘NAN’ undeclared (first use in this function)
plfit/gss.c:92: error: (Each undeclared identifier is reported only once
plfit/gss.c:92: error: for each function it appears in.)
plfit/gss.c:93: error: ‘INFINITY’ undeclared (first use in this function)

It turns out to be a compiler standard problem. Adding the flag CFLAGS=’-std=gnu99′ to make solves the problem