Avoid partial in multiprocessing in python

I have been using pool.map in the multiprocessing package for simple parallel jobs because of its simplicity and ease of use. However, the simplicity comes at a cost that the computation function f(x) to be parallelized can only take one argument as input. If f(x, D) needs auxiliary data D, there are a few workarounds:

1. combine the main argument and auxiliary data together as a tuple (x, D), and use this tuple as a single argument, i.e., f((x, D)).
2. use the partial function to generate a wrap-up function of f with the auxiliary data g=partial(f, D=D).
3. just ignore D in the argument list and let python find D in the memory.

It turns out that #3 is the most efficient way. I had been using #2 and didn’t not realize the difference until one day my f needs big auxiliary data D. In both #1 and #2, python will pickle the arguments and send them to the workers. When D is large, the pickling process takes a lot of time and the cost on data transfer is huge.

Lesson learnt: sometimes the naive approach might be the best approach.


Gephi streaming from python igraph

It is a nightmare to do visualization in python igraph, at least for me. After hours tweaking cairo and pycairo and distorted node labels, I found an alternative route – push graphs to Gephi from igraph. And what’s cool about it, I can update my graph dynamically!

  1. Download the streamer plugin fro gephi
  2. start the master server in gephi
  3. Run the following python code.

import igraph as ig
import igraph.remote.gephi as igg

# Create graph
g = ig.Graph([(0,1), (0,2), (2,3), (3,4), (4,2), (2,5), (5,0), (6,3), (5,6)]
g.vs[“name”] = [“Alice”, “Bob”, “Claire”, “Dennis”, “Esther”, “Frank”, “George”]
g.vs[“age”] = [25, 31, 18, 47, 22, 23, 50]
g.vs[“gender”] = [“f”, “m”, “f”, “m”, “f”, “m”, “m”]
g.es[“is_formal”] = [False, False, True, True, True, False, True, False, False]

# Send to Gephi

# Update graph
api = igg.GephiGraphStreamingAPIFormat()
event=api.get_add_node_event(“1″, dict(label=”eggs”))

error igraph_attributes.h: No such file or directory when installing igraph

It takes me a lot of time to pip install python-igraph on a remote ubuntu machine. The  error I got is “igraph_attributes.h: No such file or directory.” but that is not the real problem.

The real problem happens when pip was trying to compile the c core of igraph and it failed due to missing library lxml2!. And what I really need is THE FOLLOWING:

sudo apt-get install libxml2-dev

ipython notebook server on a remote machine

Goal: running an ipython notebook server on a remote machine, and access from a local browser

How to: (shamelessly copied from someone’s blog

1. On the remote machine:

ipython notebook --no-browser --port=7777

2. On the local machine, my remote machine can only be accessed via a login node, so I need to use a multi-hop ssh tunnel. In order not to type the following every time, save it into a file.

ssh -L 7777:localhost:7777 $host1 ssh -L 7777:localhost:7777 -N $host2

If you don’t need to go through a login node, it is a little easier:

ssh -N -f -L localhost:7777:localhost:7777 username@dest.ination.com