Friday, March 30, 2007

Java and Python

One thing that we are asked from time to time is if there is an Eclipse plugin for bzr. At the moment, there is a project which has been started: bzr-eclipse

It is still in the very early stages, but it seems there is enough interest, so I figured I would explore the space a bit.

One issue is trying to figure out how to communicate between bzr (written in Python) and Eclipse (written in Java).

One obvious method is to just write Java code which calls out to bzr the command line program, and then parses the string output from stdout and stderr. This can work, but bzr isn't especially scriptable. It can be scripted, but it is more focused on being something that is nice to use for a human than something that is easy to parse for a machine.

We have a much richer machine api in bzrlib the python library which is the guts of bzr. Wouldn't it be nice if we could get direct access to this rich API.

Well, there are two projects that I know of Jython and one I just heard about JEPP (Java Embedded Python).

Jython has the goal of running python code directly on the Java Virtual Machine. I'm not sure of everything that this entails, but my understanding is that it is basically writing a compiler that turns Python code into Java bytecodes. I have high hopes for this project, but at the moment it only supports Python 2.3 syntax (if you use the current beta). Unfortunately bzr is written with Python 2.4 syntax in mind. (We use decorators a lot and some generator comprehensions).

The other (major?) limitation is that Jython doesn't have a good way to support "os.chdir()". And while our general code doesn't actually use it, out test suite makes heavy use of "os.chdir()" to make sure that each test runs in isolation. Other limitations include not having a complete python standard library. Again, we use subprocess in the test suite when we want to ensure a clean run of bzr. We also use logging. There is also some concern about C extensions. At the moment, bzr is written in 100% python code, but as we finalize our data structures, we would like to implement any heavy processing loops in C/C++ (or possibly pyrex, which compiles to C).

But we could probably work around most of the missing functionality. The biggest thing is just Python 2.4 compatibility.

But this week I was exposed to Java Embedded Python or JEPP. Which takes the other approach. Rather than implementing the Python language in Java, just embed a CPython interpreter in a Java process.

This means you can use whatever CPython you have available on your system (2.3, 2.4, 2.5?). And you are sure to have access to the full standard library, extensions should never be a problem, etc.

The only real limitation of this approach is figuring out how well you can expose the embedded CPython interpreter. At a basic level, it isn't much different than calling 'python -c "do something"'. But it is possible to create a richer interaction between the CPython interpreter and the JVM, which is what JEPP is trying to do.

I played with JEPP today, and I think it is a really good start. It isn't functional enough yet that I would use it for a large project. But it seems almost there. At the moment it is able to return integers, floats, longs, and strings. But it isn't able to pass back and forth Python objects.

It does let you do stuff like:

Jep jep = new Jep(false, ".");
jep.runScript("a_python_script.py");

An the script can have quite a bit of logic. The script is run as '__main__', and the variables, functions, etc are in the running namespace. So you can do stuff like:

Object value = jep.getValue("variable");

or

Object ret = jep.invoke("a_function", "param1", 2, 3);

If "a_function" returns a "basic" type (int, long, float, str), then the returned Java Object is a Integer, Float, String, etc.

The only thing that doesn't work well is when the returned object is not a basic type. The code falls back to the catch-all, which converts everything to a string. I don't think this is the long term plan for the project, because they have a "PyObject" Java class.

I would expect the PyObject class to develop functions similar to Boost::Python's boost::python::object class.

I don't know if they will end up exposing as much of the api (slice is a nice convenience function, but logically maybe it shouldn't be on object), but ones like attr would certainly be useful. (As they also give you a way to call member functions, etc).

I know Boost does a lot of work behind the scenes with templates, and Java doesn't have the same functionality. I don't know if Java "Generics" are up to the task of PyObject(function).

Now I just have to figure out how to get commit notifications for a Sourceforge SVN project, so I can watch it evolve. :)

3 comments:

Clark Updike said...

You might also want to check out other integration projects like JPype, JEPP and JPE (see this page and the "similar projects" links):
JPype

jam said...

Thanks for the references. I hadn't heard of JPype. It certainly sounds a lot like JEPP.

It is a shame that JPype doesn't explain its differences from JEPP or JPE, it would have been nice to have a feeling for how the different projects are planning on approaching the problem.

Unknown said...

you have a nice site.thanks for sharing this site. various kinds of ebooks are available here

http://feboook.blogspot.com