Home
25 November 2009 @ 01:16 am

Blog has moved! Please, update your links.

Some interesting changes have been happening in my professional life, so I wanted to share it here to update friends and also for me to keep track of things over time (at some point I will be older and will certainly laugh at what I called “interesting changes” in the ol’days). Given the goal, I apologize but this may come across as more egocentric than usual, so please feel free to jump over to your next blog post at any time.

It’s been little more than four years since I left Conectiva / Mandriva and joined Canonical, in August of 2005. Shortly after I joined, I had the luck of spending a few months working on the different projects which the company was pushing at the time, including Launchpad, then Bazaar, then a little bit on some projects which didn’t end up seeing much light. It was a great experience by itself, since all of these projects were abundant in talent. Following that, in the beginning of 2006, counting on the trust of people which knew more than I did, I was requested/allowed to lead the development of a brand new project the company wanted to attempt. After a few months of research I had the chance to sit next to Chris Armstrong and Jamu Kakar to bootstrap the development of what is now known as the Landscape distributed systems management project.

Fast forward three and a half years, in mid 2009, and Landscape became a massive project with hundreds of thousands of very well tested lines, sprawling not only a client branch, but also external child projects such as the Storm Object Relational Mapper, in use also by Launchpad and Ubuntu One. In the commercial side of things it looks like Landscape’s life is just starting, with its hosted and standalone versions getting more and more attention from enterprise customers. And the three guys which started the project didn’t do it alone, for sure. The toy project of early 2006 has grown to become a well structured team, with added talent spreading areas such as development, business and QA.

While I wasn’t watching, though, something happened. Facing that great action, my attention was slowly being spread thinly among management, architecture, development, testing, code reviews, meetings, and other tasks, sometimes in areas not entirely related, but very interesting of course. The net result of increased attention sprawl isn’t actually good, though. If it persists, even when the several small tasks may be individually significant, the achievement just doesn’t feel significant given the invested effort as a whole. At least not for someone that truly enjoys being a software architect, and loves to feel that the effort invested in the growth of a significant working software is really helping people out in the same magnitude of that investment. In simpler words, it felt like my position within the team just wasn’t helping the team out the same way it did before, and thus it was time for a change.

Last July an external factor helped to catapult that change. Eucalyptus needed a feature to be released with Ubuntu 9.10, due in October, to greatly simplify the installation of some standard machine images.. an Image Store. It felt like a very tight schedule, even more considering that I hadn’t been doing Java for a while, and Eucalyptus uses some sexy (and useful) new technology called the Google Web Toolkit, something I had to get acquainted with. Two months looked like a tight schedule, and a risky bet overall, but it also felt like a great opportunity to strongly refocus on a task that needed someone’s attention urgently. Again I was blessed with trust I’m thankful for, and by now I’m relieved to look back and perceive that it went alright, certainly thanks to the help of other people like Sidnei da Silva and Mathias Gug. Meanwhile, on the Landscape side, my responsibilities were distributed within the team so that I could be fully engaged on the problem.

Moving this forward a little bit we reach the current date. Right now the Landscape project has a new organizational structure, and it actually feels like it’s moving along quite well. Besides the internal changes, a major organizational change also took place around Landscape over that period, and the planned restructuring led me to my current role. In practice, I’m now engaging into the research of a new concept which I’m hoping to publish openly quite soon, if everything goes well. It’s challenging, it’s exciting, and most importantly, allows me to focus strongly on something which has a great potential (I will stop teasing you now). In addition to this, I’ll definitely be spending some of that time on the progress of Landscape and the Image Store, but mostly from an architectural point of view, since both of these projects will have bright hands taking care of them more closely.

Sit by the fireside if you’re interested in the upcoming chapters of that story. ;-)

 
 
13 October 2009 @ 08:05 pm

Blog has moved! Please, update your links.

This post is not about what you think it is, unfortunately. I actually do hope to go to the Easter Island at some point, but this post is about a short story which involves geohash.org, Groundspeak (from geocaching.com), and very very poor minded behavior.

The context

So, before anything else, it’s important to understand what geohash.org is. As announced when the service was launched (also as a post on Groundspeak’s own forum), geohash.org offers short URLs which encode a latitude/longitude pair, so that referencing them in emails, forums, and websites is more convenient, and that’s pretty much it.

When people go to geohash.org, they can enter geographic coordinates that they want to encode, and they get back a nice little map with the location, some links to useful services, and most importantly the actual Geohash they can use to link to the location, so as an example they could be redirected to the URL http://geohash.org/6gkzwgjf3.

Of course, it’s pretty boring to be copy & pasting coordinates around, so shortly after the service launched, the support for geocoding addresses was also announced, which means people could type a human oriented address and get back the Geohash page for it. Phew.. much more practical.

The problem

All was going well, until a couple of months ago, when a user reported that the geocoding of addresses wasn’t working anymore. After some investigation, it turned out that geohash.org was indeed going over the free daily quota allowed by the geocoding provider used. But, that didn’t quite fit with the overall usage reports for the system, so I went on to investigate what was up in the logs.

The cause

Something was wrong indeed. The system was getting thousands of queries a day from some application, and not only that, but the queries were entirely unrelated to Geohashes. The application was purely interested in the geocoding of addresses which the site supported for the benefit of Geohash users. Alright, that wasn’t something nice to do, but I took it lightly since the interface implemented could perhaps give the impression that the site was a traditional geocoding system. So, to fix the situation, the non-Geohash API was removed at this point, and requests for the old API then started to get an error saying something like 403 Forbidden: For geocoding without geohashes, please look elsewhere..

Unfortunately, that wasn’t the end of the issue. Last week I went on to see the logs, and the damn application was back, and this time it was using Geohashes, so I became curious about who was doing that. Could I be mistakingly screwing up some real user of Geohashes? So, based on the logs, I went on to search for who could possibly be using the system in such a way. It wasn’t too long until I found out that, to my surprise, it was Groundspeak’s iPhone application. Groundspeak’s paid iPhone application, to be more precise, because the address searching feature is only available for paying users.

Looking at the release notes for the application, there was no doubt. Version 2.3.1, sent to Apple on September 10th, shortly after the old API was blocked, fixes the Search by Address/Postal Code feature says the maintainer, and there’s even a thread discussing the breakage where the maintainer mentions:

The geocoding service we’ve been using just turned their service off. That’s why things are failing; it was relying on an external service for this feature. We’re fixing the issue on our end and using a service that shouldn’t fail as easily. Unfortunately we’ll have to do an update to the store to get this feature out to the users. This will take some time, but in version 2.4 this will work.

Wait, ok, so let’s see this again. First, they were indeed not using Geohashes at all, and instead using geohash.org purely as a geocoding service. Then, when the API they used is disabled with hints that the Geohash service is not a pure geocoding service, they workaround this by decoding the Geohash retrieved and grabbing the coordinates so that they can still use it as a pure geocoding service. At the same time, they tell their users that they changed to “a service that shouldn’t fail as easily”. Under no circumstances they contact someone at geohash.org to see what was going on (shouldn’t be necessary, really, but assuming immaculate innocence, sending an email would be pretty cool).

Redirecting users to the Easter Island

So, yeah, sorry, but I didn’t see many reasons to sustain the situation. Not only because it looks like an unfriendly behavior overall, but also because, on their way of using an unrelated free service to sustain their paid application, they were killing the free geocoding feature of geohash.org with thousands of geocoding requests a day, which impacted on the daily quota the service has by itself.

So, what to do? I could just disable the service again, or maybe contact the maintainers and ask them to please stop using the service in such a way, after all there are dozens of real geocoding services out there! But… hmmm… I figured a friendly poke could be nice at this point, before actually bringing up that whole situation.

And that’s what happened: rather than blocking their client, the service was modified so that all of their geocoding requests translated into the geographic coordinates of the Easter Island.

Of course, users quickly noticed it and started reporting the problem again.

The answer from Groundspeak

After users started complaining loudly, Bryan Roth, which signs as co-founder of Groundspeak, finally contacted me for the first time asking if there was a way to keep the service alive. Unfortunately, I really can’t, and provided the whole explanation to Bryan, and even mentioned that I actually use Google as the upstream geocoding provider and that I would be breaking the terms of service doing this, but offered to redirect their requests to their own servers if necessary.

Their answer to this? Pretty bad I must say. I got nothing via email, but they posted this in the forum:

But seriously, this bug actually has nothing to do with our app and everything to do with the external service we’ve been using to convert an address into GPS coordinates. For the next app update, we’re completely dropping that provider since they’ve now failed us twice. We’ll be using only Google from that point on, so hopefully their data will be more accurate.

I can barely believe what I read. They blame the upstream service, as if they were using a first class geocoding provider somewhere rather than sucking resources from a site they felt cool to link their paid application to, take my suggestion of using Google for geocoding, and lie about the fact that the data would be more accurate (it obviously can’t, since it was already Google that was being used).

I mentioned something about this in the forum itself, but I was moderated out immediately of course.

Way to go Groundspeak.

UPDATE

After some back and forth with Bryan and Josh, the last post got edited away to avoid the misleading details, and Bryan clarified the case in the forum. Then, we actually settled on my proposal of redirecting the iPhone Geocaching.com application requests to Groundspeak’s own servers so that users of previous versions of the application wouldn’t miss the feature while they work on the new release.

If such communication had taken place way back when the feature was being planned, or when it was “fixed” the first time, the whole situation would never have happened.

No matter what, I’m glad it ended up being sorted towards a more friendly solution.

 
 
12 August 2008 @ 04:46 am

Blog has moved! Please, update your links.

The underlying concept is very simple: spreadsheets are a way to organize text, numbers and formulas into what might be seen as a natively numeric environment: a matrix. So what would happen if we loosed some of the bolts of the numeric-oriented organization, and tried to reuse the same concepts into a more formatting-oriented environment which is naturally collaborative: a wiki.

While I do encourage you to answer this with some fantastic new online service (please provide me with an account and the best e-book reader device available once you’re rich) I had a try at answering this question myself a while ago by writing the Calc macro for Moin.

Basically, the Calc macro allows extracting values found in a wiki page into lists (think columns or rows), and applying formulas and further formatting as wanted.

I believe there’s a lot of potential on the basic concept, and the prototype, even though functional and useful, surely has a lot to evolve, so I’ve published the project in Launchpad to make contributions easier. I actually apologize for not publishing it earlier. There was hope that more features would be implemented before releasing, but now it’s clear that it won’t get many improvements from me anytime soon. If you do decide to improve it, please try to prepare patches which are mostly ready for integration, including full testing, since I can’t dedicate much time for it myself in the foreseeable future.

 
 
20 May 2008 @ 10:54 pm

Blog has moved! Please, update your links.

According to Dave Troy, Google seems to be using the Geohash algorithm:

Google is employing the GeoHash algorithm I’ve been pushing to do spatial searching using BigTable. Since database schemes like BigTable don’t support traditional GIS extensions/spatial indexes, GeoHash allows for a simple bounding box search using truncated GeoHash substrings. I will post separately about this shortly, as I am working on some GeoHash tools to expand this functionality. This is of particular interest to AppEngine developers.

Nice!

 
 
03 March 2008 @ 12:49 am

Blog has moved! Please, update your links.

Friday I’ve released version 1.4 of dateutil. There are some interesting fixes there, so please upgrade if you have the chance.

 
 
01 March 2008 @ 06:27 pm

Blog has moved! Please, update your links.

Some improvements to geohash.org were made. Some of them were
motivated by a conversation with Rodrigo Stulzer.

  • Support for geocoding addresses (city names, whatever). E.g. http://geohash.org/?q=21 Millbank, London
  • Support for moving the Geohash marker in the embedded map, so that modifying the position visually is easier.
  • Support for providing a “name” to Geohashes, by appending a colon and the name, in a nice format. E.g. http://geohash.org/c216ne:Mt_Hood
  • Provided a bookmark to get a Geohash while in Google Maps.
  • Provided a Google Maps Mapplet. When enabled, it adds a Geohash marker identifying the Geohash position in Google Maps, and it may be moved around. Here is a screenshot:

Check out the Tips & Tricks page for details on these features.

 
 
26 February 2008 @ 09:11 pm

Blog has moved! Please, update your links.

After about one year writing this service in my spare time, it’s finally out.

geohash.org offers short URLs which encode a latitude/longitude pair, so that referencing them in emails, forums, and websites is more convenient.

Geohashes offer properties like arbitrary precision, similar prefixes for nearby positions, and the possibility of gradually removing characters from the end of the code to reduce its size (and gradually lose precision). I’ve put the algorithm created in the public domain. Some details may be seen in the Wikipedia article about it (hopefully that’ll help establishing prior art, and prevent Microsoft from patenting it).

To obtain the Geohash, the user provides latitude and longitude coordinates in a single input box (most commonly used formats for latitude and longitude pairs are accepted), and performs the request.

Besides showing the latitude and longitude corresponding to the given Geohash, users who navigate to a Geohash at geohash.org are also presented with an embedded map, and may download a GPX file, or transfer the waypoint directly to certain GPS receivers. Links are also provided to external sites that may provide further details around the specified location.

 
 

Blog has moved! Please, update your links.

Mocker 0.10 is out, with a number of improvements!

While we’re talking about Mocker, here is another interesting use case, exploring a pretty unique feature it offers.

Suppose we want to test that a method hello() on an object will call self.show(”Hello world!”) at some point. Let’s say that the code we want to test is this:

 class Greeting(object):

     def show(self, sentence):
         print sentence

     def hello(self):
         self.show("Hello world!")

This is the entire test method:

def test_hello(self):
    # Define expectation.
    mock = self.mocker.patch(Greeting)
    mock.show("Hello world!")
    self.mocker.replay()

    # Rock on!
    Greeting().hello()

This has helped me in practice a few times already, when testing some involved situations.

Note that you can also passthrough the call. In other words, the call may actually be made on the real method, and mocker will just assert that the call was really made, whatever the effect is.

One more important point: mocker ensures that the real method exists in the real object, and has a specification compatible with the call made. If it doesn’t, and assertion error is raised in the test with a nice error message.

UPDATE: The method for doing this is actually mocker.patch() rather than mocker.mock(), as documented. Apologies.

 
 
22 November 2007 @ 08:27 pm

Blog has moved! Please, update your links.

One neat feature which Mocker offers is the ability to very easily implement custom behavior on specific functions or methods.

Take for instance the case where you want to pretend to some code that a given file exists, but you don’t want to get on the way of everything else which needs the same function:

>>> from mocker import *
>>> mocker = Mocker()
>>> isfile = mocker.replace("os.path.isfile", count=False)
>>> _ = expect(isfile("/non/existent")).result(True)
>>> _ = expect(isfile(ANY)).passthrough()

>>> mocker.replay()

>>> import os
>>> os.path.isfile("/non/existent")
True
>>> os.path.isfile("/etc/passwd")
True
>>> os.path.isfile("/other")
False

>>> mocker.restore()

>>> os.path.isfile("/non/existent")
False

Notice that the count=False parameter is available in version 0.9.2. Without it Mocker will act in a more mocking-strict way and enforce that the given expressions should be executed precisely the given number of times (which defaults to one, and may be modified with the count() method).

 
 
19 November 2007 @ 07:24 pm

Blog has moved! Please, update your links.

A couple of additional releases tonight: dateutil 1.3, and nicefloat 1.1.

They’re both bug fixing releases.

 
 
17 November 2007 @ 06:01 pm

Blog has moved! Please, update your links.

A few more improvements were made to Mocker.

 
 

Blog has moved! Please, update your links.

I’ve recently seen some comments here and there about the lack of connection pooling as an argument for Storm to be faster, and that once this is supported it will be slower, or even as a reason for people not to use Storm at all.

So, let me kill this argument here, at once.

We have not developed Storm only for toy projects that take 10 connections a day. We have developed Storm for heavy duty web sites like Landscape and Launchpad, and we’re proud to see it being used not only in our systems, but also out there in the wild, like for instance in large scale sites developed by the fantastic guys at Lovely Systems.

So how does the connection reuse work in practice, you ask. Here is how:

In Storm, the database is abstracted behind a small, simple, and flexible API, offered in the Store class. You use an instance of this class to deal with objects coming from a given database, and this instance will handle several aspects of your interaction with the database, such as committing, rolling back, caching, ensuring that a given row in the database maps to a single instance in memory, control of dirty objects, flushing, and so on. Pretty much all of these aspects require a correct transactional behavior to work well, and in practice this means we’ve decided that to maintain the API nice and clean, each Store is internally associated with a single Connection object. You can have as many stores as you want, connecting to the same database or to different ones, and using the same model class or entirely different code bases.

So, to summarize the above paragraph, a simple Store instance is your portal to the database. You need one of these instances around to add objects to the database (Storm won’t guess which Store you want to add things to), and to retrieve objects from it.

Considering that, if you want to reuse a connection, it’s very simple: keep your Store instance around. That’s even a strange advice, since you’re already doing that if you’re using Storm in the first place. The code in trunk, which is about to be released as version 0.12, even handles reconnections for you gracefully, including correct transactional behavior.

We even offer a tool that deals with more advanced Store management in a very comfortable way for Zope 3. In the future, we’re likely to offer the same kind of facility in a more generic API.

So, connection reuse is there, and we have always benefited from it. Connection pooling? No, thanks. We’re doing very well without the complexity and overhead.

 
 
11 November 2007 @ 11:17 pm

Blog has moved! Please, update your links.

After being bored for a long time for the lack of a better infrastructure for creating test doubles in Python, I decided to give it a go.

I’m actually quite happy with what came out.. it took me about four weekends (was developed as a personal project), and I’ll dare to say that it’s the best mocking system for Python at the present time. Not only that, but it has features that I’ve not seen in any other mocking/stubing infrastructure, independent of language.

Here’s a feature list to catch your attention:

  • Graceful platform for test doubles in Python (mocks, stubs, fakes, and dummies).
  • Inspiration from real needs, and also from pmock, jmock, pymock, easymock, etc.
  • Expectation of expressions defined by actually using mock objects.
  • Expressions may be replayed in any order by default,
  • Trivial specification of ordering between expressions when wanted.
  • Nice parameter matching for defining expectations on method calls.
  • Good error messages when expectations are broken.
  • Mocking of many kinds of expressions (getting/setting/deleting attributes, calling, iteration, containment, etc)
  • Graceful handling of nested expressions (e.g. ”person.details.get_phone().get_prefix()”)
  • Mock ”proxies”, which allow passing through to the real object on specified expressions (e.g. useful with ”os.path.isfile()”).
  • Mocking via temporary ”patching” of existent classes and instances.
  • Trivial mocking of any external module (e.g. ”time.time()”) via ”proxy replacement”.
  • Mock objects may have method calls checked for conformance with real class/instance to prevent API divergence.
  • Type simulation for using mocks while still performing certain type-checking operations.
  • Nice (optional) integration with ”unittest.TestCase”, including additional assertions (e.g. ”assertIs”, ”assertIn”, etc).
  • More …

Worked? Check it out!

 
 
15 August 2007 @ 01:00 pm

Blog has moved! Please, update your links.

Finally, a couple of projects I’ve been working on in the last year and a half have been made public, which means that I have more freedom to talk about them openly.

Landscape

Landscape is a system we’ve created to allow administrators to comfortably manage and observe a large number of computers remotely through a centralized web interface.

This description certainly won’t strike anyone as a brand new idea. There are indeed a large number of systems for remote management. Even then, Landscape does bring new ideas into that known field, such as a very flexible package management offering. Landscape, supporting only Ubuntu at the present moment, also has the advantage of being built inside the company which supports the operating system distribution itself.

There are currently 5 core developers, with many other people contributing in various areas. My role is being a Technical Lead, even though that says very little about the kind of relationship that we have within the project. The guys I work with are very smart and goal oriented, so decisions are taken through friendly discussions and consensus, and initiative is seen coming from all directions.

Storm

Storm is a ORM we have developed for Python, to be used in Landscape, Launchpad, and other projects. The project was originally started because our attempts to perform client side partitioning (sharding) of data with existent ORMs for Python failed.

It was announced as an open source project in a talk I presented last month at EuroPython, and last week the second public release (0.10) was already made.

If you are around the Boston area in the US, my coworker and friend Christopher Armstrong will be giving a Storm talk at the Cambridge Python Meetup today. I’ll also be presenting it again at PyCon Brasil at the end of the month, in Joinville, Brazil.

 
 
26 June 2007 @ 08:02 pm

Blog has moved! Please, update your links.

python-dateutil version 1.2 has just been released.

It includes the following changes:

  • Now tzfile will round timezones to full-minutes if necessary, since Python’s datetime doesn’t support sub-minute offsets (reported by Ilpo Nyyssönen).
  • Removed bare string exceptions (reported and fixed by Wilfredo Sánchez Vega)
  • Fixed bug in leap count parsing (reported and fixed by Eugene Oden).
 
 
20 May 2007 @ 07:06 pm

Blog has moved! Please, update your links.

Smart 0.51 has been released today. It includes a few bug fixes and some minor updates.

Shortly after the release, I’ve added a couple of new hooks on Smart’s trunk as well: cache-loaded, and cache-loaded-pre-link. These should enable people to write plugins that hack the cache for specific purposes. Axel Thimm has requested these for a while to introduce kernel-related upgrades. Hopefully these will fulfill his needs.

This release took a while.. probably because I’ve been quite immersed in our current project at Canonical, traveling very frequently, and without much time to blog or to do some of the usual open source activities I used to. The good thing is that we’re getting very close to the public announcement, and some of the work we’ve been doing will be released as open source, so we’re all likely to get more community-oriented interactions again.

 
 
10 March 2007 @ 01:12 am

Blog has moved! Please, update your links.

brother…

My brother Diogo is in town! Good to see him after so much time.

pycon…

PyCon 2007 was fantastic. It was great to meet everyone there, and we had two awesome sprinting weeks around it.

confluence…

I’ve recently visited a confluence with a good friend of mine. Kayaks, paddling, walking, driving, swimming, aslphalt, sand, water, grass.. it was awesome.

svn2bzr…

It looks like Bazaar tags are now really coming, so I’m doing some work on svn2bzr again. Hopefully this time I’ll really migrate some projects over.

editmoin…

Version 1.9 of editmoin was released.

smart…

Some work in Smart is coming in the upcoming weeks.

projects…

Hopefully I’ll be able to speak more openly about (some of the) interesting things I’ve been working on in the near future.

 
 
20 November 2006 @ 09:09 pm

Blog has moved! Please, update your links.

editmoin 1.8 was just released, including support for moin 1.6, submitted by both David I. Lehn and Daniele Favara, and URL aliases, as suggested by Diogo Matsubara.

One interesting thing I should mention is that Daniele has sent me a reference to a Bazaar branch containing the change. It’s the first time I receive a bzr (as it’s also known) branch when the upstream project is using a different RCS (Subversion).

I actually intend to port this and other projects to Bazaar, but I’m waiting for the tags support. Hopefully it’s not too far away now.

 
 
08 November 2006 @ 09:21 pm

Blog has moved! Please, update your links.

One of the known issues in I’ve been trying to address in Smart for a while is the freezing effect that happens when a very complex upgrading situation (such as a full distribution upgrade) results in a combinatory explosion due to the number of choices to be analysed. Unfortunately, I never had time to really put in practice a reasonable solution for the problem. At this point, the beauty of open source software starts to shine.

A few weeks ago, Eran Tromer got close to the project and started researching and discussing about the issue. Not only that, but he produced actual patches that change the algorithm to prune the search space and find resonable solutions in acceptable timings. These patches were applied into the development version, and included in release 0.50rc1.

The preliminary results are quite impressive. David Farning has tested Smart 0.50rc1 with Fedora in several situations, and reported:

fc4->fc5, fc4->fc6, fc4->devel, fc5->fc6, all calculate updates and upgrades in a few minutes on a vm with 512M, using standard repos + Freshrpms and Dries. Much quicker than with 0.42.

This is really awesome. Thanks Eran!

 
 
07 August 2006 @ 07:27 pm

Blog has moved! Please, update your links.

New versions of editmoin and patcher were just released. They fix a couple of issues found by Jan Anlauff and Olivier Thauvin, respectively.