Home
30 October 2009 @ 09:17 pm

Blog has moved! Please, update your links.

It was already dead. In some senses, anyway.

Google announced a couple of days ago that they’re advancing into the business of GPS guided navigation, rather than staying with their widely popular offering of mapping and positioning only. This announcement affected the rest of the industry immediately, and some of the industry leaders in the area have quickly taken a hit on their share value.

As usual, Slashdot caught up on the news and asked the question: Will Google and Android kill standalone GPS?

Let me point out that the way the facts were covered by Slashdot was quite misguided. Google may be giving a hand to change the industry dynamics a bit faster, but both Garmin and TomTom, the companies which reportedly had an impact in their share value, have phone-based offerings of their own, so it’s not like Google suddenly had an idea for creating a phone-based navigation software which will replace every other offering. The world is slowly converging towards a multi-purpose device for quite a while, and these multi-purpose devices are putting GPSes in the hands of people that in many cases never considered buying a GPS.

The real reason why these companies are taking a hit in their shares now is because Google announced it will offer for free something that these companies charge quality money for at the moment, being it in a standalone GPS or not.

Tags: ,
 
 
13 October 2009 @ 08:05 pm

Blog has moved! Please, update your links.

This post is not about what you think it is, unfortunately. I actually do hope to go to the Easter Island at some point, but this post is about a short story which involves geohash.org, Groundspeak (from geocaching.com), and very very poor minded behavior.

The context

So, before anything else, it’s important to understand what geohash.org is. As announced when the service was launched (also as a post on Groundspeak’s own forum), geohash.org offers short URLs which encode a latitude/longitude pair, so that referencing them in emails, forums, and websites is more convenient, and that’s pretty much it.

When people go to geohash.org, they can enter geographic coordinates that they want to encode, and they get back a nice little map with the location, some links to useful services, and most importantly the actual Geohash they can use to link to the location, so as an example they could be redirected to the URL http://geohash.org/6gkzwgjf3.

Of course, it’s pretty boring to be copy & pasting coordinates around, so shortly after the service launched, the support for geocoding addresses was also announced, which means people could type a human oriented address and get back the Geohash page for it. Phew.. much more practical.

The problem

All was going well, until a couple of months ago, when a user reported that the geocoding of addresses wasn’t working anymore. After some investigation, it turned out that geohash.org was indeed going over the free daily quota allowed by the geocoding provider used. But, that didn’t quite fit with the overall usage reports for the system, so I went on to investigate what was up in the logs.

The cause

Something was wrong indeed. The system was getting thousands of queries a day from some application, and not only that, but the queries were entirely unrelated to Geohashes. The application was purely interested in the geocoding of addresses which the site supported for the benefit of Geohash users. Alright, that wasn’t something nice to do, but I took it lightly since the interface implemented could perhaps give the impression that the site was a traditional geocoding system. So, to fix the situation, the non-Geohash API was removed at this point, and requests for the old API then started to get an error saying something like 403 Forbidden: For geocoding without geohashes, please look elsewhere..

Unfortunately, that wasn’t the end of the issue. Last week I went on to see the logs, and the damn application was back, and this time it was using Geohashes, so I became curious about who was doing that. Could I be mistakingly screwing up some real user of Geohashes? So, based on the logs, I went on to search for who could possibly be using the system in such a way. It wasn’t too long until I found out that, to my surprise, it was Groundspeak’s iPhone application. Groundspeak’s paid iPhone application, to be more precise, because the address searching feature is only available for paying users.

Looking at the release notes for the application, there was no doubt. Version 2.3.1, sent to Apple on September 10th, shortly after the old API was blocked, fixes the Search by Address/Postal Code feature says the maintainer, and there’s even a thread discussing the breakage where the maintainer mentions:

The geocoding service we’ve been using just turned their service off. That’s why things are failing; it was relying on an external service for this feature. We’re fixing the issue on our end and using a service that shouldn’t fail as easily. Unfortunately we’ll have to do an update to the store to get this feature out to the users. This will take some time, but in version 2.4 this will work.

Wait, ok, so let’s see this again. First, they were indeed not using Geohashes at all, and instead using geohash.org purely as a geocoding service. Then, when the API they used is disabled with hints that the Geohash service is not a pure geocoding service, they workaround this by decoding the Geohash retrieved and grabbing the coordinates so that they can still use it as a pure geocoding service. At the same time, they tell their users that they changed to “a service that shouldn’t fail as easily”. Under no circumstances they contact someone at geohash.org to see what was going on (shouldn’t be necessary, really, but assuming immaculate innocence, sending an email would be pretty cool).

Redirecting users to the Easter Island

So, yeah, sorry, but I didn’t see many reasons to sustain the situation. Not only because it looks like an unfriendly behavior overall, but also because, on their way of using an unrelated free service to sustain their paid application, they were killing the free geocoding feature of geohash.org with thousands of geocoding requests a day, which impacted on the daily quota the service has by itself.

So, what to do? I could just disable the service again, or maybe contact the maintainers and ask them to please stop using the service in such a way, after all there are dozens of real geocoding services out there! But… hmmm… I figured a friendly poke could be nice at this point, before actually bringing up that whole situation.

And that’s what happened: rather than blocking their client, the service was modified so that all of their geocoding requests translated into the geographic coordinates of the Easter Island.

Of course, users quickly noticed it and started reporting the problem again.

The answer from Groundspeak

After users started complaining loudly, Bryan Roth, which signs as co-founder of Groundspeak, finally contacted me for the first time asking if there was a way to keep the service alive. Unfortunately, I really can’t, and provided the whole explanation to Bryan, and even mentioned that I actually use Google as the upstream geocoding provider and that I would be breaking the terms of service doing this, but offered to redirect their requests to their own servers if necessary.

Their answer to this? Pretty bad I must say. I got nothing via email, but they posted this in the forum:

But seriously, this bug actually has nothing to do with our app and everything to do with the external service we’ve been using to convert an address into GPS coordinates. For the next app update, we’re completely dropping that provider since they’ve now failed us twice. We’ll be using only Google from that point on, so hopefully their data will be more accurate.

I can barely believe what I read. They blame the upstream service, as if they were using a first class geocoding provider somewhere rather than sucking resources from a site they felt cool to link their paid application to, take my suggestion of using Google for geocoding, and lie about the fact that the data would be more accurate (it obviously can’t, since it was already Google that was being used).

I mentioned something about this in the forum itself, but I was moderated out immediately of course.

Way to go Groundspeak.

UPDATE

After some back and forth with Bryan and Josh, the last post got edited away to avoid the misleading details, and Bryan clarified the case in the forum. Then, we actually settled on my proposal of redirecting the iPhone Geocaching.com application requests to Groundspeak’s own servers so that users of previous versions of the application wouldn’t miss the feature while they work on the new release.

If such communication had taken place way back when the feature was being planned, or when it was “fixed” the first time, the whole situation would never have happened.

No matter what, I’m glad it ended up being sorted towards a more friendly solution.

 
 

Blog has moved! Please, update your links.

More than 40 years ago, a guy named Douglas Parkhill described the concept of utility computing. He described it as containing features such as:

  • Essentially simultaneous use of the system by many remote users.
  • Concurrent running of different multiple programs.
  • Availability of at least the same range of facilities and capabilities at the remote stations as the user would expect if he where the sole operator of a private computer.
  • A system of charging based upon a flat service charge and a variable charge based on usage.
  • Capacity for indefinite growth, so that as the customer load increases, the system can expanded without limit by various means.

Fast forward 40 years, and we now name pretty much this same concept as Cloud Computing, and everyone is very excited about the possibilities that exist within this new world. Different companies are pushing this idea in different ways. One of the pioneers in that area is of course Amazon, which managed to create a quite good public cloud offering through their Amazon Web Services product.

This kind of publicly consumable infrastructure is very interesting, because it allows people to do exactly what Douglas Parkhill described 40 years ago, so individuals and organizations can rent computing resources with minimum initial investment, and pay for as much as they need, no more no less.

This is all good, but one of the details is that not every organization can afford to send data or computations to a public cloud like Amazon’s AWS. There are many potential reasons for this, from legal regulations to volume cost. Out of these issues the term Private Cloud was coined. It basically represents exactly the same ideas that Douglas Parkhill described, but rather than using third party infrastructure, some organizations opt to use the same kind of technology, such as the Eucalyptus project deployed in a private infrastructure, so that the teams within the organization can still benefit from the mentioned features.

So we have the Public Cloud and the Private Cloud. Now, what would a Virtual Private Cloud be?

Well, it turns out that this is just a marketing term, purposefully coined to blur the line between a Private and a Public cloud .

The term was used in the announcement Amazon has made yesterday:

Amazon VPC enables enterprises to connect their existing infrastructure to a set of isolated AWS compute resources via a Virtual Private Network (VPN) connection, (…)

So, what is interesting about this is that this is actually not a Private Cloud, because the resources on the other side of the VPN are actually public infrastructure, and as such it doesn’t solve any of the problems which private clouds were created for solving in the first place.

Not only that, but it creates the false impression that organizations would have their own isolated resources. What isolated resources? A physical computer? Storage? Network? Of course, isolating these is not economically viable if you are charging 10 cents an hour per computer instance:

Each month, you pay for VPN Connection-hours and the amount of data transferred via the VPN connections. VPCs, subnets, VPN gateways, customer gateways, and data transferred between subnets within the same VPC are free. Charges for other AWS services, including Amazon EC2, are billed separately at published standard rates.

That doesn’t quite fit together, does it?

To complete the plot, Werner Vogels runs to his blog and screams out loud “Private Cloud is not the Cloud”, while announcing the Virtual Private Cloud which is actually a VPN to his Public Cloud, with infrastructure shared with the world.

Sure. What can I say? Well, maybe that Virtual Private Cloud is not the Private Cloud.

 
 
23 July 2009 @ 08:36 pm

Blog has moved! Please, update your links.

Alright, so I appreciate the idea of RESTful Web Services, but I’ve got a small dilemma I’d appreciate some opinions on.

In the RESTful Web Services book, by Leonard Richardson and Sam Ruby, there’s emphasis on making the programmable web look like the human web, by following an architecture oriented to having addressable resources rather than oriented to remote procedure calls. Through the book, the RPC (or REST-RPC, when mixed with some RESTful characteristics), is clearly downplayed. In some cases, though, it’s unclear to me what’s the extent of this advice. Humans and computers are of course very different in the nature of tasks they perform, and how well they perform them. To illustrate the point clearly, let me propose a short example.

Let’s imagine the following scenario: we are building a web site with information on a large set of modern books. In this system, we want to follow RESTful principles strictly: each book is addressable at http://example.com/book/<id>, and we can get a list of book URIs by accessing http://example.com/book/list?filter=<words>.

Now, we want to allow people to easily become aware of the newest edition of a given book. To do that, we again follow RESTful characteristics and add a, let’s say, new-editions field to the data which composes a book resource. This field contains a list of URIs of books which are more recent editions of the given book. So far so good. Looks like a nice design.

Now, we want to implement a feature which allows people to access the list of all recent editions of books in their home library, given that they know the URIs for the books because a client program stored the URIs locally in their machines. How would we go about implementing this? We certainly wouldn’t want to do 200 queries to learn about updates for 200 books which a given person has, since that’s unnecessarily heavy on the client computer, on the network, and on the server. It’s also hard to encode the resource scope (as defined in the book) in the URI, since the amount of data to define the scope (the 200 books in our case) can be arbitrarily large. This actually feels like perfectly fit for an RPC: “Hey, server, here are 200 URIs in my envelope.. let me know what are the updated books and their URIs.” I can imagine some workarounds for this, like saving a temporary list of books with PUT, and then doing the query on that temporary list’s URI, but this feels like a considerably more complex design just for the sake of purity.

When I read examples of RESTful interfaces, I usually see examples about how a Google Search API can be RESTful, for instance. Of course, Google Search is actually meant to be operated by humans, with a simple search string. But computers, unlike humans, can thankfully handle a large volume of data for us, and let us know about the interesting details only. It feels a bit like once the volume of data and the complexity of operations on that data goes up, the ability for someone to do a proper RESTful design goes down, and an RPC-style interface becomes an interesting option again.

I would be happy to learn about a nice RESTful approach to solve this kind of problem, though.

 
 

Blog has moved! Please, update your links.

Backwards and forwards compatibility is an art. In the very basic and generic form, it consists in organizing the introduction of new concepts while allowing people to maintain existing assets working. In some cases, the new concepts introduced are disruptive, in the sense that they prevent the original form of the asset to be preserved completely, and then some careful consideration has to be done for creating a migration path which is technically viable, and which at the same time helps people keeping the process in mind. A great example of what not to do when introducing such disruptive changes has happened in Python recently.

Up to Python 2.5, any strings you put within normal quotes (without a leading character marker in front of it) would be considered to be of the type str, which originally was used for both binary data and textual data, but in modern times it was seen as the type to be used for binary data only. For textual information, the unicode type has been introduced in Python 2.0, and it provides easy access to all the goodness of Unicode. Besides converting to and from str, it’s also possible to use Unicode literals in the code by preceding the quotes with a leading u character.

This evolution has happened quite cleanly, but it introduced one problem: these two types were both seen as the main way to input textual data in one point in time, and the language syntax clearly makes it very easy to use either type interchangeably. Sounds good in theory, but the types are not interchangeable, and what is worse: in many cases the problem is only seen at runtime when incompatible data passes through the code. This is what gives form to the interminable UnicodeDecodeError problem you may have heard about. So what can be done about this? Enter Python 3.0.

In Python 3.0 an attempt is being made to sanitize this, by promoting the unicode type to a more prominent position, removing the original str type, and introducing a similar but incompatible bytes type which is more clearly oriented towards binary data.

So far so good. The motivation is good, the target goal is a good one too. As usual, the details may complicate things a bit. Before we go into what was actually done, let’s look at an ideal scenario for such an incompatible change.

As mentioned above, when introducing disruptive changes like this, we want a good migration path, and we want to help people keeping the procedure in mind, so that they do the right thing even though they’re not spending too many brain cycles on it. Here is a suggested schema of what might have happened to achieve the above goal: in Python 2.6, introduce the bytes type, with exactly the same semantics of what will be seen in Python 3.0. During 2.6, encourage people to migrate str references in their code to either the previously existent unicode type, when dealing with textual data, or to the new bytes type, when handling binary data. When 3.0 comes along, simply kill the old str types, and we’re done. People can easily write code in 2.6 which supports 3.0, and if they see a reference to str they know something must be done. No big deal, and apparently quite straightforward.

Now, let’s see how to do it in a bad way.

Python 2.6 introduces the bytes type, but it’s not actually a new type. It’s simply an alias to the existing str type. This means that if you write code to support bytes in 2.6, you are actually not writing code which is compatible with Python 3.0. Why on earth would someone introduce an alias on 2.6 which will generate incompatible code with 3.0 is beyond me. It must be some kind of anti-migration pattern. Then, Python 3.0 renames unicode to str, and kills the old str. So, the result is quite bad: Python 3.0 has both str and bytes, and they both mean something else than they did on 2.6, which is the first version which supposedly should help migration, and not a single one of the three types from 2.6 got their names and semantics preserved in 3.0. In fact, just unicode exists at all, and it has a different name.

There you go. I’ve heard people learn better from counter-examples. Here we have a good one to keep in mind and avoid repeating.

Tags:
 
 
30 June 2009 @ 11:50 pm

Blog has moved! Please, update your links.

Yes, you’ve heard it right. I’ll exchange a legally unlocked iPhone 3G for a recent Android phone such as the Samsung Galaxy or the HTC Hero, and will pay the difference back! (street price minus 30% of devaluation for the used iPhone 3G).

I got an iPhone some time ago to learn the concepts introduced in the platform, and get a feeling of how it works out in practice. I’m happy I did it, since the hands on experience is worthwhile. But the experience is done, and even though I have positive things to say about the platform, the omnipotent and arrogant position of Apple with developers kills any chance of any further involvement I could have with the platform. I’m upset enough with it that I don’t want to see my wife using the device either.

There are many things in Apple’s behavior which are a source of arguments, and interminable flamewars, and most of the times I can see both sides of the story. For instance, when people pay a premium to get the hardware, some feel like it’s just throwing money away, but if there is good engineering behind it, well.. I understand people may want to pay the premium to get that exclusive product they like. That said, being so incredibly arrogant in the marketplace, and with developers, which theoretically should be their most precious partners, since they sustain the platform going, is something I can’t tolerate.

I know.. who am I. Just a random guy that actually gave them some money for one of their products. But I’m also a guy that won’t be buying their upgraded phones, and will be spreading the word to make people realize what a terrible future it will be if Apple ever dominates the marketplace. Even you’re not a developer, it’s a good idea to ponder carefully about this behavior. It tells a lot about how far they go to defend their own interests, and what kind of lock in they intend to get you into.

Finally, compare that to a nice open source operating system on which multiple first class vendors are cooperating. Sheeshh.. easy choice for me.

Tags:
 
 
23 June 2009 @ 09:58 pm

Blog has moved! Please, update your links.

Are you? I’m not entirely sure I am, even though I think about this a lot.

If you’re of the tech-savvy kind, you’re certainly aware of the great capabilities that the new mobile phone generation is bringing: Internet connection, a quite decent browser, GPS, camera, etc. But, really.. did you stop to think about what’s going on? This phone generation is still relatively expensive today, but they’re here to stay, and in just a few years, they’ll be commonplace.

Now, let’s forget about ourselves for a moment, and think about what mass adoption of a quite capable generic computer with full internet connectivity 24h a day being carried with its owner means for the world? Remember, the number of mobile phone users in the world is several times superior to the number of computers, and most of the computers are in the so called first world.

This implies that not only will everyone have access to the world in their pockets, which is already quite amazing by itself, but that a large number of people will have access to the Internet at all for the first time with their mobiles. Besides the several social impacts that these changes will bring, there are also many other interesting consequences. As simple examples, the most common client to many web services will be mobile phones, and many people will learn to use a touch screen interface of the mobile to interact with the world before ever having used a desktop computer for that.

I find that amazing, and this is happening right now, in front of our eyes.

Tags: ,
 
 
16 May 2009 @ 10:02 am

Blog has moved! Please, update your links.

In my previous post I made an open statement which I’d like to clarify a bit further:

(…) when the rules don’t work for people, the rules should be changed, not the people.

This leaves a lot of room for personal interpretation of what was actually meant, and TIm Hoffman pointed that out nicely with the following questioning in a comment:

I wonder when the rule is important enough to change the people though. For instance [, if your] development process is oriented to TDD and people don’t write the tests or do the job poorly will you change them then?

This is indeed a nice scenario to explore the idea. If it happens at some point that a team claims to be using TDD, but if in practice no developer actually writes tests first, the rules are clearly not working. If everyone in the team hates doing TDD, enforcing it most probably won’t show its intended benefits, and that was the heart of my comment. You can’t simply keep the rule as is if no one follows it, unless you don’t really care about the outcome of the rule.

One interesting point, though, is that when you have a high level of influence over the environment in which people are, it may be possible to tweak the rules or the processes to adapt to reality, and tweaking the processes may change the way that people feel about the rules as a consequence (arguably, changing people as a side effect).

As a more concrete example, if I found myself in the described scenario, I’d try to understand why TDD is not working, and would try to discuss with the team to see how we should change the process so that it starts to work for us somehow. Maybe what would be needed is more discussion to show the value of TDD, and perhaps some pair programming with people that do TDD very well so that the joy of doing it becomes more visible.

In either case, I wouldn’t be simply asking people “Everyone has to do TDD from now on!“, I’d be tweaking the process so that it feels better and more natural to people. Then, if nothing similar works either, well, let’s change the rule. I’d try to use more conventional unit testing or some other system which people do follow more naturally and that presents similar benefits.

Tags: , ,
 
 

Blog has moved! Please, update your links.

For a long time I’ve been an advocate of Python’s notion of controlling access to private and protected members (attributes, methods, etc) with conventions, by simply naming them like “_name”, with an initial underline.  Even though Python does support the “__name” (with double underscore) for “private” members (this actually mangles the name rather than hiding it), you’ll notice that even this is rarely used in practice, and the largely agreed mantra is that convention should be enough and thus one underscore suffices. This always resonated quite well with me, since I generally prefer to handle situations by agreement rather than enforcement. Well, I’m now changing my opinion.that this works well for this purpose, at least in certain situations.

This methodology may work quite well in situations where the code scope is within a very controlled environment, with one or more teams which follow strictly a single development guideline, and have the power to refactor the affected code base somewhat easily when the original decisions are too limiting.

Having worked on a few major projects now, and some of them being libraries which are used by several teams within the same company or outside, I now perceive that people very often take shortcuts over these decisions for getting their job done quickly. It’s way easier to simply read the code and get to the private guts of a library than to try to get agreement over the right way to do something, or sending a patch with a suggested change which was carefully architected.

Many people by now are probably thinking: “Well, that’s their problem, isn’t it? If their code base breaks on the next upgrade they’ll get burden and won’t be able to upgrade cleanly.”, and I can honestly understand this feeling, since I shared it. But, for a number of reasons, I now understand that this isn’t just their problem, it’s very much my problem too.

Most importantly, on any serious software, these problems will usually come back to the implementors, and many times the problem will have a much larger magnitude by then than they had at the time a change could have been done “the right way” on the implementation, because code dependent on the private bits will have settled.

Most people are optimist by nature and believe that the implementation won’t change, but, of course, one of the reasons why private information is made private in the first place is exactly because the implementor believes that having the freedom to change these details in the future is important, and not rarely there’s already a plan of evolution in place for these private pieces, which may include revamping the implementation entirely for scalability or for other goals.

In the best case, the careless people will get burden on the upgrade and will ask for support or simply won’t upgrade silently, and both cases hurt implementors, because providing support for broken software takes time and energy, and amazingly can even hurt the software image. Lack of upgrades also means more ancient versions in the wild to give support for. Besides these, in the worst case scenario, the careless people have enough influence on the affected project to cause as much burden on it as if the private data was public in the first place.

As much as I’m a believer in handling situation by agreement rather than enforcement, I’m also a believer that when the rules don’t work for people, the rules should be changed, not the people. So my positioning now is that the language supported access constraints (public, protected, private), as available in languages like Java and C++, are a better alternative when compared to convention as used today in Python, since they provide an additional layer of encouragement for people to not break the rules carelessly, and that helps in the maintenance and reuse of software that has greater visibility.

Tags: , ,
 
 
14 August 2008 @ 09:24 pm

Blog has moved! Please, update your links.

After 4.5 years in development, Smart has been branded as 1.0. A big Thank You to everyone who contributed along the years.

Tags:
 
 
12 August 2008 @ 04:46 am

Blog has moved! Please, update your links.

The underlying concept is very simple: spreadsheets are a way to organize text, numbers and formulas into what might be seen as a natively numeric environment: a matrix. So what would happen if we loosed some of the bolts of the numeric-oriented organization, and tried to reuse the same concepts into a more formatting-oriented environment which is naturally collaborative: a wiki.

While I do encourage you to answer this with some fantastic new online service (please provide me with an account and the best e-book reader device available once you’re rich) I had a try at answering this question myself a while ago by writing the Calc macro for Moin.

Basically, the Calc macro allows extracting values found in a wiki page into lists (think columns or rows), and applying formulas and further formatting as wanted.

I believe there’s a lot of potential on the basic concept, and the prototype, even though functional and useful, surely has a lot to evolve, so I’ve published the project in Launchpad to make contributions easier. I actually apologize for not publishing it earlier. There was hope that more features would be implemented before releasing, but now it’s clear that it won’t get many improvements from me anytime soon. If you do decide to improve it, please try to prepare patches which are mostly ready for integration, including full testing, since I can’t dedicate much time for it myself in the foreseeable future.

 
 
04 August 2008 @ 06:48 am

Blog has moved! Please, update your links.

In his post Quantity Always Trumps Quality, Jeff Atwood made a very interesting reference to an arts-related book:

The ceramics teacher announced on opening day that he was dividing the class into two groups. All those on the left side of the studio, he said, would be graded solely on the quantity of work they produced, all those on the right solely on its quality. His procedure was simple: on the final day of class he would bring in his bathroom scales and weigh the work of the “quantity” group: fifty pound of pots rated an “A”, forty pounds a “B”, and so on. Those being graded on “quality”, however, needed to produce only one pot - albeit a perfect one - to get an “A”.

Well, came grading time and a curious fact emerged: the works of highest quality were all produced by the group being graded for quantity. It seems that while the “quantity” group was busily churning out piles of work - and learning from their mistakes - the “quality” group had sat theorizing about perfection, and in the end had little more to show for their efforts than grandiose theories and a pile of dead clay.

If I tell you that you’ll get better at doing something if you do it repeatedly you’ll probably stare at me with a look of obviousness, but even then the correlation made above still feels a bit surprising to a lot of people. Why is that so?

I have a guess. In our society we tend to believe that art and innovation is something for the gifted, rather than the product of hard work. Just think of any great famous painter or musician and you’ll likely have in your mind the concept of a uniquely gifted genius, rather than someone that worked uniquely hard after a goal.

Perhaps that’s why we tend to forget long learned lessons. Some 23 years ago Frederick Brooks already pointed out in The Mythical Man-month that we should plan to throw away the first version of the software, because it most likely will be a poorly designed prototype that provides insight into the problem for the actual production version. Even then, it’s still rare to see the practice intentionally in use nowadays.

Tags:
 
 

Blog has moved! Please, update your links.

As everyone is probably aware by now, in Python 3 dict.keys(), dict.values() and dict.items() will all return iterable views instead of lists. The standard way being suggested to overcome the difference, when the original behavior was actually intended, is to simply use list(dict.keys()). This should be usually fine, but not in all cases.

One of the reasons why someone might actually opt to perform a more expensive copying operation is because, with the pre-3.0 semantics, the keys() method is atomic, in the sense that the whole operation of converting all dictionary keys to a list is done while the global interpreter lock is held. Thus, it’s thread-safe to run dict.keys() with Python 2.X.

The suggested replacement in Python 3, list(dict.keys()), is not. There’s a chance that the interpreter will give another thread a chance to run before or during the iteration of the view, and this will cause an exception if the dictionary is modified at the same time. To fix the problem, either a lock must protect the iteration, or a more expensive operation such as dict.copy().keys() must be used.

The 2to3 tool won’t help you there, unfortunately. So, keep an eye on it!

Tags:
 
 
01 June 2008 @ 10:04 pm

Blog has moved! Please, update your links.

Avi Bryant is working on MagLev, a Ruby interpreter, based on Gemstone’s Smalltalk VM, with some very amazing features, like transactioned objects distributed across several VMs:

The integrated VMs, cache, and storage conspire to create an illusion that global state is shared across all instances: no matter how many VMs you add, over however many machines, they all see and work with the same set of Ruby objects.

My geek side finds this highly exciting, and eager to see it released to see how people will deal with it in practice.

At the same time, my let’s-build-stable-and-maintainable-software side is a bit skeptic. As Joe Armstrong puts so enthusiastically, shared state is hard to manage correctly, global transactions reduce scalability, and transparent RPC is seductive, but dangerous.

I’m also curious about the speed gains pointed out. It’s well known that the Ruby VM isn’t very fast, which means that there must be opportunities for speedups. Even then, 100x faster is impressive, and history shows that sometimes the significant improvements are harder when the semantics are precisely the same. Let’s hope Avi can manage to run the Ruby tests successfully.

Tags:
 
 
01 June 2008 @ 05:55 pm

Blog has moved! Please, update your links.

Today, Sunday, on the mailman day, I decided to change my reading habits.

You’d certainly laugh if I told you how many mailing lists, blogs, and IRC channels I try to follow (won’t include IM networks here as I don’t really read them asynchronously). What I look for is pretty obvious: I want to exchange volume for quality.

The first thing I’m doing is unsubscribing from all high-traffic lists I’m part of. The reason is clear: one hundred messages a day can’t possibly be all interesting. I’m not saying there are no interesting posts among these, of course. But with such a vibrant community of followers, a few smart readers usually bring up the most interesting discussions in more selective formats. I’ll track these instead.

For the same reason, I’m unsubscribing from most feed aggregators. Planet and similars are a great way to subscribe to many feeds quickly, but let’s face it.. how many posts in an aggregator with lots of feeds are interesting to a single individual? While getting off from them, I’m selectively peaking the feeds that interest me and subscribing to each.

Then, for the not-so-high volume sources, I’m checking the last 5 posts or so (or days, for IRC channels). Anything that hasn’t had information worth tracking will be phased out too. Interesting topics eventually will find their way to the sources I’ll still follow.

I want to read less, to read more. I want to go faster through the queue of pending books, and also follow a wider variety of topics with less pain.

Tags:
 
 
20 May 2008 @ 10:54 pm

Blog has moved! Please, update your links.

According to Dave Troy, Google seems to be using the Geohash algorithm:

Google is employing the GeoHash algorithm I’ve been pushing to do spatial searching using BigTable. Since database schemes like BigTable don’t support traditional GIS extensions/spatial indexes, GeoHash allows for a simple bounding box search using truncated GeoHash substrings. I will post separately about this shortly, as I am working on some GeoHash tools to expand this functionality. This is of particular interest to AppEngine developers.

Nice!

 
 
03 March 2008 @ 12:49 am

Blog has moved! Please, update your links.

Friday I’ve released version 1.4 of dateutil. There are some interesting fixes there, so please upgrade if you have the chance.

 
 
01 March 2008 @ 06:27 pm

Blog has moved! Please, update your links.

Some improvements to geohash.org were made. Some of them were
motivated by a conversation with Rodrigo Stulzer.

  • Support for geocoding addresses (city names, whatever). E.g. http://geohash.org/?q=21 Millbank, London
  • Support for moving the Geohash marker in the embedded map, so that modifying the position visually is easier.
  • Support for providing a “name” to Geohashes, by appending a colon and the name, in a nice format. E.g. http://geohash.org/c216ne:Mt_Hood
  • Provided a bookmark to get a Geohash while in Google Maps.
  • Provided a Google Maps Mapplet. When enabled, it adds a Geohash marker identifying the Geohash position in Google Maps, and it may be moved around. Here is a screenshot:

Check out the Tips & Tricks page for details on these features.

 
 
26 February 2008 @ 09:11 pm

Blog has moved! Please, update your links.

After about one year writing this service in my spare time, it’s finally out.

geohash.org offers short URLs which encode a latitude/longitude pair, so that referencing them in emails, forums, and websites is more convenient.

Geohashes offer properties like arbitrary precision, similar prefixes for nearby positions, and the possibility of gradually removing characters from the end of the code to reduce its size (and gradually lose precision). I’ve put the algorithm created in the public domain. Some details may be seen in the Wikipedia article about it (hopefully that’ll help establishing prior art, and prevent Microsoft from patenting it).

To obtain the Geohash, the user provides latitude and longitude coordinates in a single input box (most commonly used formats for latitude and longitude pairs are accepted), and performs the request.

Besides showing the latitude and longitude corresponding to the given Geohash, users who navigate to a Geohash at geohash.org are also presented with an embedded map, and may download a GPX file, or transfer the waypoint directly to certain GPS receivers. Links are also provided to external sites that may provide further details around the specified location.

 
 

Blog has moved! Please, update your links.

Mocker 0.10 is out, with a number of improvements!

While we’re talking about Mocker, here is another interesting use case, exploring a pretty unique feature it offers.

Suppose we want to test that a method hello() on an object will call self.show(”Hello world!”) at some point. Let’s say that the code we want to test is this:

 class Greeting(object):

     def show(self, sentence):
         print sentence

     def hello(self):
         self.show("Hello world!")

This is the entire test method:

def test_hello(self):
    # Define expectation.
    mock = self.mocker.patch(Greeting)
    mock.show("Hello world!")
    self.mocker.replay()

    # Rock on!
    Greeting().hello()

This has helped me in practice a few times already, when testing some involved situations.

Note that you can also passthrough the call. In other words, the call may actually be made on the real method, and mocker will just assert that the call was really made, whatever the effect is.

One more important point: mocker ensures that the real method exists in the real object, and has a specification compatible with the call made. If it doesn’t, and assertion error is raised in the test with a nice error message.

UPDATE: The method for doing this is actually mocker.patch() rather than mocker.mock(), as documented. Apologies.