The Exciter Home
  • Home
  • About
  • Application Error (the tumblelog)

About

This is the personal blog of Johan Sørensen, here you’ll various various musings about technology and other semi-related stuff.

Syndication

Atom

What's interesting about MagLev

June 2nd, 2008

One of the more interesting presentations at this years RailsConf seems (I wasn’t at the conference myself) to have been the one about MagLev. Lots of excitement, a fair bit of koolaid-drinking and a good round of healty sceptism.

But the real interesting part of MagLev to me isn’t the claimed performance numbers (though that in itself may be interesting, but not until we know how ruby-conformant the final product is going to be). What’s really interesting is the Gemstone/S based persistance part of MagLev, along with the distributed VM part, and not at least Gemstone (the company) itself, with a long experience of creating dynamic language VMs.

And yes, so far it might be mostly koolaid (since none of us outside Gemstone has used it), but I seem to remember a certain Ruby webframework being called samething similar in 2004, and continued to receive trash even after its release. History will probably repeat itself here with the arm-chair experts.

I for one find traditional relational databases lacking in a lot of problem-domains, that’s why I’m interested in (admititly young) projects such as CouchDB, StrokeDB and Neo4j, and anything else that teases my mind with new posibilities. Yes, Facebook, Skype and others scales MySQL/Postgres indeed, but it’s interesting to note that the reasons for that is probably more developer culture and experience related. While the Gemstone Smalltalk implementation seems to cater more towards wildly enterprising sectors, such as the financial and shipping markets, where the demand is quite different from sending virtual vampires to your friends. Then again, that argument works the other way as well.

I don’t think ActiveRecord is no-where near being the perfect way to persist objects, rather I think it’s one of the better ways to map objects to a relational database. There’s a small syntactic difference there, between an enhanced class (by inheriting from ActiveRecord::Base) and an object you interact with just as you would with any other object, which in turn is persisted across machines and sessions as those very same pure objects.

People playing around with a shared global object store made by a company who has been doing it for 15+ years is a good thing. Who knows, maybe hopefully someone will get inspired to write something new based on ideas we’ll get from using MagLev. Maybe Seaside (or something similar in spirit) will find its way back to its Ruby roots. Maybe even other Ruby implementation hackers will start to implement something similar to Gemstone at the bottom.

What’s also interesting is that Gemstone the company even does this to begin with. I hope it’s because they see Ruby as something they could help with their dynamic language VM experience. But moreso it might be because they see real business value in Ruby, and being able to provide a fast, scalable platform for enterprises who mgiht feel that Smalltalk is a bit old fashioned for todays developers, but dynamic enough to give them flexibility while being able lean against a vendor. Of course, I know little of Gemstone the company or their customers, but that that seems to be the tune as of lately.

From the sounds of it, MagLev is not going to be completely open source, but I’m with Giles Bowkett and Wincent Colaiuta on this one, while the majority of the Rails community says things should be F/OSS, that has never stopped them from using mostly closed-source things like Macs and Textmate. As long as it’s shiny, works great and gets the job done, the majority of Rails developers will happily use it. And that in itself is fine.

Maybe it’ll MagLev really will be fast, maybe it’ll inspire people and make their work enjoyable, maybe it’ll allow certain applications to scale easier, maybe it’s persistance model will be a joy to work with. And Maybe, just maybe, it’ll be friggin’ sweet. But I, for one, welcome our new Object overlords and can’t wait to see what they’re upto because anything doing interesting things to, or with, Ruby is a good thing.

6 comments »

Gitorious, so far…

March 19th, 2008

It seems that Git is getting more and more mindshare by the day, which is great because I’m loving working with it. It’s a little over three months since I made Gitorious.org public and I’ve been having lots of fun with it since then.

In particular I’m happy about the wide range of projects there, ruby based projects are in the majority there including semi-official mirrors of both the Rails, RSpec and MacRuby projects. But some python things, such as gdb-python, a bit of lisp along with some Erlang, C/C++ and two linux kernel mirrors. Good times.

The last two are interesting because they are some of the biggest git repositories around, yet they only take up about 200 megs worth of diskspace. Heck, take all the repositories combined on Gitorious, and the cache for the web frontend of gitorious.org is still bigger than those. However neither disk nor bandwidth charges are anywhere near hurting my wallet, and it’ll stay that way for quite a while. I don’t contribute as much as I should back to most of the open-source projects I use on a day to day basis at the $dayjob (or otherwise) so I consider Gitorious as my way of giving back. Long term I have some ideas that would allow gitorious to give back even more (things like this is awefully inspiring), without resorting to cheap tricks such as ads all over the place.

So far, most of my focus on the Gitorious codebase has been on stability and speed (it’s really quite snappy now I think), but also a few new features such as merge requests and searching. But soon it’s time to add some of the bigger things on the list, that’ll help dealing with managing an open source project hosted on Gitorious.

But first I want to talk about “the competition”, namely github, “competition” is in quotes because I honestly don’t see it like that (git is distributed after all), however a lot of people seem to lay it out like that whenever the two are mentioned in the same sentence. It says a lot about the workflow that git presents, that we both had the same idea and ran with it, only to release each others thing publicly a week or so apart.

But I find it slightly peculiar that a lot of open-source projects (ruby/rails projects in particular) has jumped on it, despite it being closed-source. What’s the point of being myspace for hackers (not that that’s a particular flattering comparison to begin with) if I can’t hack on it? But that’s me being seemlingly more idealistic about this stuff than most people. Launchpad is closed-source and seems to be doing well despite it being a total mess to use, and even the Apache Foundation offers their incubator projects an option to use JIRA and/or Confluence (“The Enterprise Wiki”—that cracks me up everytime). Anyway, not crying about this at night, just finding it interesting. What’s really important is that more people discover the advantages of a distributed SCM such as Git, even for internal (“dayjob”) projects, regardless of whether they host their code on a third-party server or on their own using Gitosis and gitweb, a custom Gitorious install (I hear there’s a few already) or just a plain old git repository somewhere.

I don’t want Gitorious to end up like the mess that is Launchpad, but I do think there’s a few good idaes floating around when it comes to dealing with the practicalities of running, or contributing to, an opensource project that’s worth looking into. In particular the notion of a distributed bug tracking system is too cool to pass up, even if distributed just means that I can track bugs across projects and different repositories. Imagine Jane having cloned Bobs project publicly and fixing that damn bug #2353, all Bob has to do now to fix the bug is to pull in Janes changes into the mainline repository. Boom, no need to mess around with patch files.

Having the ticket system truly distributed is of course something to strive for, but I think I’ll start with a slightly less lofty target for Gitorious and use tracer bullets from there to hit the sweetspot of a ticketing system that fits git, and humans, well.

Gitorious source pushed - and a freebie!

January 13th, 2008

I’ve pushed the source to Gitorious to… Gitorious! Yurii has already made a clone of it and I think he’s hacking on some SVN mirroring he needed for one of his projects. Very cool.

I’ve also added a project called Tumbline, which is an 80% done tumblelogging application I wrote during the summer when I was really unhappy about Tumblr (where I host Application Error), they have since shaped up a bit and I’ll probably continue using them. But I’ve open sourced the application I wrote in case someone wants to use it for something, rather than it collecting dust in my ~/Projects directory.

Enjoy!

Gitorious - open source project hosting

January 8th, 2008

Since writing this post I’ve slowly been implementing some of the ideas of my take on a way to do open source collaboration on the repository level, based on git in particular.

I love open source, from the things being created to the concept in itself. Project forges like SourceForge and Rubyforge are great ways to publish a project and handle the infrastructure around it, such as mailing lists, bugtrackers and tarball releases.

But they’re also filled with dead projects. Some of these projects have been forked, or are actively maintained elsewhere, but most of the time you aren’t so lucky. They’re also rather centralized in the sense that the project owner or maintainer, has to actively accept patches, or hand out commit bits, in order for the repository to stay up with developments. As a project maintainer this can be hard in the long run, particular if you’ve for some reason lost interest in the project, or are just too plain busy with other things. I know this far too well from my own opensource projects.

Distributed source control provides one possible way around this, because every clone (or checkout in svn-speak) of a repository is a full-blown repository, you can just publish your updated repository instead and if people like your stuff better they can just pull from that instead of the “mainline” repository! Likewise, a project maintainer can just pull in these changes into the mainline repository to keep the project going forward and easily accept contributions.

DSCM tools like git are great at this, since every clone is a full repository it has to be extremely good at merging any commits you make when pushing upstream, hence pulling in other commits from clones works just as well. This also means that forking is not really such a big issue, because any forks can easily be pulled back in upstream (because of the shared commit history), in fact, forking (in the essence of the word) is the only way to work with DSCM. Of course, the social aspects of forking, such as disagreements of project direction, is an entire different issue that has to dealt with on the social level.

Gitorious is a free git hosting solution I’ve built, that allows anyone to create a project, and in turn, allowing anyone to create a clone of that project’s mainline repository for their own contributions. The project owner, or anyone with write access to the mainline repository, can then pull in these changes into the mainline repository if they like what they se. Or they can provide feedback directly on commits if they’re unsure about the approach taken, or just wanting to communicate something.

I’m hoping it will be useful for git users and I’m very interested in seeing this being used and hear peoples ideas for improvements.

I’ve got many more things I want to do with Gitorious, an improved repository browser and better ways to communicate with contributers are some of the next things on the list, but what’s there today has everything to get you started.

A quick stroll through DTrace

December 9th, 2007

DTrace has been getting a lot more press recently, since Apple has ported it to Leopard, it’s also been getting a lot of mentions in the Ruby community since Apple has included the DTrace providers for it. Yet, surprisingly few seem to actually use DTrace much (yours truly included really). So here’s a short intro to DTrace and D (not to be confused with the other D).

The essence of DTrace is probes, these are event handlers that fire whenever their particular event happens, you can then register interest in these probe events with a particular action, like printing it, aggregating usage counts and whatever other way you decide to use this information.

Since there’s over 450 000 probes in Leopard, there’s a lot of information you can gather and the trick is to start at a high level and drill down—“hmm, why are there 800 syscalls? hmm, what function caused this? what is it writing? what did it do right before it made the call to write()?” and so, one question leads to next with DTrace.

We can get a list of all the probes currently available on our system, by running dtrace -l, or drilling down with the -P flag


$ sudo dtrace -l | wc -l
  454839
$ sudo dtrace -l -P syscall | head -5
   ID   PROVIDER            MODULE                          FUNCTION NAME
17590    syscall                                             syscall entry
17591    syscall                                             syscall return
17592    syscall                                                exit entry
17593    syscall                                                exit return

So let’s start with asking what syscalls are currently being made by all the applications currently running (unless otherwise told to, DTrace will listen forever so finish it with ctrl+c):


$ sudo dtrace -n 'syscall:::entry{trace(execname)}'
dtrace: description 'syscall:::entry' matched 427 probes
CPU     ID                    FUNCTION:NAME
  1  17698                      ioctl:entry   dtrace                           
  1  17698                      ioctl:entry   dtrace                                             
  1  17682                  sigaction:entry   dtrace                           
  1  17682                  sigaction:entry   dtrace                           
  1  17682                  sigaction:entry   dtrace                           
  1  18258           __semwait_signal:entry   Little Snitch U
  1  17686                sigprocmask:entry   WindowServer                     
  1  17696                sigaltstack:entry   WindowServer

The probes are specified in a provider:module:function:name format, with an empty entry being a wildcard, so asking for all syscall function entries would mean asking for syscall:::entry, we could get all write syscall entries by asking for syscall::write:entry and its (function) returns by asking for syscall::write:return for the write() function.

So the above output isn’t all that useful since it’s too much information for us puny humans to parse effectively. Luckily DTrace provides means of aggregating things with the @[key(s)] notation, where key(s) is an arbitary comma-seperated list of D expressions and the value is an aggregating function like count() that simply counts the number of times something happens. So to aggregate the number of syscalls on the application name we can use execname:


  $ sudo dtrace -n 'syscall:::entry{ @[execname] = count() }'
  dtrace: description 'syscall:::entry' matched 427 probes
  ^C

    DirectoryServic                  2
    Finder                           2
...
    WindowServer                    46
    launchd                         48
    natd                            81
    SystemUIServer                 113
    Adium                          131
    ruby                           356
    pmTool                         584

We can even expand this to see what probe function is being called using the probefunc expression:


$ sudo dtrace -n 'syscall:::entry{ @[execname, probefunc] = count() }'
dtrace: description 'syscall:::entry' matched 427 probes
^C

 Finder                   kevent                    1
 Safari                   gettimeofday              1
 Terminal                 mmap                      1
...
 ruby                     select                   10
 dtrace                   ioctl                    14
 WindowServer             sigaltstack              15
 WindowServer             sigprocmask              15
 ruby                     __semwait_signal        141
 pmTool                   __sysctl                291

Ruby seems to be waiting in a semaphore/thread, lets take a look at its current stacktrace. We can do this by specifying a predicate for our probing, think of it as a conditional. So, by only registering interest in a proble if the execname == "ruby" predicate is met, we print the stack:


$ sudo dtrace -n 'syscall:::entry/execname == "ruby"/{ ustack() }'
dtrace: description 'syscall:::entry' matched 427 probes
CPU     ID                    FUNCTION:NAME
  0  18258           __semwait_signal:entry 
              libSystem.B.dylib`__semwait_signal+0xa
              libruby.1.dylib`rb_thread_group+0x29f
              libSystem.B.dylib`_pthread_start+0x141
              libSystem.B.dylib`thread_start+0x22

  0  18258           __semwait_signal:entry 
              libSystem.B.dylib`__semwait_signal+0xa
              libruby.1.dylib`rb_thread_group+0x29f
              libSystem.B.dylib`_pthread_start+0x141
              libSystem.B.dylib`thread_start+0x22

Yep, looks like an rb_thread allright. And that makes perfect sense since I had a mongrel running there in the background.

Let’s take a look at what Ruby providers are available (you need a running ruby process to see this):


$ sudo dtrace -l -P "ruby*"          
   ID   PROVIDER            MODULE             FUNCTION NAME
19708  ruby48398   libruby.1.dylib             rb_call0 function-entry
19709  ruby48398   libruby.1.dylib             rb_call0 function-return
19710  ruby48398   libruby.1.dylib      garbage_collect gc-begin
19711  ruby48398   libruby.1.dylib      garbage_collect gc-end
19712  ruby48398   libruby.1.dylib              rb_eval line
19713  ruby48398   libruby.1.dylib         rb_obj_alloc object-create-done
19714  ruby48398   libruby.1.dylib         rb_obj_alloc object-create-start
19715  ruby48398   libruby.1.dylib      garbage_collect object-free
19716  ruby48398   libruby.1.dylib           rb_longjmp raise
19717  ruby48398   libruby.1.dylib              rb_eval rescue
19718  ruby48398   libruby.1.dylib    ruby_dtrace_probe ruby-probe

We wildcard the ruby provider name since they’re per app specific (the 48398 part is the PID). Which is cool if you’re running more than one ruby process, so you could poke around figureing out why one is eating cpu and the other isn’t (Here’s an explanation of the Ruby probes). Let’s see what method calls are being used the most in a typical Rails request:


$ sudo dtrace -n 'ruby*:::function-entry{ @[copyinstr(arg0), copyinstr(arg1)] = count()  }'
dtrace: description 'ruby*:::function-entry' matched 1 probe
^C
...                                                                       
    Array            pop                                     24
  File::Stat       size                                    24
  Inflector        inflections                             24
  Inflector        inflections_without_route_reloading     24
...                                                    
    Hash             []                                     557
  Hash             []=                                    623
  String           to_s                                   723
  Hash             key?                                  4379

Here we make the aggregation list keys out of the class and the method name, which is specified as argN. args[] is an array of arguments for the probe, argN is a shortcut for that array, in this case the arguments are what the probe made them up to be (class and method name, arg 2 and 3 are sourcefile and line number), but it could also be the arguments for a function call. copyinstr() simply means “make a string out of this pointer reference”.

Back to poking around, Hash lookups and String#to_s isn’t all that interesting for us right now, but I’m kinda curious about what it is stat()’ing 24 times for a request? Let’s try and find out:


sudo dtrace -n 'ruby*:::function-entry/copyinstr(arg0) == "File::Stat" && copyinstr(arg1) == "size"/{ ustack()  }'
dtrace: description 'ruby*:::function-entry' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0  19708          rb_call0:function-entry 
              libruby.1.dylib`rb_eval_string_wrap+0x43f9
              libruby.1.dylib`rb_eval_string_wrap+0x5173
              libruby.1.dylib`rb_eval_string_wrap+0x23ee
              libruby.1.dylib`rb_eval_string_wrap+0x34ba
              libruby.1.dylib`rb_eval_string_wrap+0x1f53
              libruby.1.dylib`rb_eval_string_wrap+0x2dbb
              libruby.1.dylib`rb_eval_string_wrap+0x4d65
              libruby.1.dylib`rb_eval_string_wrap+0x5173
              libruby.1.dylib`rb_eval_string_wrap+0x23ee
              libruby.1.dylib`rb_eval_string_wrap+0x149d
              libruby.1.dylib`rb_eval_string_wrap+0x4d65
              libruby.1.dylib`rb_eval_string_wrap+0x5173
              libruby.1.dylib`rb_eval_string_wrap+0x23ee
              libruby.1.dylib`rb_eval_string_wrap+0x4d65
              libruby.1.dylib`rb_eval_string_wrap+0x5173
              libruby.1.dylib`rb_eval_string_wrap+0x23ee
              libruby.1.dylib`rb_thread_trap_eval+0x959
              libruby.1.dylib`rb_yield+0x21
              libruby.1.dylib`rb_ary_each+0x1e
              libruby.1.dylib`rb_eval_string_wrap+0x455f

  0  19708          rb_call0:function-entry 
              libruby.1.dylib`rb_eval_string_wrap+0x43f9
              libruby.1.dylib`rb_eval_string_wrap+0x5173
              libruby.1.dylib`rb_eval_string_wrap+0x23ee
              libruby.1.dylib`rb_eval_string_wrap+0x1f53
              libruby.1.dylib`rb_eval_string_wrap+0x4d65
              libruby.1.dylib`rb_eval_string_wrap+0x5173
              libruby.1.dylib`rb_eval_string_wrap+0x23ee
              libruby.1.dylib`rb_eval_string_wrap+0x149d
              libruby.1.dylib`rb_eval_string_wrap+0x4d65
              libruby.1.dylib`rb_eval_string_wrap+0x5173
              libruby.1.dylib`rb_eval_string_wrap+0x23ee
              libruby.1.dylib`rb_eval_string_wrap+0x4d65
              libruby.1.dylib`rb_eval_string_wrap+0x5173
              libruby.1.dylib`rb_eval_string_wrap+0x23ee
              libruby.1.dylib`rb_thread_trap_eval+0x959
              libruby.1.dylib`rb_yield+0x21
              libruby.1.dylib`rb_ary_each+0x1e
              libruby.1.dylib`rb_eval_string_wrap+0x455f
              libruby.1.dylib`rb_eval_string_wrap+0x5173
              libruby.1.dylib`rb_eval_string_wrap+0x23ee
                                                         24

By adding the predicate of our target class and method we get only what we’re interested in, and print the stack using ustack. Unfortunately this being Ruby it’s not all that useful to us, since it’s pretty much all rb_eval-inner-ruby-runtime-here-be-dragons-stuff (I would love a ustack helper for ruby, like there is for python), that doesn’t make much sense to us. I wonder which file it’s doing this in though?


$ sudo dtrace -n 'ruby*:::function-entry/copyinstr(arg0) == "File::Stat" && copyinstr(arg1) =="size"/{ printf("%s in %s", copyinstr(arg0),copyinstr(arg2))}'
  dtrace: description 'ruby*:::function-entry' matched 1 probe
File::Stat in [...]gems/mongrel-1.0.1/lib/mongrel/handlers.rb
File::Stat in [...]gems/mongrel-1.0.1/lib/mongrel/handlers.rb
File::Stat in [...]gems/mongrel-1.0.1/lib/mongrel/handlers.rb
# (output slightly truncated)

OK, so that tells us where this is happened, but not what it’s stat()’ing, luckily File::Stat sounds like something that might be doing a syscall, and we have probes for that, here’s a script that matches up the ruby function-entry with looking at syscalls at the same time:


#!/usr/sbin/dtrace -s

#pragma D option quiet

ruby*:::function-entry
/copyinstr(arg0) == "File::Stat" && copyinstr(arg1) == "size"/
{
  self->interested = 1;
  self->rubymethod = copyinstr(arg1);
  self->rubyclass = copyinstr(arg0)
}

syscall::stat*:entry
/self->interested/
{
  printf("%s from %s#%s\n", copyinstr(arg0), self->rubyclass, self->rubymethod); 
}

By defining the variable interested whenever we’re in the function-entry we’re interested in, we can use that variable as a predicate for our syscall::stat*:entry (stat* is wildcarded because there’s things like stat64() as well), making it executable and running it we see:


$ chmod +x who_be_stattin.d    
$ sudo ./who_be_stattin.d 
RAILS_ROOT/public/favicon.ico from File::Stat#size
RAILS_ROOT/public/favicon.ico from File::Stat#size
RAILS_ROOT/public/favicon.ico from File::Stat#size
RAILS_ROOT/public/favicon.ico from File::Stat#size
RAILS_ROOT/public/favicon.ico from File::Stat#size
RAILS_ROOT/public/favicon.ico from File::Stat#size
RAILS_ROOT/public/javascripts/application.js from File::Stat#size
RAILS_ROOT/public/javascripts/application.js from File::Stat#size
RAILS_ROOT/public/javascripts/application.js from File::Stat#size
RAILS_ROOT/public/javascripts/application.js from File::Stat#size

Aha! It must be the Rails mongrel handler that checks the size of the asset files, before it sends them down the wire (it’s from a local ./script/console). Not so interesting after all, but at least we learnt a bit along the way.

Remember the ruby providers from up there? the ruby-probe one? That one is basically a plugin that lets you fire your very own probes in your app, using the (Apple shipped) DTracer class:


>> def with_my_probe &blk
>>   DTracer.fire("my-probe-entry")
>>   yield
>>   DTracer.fire("my-probe-return")
>>   end
=> nil
>> with_my_probe{ puts "moo" }

$ cat probe_my_probe.d
#!/usr/sbin/dtrace -s

ruby*:::ruby-probe
/copyinstr(arg0) == "my-probe-entry"/
{
  self->interested = 1;
}

syscall:::
/self->interested/
{
  /* default action is to just print it */
}

ruby*:::function-entry
/self->interested/
{
  printf("%s in %s", copyinstr(arg1), copyinstr(arg0));
}

ruby*:::ruby-probe
/copyinstr(arg0) == "my-probe-return"/
{
  self->interested = 0;
}

$ sudo ./probe_my_probe.d
dtrace: script './ruby_probe_test.d' matched 860 probes
CPU     ID                    FUNCTION:NAME
  1 454800          rb_call0:function-entry puts in Object
  1 454800          rb_call0:function-entry write in IO
  1  17598                      write:entry 
  1  17599                     write:return 
  1 454800          rb_call0:function-entry write in IO
  1  17598                      write:entry 
  1  17599                     write:return 
  1 454800          rb_call0:function-entry fire in Module

# (While running 'with_my_probe{ puts "foo" }' in irb)

But more on that later. In the meantime do go off exploring your OS and applications with DTrace, you’d be surprised how quickly you can loose an hour or two just by asking “why is that doing this here?”...

CouchDb views in Ruby instead of Javascript

September 15th, 2007

I’ve just pushed CouchObject 0.5 out to the rubyforge mirrors, here’s the History.txt file:

== 0.5.0 2007-09-15

* 2 major enhancements:
  * Database.filter{|doc| } for filtering the on doc, in Ruby!
  * couch_ruby_view_requestor, a JsServer client for CouchDb allowing you to query in Ruby

* 1 minor enhancement:
  * Added Database#store(document), the parallel of Document#save(database)

Those two major enhancements are a result of my laziness as reported at the end of the last post, because now you can query your CouchDb views in Ruby instead of Javascript:


$ irb -rubygems
>> require 'couch_object'
=> true
>> db = CouchObject::Database.open "http://localhost:8888/foo" 
=> #<CouchObject::Database:0x142d4d8 ...>
>> pp db.post("_temp_view", "proc{ |doc| return doc if doc[\"foo\"] =~ /qux/  }")
#<CouchObject::Response:0x13f8d50
 @parsed_body=
  {"rows"=>
    [{"_rev"=>189832163,
      "_id"=>"96193CD461168BD024B64EA367C1E0BF",
      "value"=>
       {"_id"=>"96193CD461168BD024B64EA367C1E0BF",
        "_rev"=>189832163,
        "foo"=>"qux"}}],
   "offset"=>0,
   "total_rows"=>1,
   "view"=>"_temp_view:proc{ |doc| return doc if doc[\"foo\"] =~ /qux/  }"},
 @response=#<Net::HTTPOK 200 OK readbody=true>>

Boom. The rows key there is our matching document with an attribute of foo that matches /qux/. You just pass in anything that responds to a #call(the_couch_document) when you define your view request.

But passing around strings will make your eyes sore, so, lets just do this in pure Ruby:


>> pp db.filter{ |doc| return doc if doc["foo"] == "qux" }
[{"_rev"=>189832163,
  "_id"=>"96193CD461168BD024B64EA367C1E0BF",
  "value"=>
   {"_id"=>"96193CD461168BD024B64EA367C1E0BF",
    "_rev"=>189832163,
    "foo"=>"qux"}}]

Thanks to a bit of RubyToRuby we can send along the block to CouchDb just fine. But, how is this all done on the CouchDb side of things? It’s actually a whole lot easier than it looks; all CouchDb does when it receives a view query like the above is pass it on to whatever is defined as the JsServer in $COUCH_INSTALL/couch.ini, this is normally SpiderMonkey, but with the CouchObject gem installed it can be Ruby!

Here’s the relevant section from my couch.ini:
# ...
# You need full, or relative to couch install dir, paths for now
JsServer=/opt/local/bin/couch_ruby_view_requestor
# ...

So have a go at it:

  $ sudo gem install couchobject

Report issues at the tracker, or check out the Git source and have a play with it:

  $ git clone git://repo.or.cz/couchobject.git

CouchObject released!

September 14th, 2007

CouchObject 0.0.1 is out, fresh from the sofa. Sit down, relax and read the RDoc.


$ sudo gem install couchobject

Since the last time I’ve taken it in a slightly different direction, focusing more on getting the basics up and running. You see, I’ve realised that CouchDb isn’t really perfect as a general OODB store (though, nothing is stopping you from storing an objects attributes in CouchDb, the Persistable module still does that). I’ll be waiting for GemStone and Rubinius for an awesome OODB. Instead CouchObject focus specifically on documents as it is right now:


>> CouchObject::Database.create!("http://localhost:8888", "roflcopters")
=> {"ok"=>true}
>> db = CouchObject::Database.open("http://localhost:8888/roflcopters")
=> #<CouchObject::Database:0x65b184...>
>> db.all_documents
=> []
Creating and saving a document
>> doc = CouchObject::Document.new
=> #<CouchObject::Document:0x62708c @id=nil, @attributes={}, @revision=nil>
>> doc.engine_noise = "roflroflrofl" 
=> "roflroflrofl" 
>> doc.url = "http://www.thinkgeek.com/images/products/zoom/roflcopter.jpg" 
=> "http://www.thinkgeek.com/images/products/zoom/roflcopter.jpg" 
>> pp doc.save(db)
#<CouchObject::Response:0x4cd934
 @parsed_body=
  {"_rev"=>-1022899809, "_id"=>"4D91304BE683851F0E18871ADA6749D8", "ok"=>true},
 @response=#<Net::HTTPCreated 201 Created readbody=true>>
Get the same document by its id, and convert the response to a document (Just to illustrate it)
>> doc_we_created = db.get(doc.id).to_document
=> #<CouchObject::Document:0x14e8c38 @id="4D91304BE683851F0E18871ADA6749D8", @attributes={"url"=>"http://www.thinkgeek.com/images/products/zoom/roflcopter.jpg", "engine_noise"=>"roflroflrofl"}, @revision=-1022899809>
>> doc_we_created.engine_noise
=> "roflroflrofl" 
>> doc_we_created.engine_noise = "ROFLROFLROFL" 
=> "ROFLROFLROFL" 
>> doc_we_created.save(db)
>> db.all_documents
=> [{"_rev"=>1353035433, "_id"=>"4D91304BE683851F0E18871ADA6749D8"}]
Sending a raw request to the db
>> response = db.post("_temp_view", <<EOJS)
function(doc){ 
  if (doc.engine_noise.match(/rofl/i)) { 
    return doc
  }  
}
EOJS
# Our temp view query returns a list of rows matched documents
>> pp response.to_document.rows.first
{"_rev"=>1353035433,
 "_id"=>"4D91304BE683851F0E18871ADA6749D8",
 "value"=>
  {"url"=>"http://www.thinkgeek.com/images/products/zoom/roflcopter.jpg",
   "_rev"=>1353035433,
   "_id"=>"4D91304BE683851F0E18871ADA6749D8",
   "engine_noise"=>"ROFLROFLROFL"}}

As you can see the API still needs some ironing out by means of more real world usage. You’ll notice that there’s no nice way of doing view of doing view “queries”. I really really want to create a more familiar Ruby DSL for defining views and sending of temporary views (like the one above). In particular it boils down to one or more of these things:

  • A Ruby to Javascript converter (Like this perhaps).
  • Ambition is awesome, CouchDb is awesome, sounds like a perfect match to me. LINQ me up.
  • Make CouchDb use ruby instead of Javascript for views.
Since I’m lazy, I’m starting from the bottom, more on that later. Because I really want to say:

db.select{|doc| doc.title =~ /foo/ }

There’s a Git repository too


$ git clone git://repo.or.cz/couchobject.git

Patches? Yes please, release your inner couch potato.

CouchDb and CouchObjects

September 7th, 2007

I’ve been watching CouchDb for a while, but it wasn’t until recently when it changed it transport format from XML to JSON that I got real interest in doing something with it, something I apparently wasn’t alone about.

One of the things I’m doing with it is a library called CouchObject, and one of the things it does is allowing you to serialize arbitrary ruby objects to and from CouchDb JSON documents by including a module and defining a few methods on your class:


class Bike
  include CouchObject::Persistable

  def initialize(wheels)
    @wheels = wheels
  end
  attr_accessor :wheels

  def to_couch
    {:wheels => @wheels}
  end

  def self.from_couch(attributes)
    new(attributes["wheels"])
  end
end

The #to_couch method is the one that describes the format we want the class instances’ attributes serialized as a document in the CouchDb database:


{ 
  "_id": "6FA2AFB684A93ECE77DEAAF52BB02565", 
  "_rev": 1745167971, 
  "attributes": {
    "wheels": 4
  }, 
  "class": "Bike" 
}

Our #to_couch return result is stored in the attributes key, and the class of the object is the class key, for querying purposes (_id and _rev are CouchDb document attributes).

The from_couch class method is what describes how we should set up our new Bike object that we load from the database, the attributes parameter is the attributes key from the CouchDb document. In this case we just instantiate a new Bike with a number of wheels:


>> bike_4wd = Bike.new(4)
=> #&lt;Bike:0x6a0a68 @wheels=4&gt;
>> bike_4wd.save("couchobject")
=> {"_rev"=>1745167971, "_id"=>"6FA2AFB623A93E0E77DEAAF59BB02565", "ok"=>true}
>> bike = Bike.get_by_id("couchobject", bike_4wd.id)
=> #&lt;Bike:0x64846c @wheels=4&gt;

As I started on this last night there’s still lots of little things to add, like better server and database semantics (in the above #save call, the argument is the database name and the host is hardcoded for now; not pretty).

Another thing I’ve been thinking about doing is a more formal way to describe “models”, something along the DataMapper pattern perhaps, but we’ll see if I actually need it once I get the Persistable module some more features.

Update: I’ve uploaded the Git repository here, I want to add a few things before I do a release.

Distributed SCM == Goodness

August 10th, 2007

Ever since leaving Joyent in favour of Bengler back in february or so, I’ve pretty much switched all my development over to using a distributed SCM (a topic I’ve tumbled about a lot lately too).

While I’ve looked at distributed SCMs before, they never really stuck until I was forced to using them full-time; at first it was Darcs since that’s what they used when I switched jobs, coming from Subversion discovering distributed source control is a three step process;

  • Denial—I don’t get it, why!?
  • Acknowledgment—Ok, so doing bigger features in different branches that’s easily mergable into the mainline is really nice
  • Acceptance—it’s the only way for me to work

Later on I discovered Git and now most of my own local stuff is in Git repositories.

The real dealbreaker when it comes to distributed SCMs and the open source world is pretty much summed up by Linus Thorvalds in his Git talk, where he says something along the lines of:

if you want to implement something just clone the main repository and start hacking, and if people like your stuff better they can just pull from there instead

Now, put this into the context of GForge installations such as sourceforge and rubyforge, where there are a huge number of inactive projects (for various reasons). If I wanted to hack a bit on a dead project I should be able to just register my own repository/branch (depending on SCM terminology) with the project and anyone interested in my new awesome updates could just pull from my repository instead. Or the project owner could just take a look at it and merge it back into trunk if he was happy with it. Or more commonly, just when “Bob” and I work on a big feature and we’ll just push and pull from each other without disturbing the mainline.

Most projects are made up of several smaller internal and/or external projects (frameworks, libraries etc), and most often its different (from yours) real world usage that reveals bugs in someone elses code (or maybe even your own in case of “internal” projects). Wouldn’t it be nice to be able to report these, but still have them in the context of your own project somehow so you knew when it was fixed (and other people knew it was already reported, just in a different project), add to that the multiple repositories in a project from above and you’d most definitely need a way to report, watch and fix a bug in several different places.

Launchpad seems to get a lot of this right right, but it’s proprietary and fairly tightly coupled to Bazaar, which I for various reasons dislike. But, I still want to use a issuetracker and source browser (and so on) that gets all of this distributed stuff, both for my open things and for internal corporate stuff and for anyone else who may be needing it in those settings. Distributed means giving away some control (which you never had to begin with anyway aka the forking non-issue), but it also means you loose that one-stop place to get an overview of what’s going on.

So that’s what I’m hacking on right now. It won’t be Collaboa that gets this functionality, mainly because I want to experiment a bit with this freely, but also because I don’t think my new requirements will fit well with Subversion at all. Which is mainly why Collaboa now has a new maintainer (can’t wait to see what he does with it).

I’m not really interested in cloning launchpad as such, but I do think they do get a lot of things right (and a good amount of things I don’t like/need). So there’s some similarities of concepts in my “thing”, but it’s also a lot more geared towards my needs and workflow, things learnt from building Collaboa and using other issue trackers. And most importantly; being shown how development should work by using distributed SCMs.

Dynamic page caching with Nginx & SSI

May 22nd, 2007

Zed Shaw mentioned this on the Rails podcast, but you see, Nginx has this ability to do virtual SSI includes from another url. The documentation on it is right here but half of it is in russian to confuse the spies. It’s pretty straightforward stuff though, so lets play around with it for a bit. Like Zed says on the podcast, this kinda stuff would really be useful in situations where you can page cache pretty much the entire page, except this little part that needs to be dynamic (like a “Welcome <\= current_user.name -\>”)

Server Side Includes where these things we all used in the nineties for sprinkling random dynamic(-ish) stuff over our homepages. Nginx has support for virtual includes that looks something like this


  <!--# include virtual="/foo" -->

which will include whatever the url /foo returns straight into the document where the include is defined (you can also throw it into another SSI block if you like, as the docs say). Our little Rails testapp for this does about 205 req/s without any caching and using render(:partial => "foo") for the “welcome” bit (I feel really bad mentioning Zed Shaw and stupid/naive statistics like the above in the same place, but the precise performance gains aren’t that important. Think big picture stuff for now). So here’s a little helper for outputting the SSI in our template:


  def ssi_include(options={})
    #(options hash so we can pass in a SSI block target or whatever, YAGNI really).
    options.assert_valid_keys(:url)
    %Q{<!--# include virtual="#{url_for(options[:url])}" -->}
  end

  # and in our view we'd use it like this:
  <%= ssi_include :url => {:action => "greet", :name => current_user.name} -%>

Not exactly rocket science. With that and page caching turned on, it’s slightly faster (about 250 req/s), but not that much. Chances are we can cache those fragments in memcache to gain just a bit more.

By now you’ve hopefully realized that that the actual greet don’t even need to come from rails to begin with; with a shared session storage and because Nginx forwards us the cookies to the virtual included url, we can just as well hook up a small Merb or Rack (or whatever) app to fetch the session_id and/or objects needed from the database (or cache storage) and display the correct text, all without the luggage from rails which we really don’t need just to render some tiny text fragments. Doing that in our stupid little test scenario here gives us just over 1000 req/s. That’s a bit closer to the 5K req/s that nginx does for straight up html from disk (on my local machine) than the 205req/s we started off with. Yet another thing to pull out of the olde bag of tricks when you really do need it. I’d be interested in hearing if others have experimented in practice with this kinda approach?

update: passing in the “name” querystring like the example code is about the worst example I could think of, since its static once it’s written to the cached template, but cookies and such are still go

Application Error: The Tumblelog

March 7th, 2007

Since The Exciter updates are far between, I’ve been running a tumblelog for the past two weeks called Application Error. It’s powered by the fabulous tumblr (which I think is going to get huge). Expect mostly ruby-related links and other random stuff there.

Ruby has a nice new Rack

February 27th, 2007

As someone who’ve used Rails and other ruby web frameworks for quite some time, plus my own dabbling in that domain, I’ve seen how we all go and redo our own webserver interface, while those cheeky python kids keep nagging about WSGI.

So, I’ve been playing with Rack recently, and it’s quite inspired by WSGI. At its core, all a Rack application has to do is answer to a message for call with the environment hash as the arguments and return a tuple looking like [status_code, headers, body_array], like this


require "rack" 
class Foo
  def call(env)
    [200, {"Content-Type"=>"text/plain"}, ["Hello world!"]]
  end
end
HOST_AND_PORT = {:Host => "127.0.0.1", :Port => 8080}
Rack::Handler::Mongrel.run(Foo.new, HOST_AND_PORT)

in fact, we could even replace that whole class with a lambda that just returns the array:


app = lambda { [200, {"Content-Type" => "text/plain"}, ["Hello lambda world!"]] }
Rack::Handler::Mongrel.run(app, {:Host => "127.0.0.1", :Port => 8080})

And we can run our marvelous application under mongrel. Now, a Rack application is basically anything that responds to #call, the nice thing about this is that we can chain Rack applications together, forming some middleware between our main app and the request being served by the browser. So if we call Rack::ShowExceptions#call before calling Foo#call, like this Rack::ShowExceptions.new(Foo.new) we get some nice views from your nasty little exceptions.

Why is this good? Because as a framework author you’d be able to reuse middleware (Rack applications) from other applications, or as Chris puts it:

Compare “That upload handler you wrote for IOWA is really great, too bad I use Camping.” with “That upload handler you wrote for Rack works great for me too!”

Rack is still a bit rough around the edges, and the API is stupidly simple (“just #call it”), however it does provide a very easy to use API.

Cabinet is a tiny little pseudo-framework I wrote while playing around with Rack last night. Knock yourself out. I think the slogan should be “10x less productive” or “typing boring stuff over convention”. Features Django inspired url dispatching, that’ll make you type lots of regexens for every single thing. “Ruby push-ups” or something like that.

Now, don’t go write your own framework just yet, unless its merely for the sake of fooling around (like “Cabinet” was), ruby already has a bunch; Rails, Nitro, Camping, Merb, Ramaze and the oldskoolers like IOWA and Cerise.

Oslo RUG: Distributed Ruby Slides

January 4th, 2007

Last night I gave a presentation at our small (but awesome) Oslo Ruby group about distributed programming with Ruby. Topics of the talk included DRb, Rinda::TupleSpace, Rinda::RingServer, tin-can telephones, Joyent’s Bingo!, set_trace_func and a brief thumper server-porn interlude.

Here are the slides (900kb). Beware though, they’re in an abomination of norwegian, swedish and english

Flashing the Nokia 770

October 23rd, 2006

I recently bought a Nokia 770 “Internet Tablet” as they call it. No GSM/3G, just WLAN, Bluetooth and USB. And it runs a Debian linux offspring out of the box so it’s very hackable.

And that was kind of the problem for me; I hacked around too much and in a moment of clear stupidity I changed the setuid bit on `sudo` in a freak typing accident of a chmod gone wrong. Oops. So I locked myself out of a lot of fun times, including the package manager not working, and without any ssh server installed (yet) and no clear way of making it boot in single-user mode I decided to reflash the whole thing; returning it to its factory settings. Here’s a readers digests of my approach using OSX. It’s based on things found mostly here and here.

Flashing your device

I downloaded the flasher.macosx version if the flasher along with the it2006 OS image/Maemo 2.0 image

First you may want to backup your settings, bookmarks and whatnots using the builtin backup software. Then we do a complete fresh install by flashing the device with the image:


$ ./flasher-2.0.macosx -F SU-18_2006SE_1.2006.26-8_PR_F5_MR0_ARM.bin -f -R
flasher v0.8.1 (Jun 22 2006)

SW version in image: SU-18_2006SE_1.2006.26-8_PR_MR0
Image '2nd', size 8704 bytes
Image 'secondary', size 87040 bytes
Image 'xloader', size 13824 bytes
Image 'initfs', size 1890304 bytes
Image 'kernel', size 1266560 bytes
Image 'rootfs', size 60030976 bytes
USB device found found at bus 003, device address 002-0421-0105-00-00
Found device SU-18, hardware revision 1802
[..lots of fun stuff..]
100% (58624 of 58624 kB, avg. 814 kB/s)
Finishing flashing... done

And then when the thing boots back up we have to go through those fun initial settings again (language, datetime etc).

Making a smaller initfs

OK, so that was easy, except now our initfs partition is completely full, so we can’t install packages such as “becomeroot” and other fun things, so we have to find a stripped down version if that image. Luckily people smarter than me have already figured that out. (There’s also a smaller image here but I couldn’t get it to work as expected).

The smaller initfs comes in the form of a binary xdelta diff, but I had some issues with the `xdelta` dependencies, so in the end I had to install fink, just for the sake of getting xdelta in a quick way. So:

First “unpack” the image from the device:

  
  $ mkdir it2006-unpacked
  $ cd it2006-unpacked
  $ ../flasher-2.0.macosx -F ../SU-18_2006SE_1.2006.26-8_PR_F5_MR0_ARM.bin -u
  [...]
  Image 'initfs', size 1890304 bytes
  [...]
  Unpacking initfs image to file 'initfs.jffs2'...
  [...]
  
  

Then we get the smaller initfs image and apply the xdelta

  
  $ curl -O http://fanoush.webpark.cz/maemo/initfs.bootmenu.it2006.tgz
  $ tar xzf initfs.bootmenu.it2006.tgz 

  # Apply the xdelta
  $ /sw/bin/xdelta patch initfs.bootmenu.xdelta initfs.jffs2 initfs.bootmenu.jffs2
  # flash it onto the 770:
  $ ../flasher-2.0.macosx --initfs initfs.bootmenu.jffs2 -f -R
  [...]
  Sending initfs image (1537 kB)...
  100% (1537 of 1537 kB, avg. 859 kB/s)
  Flashing initfs... done.
  
  

And we’re laughing. But we’re going to laugh even more once we install Ruby 1.8.4 from here:

Bingo!

September 15th, 2006

A few days ago we launched Bingodisk, which is a 100GB WebDAV powered disk in the sky, with a public folder. Useful for storing just about anything and serving it up (or not) to the public.

Building Bingo! is a lot of fun, the bingodisk you’ll get is sitting on a ‘Thumper’ which is just a lovely piece of monster storage hardware and the actual frontend application is a Rails application that talks to the Thumpers via a distributed interface I wrote.

The really nice thing is that it uses a WebDAV interface which means that it’s possibly to mount it as a disk in almost every modern operating system (as usual, we had to jump through hoops to get it working properly in windows, but it does). WebDAV also supports things such as resource locking, and easily moving and/or copying resources. Now, WebDAV is a set of HTTP extensions, which means that it’s easy to talk to from an application. Let’s see how we could do that from a Ruby script using Net::HTTP. Unfortunately Net::HTTP doesn’t support Digest authentication out of the box (Basic auth won’t work with Bingo), but we can fairly easily add that, based on a snippet I found by Eric Hodel.

First you’ll need to get this file. It adds a digest_auth method to Net::HTTP.

Finding properties

To get a list of the resources available we’ll use the PROPFIND HTTP method, which will return a (rather large) chunk of XML, containing locking info, resource name and meta such as size and mtime. Here’s a script that lists the files at a given path:


  # list.rb
  require 'net_digest_auth'
  require 'rexml/document'
  include REXML

  abort("Usage #{$0} <username> <password>") unless ARGV.size==2

  ALLPROPS = <<EOS
  <?xml version="1.0" encoding="utf-8" ?>
  <D:propfind xmlns:D="DAV:">
    <D:allprop/>
  </D:propfind>
  EOS

  url = URI.parse("http://johan.bingodisk.com/bingo/")
  Net::HTTP.start(url.host) do |http|
    res = http.head(url.request_uri)
    req = Net::HTTP::Propfind.new('/bingo/tmp/', {'Depth' => '1'})
    req.digest_auth(ARGV[0], ARGV[1], res)
    response = http.request(req, ALLPROPS)

    puts "#{response.code} #{response.message}\n" 
    puts

    Document.new(response.body).elements.each("//D:response") do |r| 
      puts r.elements["D:href"].text
    end
  end

Apologies for the textile parsing error with the ARGV index

And the output:


  $ ruby list.rb johan@bingodisk.com secret
  207 Multi-Status

  /bingo/tmp/
  /bingo/tmp/mch.jpg
  /bingo/tmp/TextMateBook-beta.pdf

So what this does is that it first requests HEAD to get the things needed for the digest auth, then we create a new Net::HTTP::Propfind request instance and use the digest_auth_ method to set the user and password from the arguments given. Then fire off the request with a snippet (the ALLPROPS constant) of XML telling the DAV server we want to get all the props.

We’ll get back the “207 Multi-Status” HTTP request code and the XML describing the properties of the resources, on which it does a XPath query using REXML to get the filenames (the D:href element).

PUTting resources on the disk

Now let’s upload something, as expected we’ll want to use the PUT method, here’s a script that takes the username, password and a path for a file to upload into /bingo/public/code/:


  # upload.rb
  require 'net_digest_auth'

  abort("Usage: #{$0} <username> <password> <path/to/file/to/upload>") unless ARGV.size==3

  if File.exists?(ARGV[2])
    url = URI.parse("http://johan.bingodisk.com/bingo/")
    Net::HTTP.start(url.host) do |http|
      res = http.head(url.request_uri)
      req = Net::HTTP::Put.new("/bingo/public/code/#{File.basename(ARGV[2])}")
      req.digest_auth(ARGV[0], ARGV[1], res)
      response = http.request(req, File.read(ARGV[2]))
      puts response.code + " " + response.message
    end
  else
    puts "No such file #{ARGV[2].inspect}" 
  end

By running the script we’ll upload the net_digest_auth.rb file you:


  $ ruby upload.rb johan@bingodisk.com secret net_digest_auth.rb 
  201 Created

Nice and easy.

WebDAV might feel a bit more “bulky” than a straight up RESTful interface, but it’s really not that bad and the fact that it’s so well-supported in existing client programs is just friggin’ sweet.

© Johan Sørensen 2004-2007