Already well past 1.0! Tap has evolved a great deal. Many of the big changes are internal — the object APIs are now well-defined and consistent, the environment is MUCH simpler, more powerful, and easier to configure, and there is far less cruft.

I hope to write up more posts but for now an update showcasing ways you can make workflows. The new syntax is much cleaner and more pure.

 
  % tap load abc -: dump
  vs
  % tap run -- load abc --: dump

Commands (ex run) has been dropped. Now the main ARGV given gets split along breaks to get multiple smaller argvs, each of which gets turned into an object. For instance:

 
  ['--', 'load', 'abc']  # the '--' is implied
  ['-:', 'dump']

The leading argument identifies a constant, later args get parsed for configs, and the break itself determines what to do with the remainder. For instance the first set of arguments is equivalent to this:

 
  require 'tap/tasks/load'
  task = Tap::Load.new
  task.enq 'abc'

Breaks all start with a dash. These are various ways of writing the same workflow, each of which will print ‘abc’:

  # as above, '-:' indicates a sequence join
  % tap load abc -: dump

  # the implied '--' means 'enque the next object'
  % tap -- load abc -: dump

  # objects can be defined without enque using '-'
  % tap -- load abc - dump - join 0 1

  # you can reorder as long as you keep the identifiers straight
  % tap - dump -- load abc - join 1 0

  # now use a signal to enque the load task
  % tap - load - dump - join 0 1 -- signal enq 0 abc

  # a little cleaner
  % tap - load - dump - join 0 1 -/enq 0 abc

  # still cleaner
  % tap - load - dump - join 0 1 -@ 0 abc

  # a signal sent to the load task directly
  % tap - load - dump - join 0 1 -/0/enq abc

  # now purely through signals
  % tap -/set 0 load -/set 1 dump -/bld join 0 1 -/0/enq abc

The last example is verbose but indicates the underlying nature of Tap — signals. Tap is now driven entirely by signals such that workflows can be created and driven interactively through a prompt, and later the syntax will translate to the web directly.

  % tap prompt
  /set 0 load
  #<Tap::Tasks::Load:0x1017d9298>
  /set 1 dump
  #<Tap::Tasks::Dump:0x1017b5528>
  /bld join 0 1
  #<Tap::Join:0x1017ac298>
  /enq 0 abc
  #<Tap::Tasks::Load:0x1017d9298>
  /run
  abc
  # future (for example)
  localhost:8080/app/set?var=0&class=load
  localhost:8080/app/set?var=1&class=dump
  ...

Signals mean that workflows can be written out to taprc files:

  [taprc]
  set 0 load
  set 1 dump
  bld join 0 1
  enq abc
  % tap --- taprc
  abc

Objects you load through a tapfile are stored so you can continue to interact with them:

  % tap --- taprc -@ 0 xyz
  abc
  xyz

The latest Tap also adopts and extends the rakish syntax that was Rap (Rap is now gone, FYI).

  [tapfile]
  desc "run a workflow"
  work :example, %q{
    - load
    - dump
    - join 0 1
  }
 
  % tap example abc
  abc

As with a rakefile, tapfiles can define new tasks, with configurations and extended documentation (1.9.1 output):

  [tapfile]
  # This documentation will show up if you run:
  #   % tap sort --help
  
  desc "sort a string by word"
  task :sort, :reverse => false do |config, str|
    words = str.split.sort
    config.reverse ? words.reverse : words
  end
 
  % tap sort 'the swift brown fox' -: dump
  ["brown", "fox", "swift", "the"]
  
  % tap sort 'the swift brown fox' --reverse -: dump
  ["the", "swift", "fox", "brown"]

Tap is much more flexible than it used to be, but you’re still able to do all the things from before like define, package, and distribute tasks as ordinary ruby classes. Check out the documentation for more info.

In studying the inheritance of methods I came across what I consider a surprising behavior of modules included into classes.

First, when you include a module into a class, the module methods are available in the class and they also propagate down to subclasses. This is reflected in the fact that the module is added to the ancestors of both the including class and all subclasses (be the defined before or after the include).

 
class LateIntoClassTest < Test::Unit::TestCase
  
  class A
  end
  
  class B < A
  end
  
  module LateInClass
  end
  
  class A
    include LateInClass
  end
  
  class C < A
  end
  
  def test_including_a_module_into_a_superclass_adds_to_ancestors
    # LateInClass is added to A
    assert_equal [A, LateInClass, Object, Kernel], A.ancestors
    
    # LateInClass is added to B
    assert_equal [B, A, LateInClass, Object, Kernel], B.ancestors
    
    # LateInClass is added to C
    assert_equal [C, A, LateInClass, Object, Kernel], C.ancestors
  end
end

What is surprising (at least to me), is that the same is not true when you include a module into an included module. I thought modules were kind of like superclasses; if you add to a module then you add to everything that uses the module. Not so.

Here you can see LateInModule is not added to classes that already include A. By contrast, classes defined after the ‘late’ include will add LateInModule to their ancestors.

 
class LateIntoModuleTest < Test::Unit::TestCase

  module A
  end
  
  class B
    include A
  end
  
  module LateInModule
  end
  
  module A
    include LateInModule
  end
  
  class C
    include A
  end
  
  def test_including_into_an_included_module_DOES_NOT_add_to_ancestors
    # LateInModule is added to A
    assert_equal [A, LateInModule], A.ancestors
    
    # LateInModule is missing from B
    assert_equal [B, A, Object, Kernel], B.ancestors
    
    # LateInModule is added to C
    assert_equal [C, A, LateInModule, Object, Kernel], C.ancestors
  end
end

You might take from this the lesson that modules are not superclasses, they are collections of methods poured into a class by include. But that isn’t the full story either. After all, you can modify a module and still have those changes propagate into an including class.

 
class LateModuleModificationTest < Test::Unit::TestCase

  module A
  end
  
  class B
    include A
  end
  
  module A
    def late_method; true; end
  end
  
  def test_included_modules_MAY_be_modified
    assert_equal true, B.new.late_method
  end
end

I find it tricky to express this behavior descriptively, even though the cause is clear; modules only add to ancestors when first included in a class.

Update: for those who are interested, Redmine has a feature request regarding how modules get added to ancestors.

Just finished presenting Tap at this year’s RubyConf.  Here are the slides.

Tap –[Not] a Talk About Replacing Rake

Getting a line feed into an input on *nix is pretty easy, all you do is hit return in the middle of a quoted input:

  % ruby -e 'puts ARGV.inspect' 'line one
  > line two'
  ["line one\nline two"]

On Windows, it’s more verbose.  Use a caret, the MS-DOS escape character, to ignore the next line feed:

  % ruby -e 'puts ARGV.inspect' 'line one^
  More?
  More? line two'
  ["line one\nline two"]

Notice that pressing enter every other line is what actually puts the “\n” into the argument . Keep using carets to enter more lines.

The syntax on Windows isn’t exactly pretty, and sadly it doesn’t work with gem executables; the .bat scripts generated by rubygems re-processes inputs before calling the .rb executable and the extra lines get lost in translation.  Still, it’s nice to know it’s (sort-of) possible!

I’ve been trying to speed up the command line response of tap by judiciously loading only what needs be loaded up front.  This script has proven quite helpful… it’s a profiler for require/load. Simply add the requires you want to profile at the end of the script and run it from the command line:

  % ruby profile_load_time.rb
  ================================================================================
  Require/Load Profile (time in ms)
  * Load times > 0.5 ms
  - duplicate requires
  ================================================================================
  * 21.6: yaml
  *   0.5: stringio
      0.2: yaml/error
  *   2.1: yaml/syck
  *     0.7: syck
  *     1.2: yaml/basenode
          0.3: yaml/ypath
      0.3: yaml/tag
      0.3: yaml/stream
      0.3: yaml/constants
  *   15.9: yaml/rubytypes
  *     11.8: date
  *       1.2: rational
  *       4.0: date/format
  -         0.1: rational
  *   0.9: yaml/types

The output flags requires that take longer than 0.5 ms, and requires that occur multiple times. Long requires are often good candidates for autoload… if you want to have YAML available but feel 22 ms is too long to wait up front:

  autoload(:YAML, 'yaml')

Then the file will be required the first time YAML gets used, if at all.

I ran into trouble when I tried turning rake into something it is not. Rake is a build program designed for build-like tasks; rake is not a platform for general-purpose task libraries. Given it’s design goals, rake very sensibly does not facilitate extensive documentation (who needs it to compile something),  inputs (although this has changed somewhat), configuration, testing, or distribution.

It’s also a dependency-based system; workflows constructed by rake are synthesized in reverse — it’ll be you, not your program, that gets forked when you try to make an imperative workflow. It’s simply the nature of rake! Rake is an excellent build program, but these types of things are in a different domain.

Tap (Task Application)

Tap was originally designed as a simple workflow engine, but it’s evolved into a general-purpose framework for creating configurable, distributable task libraries. Tap tasks can be defined in much the same way as a Rake task:

  # Goodnight::manifest your basic goodnight moon task
  # Prints the input with a configurable message.
  Tap.task 'goodnight', {:message=> 'goodnight'} do |task, input|
    task.log task.message, input
    "#{task.message} #{input}"
  end

Tap pulls documentation out of task declarations to generate manifests:

  % tap run -T
  sample:
    goodnight # your basic goodnight moon task
  tap:
    dump # the default dump task
    rake # run rake tasks

And help:

  % tap run -- goodnight --help
  Goodnight -- your basic goodnight moon task
  ------------------------------------------------------------------
    Says goodnight with a configurable message.
  ------------------------------------------------------------------
  usage: tap run -- goodnight NAME

  configurations:
          --message MESSAGE a goodnight message

  options:
      -h, --help Print this help
          --name NAME Specify a name
          --use FILE Loads inputs from file

Tasks are immediately available to run with inputs and configurations:

  % tap run -- goodnight moon
    I[00:09:55] goodnight moon

  % tap run -- goodnight moon --message hello
    I[00:10:01] hello moon

Task declarations define classes which naturally support namespaces, subclassing and testing.   When the shorthand declaration is not enough, task classes can be defined in the standard way:

  # Hello::manifest a hello world task
  # A more complicated hello world task illustrating
  # config blocks and a full task class definition.
  #
  class Hello < Tap::Task

    config :greeting, 'hello', &c.string    # a greeting string
    config :reverse, false, &c.flag         # maps to a flag

    def process(name)
      message = reverse ? greeting.reverse : greeting

      log message, name
      "#{message} #{name} result"
    end
  end

  task = Hello.new
  task.process('world')     # => "hello world result"
  task.reverse = true
  task.process('world')     # => "olleh world result"
  task.greeting = :symbol   # !> ValidationError

Configurations map to methods and can utilize a validation/transformation block.  Tap defines a number of common blocks (ex c.integer, c.regexp, etc.) that may also imply metadata for the command line (ex c.flag):

  % tap run -- hello world --reverse
  I[20:04:33]              olleh world

Distribution

Tap supports distribution of tasks as gems. To illustrate, say we installed the sample_tasks gem. Now our manifest looks like this:

  % tap run -T
  sample:
    goodnight   # your basic goodnight moon task
    hello       # a hello world task
  sample_tasks:
    concat      # concatenate files with formatting
    copy        # copies files
    grep        # search for lines matching a pattern
    print_tree  # print a directory tree
  tap:
    dump        # the default dump task
    rake        # run rake tasks

Tap checks the installed gems for a ‘tap.yml’ configuration file or a ‘tapfile.rb’ task file; any gems with one (or both) of these files gets pulled into the execution environment. Now tasks can be specified either by a short name when there isn’t a name conflict (ex goodnight, print_tree), or by a full name that includes the environment (ex sample:goodnight, sample_tasks:print_tree).

  % tap run -- sample_tasks:print_tree .
  .
  |- Rakefile
  |- lib
  |- sample.gemspec
  |- tapfile.rb
  `- test
      |- tap_test_helper.rb
      |- tap_test_suite.rb
      `- tapfile_test.rb

Not bad, eh?

Workflows and the Roadmap

Tap support simple workflows in the imperative style. Tasks can be assigned an on_complete block that executes when the task completes, allowing results to be examined and new tasks to be enqued as needed.

  app = Tap::App.instance
  t1 = Tap.task('t1') {|t| 'hellO'}
  t2 = Tap.task('t2') {|t, input| input + ' woRld' }
  t3 = Tap.task('t3') {|t, input| input.downcase }
  t4 = Tap.task('t4') {|t, input| input.upcase }
  t5 = Tap.task('t5') {|t, input| input + "!" }

  # sequence t1, t2
  app.sequence(t1, t2)

  # fork t2 results to t3 and t4
  app.fork(t2, t3, t4)

  # unsynchronized merge of t3 and t4 into t5
  app.merge(t5, t3, t4)

  app.enq(t1)
  app.run

  app.results(t5)       # => ["hello world!", "HELLO WORLD!"]

True support for workflows from the command line is lacking right now, but it will be coming soon. Here’s a short list of what else is planned:

  • Global Rake Tasks (hopefully!). I should be able to find and load rakefiles using the Tap execution environment. Tap already allows you to incorporate local rake tasks into a workflow using the ‘rake’ task.
  • Tap server. In a small-group environment where some people are computer savvy and others aren’t, it would be really useful to serve up your tasks using a web interface.
  • More test support. Tap provides several modules for testing tasks and supporting common types of tasks (ex file transformation tasks). Currently the modules are a bit incomplete, and they’re only geared towards Test::Unit. I’d like to add support for RSpec in the future.

Most of the time a web server parses HTTP requests before you get access to them in your code. Time to time however, it’s nice to actually see what the HTTP from a web form looks like, and you may have manipulate the message programatically.

Firefox has a plugin called LiveHTTPHeaders that lets you capture and view HTTP requests as they get sent out. These you can save to files and then load into Ruby using the following code:

  require 'webrick'  req = WEBrick::HTTPRequest.new(WEBrick::Config::HTTP)

  File.open("path/to/http.txt', 'rb') do |socket|
    req.parse(socket)
  end

Now req can be used as any other WEBrick::HTTPRequest. The input to parse can be any IO (like File or StringIO). The request will begin parsing an HTTP header from wherever the IO is positioned, and continues parsing until it reaches an empty line. This method works with multipart/form data as well. For example:

  POST /path HTTP/1.1
  Content-Type: multipart/form-data; boundary=1234567890
  Content-Length: 158

  --1234567890
  Content-Disposition: form-data; name="one"

  value one
  --1234567890
  Content-Disposition: form-data; name="two"

  value two
  --1234567890--

Used in conjunction with the code above, this message result in the following:

  req.header   # => {"content-type" => ["multipart/form-data; boundary=1234567890"],
                     "content-length" => ["158"]}
  req.query     # => {"one" => "value one", "two" => "value two"}

A couple notes about parsing HTTP using WEBrick in the current (1.8.6) version of Ruby:

  • As mentioned, WEBrick considers an empty line as a break between the headers and body of a message. The capture for multipart/form requests from LiveHTTPHeaders lacks this breaks, so you’ll have to add it if you’re using that tool. You should be ok if you’re parsing a non-multipart request.
  • Header parsing is forgiving with end-line characters (ie “\r\n” and “\n” are both acceptable) but parsing of multipart/form data IS NOT. Multipart/form data requires that the end-line characters are “\r\n”. On Windows, therefore, it is absolutely ESSENTIAL to open file data in binary mode (ex ‘rb’, as above) to preserve these characters.

This last summer I finished the joinfix gem providing a solution to the fixture join problem — mainly that it’s a pain to create fixtures by specifying entry ids across multiple fixture files. When Mike Clark and Chad Fowler opened up submissions for Advanced Rails Recipies, I sent JoinFix to them and it was accepted.

I took a look at the new Foxy Fixtures in Rails 2.0 and was shocked/pleased to find something similar has gotten incorporated directly into the core. Goes to show how problematic fixtures really were. JoinFix still is interesting, however, mainly because JoinFix lets you define entries inline.

Consider the following data model:

  class User < ActiveRecord::Base  has_many :user_groups
    has_many :groups, :through => :user_groups
  end

  class Group < ActiveRecord::Base
    has_many :user_groups
    has_many :users, :through => :user_groups
  end

  class UserGroup < ActiveRecord::Base
    belongs_to :user
    belongs_to :group
  end

You can write your fixtures using the naming scheme you lay out in your models, referencing entries across multiple fixture files (similar to Foxy) or you can define them inline:

   [users.yml]
   john_doe:
     full_name: John Doe
     groups: admin_group   # => reference to the 'admin_group' entry

   jane_doe:
     full_name: Jane Doe
     groups:               # => you can specify an array of entries if needed
       - admin_group
       - worker_group:     # => inline definition of the 'worker_group' entry
           name: Workers

   [groups.yml]
   admin_group:            # => the referenced 'admin_group' entry
     id: 3                 # => you can (but don't have to) specify ids
     name: Administrators

Join entries implied in your definition, as in a has_and_belongs_to_many association, will be created and named by joining together the names of the parent and child, ordered by the ’<’ operator. For example, the users.yml and groups.yml fixtures produce these entries:

  [users]
  john_doe:
    id: 1                  # => primary keys are assigned to all entries
    full_name: John Doe
  jane_doe:
    id: 2
    full_name: Jane Doe

  [groups]
  admin_group:
    id: 3
    name: Administrators
  worker_group:
    id: 1
    name: Workers

  [user_groups]
  admin_group_john_doe
    id: 1
    user_id: 1             # => references are resolved to their foreign keys
    group_id: 3            # => explicitly set primary keys are respected
  admin_group_jane_doe
    id: 2
    user_id: 2
    group_id: 3
  jane_doe_worker_group    # => Notice the '<' operator in action
    id: 3
    user_id: 2
    group_id: 1

Nesting is allowed. This will make the same entries as above:

  [users.yml]
  john_doe:
    full_name: John Doe
    groups:
      admin_group:
        id: 3
        name: Administrators
        users:
          jane_doe:
            full_name: Jane Doe
            groups:
              worker_group:
                name: Workers

In this final form, JoinFix defines a highly-involved fixture in one chunk, in one file. This can be a BIG advantage when you try to test some cross-table, complicated lookup. The full fixture is centralized and easy to manage.

calculus n. A branch of math dealing with derivatives and integrals, based on the summation of infinitesimal differences. The beauty is the origin of the word – Latin “small pebble” – as in the stones used on an abacus. I like the thought of making those summations one calculus at a time. Also a medical term for kidney/gall stones.

nervure n. Entomology: each of the hollow veins forming the framework of an insect’s wings. Botany: the principal vein on a leaf. (note the small spaces between nervure are areola, from Latin “small open space”. I like this other, less-well-known meaning. It adds something nice to the commonly known meaning.)

soubriquet n. A person’s nickname.

I have loved words forever. Years ago I had a large vocabulary – I especially liked rare words because they so often capture life exactly.

I’ve got a book. It’s called 2000 Most Challenging and Obscure Words. I’ve had it for ages. Yes I read the dictionary, and yeah, I dig it.

So, to start, here are three words with definitions lifted from the 2000 words book:

etymon (ET uh mon) n. The original root or true origin of a word.

bahuvrihi (bah hooh VREE hee) n. A bahuvrihi is a compound noun or adjective consisting of two parts, an adjective and a noun, the combination describing someone or something characterized by what is denoted by the noun. Ex: bluebell, bonehead, hard-hearted, redcoat, redhead, and bahuvrihi itself which is a Sanskrit word meaning “with much rice”, based on bohu- (much) plus vrihi (rice).

fundament (FUN duh munt) n. We are familiar with the adjective fundamental, a synonym of basic, underlying, which describes anything that goes to the root of the matter and is an essential part of whatever may be involved. We speak of fundamental rules, fundamental principles, a fundamental change or revision, a fundamental concept or idea. A far cry from fundament itself, meaning “buttox”, aka. the arse, behind, derriere, and nates with additional meaning, according to some dictionaries of “anus”, aka. the ass-hole, bung-hole, and lots of other disagreeable nicknames. The fundament, then, refers to the lower part of the torso in general.

Follow

Get every new post delivered to your Inbox.