You are currently browsing the monthly archive for January 2008.

Most of the time a web server parses HTTP requests before you get access to them in your code. Time to time however, it’s nice to actually see what the HTTP from a web form looks like, and you may have manipulate the message programatically.

Firefox has a plugin called LiveHTTPHeaders that lets you capture and view HTTP requests as they get sent out. These you can save to files and then load into Ruby using the following code:

  require 'webrick'  req ="path/to/http.txt', 'rb') do |socket|

Now req can be used as any other WEBrick::HTTPRequest. The input to parse can be any IO (like File or StringIO). The request will begin parsing an HTTP header from wherever the IO is positioned, and continues parsing until it reaches an empty line. This method works with multipart/form data as well. For example:

  POST /path HTTP/1.1
  Content-Type: multipart/form-data; boundary=1234567890
  Content-Length: 158

  Content-Disposition: form-data; name="one"

  value one
  Content-Disposition: form-data; name="two"

  value two

Used in conjunction with the code above, this message result in the following:

  req.header   # => {"content-type" => ["multipart/form-data; boundary=1234567890"],
                     "content-length" => ["158"]}
  req.query     # => {"one" => "value one", "two" => "value two"}

A couple notes about parsing HTTP using WEBrick in the current (1.8.6) version of Ruby:

  • As mentioned, WEBrick considers an empty line as a break between the headers and body of a message. The capture for multipart/form requests from LiveHTTPHeaders lacks this breaks, so you’ll have to add it if you’re using that tool. You should be ok if you’re parsing a non-multipart request.
  • Header parsing is forgiving with end-line characters (ie “\r\n” and “\n” are both acceptable) but parsing of multipart/form data IS NOT. Multipart/form data requires that the end-line characters are “\r\n”. On Windows, therefore, it is absolutely ESSENTIAL to open file data in binary mode (ex ‘rb’, as above) to preserve these characters.