Random Artifacts

Friday, October 24, 2014

A TDD story about a test that started "green"

This is a story is about the value of making sure you start with a test that's actually failing.

First a little context. I've been sharpening my TDD-fu and design-fu with a pet project: implementing the rules for Monopoly, the venerable and venerated game from Parker Brothers. (For some interesting background reading, check out the original rules and the Wiki page describing Monopoly's history.) The game's domain model and rules are fairly rich, making this exercise reasonably challenging and thought-provoking; and the rules are well defined, so the stories are easy to find and are clearly constrained. (There's little chance of me going off the rails chasing an interesting subset of the problem domain.)

Someday, ambition and time permitting, I might write a whole series of posts on this. Today I'm going to describe a particular situation and how it highlights for me the value of starting with a "red bar" when a new test is written.

This story begins with a new test named shouldNotAllowTurnsWhenGameIsOver. Its intent is to ensure that the winning player can't continue to take turns when game is over. There is a winner when all but one player is bankrupt.

I already had a test that ensured bankrupt players couldn't take a turn, and another that verified a winner was identified when all other players were bankrupt. But I didn't have any specific logic in the game implementation to ensure that the winning player couldn't take a turn after the game was over. So I was a little surprised when I ran my new test and it passed.

When I write a test I'm saying "I want this to happen when I do that". When I run that new test and it tells me I'm already getting this to happen before I've written any production code, there's something wrong.

A test that starts out green is a bad sign: first, you don't know that it can ever be red, and thus it can't be trusted to do anything useful. But equally importantly, it's a signal that either something in the production code is accidentally resulting in the test passing or the test is not valuable and ought to be scrapped or rewritten.

My initial thought was to rewrite the test to make it go red, but I couldn't figure out a way to change it that didn't rely on implementation details. So I dug into the code.

A little investigation revealed the answer. In the GameState class I keep track of which player will take a turn next in an integer field named nextPlayer. The GameState#getNextState() method figures out what the next state is when a turn is taken, and constructs an instance of the correct type:

       GameState getNextState() {
           if (game.hasWinner()) {
               return new GameOverGameState();
           }
           return new TakingTurnsGameState(nextPlayer);
       }

Notice the difference in the constructor calls to GameOverGameState and TakingTurnsGameState. The GameOverGameState constructor doesn't have a nextPlayer parameter, so its nextPlayer field is initialized to 0.

I fixed this by refactoring to always call the GameState constructor that takes a nextPlayer parameter. Once I did this, the test turned red and I could continue normally, implementing the code to make it green.

I also wanted to dig a little deeper. After a little root cause analysis (walking backwards through my commits) I decided I had made two mistakes earlier. The first was implementing code in a way that required me to know about (or assume) a future requirement. (I knew that once the game was over no more turns would be taken, so I assumed the value of nextPlayer didn't matter when creating GameOverGameState.) The second error was not ensuring that all fields of the GameState class were properly initialized on construction. (Don't leave your fields uninitialized, kids!)

Following a good TDD process helps find defects early and leads to a better design by making it safe and easy to refactor code. But it is very much not an automatic process. You have to pay attention and be diligent and conscientious. I was able to find and fix a bit of troublesome code today by paying attention when a new test was passing.

tl;dr, if you get a green bar first when writing a test, pay attention: there might be something smelly in your code.

Tuesday, September 16, 2014

Install jrnl on Cygwin

I ran across jrnl the other day and thought "ooh, a command line journal, how cool is that??" Because ... well, command line. (If I could move my mouse and click from the command line, I probably would. Maybe a vim plugin....)

So this is how to install jrnl under Cygwin, because it was (slightly) non-obvious:

First, you need the Cygwin python package. If you don't have it, get it. I installed under python3.2 which is (currently) the latest version of python3; you're on your own if you install under python 2. (Aside: the Windows version of python and the Cygwin shell do not get along very well; don't even bother trying this with Windows python.)

Next you'll need pip, the Python installer. Download the installation script at http://pip.readthedocs.org/en/latest/installing.html and then run e.g. "~/Downloads/get-pip.py". (My download directory is ~/Downloads.)

Once pip is working, run "pip install jrnl". You'll see pip do its thing, and a few moments later jrnl will be installed.

Last, you'll need to edit /usr/lib/python3.2/site-packages/jrnl/util.py. Go to line 130 and remove the leading 'u' from the string. Until you do this, running jrnl will result in a traceback and abend. For reference in case it moves, that line is:

return u"\033[36m{}\033[39m".format(string)

(I'm not a Pythonista but apparently Python doesn't support the leading 'u' for Unicode strings anymore.)

The default journal will be created as ~/journal.txt and the jrnl configuration file is in ~/.jrnl-config. Happy journaling!

Friday, April 04, 2014

Node.js Introduction from a Java Developer's Perspective

Over the past several years the Web 2.0 world has grown increasingly interesting and exciting. Client-side JavaScript has grown up, and it's now possible to develop full-featured browser applications using just JavaScript/HTML5.

And thanks to Node.js we can now develop entire applications in JavaScript on the server side as well.

I've spent a lot of time in the Java universe over the past fifteen years, and my involvement in the Web 2.0 universe has been fairly minimal. Over the past few days I've had some extra time on my hands, so I've taken the opportunity to dive into Node.js. And since writing things down helps me learn and organize my thoughts, here's a brain dump of my brief (so far) exploration into Node.js.

Describing and Comparing Node.js to Java/JEE

First, the tl;dr: Node.js is a platform (or container) for running network-based server applications written in JavaScript. It is hosted on the Google V8 engine (with a few other C++ libraries to provide I/O bindings etc.). It provides JavaScript APIs for building server applications.

The sweet spot for Node.js is dynamic, responsive, real-time data-oriented applications. Think social apps, browser-based games, etc. built for both desktop and mobile browsers. Pretty much, modern web applications that aren't compute-intensive.

Coming from the Java perspective I found it useful to compare Java/JEE and Node.js fundamentals:

Comparator	Java/JEE	Node.js
Language	Java	JavaScript
Runtime	JVM	Google V8
API	Servlet API	Node.js API
Container	Tomcat/Jetty/...	Node.js¹
DI/Lifecycle	Spring	Architect²
Package Management	Ivy/Maven	NPM
Build Tools	Ant/Maven/Gradle	Grunt/Jake/Mojito
IDE	Eclipse/Netbeans/JetBrains	Cloud9/Visual Studio/JetBrains/Eclipse

The Java Servlet Specification is 200+ pages of text that describes a standard which any conforming implementation must adhere to. There is no official Node.js standard, and "compatible" and "conforming" aren't things in the Node.js world. Node.js is an open source project, and is a collaborative effort supported and backed by multiple individuals, projects, and organizations.

Java and JEE are mature, stable technologies with a slow rate of change. Node.js is young and rapidly evolving.

The Java ecosystem (libraries, tools, containers, and applications for building, testing, deploying, and monitoring applications) is rich and variegated. The Node.js ecosystem is full-featured but not as rich or mature -- although this is rapidly changing. Node.js is being successfully used in a variety of production systems.

There are several Java IDEs with great support for Java/JEE. There are also several IDEs with support for Node.js/JavaScript.

Hello World 1.0

Let's dive into Node.js and write a Hello World application. I'm running Windows and Cygwin; I've kept commands platform-neutral but specific details may vary based on your environment.

Download and Install

First, you need to get Node.js.

Go to http://nodejs.org/download/ and download the appropriate installer/binaries.
Run the installer or unpack the binary. Node won't care where it's installed.
Add the Node.js root directory to your PATH.

Yeah, it's that easy.

Your First Hello World

Make a directory, let's call it hello-node. In that directory create hello-server.js with the following code:

var http = require('http'); // use built-in http module

// declare a function to handle a request
// this function is called by the engine when a request arrives
function fn(req, res) {
    res.writeHead(200, {'Content-Type': 'text/plain'});
    res.end('Hello Node!');
}

// create a server and tell it to start listening
var server = http.createServer(fn);
server.listen(3000, '127.0.0.1');

To start the Node.js server, run node hello-server from the command line. Yep, it's that easy.

Now go to http://127.0.0.1:3000 in your browser, and then sell your app to Google for $6 billion.

Single-Threaded Event Loop

The function called by the engine when a request arrives is a callback. You'll be using a lot of them in Node.js, so let's take a few minutes to understand the threading model Node.js uses.

In Node.js the main event loop is single-threaded. This means the application code you write all runs in the same thread. This is in complete contrast to the Java servlet model, where every request comes in on a separate thread. To get a feel for this single-threaded model, let's rewrite our (already very profitable) Hello World to introduce a long-running computation. Edit hello-server.js and update the fn callback function:

function fn(req, res) {
    setTimeout(function() {
        console.log("Waking up");
        res.writeHead(200, {'Content-Type': 'text/plain'});
        res.end('Hello Node!'); 
    }, 10000);
}

(This will sleep for 10 seconds and then call the anonymous function which sends the response to the client.) Restart the server. Now open two tabs in your browser. Go to http://127.0.0.1:3000 in both tabs, loading them at the same time. Watch the console output and your browser tabs; you'll see that even though you started loading both tabs at the same time, the second tab finishes loading ten seconds after the first tab finishes. This is a really critical point to understand: each client request is being handled by the same thread. The obvious implication is that you want the event thread to do as little work as possible, otherwise the response time for clients will quickly become unacceptable.

(As with all server applications, if you do care about scalability it's important to do some benchmarking with your favorite load testing tool on a regular basis during development.)

Callbacks, Callbacks Everywhere

We just learned the main event loop -- which executes the application code -- is single-threaded, which means we want to handle requests quickly. So how does this scale? And how do we keep from tying up the main event loop when we're reading a file or making queries to a database?

In Node.js, I/O is event-based and runs in separate threads in parallel to the main event loop. The basic programming model in Node.js uses asynchronous event handlers. We give Node.js a callback function, which Node.js calls when the I/O operation completes.

Let's revisit the first Hello World. The call to server.listen() is executed asynchronously. (Prove this by adding a call to console.log() at the end of Hello World.) Internally, Node.js uses separate threads to receive requests that arrive on this server's port and pushes each request onto the event stack. The event loop thread pulls these events from the stack and executes the code in the callback we provided in the function fn.

Hello World 2.0

Let's rewrite Hello World. We'll introduce the Express framework, the Node.js Package Manager (npm), and how to debug Node.js applications.

Describing Dependencies with NPM

If you're familiar with any kind of package management system (Ivy, Maven, RPM, yum, etc.) you know what NPM is. You can learn more, and browse the npm database, at https://www.npmjs.org/.

We'll use npm to download our dependencies. To start, create hello-node/package.json with the following contents:

{
  "name": "hello-world",
  "description": "hello world test app",
  "version": "0.0.1",
  "private": true,
  "dependencies": {
    "express": "3.5.1"
  }
}

Here, we're telling npm that our application depends on Express version 3.5.1. To download and install Express (and its dependencies), make sure your working directory is hello-node and run npm install from your shell. (Afterwards, you can run npm ls to see all of the packages it downloaded.

Now let's create a new Hello World application, more powerful than the last. Create a new file, server.js, with the following contents:

// declare our requirements
var express = require('express');
var http = require('http');

// create a new Express app
var app = express();

// handle GET requests to /hello.txt with a callback function
app.get('/hello.txt', function(req, res) {
  res.send('Hello World 2!');
});

// use a static mapping to convert requests with /static into /public
app.use('/static', express.static(__dirname + '/public'));

// intercept and handle request param :id
// put the value of the param into a variable in the request
// then call the next handler for the request
app.param('id', function(req, res, next, id) {
  req.username = 'user ' + id;
  next();
});

// intercept and handle :postid param
app.param('postid', function(req, res, next, id2) { 
  req.postid = id2;
  next();
});

// handle a GET for e.g. /user/2
// use the request variable added by the :id handler
app.get('/user/:id', function(req, res) {
  res.send('Hello ' + req.username);
});

// handle a GET for e.g. /user/3/4
// uses two parameters
app.get('/user/:id/:postid', function(req, res) {
  res.send('Hello user ' + req.username + ', post ' + req.postid);
});

function fn() {
  var addr = server.address();
  console.log('Listening on port %s:%d', addr.address, addr.port);
}

var server = app.listen(3000, fn);
console.log('Server created...');

Also, make a directory called public/ and create a public/foo.txt with anything you like in it.

Run node server and fire up your browser. Some URLs to try:

http://localhost:3000/hello.txt
http://localhost:3000/static/foo.txt
http://localhost:3000/user/2
http://localhost:3000/user/3/4

For the most part the source code is self-explanatory. We define some handlers for different routes (paths) and parameters and provide a callback for each handler. A few notes about Express:

The general syntax for defining a request handler is app.VERB(regex, handler1 [, handler2...]). VERB is a (lowercase) HTTP verb e.g. get, post, etc. The regex may be a simple path as above or any regular expression. The handler is a callback function. Multiple handlers may be provided, either as separate parameters or as an array.
Use app.use(regex, handler) to define a handler that will be called for all requests. If multiple app.use() handlers are defined they are called sequentially in the order they were defined.
Order is important. If two handlers are defined and match the same URI, only the first will be called. (Subsequent handlers may be called by explicitly invoking next().)

Express is a fairly small MVC framework for Node.js. There are several others, along with full-stack frameworks, and of course many other libraries.

Debugging with Node.js

For any "serious" development and debugging, you'll want to use an IDE. However, basic debugging for Node.js is easy to set up:

Run npm install -g node-inspector to install the debugger package (the -g puts it into your global Node.js installation instead of the current directory).
When starting your application, use either node --debug myapp or node --debug-brk myapp if you want to pause on startup.
In a separate shell, run node-inspector (I had to use node node_modules/node-inspector/bin/inspector.js from the Node.js installation directory, probably because Windows or Cygwin).
In Chrome, go to http://127.0.0.1:8080/debug?port=5858. (Make sure that port is not blocked.)

Other Thoughts and Opinions

Developing code in a strongly statically typed language feels very different from coding in a dynamic untyped language. In Java, you have no choice: you conform to the API presented, there's no ambiguity or question about the types and methods. In JavaScript, there is no type information available, and method resolution at runtime is pretty simple: if the object has a method with the name being asked for, it's called -- regardless of whether the right number of arguments are passed. And there's no "compile time" checking at all. So for a Java developer it can be uncomfortable.

Node.js is a lightweight platform, and JavaScript is ill-suited for large applications. But where there is a need, a solution will be created. There are projects, languages, tools, and libraries to make large-scale development feasible.

A summary of the main points:

Node.js is a platform for developing server applications, consisting of JavaScript running in the Google V8 engine (with some C++ libraries).
The main event loop is single-threaded, but I/O operations are asynchronous and there are techniques to offload CPU or time-intensive tasks to worker threads. Or you can use something like WebWorker to offload tasks to the client.
Callbacks are commonly used to handle the results of I/O operations.
Use NPM to manage dependencies. There are a lot of packages available...choose wisely.
Node.js and its ecosystem are rapidly evolving. Keep on top of the changes.
Since JavaScript is a dynamic language, unit testing is important. Refactoring will be painful without good unit tests, although some tooling support exists.

My initial impression of Node.js is a qualified thumbs-up. Things I like:

Low barrier to entry and supports super-quick iteration and development.
There are some really nice benefits that come from using the same language for both the server and the client.
There's a lot of excitement and activity in the Node.js universe; with the low barrier and rapid development this translates to a lot of new packages/tools being developed and rapid evolution of features in existing ones.
Node.js plays
Perhaps the most important, Node.js solves some real problems well, and has been battle tested in some fairly serious environments.

In the "minuses" column:

It's JavaScript, with all of its quirks and idiosyncrasies (not that it's unique in that regard). Plus, since it's an untyped prototype-based language with no support for large codebases, developing a largish application requires serious developer discipline to prevent the codebase from becoming too complex to work on.
Everything is a callback, and your code will be littered with them, and if you're not careful you'll end up with pyramids of death all over. (And a few things aren't callbacks and will kill scalability if you don't profile for them.)
Not really a minus but something to be aware of: because the Node.js ecosystem is churning so rapidly it's harder to find the "best" practices/patterns/idioms and picking the "right" framework/package can be a bit tricky. Do your homework and choose wisely.
Package support for third-party things is still coming along.
IDE and tool support is still maturing.

Frankly though, I think a lot of the complaints and issues are already addressable (for example using Typescript instead of JavaScript) or are being addressed in the near term. And on a larger scale, the universe of JavaScript -- languages (ClojureScript, CoffeeScript, Typescript, etc.), interpreters/VMs (Google V8, Mozilla SpiderMonkey, Oracle's Nashorn, etc.), and browser frameworks/libraries (Angular, Backbone, Ember, etc.) has a huge amount of momentum and synergy and life in it.

Summary and Further Reading

I've provided a brief introduction to Node.js -- what it is, what it's good for, and how to build, run and debug a Hello World application (and sell it to Google). Since I come from the Java world, I also compared it to the Java/JEE world.

Last, here are a bunch of articles and resources to help you journey further into the world of Node.js. Essentially this is a snapshot of the more interesting and useful things I've found over the past few days of exploration into Node.js. Your mileage may vary, caveat emptor, and all that.

Introductory and Informational

http://www.toptal.com/nodejs/why-the-hell-would-i-use-node-js - Introduction to Node.js

http://www.slideshare.net/chris.e.richardson/nodejs-the-good-parts-a-skeptics-view-jax-jax2013 - Presentation introducing Node.js (also from someone coming from a Java perspective)

http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/ - Describes the Node.js event loop

http://programmers.stackexchange.com/questions/221615/why-do-dynamic-languages-make-it-more-difficult-to-maintain-large-codebases - A discussion around the difficulty of maintaining large codebases in a dynamic language like JavaScript

Case Studies and Success Stories

http://www.nearform.com/nodecrunch/how-node-js-has-revolutionized-the-mailonline - Using Node.js and Clojure on the Daily Mail Online website.

http://strongloop.com/strongblog/mobile-app-development-with-full-stack-javascript-part-1-of-4-loopback/ - Developing a mobile app using Node.js (part 1 of 4)

Plugins, Libraries and Tools

https://www.npmjs.org/ - the Node Package Manager library, get your plugins here

http://nodeframework.com/ - A list of 30+ handpicked Node.js frameworks

http://gruntjs.com/ - Automated build system (aka Maven) for Node.js

http://expressjs.com/ - MVC web application framework

https://www.npmjs.org/package/webworker-threads - WebWorker threads, for moving long-running tasks off the main event loop

How-To and Tutorial

http://www.bearfruit.org/2013/06/21/start-a-new-node-js-express-app-the-right-way/ - One approach to getting started with Node.js applications

http://greenido.wordpress.com/2013/08/27/debug-nodejs-like-a-pro/ - Setup for debugging with Node.js

http://pettergraff.blogspot.com/2013/01/java-vs-node.html - A Java programmer's experience with programming for Node.js

http://www.codeproject.com/Articles/523451/Node-Js-And-Stuff - Longish walkthrough for developing a Node.js application

Monday, November 04, 2013

No Source, No Docs, No Joy

I recently worked on a couple of stories for a project which uses a certain commercial C++ library in its implementation. This library is truly a magical black box; it has a fairly simple API but behavior that is unintuitive and complex. The documentation is minimal, and the product is closed-source (and decompiling or stepping through a disassembly is a rather painful and time-consuming way to understand how something works). The company's support was okay, but less-than-responsive. And there is basically no community around the product. Because of this, working with the product was a slow, frustrating grind. Stories that "should have" taken about two weeks to finish ended up taking over three weeks entirely because of how difficult this library was to use.

The next story I worked on involved our persistence layer, which uses MongoDB. In contrast to the (nameless-to-protect-the-guilty) C++ library, MongoDB was for the most part a pleasure to use. It's a much larger product, and much more complex; we're a good ways from really understanding best practices. But the documentation is good and there is a healthy community around the product, so it's easy to get help.

Good products make it easy to overcome the initial learning curve and to answer "how do I..." questions with at least two of: good documentation, source code availability, good support, and a healthy community. These products are enjoyable to use, and any member of the team can almost immediately be productive when using them. In contrast, when some hard-to-use product is involved, one person often gets "stuck" with the story because they have the most experience with it. "Yeah I know we should spread out the knowledge, but Joe's the only one who knows how to use the FizzCrap API...". Poor Joe ends up spending half his time working with FizzCrap, and eventually he either goes postal and deletes the entire git repository or doesn't come in one Monday morning because he's found a job elsewhere as a night watchman.

Don't be the team that has a "Joe" on it. Usability should be one of your primary considerations when evaluating a product/framework/library, not an afterthought.

Friday, November 01, 2013

Quick Change Directory for Bash under Cygwin

I think about half the commands I type in a Bash shell must start with "cd ....". Since the Windows filesystem is inherently broken by design (http://secretgeek.net/ex_ms.asp) and pathnames can get pretty long anyway, this involves a lot of repetitive over-and-over recurring again and again typing of the same identical equal paths.

Doing the same thing more than once often presents an opportunity for automation. (See if automation will save you time first. Also consider whether your effort to automate something will help others. Then go ahead and automate it anyway, because you know you want to.)

Thinking I must not be the first person to have this thought, I did a bit of Googling and quickly found autojump.

Basically, autojump just remembers every directory you "cd" to, and you can then quickly jump to it. E.g. "j aut" instead of "cd /c/Users/Thomas/Projects/autojump". Five characters beats the heck out of 37.

Pluses: It just works, with very little cognitive overhead. After a few days of use, my fingers and brain are already starting to build a "muscle memory" of a few directories. I don't have to think about typing "j ..."; it just happens.

Minuses: It doesn't handle paths with spaces. There'll be no jumping to /c/Program Files/.... There's also a tiny-but-noticeable delay when you "cd" to a directory; it takes a while to spin up the Python interpreter. (Yes, it requires Python, so you'll want to make sure you have that installed.)

Installation is simple; I cloned and installed according to the instructions (I used the "--global" option to install under /usr).

Recommended.