Thursday, August 28, 2008

Reuters, Milliseconds, and Fail

Let's say you work for Reuters and produce something called Reuters Market Data System (RMDS) to send live market data to your customers, who pay you a lot of money for this. Let's say that for historical reasons, in general, RMDS won't give your customers any precision on ticks other than down to the second, and even that's transmitted as a string (no kidding here; you want to know the actual time that tick was for? Better get ready to parse a string. For every single tick. Seriously.).

That's just not good enough given that your tick delivery system isn't guaranteed to give clients ticks in order, and on the US exchanges you get updates on certain rapidly ticking instruments (like SPY and QQQQ) many times a second. So you decide to add an extra millisecond-precision field to the tick to indicate the millisecond for the tick. This seems like a great idea, so you do it.

Only, unlike every rational human being who would give you an absolute time from Unix epoch as an integer, you don't, because you're special. Like an olympian.

Rather, you provide a double precision floating point number of milliseconds since midnight, and don't actually document that in any way.

Let's cover why the double precision bit is so retarded, shall we?
  • I need milliseconds becuase I want more precision. That means that the least significant digits are actually the most important to me, not the least, because I already have other representations for the most significant digits. You're putting them in a data structure that strips away that precision.
  • It's not an unexact value. It's an exact value. Floating point representation implies a loss of precision by its very nature.
  • Nobody expects it. Milliseconds are integers. Always.
So why in the world do you do this? Simple. You're working with 25-year old technology that can't store 64-bit integers. Therefore, if your data type might be a 64-bit integer, you have to store it as a double precision floating point (because your data structure actually can support 64-bit values, just not integer ones).

But here's the kicker: because they're actually milliseconds since midnight, there are only 86,400,000 possible values. Therefore, you can cleanly put them into a 32-bit integer, which your data structure supports.

Meaning, you're doing this because you suck and you want to make my life miserable.

By the way, so that this is better googleable, the actual fields are SALTIM_MS, QUOTIM_MS, TRDTIM_MS, and TIMCOR_MS

Friday, August 22, 2008

Tiny Java Containers: None Of Them Do What I Want

Dear Lazyweb,

Although I find it difficult to focus on programming while BBC is live broadcasting Olympic Table Tennis, focus I must. But I have a conundrum: I want a Java code container, but I don't like any of the current methodologies that I'm familiar with. Please help.

Yes, I've looked into Spring and Pico/NanoContainer and all that, but their focus appears to entirely be below the level of what I'm looking at. Their focus appears to be running inside some type of application, rather than launching it (so you configure launching Spring within your Web Application: you don't start Spring at the command line). Some of my problems have to do with packaging and deployment of multi-jar projects, and the IoC containers don't work at that level that I can see.

Plus, while I totally buy IoC as a general principle and use it on a regular basis, I don't need a generic configuration file to do it, particularly not in test cases. If I want a configuration file for a particular context, I bloody write one. It exactly represents my configuration options, is typesafe, has a schema, and can be understood by not-me for later configuration. A configuration file that is capable of representing the entire universe of Java types and construction options is essentially programming in XML. For people who seem to hate J2EE, the IoC-container-crowd seem to have adopted its worst tenet: replacing simple, clear, short snippets of source code with a monstrous amount of generic XML.

But I digress.

Anyway, here's what I want:
  • Packages up a set of jar files as a single classloading context. This is pretty important because I think a big problem with modern composed Java applications is assembling all your various jar files and dealing with diamond dependencies. But anyway, I want something that I can pass around and say "this is the code for my application."
    Yes, I know OSGi is a better solution for this, but I'm not there yet, and I won't necessarily be there for quite some time in terms of OSGi-ifying everything I've got. Bundle of JAR files I can handle.
  • I want to be able to load multiple of those jar bundles at any given time in the same JVM.
  • I want to be able to run static methods (maybe even main itself) with arguments. Heck, it's my code, I'll even adhere to some type of contract (whether explicit interface or implicit through annotations) to give me better lifecycle support.
  • I want to be able to run lots of instances of those static methods (think message processors, each instance on a different JMS Topic). I want that to be dynamic (so I have one service kicking off others).
  • I want to scale those instances on multiple machines. I'd like that to be automatic.
  • I want to automatically handle failures of machines by distributing the load on that machine to other machines.
  • I want lots and lots of monitoring and administration.
  • I don't want to write a single line of code that I cannot adequately unit/functional/integration test outside of the container.

What I've been tempted to do in this situation in the past is to just use Tomcat as my generic container, and model everything as a Servlet with no Servlet mapping, so I'm only getting the lifecycle calls that I need and use of web.xml initialization parameters as my configuration. But that just, well, it feels dirty. Plus, it would take longer than I think would be reasonable to launch one of these processor instances, and if I'm running thousands of discrete processor instances, I need control to be quick.

My guess is that there isn't something like this out there.

My guess is that I'm about to write one. I'd rather not, but I don't think that this type of thing exists. If it does, it might be the SpringSource Application Platform, but I can't actually tell what that does to be honest. I mean, it kinda just looks like OSGi++, but I can't tell what the ++ is. (Is it really just Equinox with some extra bits for central repositories and remote configuration? Is that all?)

My guess is that the actual eventual solution here is largely just going to be my leveraging Terracotta for all my distribution and failover requirements, but writing the whole actual lifecycle management work myself.

Oh, and it has to do Java. Telling me Erlang/OTP gives you every single thing here for absolutely free doesn't help me when I'm under a strict Erlang embargo at work.

Help me Lazyweb, Help me!

Monday, August 18, 2008

AMQP: Involve End-User Developers

I had a chance to sit down at a pub (fabulous network effects having tech clusters provides, even if most of us in London are working for financial firms rather than software firms) with Alexis Richardson of CohesiveFT, one of the firms responsible for the RabbitMQ AMQP broker, and we had a chance to thrash out some of the issues and concerns that I've had with the current AMQP 0.10 draft and what it might mean for the eventual 1.0 release. This is the first post on some of the subjects that we discussed.

The first thing that the two of us were completely in agreement on was that the AMQP working group should have some mechanism for people like me to get involved. What do I mean by "people like me?" Well, I don't work for a vendor, but I am conversant with MOM products and standards, and I write applications that make use of MOM technology as part of my job. I don't have time to act as a full working group participant, but I have some bandwidth that I could use to provide feedback on certain areas of the spec. This, to me, would be a pretty useful thing, because vendor-heavy standards bodies tend to produce specs that most strongly correllate with what it is in the interests of the vendors, rather than the developers.

Furthermore, the non-vendor group (the User SIG, which disbanded after the Business Requirements document was produced, and is now part of the Management SIG, which appears to be almost nothing other than vendors at this point) seems to really be fixated on some problems which I don't believe the vast majority of MOM-based applications face. True, there are quite a few B2B MOM systems, where MOM infrastructure from one firm communicates directly with that of another firm through an automatic mechanism. But that's not most applications that I've seen, and the fixation that the Business Requirements has on inter-firm B2B MOM has is, at best, superfluous to most MOM applications. Most MOM applications never leave the firm.

So how do I get involved with AMQP? I can't see an easy way other than ranting about something on a blog that nobody reads, except that Google Alerts alerted someone to it, and hoping that people find me and ask me or challenge me. That doesn't seem right to me. I don't know what the solution is, and maybe the blogging thing is the solution, but it seems like there's got to be a better way for me to get more involved without going to the point of fully joining up with a SIG, which I don't really have the time for.

After this, Alexis alerted me to a blog post by Guy Crets explicitly questioning whether AMQP, even if all it does is help individual internal-software developers write better applications and never ever hits the b2b federation issue is worthwhile. Yes! Yes, yes, yes, a thousand times yes! Moreover, it needs to do that in order to get the type of momentum to even hit the federation stage.

Standards like this have realistically two ways of getting adoption: top-down or bottom-up. The top-down approach is that a bunch of vendors get together and write a spec and develop products that adhere to the spec and then ram it down my boss' throat who tells me to use it. Then you make sure that every vendor is on board, and you have the combination of management pressure and vendor support to drive adoption. And you better make sure that my manager has the ability to force technology on me, which usually means I would be working for a rubbish organization using Visual Basic or something.

The alternative is to get the people actually writing messaging applications based on the spec from an early stage. That means that you have to engage the early adopters, and not only the standard "my first app using X" blog post just to hit the front page of Proggit or something, but the people who are doing real development. Get so much code out there that works with the standard that it becomes de facto instead of de jure, and then everything flows from there.

I think you can tell which I prefer, but to get there, you have to be quite open that you want developers, even those who aren't running the NYSE or FedEx or Safeway or something like that, on board early. Given some of the comments on some of my blog posts, I don't think the AMQP working groups necessarily have that mind-set. But it's critical. I don't support a monstrous B2B automatic logistics tracking system. I work on asynchronous messaging in a financial services context. But there are a heck of a lot more of me than there are Walmart-level procurement applications, and a heck of a lot more broker instances at my scale.

Find a way to get application developers absolutely thrilled, and everything else will follow. Make me happy.

Wednesday, August 13, 2008

Next Open Protocol Failure: Perforce

I've been using Perforce for years (professionally since early 1998), and I like it. It's a great product. Not perfect, but the combination of the server, the community, and the tools are pretty hard to beat from the perspective of a centralized SCM system. But it's not the end-all, be-all that it once was (when the only real alternative was CVS, which really just needs to be put out of its misery for good).

Subversion, for example (originally started as a Perforce replacement according to the Collab.net people that I spoke with at the time, as I was working on my Open Source Database startup), has come along quite a bit. It's a pretty good system, and you could in theory replace Perforce with Subversion for a lot of uses. I use it for non-commercial work, and it's a fine system.

Perforce realizes that part of the real benefit of an SCM system is that you can build a development ecosystem around it, with alternative GUIs and Continuous Integration systems and bug tracking and automated scripts and all kinds of interesting tidbits designed to save engineers time and encourage proper development methodology. That's great.

But there has been a whole category of systems that have come into being that Perforce absolutely, positively, supremely refuses to support: Java developers.

There are a lot of people making tools that attempt to integrate with Perforce but using Java: Hudson (Continuous Integration), Ant (build tools), Maven (build tools), JIRA (bug tracking), FishEye (SCM introspection), Crucible (Code Reviews), Bamboo (Continuous Integration), Eclipse/NetBeans/IntelliJ (IDE). All of them are Java. All of them have very valid reasons to integrate with Perforce. And Perforce Software categorically refuses to help them in any way. Well, no, that's not entirely true, Perforce finally realized that they couldn't refuse to support Eclipse anymore and the existing p4eclipse plugin wasn't that great, so they released P4WSAD specifically for that.

Perforce supports a C++ client library, which they build for you on every platform they support (which, to be fair, is pretty much any platform that you're ever likely to ever want to do development these days), and they've built wrappers on top of it for the various scripting languages that usually just wrap C/C++ libraries: Python, Perl, Ruby. But there's no Java there, and there never has been. And while in theory you could JNI/SWIG wrap the C++ API, nobody who is realistic about supporting Java applications ever does that without a fallback that's native if they can possibly avoid it. It's just a world of fail down that path.

So what all tools do is just wrap the p4 executable and run that. Heck, that's what P4WSAD appears to do. And to be honest, that's fine for occasional use, but it's not really ideal for long-running processes that have to kick off a massive number of p4 interactions. Is that really the best they can do? And the existing libraries for wrapping the p4 binary in Java really aren't that great and have some serious performance implications. Sure, you could make them better, but the generic problem is that shelling out to a subprocess, executing a simple command, and then parsing a bunch of textual results really just isn't ideal. What you want is something that natively speaks the protocol in your programming language of choice. And for that, you need the protocol.

The core of the problem here is that Perforce refuses to document their binary protocol in any form. This means that if you want to write a purely native client for a Perforce server in any language that isn't C++, you're stuck. The only supported integration path is to link against their C++ client library, and if you don't want to have to deal with native dependencies in all your various programming languages, you're completely out of luck: you have to wrap the p4 binary. You want Erlang, Haskell, Jython, IronPython, JRuby, F#, OCaml, Lisp, Scheme, whatever, you're out of luck. You have to wrap the p4 binary.

From the API, it appears that their protocol is almost stupidly simplistic: send over a list of strings (the command and its arguments); server blasts back commands you should execute. Is your critical intellectual property wrapped up in such a simple protocol? One that I could packet-sniff and reverse engineer in an afternoon (while of course violating my license agreement). Really? That's how you're going to stop the advance of open source alternatives?

If I worked at Perforce, I would give up the whole thing and document my wire protocol. I might not make it completely open. I might open it up to key people initially over NDA (short-term fix), but I sure wouldn't force everybody in the world who wants to help your ecosystem grow but doesn't use your favorite programming languages to go through such serious efforts to wrap a stupid platform-specific application. I'd be bending backwards to make people want to pay me money for what I make money on, which is the p4d server application. Anything that keeps people paying me money is great. Anything that keeps people paying me money while actually reducing the number of things I have to provide them with to keep them paying me money is even better.

It's just like my SonicMQ problems. Closed binary protocols are bad if you ever want to code in a programming language your vendor doesn't really like.

Tuesday, August 12, 2008

Yegge Minimalism

I'm verbose, but not this verbose: Don't gather business requirements: hire domain experts.

Yep, whole post comes down to one sentence.

Oops, sorry, insert mental Spoiler Alert above.

Friday, August 08, 2008

Google Protocol Buffers and Streaming

Okay, I'm not going to give some huge rundown on RPC or services or network size vs. CPU efficiency. Just a little observation which gave me a little bit of a "huh?" moment the first time I used GPB for something.

Essentially, the way that Google Protocol Buffers are encoded can be seen as a small stack-based state machine that is computed as part of a Builder. The Builder holds the essential state of a particular message representation (such as an Address or AddressBook or something), and runs through the bytes in your wire representation, modifying its current state for the desired fields. When you think you've consumed everything, you then extract your Address or AddressBook or whatever from the Builder.

The commands of this state machine are pretty simple:
  • Set the value of Field Number X to value Y (encoded with type Z)
  • Push a new context onto the stack for a sub-Message
  • Pop the context off the stack to go back to the parent message
It's quite clever, but it can lead to some interesting situations. For example, it's entirely valid for a non-repeated field to be set multiple times, so if you're interpreting the network commands, you can get "Set Name to Kirk Wylie" followed by "Set Name to Wylie, Kirk", and (at least in the implementations that I worked with) you get the final value set to "Wylie, Kirk".

That seems like a little piece of trivia, until you realize that Google Protocol Buffers, unlike every other network message representation that I've ever worked with, lack both a sizing prefix and a terminator. Remember, it's a state machine, so it's just going to keep processing.

Again, trivia.

Until you try to store a sequence of discrete messages into a file or over a socket. In which case, what will end up happening is that if you don't explicitly do your own termination or size prefixing, the Builder will just keep processing commands and you'll end up consuming the entire stream and get one message output with only the final values for each field. So if I'm trying to save two Address messages, the first having a name of "Kirk Wylie" and the second having a name of "Wylie, Kirk", I'll only get one output, with "Wylie, Kirk".

This also has the side effect of implicitly, in Java, forcing you to do an unnecessary byte copy. You have to get the prefix number of bytes of the following message (and computing that in the first place before you do the serialization costs you in CPU time), extract the next N bytes from the stream to a byte array, and then have your Builder parse the byte array.

All annoyances more than anything else. But probably useful for other people to know.