Tuesday, September 30, 2008

Interop News on AMQP: Huh?

Sorry, I don't usually call out at other writers, but this article makes utterly no sense whatsoever. I mean, m4d pr0p2 for giving a call out to AMQP, but WTF?

Some random stuff in there:
  • Right off the bat, the author (Jeff Gould) calls out Rendezvous as a guaranteed message delivery system. But that's actually the opposite of Rendezvous's raison d'etre. Rendezvous is best as a super-low-latency tick distribution system where its default best-effort semantics are good enough, not a guaranteed message delivery system. That's because it's an inherently distributed system. Yes, there are advanced features of Rendezvous that make it nearly guaranteed, but they're pretty ugly to be fair.
  • He conflates bridging with heterogenous client connectivity. In general, the JMS API-driven approach that most vendors do with interoperability actually is good enough, because there's going to be something that has to bridge them no matter what, and many different JMS-based systems allow for automatic bridging from a broker to another system quite easily. Just because the code wasn't all written by one vendor (the broker by Vendor A, the client library to interact with the other broker by Vendor B), it isn't a massive problem.
    Furthermore, most of the time when you have such an interop problem, someone has to write some bloody code to actually implement all the tiny little differences between the two systems (such as ID issues: if both companies are using the same software, there's probably a conflicting ID namespace somewhere that requires translation). That involves a bridge.
    Imagining that there will be a world where two companies running two ever-so-slightly different versions of the same software package will be able to just exchange messages willy-nilly and all will be well and the programmers can all just go home is rubbish.
  • He assumes that AMQP is a replacement for the two major products that he directly calls out to: IBM MQ (just calling it that since I will always think of it as MQSeries) and Rendezvous. And then refers to Sonic. But here's the thing: none of the three has almost anything to do with each other, and each is used in pretty different spaces.
    • IBM MQ is largely used in older systems (think flat-out legacy). It's big, it's slow, it runs on mainframes, and I really don't know of a single person doing work similar to me that's actually deployed it out of choice in ages. And you know what? AMQP will do absolutely 0 to that market. It's very legacy and very entrenched and IBM has absolutely no incentive to change it and never will.
    • Rendezvous is a completely decentralized MOM system ideally suited to low-latency best-effort systems (tick distribution, enterprise system monitoring). It does something completely different to IBM MQ, and you would never replace it with MQ. And it has competitors (29 West is my favorite to give a call-out to).
    • SonicMQ and other pure-play JMS vendors (Fiorano, Tibco EMS, ActiveMQ, ...) have a broker-based model with guaranteed delivery and best-effort built in. They're both a superset of the IBM MQ and Rendezvous systems, with some special sauce out of having at least a marginally intelligent approach to broker-based middleware worked in.
The point of my third bullet up there is to say that JMS attempted to define an semantic layer inside the application that you can use to develop broker-based messaging. You couldn't actually use Rendezvous underneath the JMS layer because you'd end up having to throw a broker in the middle (Rendezvous can't support the JMS semantic model), and then you'd just end up with Tibco EMS at the end of the day, but you can do best-effort delivery in JMS if you're willing to put a broker in the middle.

AMQP does a similar thing at the network protocol layer: it defines a semantic layer that assumes a centralized broker-based middleware system and wraps a network protocol around that. It really competes more with the JMS pure-plays than either IBM MQ or Rendezvous. But I don't think it as a protocol provides a compelling reason for people who actually know and want to be on a distributed system like Rendezvous to shift, and people on IBM MQ are there because they have to be.

None of this is to say that the core of his message isn't correct, which is that AMQP is a great concept for a variety of reasons, but his introduction is setting himself up for something he didn't need to.

I still stand by my original point:
  • Vendors cannot support every possible programming language, OS, chip, or compiler with a closed/proprietary client driver.
  • Programming language experts are superior at doing that and can provide open source versions.
  • This naturally and inevitably leads to possible interoperability gains.
That's the eventual victory of AMQP.

RabbitMQ at Google UK

Turns out the RabbitMQ guys have been busy, and Alexis and company went over to Victoria and gave a talk to the Google UK guys. Can't wait for the video to show up!

Monday, September 29, 2008

My Wireless Network SuX0R2

Dear Lazyweb,

Please help me get my wireless network at home working better. I am absolutely at my wits end.

The basic setup is:
  • We have a NetGear 834G acting just as a router (no wireless on the box, it's just a DSL modem/router and DHCP server)
  • That connects to a Belkin gigabit switch
  • Connected to that are all the wired hosts (e.g. a desktop machine, my NetGear/Infrant NAS box)
  • As well as the TimeCapsule which is acting as a wireless base station
  • Wirelessly connecting to the TimeCapsule in a WDS configuration (with no local connections) is an Airport Express, which has a wire coming out of it, to bridge (I kid you not) 10 feet that I'm not allowed to run a wire, so that the
  • Sonos receiver can participate in the network.
Here's the deal: I'm dropping packets like mad, and have super-delayed packets. This makes a lot of connections pretty much unusable, and is really starting to get on my wits, because it doesn't happen with the same computer setup in other locations (like our zoo, which has a cheap-ass BT-provided broadband modem/router/wireless access point, but which works perfectly).

And the packet sequence also involves (quite strangely), some dropped packets, and then some packets that are delayed for an excessive period (so I'll drop like 5 ping packets, and then it'll just stall, and then all of a sudden I'll get a massive flow, some delayed by up to 10 seconds). To give you an example of what the ping traffic looks like (192.168.0.15 is the NAS box, hooked up directly to the gigabit switch), take a look at this very indicative ping sequence (no drops, just the strange delay):

Macintosh-4:~ kirkwylie$ ping 192.168.0.15
PING 192.168.0.15 (192.168.0.15): 56 data bytes
64 bytes from 192.168.0.15: icmp_seq=0 ttl=64 time=1.545 ms
64 bytes from 192.168.0.15: icmp_seq=1 ttl=64 time=0.792 ms
64 bytes from 192.168.0.15: icmp_seq=2 ttl=64 time=7481.797 ms
64 bytes from 192.168.0.15: icmp_seq=3 ttl=64 time=6484.930 ms
64 bytes from 192.168.0.15: icmp_seq=4 ttl=64 time=5485.776 ms
64 bytes from 192.168.0.15: icmp_seq=5 ttl=64 time=4486.154 ms
64 bytes from 192.168.0.15: icmp_seq=6 ttl=64 time=3483.138 ms
64 bytes from 192.168.0.15: icmp_seq=7 ttl=64 time=2483.769 ms
64 bytes from 192.168.0.15: icmp_seq=8 ttl=64 time=1483.877 ms
64 bytes from 192.168.0.15: icmp_seq=9 ttl=64 time=484.237 ms
64 bytes from 192.168.0.15: icmp_seq=10 ttl=64 time=0.657 ms
64 bytes from 192.168.0.15: icmp_seq=11 ttl=64 time=0.674 ms

During the period between seq 1 and 2 arriving, nothing seems to be coming in over the wireless connection at all, for any application. But if you note, all the packets are blocking somewhere, and then all flowing through within a few milliseconds. This is quite common.

All these dropped packets and packet resends are leading to quite a difficult and horrible internet experience, because especially Web 2.0-related sites with lots of background AJAX don't really like having half of their AJAX connections failing or timing out or whatever.

Running WireShark doesn't indicate anything out of the ordinary.

I get the same results if I ping the TimeCapsule itself.

And on average, I'm dropping like 20% of packets. That's pretty poor all around.

During all this, I have pretty much perfect signal/noise, and that doesn't really fluctuate or have any correlation that I can tell with the packet drop or the packet delay.

Things I've tried:
  • I had an old AirPort base station (pre-draftN), and used that. No love.
  • I've tried 6 different uncongested channels (including low ones like 3). No love.
  • I've tried plugging the TimeCapsule into the router's built-in switch. No love.
  • I've tried turning off the AirPort Express, thinking it might be the WDS doing it. No love.
  • I've tried upgrading the firmware of every single thing that I can find. No love.
  • I've tried the same client systems (laptops and my iPhone) in other wireless networks to see if it's then. They work perfectly. No love.
  • I've tried putting my laptop literally right next to the base station. No love.
Dear Lazyweb, what in the world should I be trying next? I can't retrofit my entire flat with cables everywhere we want the InterTubes, so how in the world do I get past this current impasse? I mean, the obvious thing would be to assume that there's a fault in Apple's products, but my love of all things Jobsian means that I find it very difficult to accept that. I mean, if it is, it is, but I really don't want to have to go out and buy another access point just to prove that an Apple product isn't perfect.

If nothing else, the whole thing is getting embarassing, because my significant other is starting to believe that my powers of technical prowess might be limited after all. This makes me look pretty darn bad.

Kirk

P.S. Yes, with a cable, it all works just perfectly. It's only wireless that's a problem.

Friday, September 26, 2008

Technical Recruiters Need To Wake Up

In case any of you missed it, the financial markets are kinda imploding these days. And there are a lot of companies failing. This means that there's a fair number of people looking for jobs, and not that many places on offer.

This has recruiters going positively apeshit as they try to find some way to earn some money fast so that they don't also lose their jobs. This is troublesome for them, as most recruiters aren't actually very smart at all. So many of them are cold-calling employees with no job that they're trying to pitch, trying to find out if they're looking for jobs. Which is pretty horrible. And big firms that are looking to target from people like Lehman are using their own internal channels and doing it in bulk at the moment, and they don't want to get CVs from you.

So if you're a financial technology recruiter, you're pretty hosed at the moment, and trying to do anything just to survive.

But if you're a recruiter who's actually smart enough to google me before picking up the phone, lemme give you a few little secrets about what it's like to actually work in technology in Financial Services in London, so that hopefully you won't come across quite as uninformed as you actually are.

First of all, we all work in great big open plan spaces. Techies usually have a little more room than traders (unless you're (un)fortunate enough to actually work on the trading floor itself), but we're packed in pretty tight as these offices are pretty expensive and you want to maximize your utilization of the space. That means that every single person around you can hear every single word you say. There is no privacy in a financial services company unless you plan for it in advance.

Even worse, if you're calling me on my desk, if I actually work on the trading floor itself, I have a recorded line. Do you people even understand that? I've told recruiters I'm on a trading floor with a recorded line, and the idiots won't shut the hell up. You people have never worked in such an environment, but on these systems, virtually any person can listen in to any line they want without anybody knowing. These is no privacy, expected or actual, on any of these lines. If I'm on one, all I want to do is get you to shut up as fast as I possibly can in the most polite possible way so that if someone picks up the line they don't hear me talking to a recruiter.

I was even in New York on a trading floor and a recruiter called me from London (I was in New York on business). In our New York office, they're really strict. No cell phones on a trading floor. That means that if you call me on a cell phone, I either have to get you off the phone right away, or I have to leave the floor, which means everybody knows it's a personal call, and since it won't sound to others (remember: no privacy) like I'm talking with a friend or family member, sounds dodgy. I said "I'm actually in New York at the moment and on a trading floor, so I can't speak. Can you send me an email?" and the idiot just kept blathering on and on about something positively stupid, and after a trader actually pointed at the cell phone in my hand, I just hung up on him.

Moreover, many of us don't like talking on the phone at all. You do. I get that. I fully understand that you spend all day on the phone and like it. We don't. Most techies do a lot of work over email and IM and other non-spoken communications mechanisms, in part because of the interruption effects (you can control when you actually are focusing on IM and email; you can't control face to face spoken communications). I am supposed to pick up my phone when it rings, because it might be someone actually important. It's you. I'm busy. I don't want to talk. Just email.

I can respond to emails at my leisure when I'm not busy. Phone calls I can't. And if you call and I'm in flow, I have about 10 seconds to get you off the line without coming across like a complete asshole or else I lose flow. And if you ever make me lose flow, I will hate you to the end of my days and suggest you go talk to and attempt to recruit the most braindead people I've ever worked with to make you seem like a right moron and get sacked. And don't think I'm bluffing. I've done it.

What all of this means is that you really shouldn't ever try to pick up the phone to me. Pretty much ever in fact, but never ever ever should you cold-call a financial techie during work hours. [Don't want to talk to people out of hours because you'd rather be down the pub with your mates? Not my problem. I'm not the one trying to convince gainfully employed people to go somewhere else for employment. You are. Suck it up or quit being a recruiter.]

Even more, if someone indicates to you in the first gambit that they work on a trading floor (which I've done), here's what you should do:
  1. Shut up. Immediately. That's your sign to just shut your bloody trap. It means that the person you're calling is telling you in a polite way that they can't talk, at all.
  2. Ask for an email address (though ideally you've already got one).
  3. Send your contact details to their email address.
  4. Iff they want to talk to you, they'll call you. If not, ideally they'll politely email you "thanks but no thanks."
Deviation from this indicates you really don't get it, and it's not worth my time to speak with you at all.

But that's actually quite surprising. I've never met a single recruiter in London who's actually worked as a technologist. At best, I've found some who actually understand what we do and how we do it. They're few and far between. Prove you're one of them before you try to talk to a techie. Because when we sense you're just another CV hunter trying to justify your existence in a super-tough job market, we're just going to tune you out.

Note that in all of this I'm trying to be polite to recruiters in every interaction. That's because I want to know that if I actually was looking for a job, I don't have a notation in my record in your database of "is a total asshole" that will screw things up for me.

But no matter who I am, right now, if I've got a job, chances are pretty darn good I'm not looking for another one if I'm not working at an affected institution (which for the record I'm not, although things are changing every day), so you've got a pretty low chance of your cold calling working. So just accept that I'm going to be polite while declining your opportunities, and leave it at that, okay?

Friday, September 19, 2008

Outlook Cannot Exit; Exit Outlook?


Wanna know the funniest thing?

After hitting "End Now" Outlook was running just fine.

(via my manager, who actually runs Outlook)

Solaris 10: A Terrible Choice For Java Continuous Integration

As some of you might know, I'm not permitted to run Linux at work. Rather, I have my choice of Windows or Solaris 10 x86. (And no, I'm not allowed to run OpenSolaris, it's boring old Sol 10). I do a lot of Java. I love Continuous Integration. I started pushing Bamboo here in house. And lo and behold, I converted the masses, and we ended up merging projects from CruiseControl.NET, CruiseControl, and Hudson all to one lovely Bamboo instance. We ended up with 110 build plans in Bamboo, all but about 10 of which (dependant and overnight plans) were hitting Perforce all the time to determine whether they should build.

And then things went truly, truly, horribly wrong, to the point of my almost rescinding my Atlassian Shill status. But it turns out that it's only partially their fault. It's actually Sun's fault. Follow me here on a path down process forking details of joy.

What a Java-based Continuous Integration server does when dealing with Perforce, because Chris won't allow them to open up the protocol or provide a usable Java interface, is:
  1. Check to see if there have been any changes in Perforce
  2. If there have, sync code and run a build
  3. Go to 1
For step #1, you've got two choices. You can either say "give me all changes, and I'll figure out whether the change applies to this build plan" or you can say "give me any changes that apply to each of my build plans." The former involves much fewer Perforce interactions, but the latter is directly supported by Perforce itself, which makes it easy to implement, and thus I would argue more correct, since Perforce has a lot of logic about exactly this type of operation (I want a CI system, not another SCM system).

Here's where the whole thing turns to Fail.

When we started adding more and more projects to our Bamboo installation, it started running more and more slowly, to a point where on a 4-core dual-dual-Opteron box it was averaging a load average of 7, with up to 80% of time spent in the kernel. That type of load was so extreme that all sorts of things started going wrong completely mysteriously.

Because of the whole Java+P4 issue, you have to shell out to run the p4 binary to interact with the Perforce server. Under Unix implementations of the JVM, that involves essentially a fork + exec (under Windows it does not, so this is arguably a superior behavior). Here's where things get sticky.

Working with Atlassian, we figured out that in part, this was because Bamboo was being over aggressive in hitting Perforce, and under aggressive in caching things that seldom/never change. Moreover, it was the cost of actually performing the fork far more than the cost of the subprocess that was killing you. 2.1.2 (to be released on Tuesday) resolves this, and so checking whether there are changes goes from 3 p4 invocations, to 1 p4 invocation. A factor of 3, which in their artificial test suite results in a 76% performance improvement.

Now the one thing that ties everything together here is that the parent process here is a JVM. But not just any JVM, a JVM tuned to running big web applications with a lot of users (the same Tomcat instance hosted our JIRA instance as well). And what's the generic rule of thumb for running a JVM for a Servlet container? More heap. More heap, more heap, more heap.

More heap, more betta, right?

Wrong.

Turns out that JVMs on Solaris 10 blow (funny that, given that they're both from the same company; you might think Sun would have an interest in making sure Java ran best on Solaris, but whatever).

In order to do a traditional fork, although there is a copy-on-write optimization to avoid actually duplication of system memory that's going to be abandoned, in order to comply with the Posix standard for fork, Solaris has to reserve enough swap space for the forked subprocess. Now in top, you won't see it, because that swap space isn't actually in use, just reserved, so you can't have it, but it's not being used. Great.

This is terrible, and is precisely why things like clone() were invented in Linux and used to great success. My best recollection (can someone clue me in here, Lazyweb?) is that this is what Runtime.exec() and ProcessBuilder.start() do on Linux-based JVMs. Solaris traditionally didn't have such a beast, so you're stuck with old Posix fork() behavior. Which sucks for this.

So what Bamboo is doing is running 3 2GB heap forks for every Perforce interaction, which is really really really bad.

When one of my colleagues did a fork test here to see how fast he could do a fork+exec in C with a 2GB memory allocation (and forking to a really small sub-process) on the same hardware and OS configuration, it turns out that doing that you're limited to 3/second, and it consumes about 50% of the machine's total CPU utilization in the kernel. Given that Bamboo was attempting to do the same thing in multiple threads, the fact that it was hitting 80% kernel utilization makes complete sense.

Hence, when I reduced the heap to a mere 512MB, Bamboo actually ran faster. Much faster. Back to being usable factor, even with my 110 plans. Once 2.1.2 comes out we should be really rolling, and we might be able to achieve our ultimate goal, which is to have every single build in our group managed from one metadata server with build agents scattered to the winds. And that would be sweet.

Oh, and don't think Sun doesn't know about this. Turns out in Solaris 10 they added a system call specifically to solve this problem. It's called posix_spawn, and it does precisely what you'd want. And Sun hasn't changed their JVM on Solaris to use it, probably because Sun, like everybody else in Solaris land, targets Solaris 8 for all those people who refuse to upgrade.

And I think that says a lot about the Solaris community.

Friday, September 12, 2008

More Reuters Fail: Does Nobody Use This Stuff?

To follow on from the last Reuters-related voyage of fail, I'm now looking at another one of their products, OpenDACS, and it's got yet more fail. First, some (not particularly) brief background.

When you buy financial data, the stock/commodity/whatever exchanges that are selling the price information want to make sure that you are controlling that data, so that they can get as much money as they possibly can. Therefore, there are all kinds of tight controls on what you can do with it once you get your hands on it, so that the stock exchanges can know exactly how much to charge you for realtime price data. The Reuters/RMDS form of that is called DACS (Data Access Control System), which allows you to locally administer who in your organization is supposed to have access to what types of data, and then every month you run a report and send it to Reuters and they tell you how much you have to pay based on who was permissioned to see what types of data during the course of that month.

In contrast, the last time I worked with Bloomberg data, to change anything like that you had to go through your Bloomberg Representative, which took longer. So the fact that DACS is a local system is actually not a bad part of their design, because it means you don't have to talk to your sales guy for every small change. I mean, in theory you could hack your local database and screw those nasty exchanges, but it doesn't really seem worth it, so the exchanges trust the reports.

Now let's say you're doing a server-side transformation like calculating implied volatilities from listed equity option prices. In general, many transformations like that you want to share amongst multiple end users, because they can be expensive to perform, but you need to make sure that the user who wants the implied volatility has the rights to the underlying instruments. (In non-financial terms, if you have a server-side process calculating a function F(A, B, C), then in order to be able to get the results of F applied to A, B, and C, you need to have the rights to see A, B, and C in their raw form). This is because the end users aren't actually seeing A, B, or C, they're seeing F(A,B,C), so a server is going to listen to A, B, and C on behalf of all users, and then compute F. But this isn't a use case that exchanges are comfortable with, so you have to prove to them that you really are checking your permissions all the way through: you have to prove to them that you're checking against DACS.

DACS works in terms of Permissionable Entities (PEs for short) and Items. An Item is an individual market instrument that you want to get data on. Cisco is an Item. AT&T is an Item. An individual option on Cisco or AT&T is also an Item. There are lots and lots of items. But most of them are paid for in the same bundle (you never call up the London Stock Exchange and say "hey, I'm only interested in real time updates on Vodafone and BT, so just sell me those"). Those bundles are Permissionable Entities, and Reuters sells packages of Permissionable Entities to you.

Yes, that's all just background.

Here's the juicy fail.

OpenDACS allows you to ask "what are the PEs that user X has rights to?"

OpenDACS also allows you to ask "does user X have access to Item Y?"

(stop me if you know where this is going).

Apparently nobody at Reuters thought to hook up the two. That's right, there's no documented way to either:
  • Get the PE applicable for a particular Item; or
  • Get all the Items that a PE covers.
And in point of fact this data must be available somewhere, or else Reuters couldn't work at all. Some where in Some database they have this data. They just can't/won't give it to you.

Even better, after contacting Reuters, there's no officially supported way to do this at all, documented or undocumented. Nobody ever thought that you might want to do this.

This means that what should be an M+N problem (one call per user to get his PE list; one call per Item to get the PE) has turned into an M*N problem. Given that these are remote calls, that's not great, and traders aren't going to be happy starting up in the morning to wait 15 minutes while all the items that they need are checked against the database. Which will actually happen for all users at once. What joy!

Because apparently I'm the only person in the history of the world to ever actually use any Reuters API and realize that it fundamentally Fails at its most obvious use case.

Monday, September 01, 2008

More on Open Source Cookie Delivery

Matthew Aslett did a write-up where he's starting to call what I called Split Licensing "Open Core Licensing" (which by the way, I think is a little silly, because this whole "core" thing in enterprise software makes me think of multi-core per-processor discounts, so it's yet another suitably overloaded word, but that's neither here nor there), and did a shout out to my previous article on open source business strategies.

One thing that he mentions is that it's difficult to figure out what the cookie should be. I don't think it's that particularly hard if you come from a traditional marketing background: it's all about market segmentation. (non-Joel-specific writeup from the Borgmind here). The only distinction here is that many (if not most) of your customers in an Open Source context aren't actually paying you anything, so essentially you have "customers" who pay you nothing, and you're trying to convert them to give you some money. Any money. For anything.

So what should the cookie be?

I recommend starting from the position that there are two user bases:
  • Users who will never give you a single penny, no matter what you do for them, under any circumstances. This is the vast majority of all open source users, and you just have to accept that they are who they are, and that they provide community and network benefits that are extremely valuable to the project over time.
  • Users who might give you some money iff you had the right cookie.
It may look like we're no closer to figuring out what the cookie is, but we're closer than you might think. What does the second group look like?

I would posit that they're people who have money to spend on IT, and aren't running a shoestring budget (the shoestring group aren't going to be paying you for anything if they can get away with Open Source + Google for support).

I would also posit that they are usually either dumb enough that they're going to pay for support and "insurance" [sic] (gag me) no matter what just because That's What They Do, or they're intelligent enough that they won't. If they won't, I would also posit that they're more technically advanced than the first group, and thus are working with more advanced technologies than the former, either because they face problems that require them to, or because they are just smart people who like to work with that type of stuff.

I would finally posit that anybody who claims that they have an ideological reason why they would never run non-Free software is a complete red herring: they don't exist in the Real World, and by the Real World, I mean the world of people who might ever pay you for anything, and they usually smell. (Seriously. You ever met RMS? Dude: Body wash may not be Libre, but instructions to make Soap are public domain, and no matter what, both are bloody cheap. Buy some. And then use it. Particularly if you want to have passionate awareness of a woman.)

There's a reason why Financial Services firms were always the holy grail of early adopters in Silicon Valley: they tend to fit this profile. But there are other, nimble, advanced firms across the world who also fit it that aren't in financial services.

So how might you work all this together? I'd say that these firms usually are:
  • Dealing with issues of scale. Big budget usually means big problems. That usually also implies some type of standardization, which usually means that if they go with your product, they're going to go for a single point POC, and then roll it out for a lot of stuff.
  • Dealing with complex support and availability scenarios. Things Go Wrong == Job Loss. That implies a lot of work is going to go into management and support of anything they're doing, that the "heck, it's down, just bounce it" people aren't going to get involved in. Think JMX/SNMP, Active/Active, visualization.
  • Dealing with advanced/expensive technologies. An example might be Infiniband for an enterprise software product. I'm sorry, but it's fringe and expensive and complex enough that the pure-shoestring peeps aren't going to have it (unless they're doing supercomputer stuff, but then the only real reason they have it is for their MPI implementation). GPGPUs are on the advanced side, in that they're still pretty fringe outside certain areas (although that's starting to change over time). RAMSANs and other esoteric high-performance storage systems are in there as well (as are anything having to do with a proper FC fabric bizarrely).
  • Dealing with fringe platforms. You're not running on Linux on Intel/AMD? You're now in the realm of Stuff People Might Pay You For. AIX? HP-UX? Itanium? Solaris x86? Power Architecture? All stuff you're going to find in a lot of enterprises, and if you actually know what you're doing there (and just getting your code to compile doesn't really qualify you as knowing what you're actually doing), there's scope there.
  • Dealing with compliance. You're in financial services or health care or government? You have compliance issues. Those compliance issues you will gladly pay someone to take off your hands. Zantaz integration? HIPAA certification? All scope for money.
So let's put it all together. Let's say you're a new AMQP company that wants to sell an AMQP broker. You produce a core product that's your basic AMQP engine. You then produce an Enterprise version that:
  • Allows RDMA publication of messages in zero-copy mode over Infiniband;
  • Working on Power architecture machines running AIX; and
  • Blades with Cell processors; and
  • Logs all messages to Zantaz optionally; and
  • Has built-in configuration for dozens of similarly configured brokers; and
  • Publishes alerts to my whacked out network management software; and
  • Has built-in/native support for RAMSANs (as opposed to any arbitrarily fast FC-based storage pool); and
  • Auto-configures itself into an N+1 redundancy configuration; and
  • Provides feedback on application connections and use cases that are relatively slow with implementation-specific alternatives and suggestions.
(Note: Feature list only partially pull right out of my ass). You get the drift. For any type of software you can similarly come up with some type of list of features that is only ever really going to appeal to the group that might pay you money. Focus on the problems that only affect them (hint if you're in the business app and not infrastructure space: look at the compliance and retention and data protection stuff; the people who aren't going to pay you won't do it properly, but those of us who have to will gladly pay to make the problems disappear). There are a lot of them, and if you see someone complaining that a Free As In Beer product won't integrate automatically with their 7-Figure-Commercial-Software-Package, you can tell them to shove it. Seriously, that's just crass.

Now you've got the idea going on. Cookies galore.

Are there things in there that the non-fee-paying users might find useful? Sure. But you pack enough of them into one package and you've got something that starts to make logical sense between the Open Core version and the Pay Me Money version.

P.S. I take royalty payments in actual cookies. Seriously. Millie me up, yo.