Blog (joshisanerd.com)

Roku Firmware Download

So, over on my del.icio.us account, you'll find a link to download the Roku's firmware image. I thought I should mention that in a venue more people will see. I'm keen to share notes with anyone who's also poking at this. To prove you're serious, please include the significance of 192.168.251.0 in any introductory emails on Roku stuff. Feel free to pass this on to others of a curious bent, as I'm sure there's plenty for people to poke at.

The NED (Netflix Device) is quite a slick little box, and I highly recommend it if you're a Netflix Watch-it-now addict like me. Alfred Hitchcock Presents, The Outer Limits, Little Britain... really, this is a great way to watch classic television. ~~It'd be great if Roku would fulfill their obligations under the GPL, as I've got one of their boxes in my grubby little paws as we speak.~~ Whoops, spoke too soon! They have released source: Roku Netflix Player GPL sources. Thank you, Roku!

In other news, I'm between jobs at the moment. Since I'm no longer there, I can mention that I was at Paglo Labs, which is a cool idea. Someday I'll likely regret leaving the place. Unfortunately, the role wasn't really fitting as time went on (but, I stuck it out through the crunch of the Public Beta launch). There are about four people who seem to be warming to the idea of my employ, but I'm always keen to find more potential positions. There are likely two more going into the top of the hopper in the next day or two, but if you know anyone looking for an information person, let me know (that's stuff like data mining, search, or machine learning). Mo' option, mo' better.

Heathen Children Get Presents Too Day

I'm currently in Chapel Hill, staying with Kristina. We've done Heathen Children Get Presents Too Day, which is like Christmas, but for those of us who don't fit into the "Christian" category. (I used to have Atheist Children Get Presents Too Day for just myself, but broadened it this year).

Between my folks and K, it's been a very good year for presents. The real wins were a pocket hole jig and the charcoal Lamy Safari, in Fine. I also got the appropriate convertor, which is already loaded with Noodler's Legal Lapis.

Reawakening the blog

If I may quote myself rom the first post here:

So, this is the first post using my new "blog engine," which is just a messy set of of perl scripts, make, and the C pre-processor from GCC. At some point, I'll move over to something a bit more appropriate, but this works wonderfully for now. I think.

I never quite got around to fixing it up and settling it in. Then I stopped caring. Now, however, I feel like I should have a more active presence here. I've got a few little projects going, and feel like a personal outlet could be beneficial.

Therefore, I've dusted off the perl scripts, the macros, and the Makefile, and am bringing blog back.

Dryer Hacks

(Every so often, I have "clever monkey" moments. These are little creative flashes, insight or non-linear solutions to problems. They make me think I might, in fact, be half as clever as people seem to think I am. Which would make me twice as clever as I usually feel.)

For instance, the other evening, I wanted to wash the pillow slips for my feather pillows. You know those little zipper thingies you put around a feather pillow, to contain the feathers that sneak out? Those things.

This process is akin to cracking open a nuclear containment vessel, but instead of blue light that gives you cancer, you get a cloud of loose little feathers that stick to everything and look silly. And, if you simply wash the cover with feathers in it, they stick to the inside of your washing machine and show up on your clothes for weeks to come.

In the past, my solution to this has been to put on a nice linen shirt, go outside, and flail the feathery pillow slip around. Then, I carry out the pillow and whack it around for a bit. There were little white drifts in my back yard last time, in California, in August. And i still had the damned things in my hair for days.

What I really wanted was a second container for this. I consider doing all that inside a trash bag, or a grocery bag. Maybe I could tie the bag up, all puffy, and then beat it around for a bit. All good ideas, but there's still the problem of opening the bag up. What you need is a filter on the bag as you open it up and the air comes out.

Or, you can just stick the whole assembly in the dryer, open it up, and extract in there. Then, close the dryer, and run air through it for a little while. Et voila! All the feathers are handily sequestered in your standard filter, the pillows are delightfully fluffed, and the slip are wonderfully, well, de-fluffed.

A Revival of Sorts

Saturday was the unveiling of the Computer History Museum's Difference Engine. Unfortunately, I missed it because I was half-zonked on the couch all day with something like strep throat. That was a bummer, but it's okay, I had a bit of a religious revival.

Trolling around Google Video for something interesting, I came across Google's Tech Talks again. A total gold mine (and it'd be even better if people knew how to use microphones).

Supporting Scalable Online Statistical Processing was really interesting, about using statistical mechanisms and randomized algorithms to numerically optimize really hard SQL queries. I recommend it as a different way of thinking about working with large datasets.

However, the really significant talk for me was The Next Generation of Neural Networks, by Geoffrey Hinton. In it, he presents a new twist on good ole connectionism. This is one hell of a demo, despite the Chomsky-esque annoying political asides.

"I am interested in what you say and would like to subscribe to your newsletter." It reaffirms my belief in the plausibility of neural networks. There's a long way to go until they're practical, but this demo makes it clear that the promise is still there. I'm back in the fold, I believe again. No pulpit-pounding required, no fire and brimstone. Just the promise of heaven in a clean, intuitive, fundamentally simple model.

The best part, though, is that they publish all the code for this on his website. Unfortunately, it's in Matlab; fortunately, it runs in octave. When I've been up to it (ie: the few hours I was less feverish Sunday), I've started porting it to C++ with GSL, mostly to fully understand the underlying structure.

For now, though, I should get back to sleep. Tomorrow morning: to the doctor's, then hopefully to work. I've run out of watchable movies on Netflix on-demand, and, quite frankly, can't stand the idea of sleeping away another day when there's cool stuff to be done.

Huffman Trees in Haskell

I've recently been learning Haskell. As part of that, I'm implementing Huffman Coding. This is my first real project in the language. It's been overall quite pleasant, and has taught me a lot.

The biggest lesson has been thoroughly meta: compression tasks are a great way to learn a language/environment. For this project, I had to learn how to use modules, do I/O, mangle arrays, and define tree structures. It might not look like too much, but that's actually a huge amount of stuff to shove into a couple weekends of hacking.

Other things I've learned (or relearned), in no particular order:

You'll get a reasonable answer if you hop on IRC to for help in #haskell.
"You remember the mental leap from imperative languages to OCaml? I had about the same level of change going from OCaml to Haskell." -- My friend Evan, talking about Haskell (he convinced me to try it, at least).
The language-of-the-month club is incredibly time-consuming. It took forever to get really basic stuff down in a new language (and I even had the benefit of having already coded some in OCaml).
A good book makes all the difference. I highly recommend The Craft of Functional Programming. It's a great book, even if you've already been programming in other functional languages.

Anti-social Networking

This week, I did something tantamout to virtual suicide. I went through my social networking profiles and disabled, deactivated, or deleted most of them. After much consideration, it became clear that social networking sites were bad for my relationships, bad for for my social life, and generally bad for me.

Social networking sites are the high-fructose corn syrup of social interaction. High-fructose corn syrup in food is trouble for two reasons: it doesn't nourish you, nor does it fill you up. Similarly, social networking makes you feel like you're involved in another person's life, without providing the nourishing fulfillment meaningful interaction gives. You get a brief glimmer of fulfillment when you read a snippet of your friend's life, but it wears off quickly, leaving you needing more interaction. This means that you'll visit the site again, which is perfect for the provider: they get another set of ad views.

Don't believe me when I say it's less fulfilling? Let's try a thought experiment. You might get a brief burst of warm fuzzies when someone posts pictures of their newborn baby on facebook, but it's short-lived. Contrast this with someone walking around the office with cameraphone pictures of the same baby. There's a real difference in the quality of these two interactions: the in-person interaction is more affecting than the online one. There are myriad reasons for this, but the important thing is that the in-person, in-depth, in-excitement experience is much richer and more fulfilling than the mediated, short, distant experience of facebook.

Of course, high-fructose corn syrup is okay in moderation, when balanced out with something healthy. Enjoy a Coke when you go see a movie or are stuck in the airport. It's a nice thing, in balance. Similarly, social networking is fine if it's balanced out with more healthy interactions. That comes down to a sort of self-control, which is where I have a hard time, and what finally pushed me over the edge.

I'm a bit of a work-a-holic, having a hard time with work-life balance. I have an intense day job at a startup, where I'm a third of the engineering team. When I get home in the evening, I tinker on personal projects: more software, writing a book or two, and studying new things. I find a lot of satisfaction in these things, and therefore overdo it. In fact, I overdo it so much that my friends won't see me (in person) for weeks at a time. But they'll see my facebook status update every other day or so, "Josh looks forward to taking a day off," or "Josh is finally shipping his pet project!"

Social networking sites make it too easy to "work friends in" around your schedule. They're an enabler for this sort of thing, both in scheduling and perception. If I had to go to dinner to see my friends, I would make sure the few hours were well-spent, and connect with the other people. I might even stop thinking about work for a while. Social networking sites, however, reduce the cost of socializing, which is a great thing for keeping in vague touch with people. Unfortunately, this reduces the perceived value of relationships: there's less social capital invested in these short bursts of activity. This, in turn, makes them less seem meaningful to the participants.

The perception of value is a funny thing. In dating, there's a reason people play hard to get. It's the same reason that food you cooked yourself tastes better. Somewhere in the back of our brains, there's a tiny beancounter, keeping track of how much time, money, and emotion we've put into things. This little accountant isn't always rational or consistent, but generally, the more you put into a thing, the more valuable you find it. By reducing the cost of relationships, social networking sites accidentally trick us into thinking our relationships are less valuable.

Of course, often the relationships are less valuable. It is possible to hold truly deep and meaningful discussions with people online. It's a great medium for this, just as television is a great medium for teaching people. In television, you can show animated graphs, moving diagrams, and demonstrate experiments, all with expository notes (think Mythbusters). What's television actually used for? Fear Factor, Maury Povich, and so on. Similarly, social networking sites don't typically make good use of the medium. They encourage lots of short interactions, which are really great for ad revenue, but are terrible for meaningful connections.

Some sites are better than others for this, and allow you to grow a group of really great friends. This, though, also poses a problem. There's always someone out there to listen and offer advice. Therefore, you never have to think for yourself. Which means you never make your own decisions/mistakes in a vacuum. And, correspondingly, you're never forced into full independence. Collaboration is a great tool for developing new ideas, but it might not be the best thing for one's internal life, as it tends to encourage this sort of promiscuous codependency.

Now that the accounts are closed and the bookmarks are deleted, what do I do next? First, I set up a public Google Calendar. Josh's Google Calendar is a good first approximation of whether or not I'm busy at a given time. It's also a good motivator, reminding myself of the fact that I haven't seen people in X days, and maybe I should get out more.

Next, it's time to really clean up my house, so I feel confident in having people around more often. The war on slightly embarassing dustbunnies is nigh. After that, it's time to start collecting people's phone numbers. Along with this, I need to get better about calling people to hang out more often.

Maybe I should just create a new event on facebook and invite everyone.

First Post

So, this is the first post using my new "blog engine," which is just a messy set of of perl scripts, make, and the C pre-processor from GCC. At some point, I'll move over to something a bit more appropriate, but this works wonderfully for now. I think.

Introducing TDIK

(obdisclaimer: I don't speak for my employer here, and this is my idea, not theirs.)

Quick summary: Applying machine learning to uncover bottlenecks, predict system capacity, user growth, hardware purchases, and, generally, everything you need to know when running a service-based business. You give it your monitor data, and it gives you a diagnostic and predictive model of your system.

Time Data Into Knowledge (TDIK) is an idea I had a little over a year ago. Since I've talked with several people about it in that time, it's no longer patentable in the US. And, since I haven't actually done more than a proof of concept, I wanted to make a full public disclosure of the idea, in the hopes that it would inspire someone.

Imagine you have a multi-tier application stack, with fairly complete monitors. So, your frontend server, the message queues, databases, and a few backend processes (bulk and on-line). Additionally, assume you've got complete monitors in this stack: the normal machine telemetry (CPU usage, disk capacity, network utilization, etc), as well as application-specific stuff, like number of users, hits per second, message queue depth, etc.

Traditional monitoring systems give you graphs of all this; you do the analysis. The best you can hope for is a big display of all your graphs together, then eyeball them for correlations. You can shuffle them around to make it easier, but it's still human work. This kind of correlation is great for fires, where you have a sudden large shift in two variables and don't care about the precise magnitude of the relations. It's no good for a more valuable, big-picture task: capacity planning, where the relationships are less pronounced and more complicated.

That's where TDIK comes in. There are ways to find out how correlated two datasets are and then extract models of their relationship. You can expand these out to any number of combinations, though it gets much more computationall expensive. Once you have these models, though, they're invaluable.

You can make a model of your system yourself, using your knowledge of it. Your model will probably be darned good, since you built the thing. I've done these, and not only are they fun, they're handy. But there's always dark corners lurking around. What's the performance interaction of running MySQL and Squid on the same host, for instance? They're both memory-intensive, especially when big requests are getting bandied about. Ideally, I'd have separate hardware for them, but, well, you know how that goes.

TDIK, since it's learning the model from scratch every time, will find out how things interact on your particular system. It discovers correlations that don't seem straightforward, but make sense after the fact. Things like "Webserver load is highly correlated with the number of user profile views in full mode" (you eventually discover that someone accidentally left in the debug code that disables template caching there).

TDIK's models are useful for more than troubleshooting, though. You can also use them for planning. For instance, let's say your website has N concurrent users. Would you like to know how many users the current system can support? The model can tell you how many, and which component will be your bottleneck. Or, perhaps you know that you want to be able to support some number of users. The model could tell you how much you'd need to scale each component in your current system to get there.

What about firefighting? Your model reflects the steady-state performance of the overall system. If you have the last few minutes of monitor data, you can quickly re-correlate and see which components don't fit the model. In fact, you can see exactly how much they don't fit the model, and prioritize the order in which your team checks things out.

But why firefight in the first place? Using that same correlation, you can get alerts when the current state deviates appreciably from the model. One things I've always said about nagios alerts is that "They're only as good as your experience and creativity." If you don't know that a failure mode is waiting, you're not going to have a nagios alert prepared for it. TDIK's model obviates that, since it knows your system intimately. It will notice the increase in CPU time versus page hits even if the raw number of hits is low (say, at midnight), letting you identify and avert the morning meltdown.

"So where do I download or buy this product, or pay for the service?" you ask. Well, it's mostly still vapor. I did a small proof-of-concept, and found that modelling this many variables is noisy. And, honestly, I haven't had time to make this happen. It could probably be a startup, but I'm not certain of it yet. Feel free to drop me an email at josh@joshisanerd.com if you feel otherwise.