Archive for the ‘ Best Practices ’ Category

HBase Backup/Export/Import Tool

We (Mahalo) have just released a backup and restore tool for HBase.

From the docs:

Attached is a simple import, export, and backup utility. Mahalo.com has been using this in production for several months to back up our HBase clusters as well as to migrate data from production to development clusters, etc.

Simple MapReduce job for exporting data from an HBase table. The exported data is in a simple, flat format that can then be imported using another MapReduce job. This gives you both a backup capability, and a simple way to import and export data from tables.

The output of a backup job is a flat text file, or series of flat text files. Each row is represented by a single line, with each item tab delimited. Column names are plain text, while column values are base 64 encoded. This helps us deal with tabs and line breaks in the data. Generally you should not have to worry about this at all.

HBase Backup/Export/Import Tool

  • Twitter
  • Facebook
  • Identi.ca
  • Digg
  • del.icio.us
  • Google Bookmarks
  • RSS
  • E-mail this story to a friend!
  • Turn this article into a PDF!
  • Print this article!

Speaking of Hadoop

We’ve recently switched the backend of Mahalo to use Hadoop for all of our text archiving needs. What’s Hadoop? Glad you asked…

Hadoop: When grownups do open source | The Register

Hadoop is a library for writing distributed data processing programs using the MapReduce framework. It’s got all the makings of a blogosphere hit: cluster computing, large datasets, parallelism, algorithms published by Google, and open source. Every four days or so, a nerd will discover Hadoop, write a “Basic MapReduce Tutorial with Hadoop” tutorial on his blog with some trivial examples, and feel satisfied with himself for educating the world about a yet-undiscovered gem. Comparatively, very few people actually use Hadoop in practice, and those who do don’t write about it. Why? Because they’re adults who don’t care about getting on the front page of Digg.

Read on. It’s great stuff, and you’ll definitely learn something useful if your site needs to…well…scale.

  • Twitter
  • Facebook
  • Identi.ca
  • Digg
  • del.icio.us
  • Google Bookmarks
  • RSS
  • E-mail this story to a friend!
  • Turn this article into a PDF!
  • Print this article!

Cache Your WordPress Blog

(Originally published on RefreshCleveland)

The power of microsoft
Creative Commons License photo credit: doyoukekko

In the past few weeks, I’ve helped some of my friends move their WordPress blogs to new servers. One of them had a consistent problem with their host because WordPress was hogging cycles on the shared server. We implemented the WP-Cache plugin, and things got better in minutes.

Jeff Atwood has written a terrific article about the perils of using WordPress without caching.

I’ve been thoroughly impressed with the community around WordPress, and the software itself is remarkably polished. That’s not to say that I haven’t run into a few egregious bugs in the 2.5 release, but on the whole, the experience has been good bordering on pleasant.

Or at least it was, until I noticed how much CPU time the PHP FastCGI process was using for modest little old blog.stackoverflow.com.

Read the rest of this entry »

  • Twitter
  • Facebook
  • Identi.ca
  • Digg
  • del.icio.us
  • Google Bookmarks
  • RSS
  • E-mail this story to a friend!
  • Turn this article into a PDF!
  • Print this article!