What is Toluu?
Toluu is a free service for sharing the feeds you read and discovering new ones.
Get Invite

The Making of MarkMail

Welcome to the MarkMail team blog. Here we discuss MarkMail enhancements, new mailing list archives, and (perhaps most important) discuss the challenges of building an Internet service for searching large, million-message mailing list archives using XML, XQuery, and MarkLogic Server.


MarkMail at One Year: Looking BackDecember 9 2008
It's now been a little over a year since we launched MarkMail. We've sure come a long way!

We're now seeing well over a million unique visitors every month and more than 5 million page views.

The Googlebot crawler (whose activity isn't included in the above statistics) has also been active. It now crawls between 1.0 and 1.3 million pages every day to keep its index fresh. That's about 15 page hits every second -- or 15 Hertz, enough to make a nice low background rumble noise. It's really enjoyable to get so much Google attention because it wasn't that long ago when we were just trying to get Google to index more than a million of our pages, nevermind crawl that many in a day.

Our content size has grown also, from 4 million messages at launch, covering just the Apache Software Foundation archives, to 34 million messages today, spanning all sorts of communities. For us to grow so big so fast has been possible only because of the community support we've received. There's a long list of various community members who have worked with us to accumulate and load their list archives. We'd like to thank all those folks, as well as the people who placed a MarkMail search box or other MarkMail link on their site or helped spread the word in blogs and emails and twe





Google Code Adds Gadgets: MarkMail HelpsOctober 9 2008
Google today announced new support for embeddable "gadgets" on Google Code project pages. Particularly exciting to us, they introduced MarkMail as the recommended gadget for viewing and searching Google Code project list archives.

For those who haven't encountered one in the wild, a Google Gadget is an embeddable web object that puts a bit of third-party dynamic content into the middle of a web page. Gadgets are the things you place on your iGoogle home page or your Google Desktop, but you can also add them to your own web page with one line of JavaScript, or anyone else's page if it supports the OpenSocial APIs.

We've coordinated with the Google Code team over the last several months to load about 500 GoogleGroups lists (3.8 million emails) and build a new MarkMail Gadget (launching today!) to let Google Code developers search and analyze their lists using MarkMail.

The new MarkMail gadget lets you view messages, threads, attachments, and senders, and a traffic chart (wouldn't be MarkMail without it!) for any set of messages you want to follow. The messages you choose to track with the gadget can be those from a single list, set of lists, a person, containing a term or phrase, or any





1.4% of Emails Mention GoogleOctober 2 2008
As Google celebrates its 10 year anniversary we thought it'd be fun to use our archive of 30 million mailing list messages to see how Google's popularity has grown over time across the list-o-sphere. Boy has it grown!

In 2008 (so far) the word "Google" appears in 1.4% of emails in our archive, up from 1.15% last year and 0.75% five years ago.

While shockingly high, that 1.4% number is actually calculated with some conservative restrictions. We're excluding all mentions that occur inside quote blocks (where someone replies to another who said the word). It'd be 2% if we didn't have that rule. We're also excluding from our calculations all the Google Groups lists we follow, where Google is often the topic of discussion. With those lists added in? It's 13%.

You can explore this yourself with our public interface. You'll want to query for "google", use the "opt:noquote" flag, and set "-list:googlegroups" to exclude those lists. Then you can add date constraints either by typing "date:2008" in the search or dragging on the chart. Track the numbers as a result of your searches, do a little division, and you get your percen





Ruby on Rails on MarkMail: 200,000 EmailsSeptember 22 2008
Interested in Ruby on Rails? If so, you'll be happy to learn we've loaded the full RoR mailing list archive. It holds about 200,000 emails and includes both the original Mailman lists from 2004-2006 and the GoogleGroups lists from 2006 onward.


Fun facts:
Don't forget, we have the regular Ruby lists too.400244292




FreeBSD, the Unknown GiantSeptember 11 2008
My last entry about NetBeans and OpenOffice.org and their million messages reminded me that I've never announced here our load of the FreeBSD archives, an even larger and older community. They have more than 2.5 million messages, stretching back to 1994.

FreeBSD doesn't get as much attention at Linux but is a great operating system. Here's a description from an IBM developerWorks article:

"The FreeBSD operating system is the unknown giant among free operating systems. Starting out from the 386BSD project, it is an extremely fast UNIX®-like operating system mostly for the Intel® chip and its clones. In many ways, FreeBSD has always been the operating system that GNU/Linux®-based operating systems should have been. It runs on out-of-date Intel machines and 64-bit AMD chips, and it serves terabytes of files a day on some of the largest file servers on earth."

Here's the historic traffic chart (excluding automated bug and check-in messages):