What is Toluu?
Toluu is a free service for sharing the feeds you read and discovering new ones.
Get Invite

Neoformix

Discovering and Illustrating Patterns in Data


MacWorld Clustered Word CloudToday

Macworld has been attracting a lot of attention the last few days. I've taken the last 10,000 tweets that mention it and created a Clustered Word Cloud. The primary themes of the conference do seem to emerge from the cloud.

cwc_MacWorld2009.png
World News Clustered Word CloudToday

The graphic below shows a Clustered Word Cloud for the world news headlines from 2008. As in my last post, the data comes from the Toronto Star so it comes from a Canadian perspective. Several groups of keywords bear this out including the second largest (in red) which shows there was a lot of coverage about Canadian soldiers killed or injured in southern Afghanistan. The largest cluster by far (light blue) shows that the US presidential campaign received a lot of coverage. The automated clustering did produce the unusual grouping of 'Korea' with 'Carolina', 'primary', and 'victory'. They were linked through frequent use of 'North' and 'South' as in 'North Korea' and 'North Carolina'.

By grouping related words this technique does a much better job of summarizing the most covered international events than the Streamgraph representation. However, in order to do so it sacrifices any attempt at showing the distribution of events over time. Perhaps some combination of these two ideas would be fruitful.

cwc_WorldNews2008.png
World News StreamgraphYesterday

Now that 2008 is over I've been thinking about looking at some datasets for the year. One that I have started to explore is a set of world news headlines from my local paper, the Toronto Star. I used some great information I found in here that shows how to use Google Reader to get the latest RSS entries from any feed. The dataset includes 1311 stories and I looked at both the title and summary text for this analysis.

The image shows two StreamGraphs. The top one in red shows the most common capitolized words and when they appeared during the year. The blue StreamGraph shows the popular non-capitolized words over the same time period. The graphic seems to do a reasonable job showing the primary international news events of the year:

  • Obama throughout most of the year with coverage peaking at election time
  • Wall between Gaza and Egypt in early 2008
  • Tibet in March
  • NATO, Mugabe in March/April
  • China, Burma, cyclone, quake, aid around May
  • Georgia,Russia,Hurricane Gustav in August
  • India,Mumbai, and Pakistan in late November
  • Gaza and Israel again at the end of the year
Click on the image to see a larger version

Thank You and Happy New Year!December 31 2008

Thank you all for your attention to Neoformix during 2008. This weblog primarily showcases my own work and it is gratifying to see how many people are interested. I am excited about the possibilities of the coming year. Best wishes to all of you in 2009 !

Sincerely,

Jeff Clark

Jeff8.png
Neoformix Review 2008December 31 2008

I think it's natural at the end of the year to look back over the previous 12 months and assess what was accomplished. This post is my attempt to summarize what I think my key contributions were this past year on Neoformix. They aren't necessarily the most popular posts and are ordered chronologically rather than by any notion of importance. I hope this proves useful to those of you who are new to Neoformix or just want a quick review of the key ideas presented during 2008.

I would also like to mention here that many of these ideas were inspired by or build upon the work of other people. I have tried my best in the original posts to give credit where it was due. Feel free to contact me at any time if you think I have forgotten someone.

DiggTrends.png Digg Trends is an interactive tool that shows the trends in word usage over time and word associations for stories that reached popular status on Digg.