Thursday, November 03, 2005

Blog data for research

One of the issues for researchers in mining blogs is to find blog data to test their algorithms and research on. It's currently difficult to find blog data since it's not publicly available or you have to ask a certain researcher to see if you can have access to them. I just read a post today about making blog data available. I just wrote a comment back to Anjo's blog (Anjo also worked with Lilia Efimova, who's prominent in the academic blogging community) proposing that there should be some forum or group or web site where people could submit their blog data traces for other researchers to use.

By doing this, we can foster greater community and help those with mining the blog data and advance the research on blogs. We could perhaps build an open-source community for sharing blog data (maybe under a Creative Commons license?). There's already something like this with the wireless community. CRAWDAD is a group that maintains wireless traces for Wi-Fi from Dartmouth College. What do people think about this?

Technorati tags: ,

No comments: