Tuesday, October 30, 2007

Jon Kleinberg CS lecture at U of T

Right now is the lecture by Jon Kleinberg, Department of Computer Science, Cornell University which is on Challenges in Mining Social Network Data: Processes, Privacy and Paradoxes. He has generated seminal results in social networks and document retrieval. I've read his research work on the HITS algorithm which uses hubs and authorities in order to classify search, and from which Google's PageRank is somewhat related to. I've never heard Jon speak so I'm very glad to hear him speak.

He will also speak tonight to kick off the Social Networking Week at U of T. What can computer science contribute to social networks? Today there is a convergence of social and technological networks, computing and information systems with intrinsic social structure. Social network data is a very active area in sociology, social psychology and anthropology. So what can the different fields learn from each other (sociology, social psychology, anthropology from computer science)? This is the research area which I am also part of as well, and it's an exciting research area in my opinion with the emergence of social networking sites like Facebook and MySpace. Mining social networks has a long history in social sciences eg. with Wayne Zachary's PhD work on the university karate club, observing social ties and rivalries. Split in the network could be explained by the minimum cut in the social network.

Social network data spans many orders of magnitude. For example there were 240 million nodes of all IM communication over one month on Microsoft Instant Messenger (Leskovec-Horvitz '07), 4.4 million nodes of declared friendships on blogging community LiveJournal (Liben-Nowell et al., 2005). How can we find the point where the lines of research in large scale and small scale networks converge? In social networks, we can find behaviours of diffusion in social networks that cascade from node to node like an epidemic, which is identified by radial structures in the graph. There have been empirical studies of diffusion in the social sciences like the spread of new agricultural and medical practices (Coleman et al., 1966). The diffusion curves are based on the probability of adopting new behaviour which depends on number of friends who have adopted (Bass 1969, Granovetter 1978 and Schelling 1978). All of the diffusion curves seem to have diminishing returns property, for example in editing a Wikipedia article (Cosley et al., 2007) and joining a LiveJournal community (Backstrom et al., 2006).

These results can then be used for general prediction. Given a network and v's position in it at t1, estimate the probability v will join a given group by t2. Kleinberg has formulated this as a probability estimation problem (Backstrom-Huttenlocher-Kleinberg-Lan 2006). Do disconnected friends or connected friends make joining more likely? Disconnected friends provide an informational advantage but connected friends provide safety/trust advantages. For example in LiveJournal, joining probability increases significantly with more connections among friends in the group (in otherwise friends that are within a clique than not).

If connectedness among friends promotes joining, do highly "clustered" groups grow more quickly? Kleinberg defines clustering to be # of triangles / # of open triads and you can determine community by examining the growth from t1 to t2 as a function of clustering. Leskovec, McGlohon, Faloutsos, Glance and Hurst (2007) have looked into the diffusion of topics in networks of news media and bloggers which shows cascading behaviour. Leskovec, Adamic and Huberman (2006) describe how incentives can be used to propagate interesting recommendations along social network links. How to push questions to people within the social network? (Kleinberg, Raghavan, 2005)

One of the most important questions in mining social network data is how to protect privacy in the dataset. There has been some research where anonymizing data actually caused problems from using on-line pseudonyms and using search engine query logs. If you are part of a small network and based on connectivity, you may be able to find yourself, so anonymization doesn't help. An attacker can attack an anonymized network by being part of the system. Kleinberg has done some work on this by creating a template (Backstrom, Dwork, Kleinberg, 2007). The idea is an attacker creates a small network of nodes through creating accounts called subgraph H and attach them to targeted nodes in the original network. From Ramsey theory, in a random n-node graph, H is unique.

Take home message: how do we build deeper models of the processes at work inside large-scale social networks? How do we make data available without compromising privacy?

It was great to finally meet and talk with Jon Kleinberg!

On Technorati: ,

Monday, October 29, 2007

Busy busy week this week

I've got a busy week ahead of me for this week. I'm giving an Ignite presentation about finding subgroups in TorCamp at DemoCampToronto15 tonight at Hart House, then have to finish marking assignments, then finish writing a paper, and then giving a talk at the Social Networking Week at U of T on Friday.

But I enjoy doing this kind of stuff, so I don't mind it. So don't expect too many blog entries this week!

Wednesday, October 24, 2007

Pervasive 2007 Conference trip report in IEEE Pervasive Computing magazine

Just found out that the Pervasive 2007 conference trip report which I helped co-author is now out in the IEEE Pervasive Computing magazine Vol. 6 No. 4 (October - December 2007). You can read the article here.

On Technorati:

Finished my CASCON short paper talk

I just finished my CASCON short paper talk on "Identifying Active Subgroups in Online Communities" about an hour ago. It went well, I had some great feedback. I'll post the talk which I recorded and slides soon. Now, I can concentrate on the rest of my PhD work. No rest for a PhD student! But I enjoy giving talks and meeting with people and discussing about my research, it's exciting and engaging. If you have any comments or feedback from my paper or talk, write some comments on this blog back to me!

Tuesday, October 23, 2007

CASCON 2007 Conference, Day 1



I just finished co-chairing a session on Tagging as a Social Contract along with Mark Chignell and Sara Darvish, where we had 4 talks about issues surrounding tagging in a business environment. This was the second session as part of the Second Working Conference on Social Computing and Business at the CASCON 2007 conference. It was a great workshop and great session, and great discussion. I talked about how community can be inferred from tagging using the YouTube vaccination videos as an example. Podcasts and slides from the workshop should be available, so check the CASCON blog.

I also showed a demo of a community-based web portal that our lab created to support vaccination groups in our exhibit at the Technology Showcase called "Video Web 2.0: Collaborative Tagging in Web Video". If you're at CASCON, check it out!

Photos from CASCON are on my Flickr account.

Friday, October 19, 2007

Google hints at social network plan

I've been wondering when Google would start to think about social networking and how it could be used. So far, Yahoo has been the leader in social networking, with Flickr and Upcoming and Yahoo My Web Beta. Google's Orkut still does not compare in the same calibre with Facebook or MySpace, huge social networking web sites. However, Google is now beginning to hint how they will use social networking data in their own web search and to share the social data with others. According to Eric Schmidt, Google's CEO, from this article from NY Times, Google has something up its sleeve.

Will Google be able to top Yahoo and other social networking sites like Facebook? Only time will tell.

Wednesday, October 17, 2007

Passed the thesis proposal!

I just did my thesis proposal today and passed! Just need to make changes and do some more analysis and I should be hopefully done before April of next year.

Saturday, October 13, 2007

Thesis proposal and busy rest of October!

I just finished writing the thesis proposal which I will send to my committee, because I will have a meeting with them on Wednesday. Hopefully, everything goes well and if everything goes according to plan, I can finish the dissertation and defence by end of this year!

It's going to be a busy rest of the October. I'm co-chairing a workshop at CASCON called Tagging as a Social Contract on Monday, October 22. If you're going to be at CASCON, sign up for this workshop, it promises to be an interesting one. For more information, check the CASCON blog. After that, I'm going to be presenting my paper at the CASCON conference on Wednesday, October 24 called "Identifying Active Subgroups in Online Communities". And then, I will be giving a talk at the Social Networking Symposium at U of T on Friday, November 2nd from 9:50 to 10:15 am called "Structural Analysis of Social Hypertext for Finding Sense of Community" right after my supervisor talks.

Monday, October 08, 2007

Married!

Yes, I just got married about a week ago, the wedding was great and the weather was just perfect. Couldn't have asked for a better day. Thanks to everyone who helped out in the wedding and for those that attended. My wife and I were so happy to see you there, and even though it was a tiring day, we thoroughly enjoyed it and will treasure this for the rest of our lives.

I'm very thankful this Thanksgiving for such a beautiful, amazing and considerate wife. And marriage life feels so great, I wouldn't trade it for anything else.

For those that are interested, I'll post wedding photos online soon.