Multimedia Data Analysis on a Massively Distributed Parallelization Network of Anonymous Web Clients

By Jeremy Hall – 2nd Year Master Student – Keio University

Professor Kiyoki's Laboratory

February 25th 2010

Overview

This semester I focused on the idea of forming a cloud video analysis resource from the combined power of clients browsing a website. This falls under the realm of “distributed and parallel computing”, but also “video analysis”. I submitted my research to two different conferences, the IEEE MAW10 (Mining and Web 2010) conference:

http://www.ece.uvic.ca/~kinli/MAW10/

Titled: "Multimedia Data Analysis on a Massively Distributed Parallelization Network of Anonymous Web Clients"

and the ACM SOCC (Symposium on Cloud Computing) conference:

http://research.microsoft.com/en-us/um/redmond/events/socc2010/

Titled: "A Reverse-Cloud Distributed Video Analysis System with Autonomous Client-Driven Task Allocation"

My first paper was accepted by the MAW10 conference. The second paper was not accepted, however I plan to revise it, and resubmit it to another international conference.

Research Abstract

The following is the abstract I used for the ACM SOCC conference paper, and nicely summarizes the research I engaged in:

“This paper proposes a distributed content-based video analysis system that utilizes the combined computing power of anonymous web clients – those users anonymously browsing a website – as a cloud computing resource on which to perform content-based video analysis. Our goal is to enable a new, scalable video-analysis resource on which organizations can perform more complex video analysis in order to improve video retrieval accuracy and relevance for their users without significantly impacting the costs of performing video analysis. The unique feature of this system, called a “reverse-cloud”, uses autonomous client-driven organization mechanisms that enable us to exploit large numbers of anonymous web clients, seamlessly harnessing their power as they connect while being able to ignore the complexity of actively managing such a volatile resource. The following are the two key technologies of this architecture: 1) a dual-channel communications protocol, which defines a logical communications model to use between servers and anonymous web clients, and 2) a task allocation algorithm featuring client-driven organization mechanisms based on random selection for enabling stateless, implicit communications. We have created a prototype implementation by mapping a computationally and IO intensive video analysis algorithm onto our architecture so that we can evaluate the effectiveness and utility of our proposed architecture.”

The research focused on how well anonymous web clients could build the dendrogram structure on the left. Through experimental results gathered from a prototype implementation, we found that the combined power of YouTube's clients would be more than enough to build this structure (which is important for performing content-based video retrieval) for all videos uploaded to YouTube. Not only that, but offloading such work to anonymous web clients would reduce the processing power required to create such a structure by a factor of 900 (a substantial reduction).

Concept of a Reverse-Cloud & Motivating Examples

Our research goal is to assemble anonymous clients browsing a website into a usable video analysis resource. More specifically, an existing user-submitted video service should be able to use this analysis resource to offset their in-house or rented server costs. A common cloud computing model is a black box of server resources which are accessed as a service through an API. Those server resources are dedicated to performing server-side functions. We wish to emphasize the difference of our system as compared to traditional cloud computing by referring to our architecture as a reverse-cloud architecture. This refers to how clients using an online service are themselves harnessed as a cloud resource.

A reverse-cloud architecture has the following three advantages: 1) Scalability: the system utilizes clients for processing video data, therefore as more clients connect to use an online service, the system automatically gains processing capacity. 2) Continuous Performance Improvement: As client hardware and browser software improve on users’ machines[6-7], the system’s overall processing capacity increases. 3) Compatibility: The architecture works over traditional HTTP infrastructure by reusing traditional resources in new ways.

In contrast to a reverse-cloud architecture’s scalability, with traditional server farms or rented virtual machines, increased demand results in increased costs as additional servers are purchased or virtual machines rented. By offloading work onto anonymous web clients, the reduction in in-house server requirements also means a reduction in maintenance, space, and utilities costs. Experiment 2 (see 4.1.3) demonstrates how offloading a dendrogram-based clustering algorithm to a reverse-cloud reduces processing requirements by a factor of 900.

Future Research & Paper Submissions

I have many plans for future research. The above research has opportunities such as demonstrating the mapping of additional algorithms to this architecture, as well as investigate methods of increasing client-contribution efficiencies, mixing web-clients and servers as computational nodes, and looking into additional scaling opportunities alternative architecture implementations.

As for paper submissions, I plan on submitting to a journal paper soon, using the combined work of the previous two paper submissions as a platform from which to start writing.