Aqua Phoenix
     >>  Research >>  Lecture Video Browser  


1. Introduction

The Lecture Browser is a tool developed as part of research into indexing and browsing visual lecture content. In this research, we establish content classification and clustering approaches for keyframe images captured from lecture videos. These approaches are based on the observed implicit structure in lectures, which we devide into two classes: 1. use of a limited number of interactions between instructor and students/camera (called "media types"), and 2. slowly progressing content that can be grouped into few units (called "topics"). In developing a proof-of-concept tool, we perform the classification and clustering tasks manually. In separate research (see Lecture Indexer), we introduce and test automatic classification and clustering methods.

1.1 CVN (Columbia Video Network)

CVN records audio and video for selected courses taught at Columbia University. The recorded information is then sent in the form of VHS videotapes to students participating in distance learning, and it is also made available as streaming media for students taking the course on-campus. While recording video, a separate program processes the video data and selects key frames that best summarize the video information for a certain period of time. For a 75 minute class, the process tends to generate approximately 210 key frames on average. These key frames are a useful linear visual index into the linear video stream. So far, CVN makes this information available through a web interface by enumerating the periodically captured key frames as text links.

Figure 1.1: CVN Interface to Class Videos and Key Frames
Click image to enlarge, or click here to open
The top left portion of the interface allows the user to view the streaming video and audio data. The upper right portion displays the key frame closest to the position in the streaming video. The lower portion of the interface displays the temporally ordered collection of key frames.

1.2 Problem Background

The present video player interface reflects a linear collection of slides that can be viewed one-at-a-time through mouse-clicks. To located specific video content, the user either reviews the entire video through video playback, or clicks through the available slides. With video content from a complete university course, taught twice a week for 13 weeks, the user is faced with 26 * 75 minutes of video (1950 minutes = 32.5 hours) or 26 * 210 slides (5460 slides).

Even though an improved interface can eliminate the need to click through every key frame, the collection of temporally ordered images remains linear, which poses a common problem for search. However, most well-organized courses follow known trends for their content: 1. content is structured around themes or topics, and 2. content develops slowly, meaning that a topic or sections of a topic tend to span several minutes. Over the course of a topic, several key frames are captured from the video, on average at the rate of three per minute. The collection of key frames for a topic tends to be highly redundant, as it contains slight variants of the same content. We are able to take advantage of the implicit course structure and the redundancy of key frames over time to cluster key frames containing similar content into virtual topics. The compressed collection of key frames into topics poses a significantly smaller search space for the user.