This thesis discusses three major issues that arise in the context of non-sequential usage of multimedia content, i.e. a usage, where users only access content that is interesting for them. These issues are (1) semantically meaningful segmentation of videos, (2) composition of new video streams with content from different sources and (3) non-sequential presentation of multimedia content.

A semantically meaningful segmentation of videos can be achieved by partitioning a video into scenes. This thesis gives a comprehensive survey of scene segmentation approaches, which were published in the last decade. The presented approaches are categorized based on the underlying mechanisms used for the segmentation. The characteristics that are common for each category as well as the strengths and weaknesses of the presented algorithms are stated. Additionally, an own scene segmentation approach for sports videos with special properties is introduced. Scenes are extracted based on recurring patterns in the motion information of a video stream.

Furthermore, different approaches in the context of real-life events are presented for the composition of new video streams based on content from multiple sources. Community-contributed photos and videos are used to generate video summaries of social events. The evaluation shows that by using content provided by a crowd of people a new and richer view of an event can be created. This thesis introduces a new concept for this emerging view, which is called ``The Vision of Crowds''.

The presentation of such newly, composed video streams is described with a simple but powerful formalism. It provides a great flexibility in defining the temporal and spatial arrangement of content. Additionally, a video browsing application for the hierarchical, non-sequential exploration of video content is introduced. It is able to interpret the formal description of compositions and can be adapted for different purposes with plug-ins.

