For S&T demo presenters:
S&T demo presenters advertise the demo in the teaser session on 30 September by 1 minute presentation.
S&T demo presenters are asked to show their demos every day in the demo sessions. But they may take a break in sessions flexibly.
Please prepare your demonstration style presentation by following Guidelines.
MASH'D demo presenters:
MASH'D demo presenters advertise the demo in the MASH’D teaser session on 1 October by 1 minute presentation.
MASH'D demo presenters must show their demos in the demo session on 1 October. But they may show demos in the demo sessions on other days as well.
Please prepare your demonstration style presentation by following Guidelines.
Demo deployment in Room 404-406, Room 413-414, 4A, 4B is here.
Coloring books capture the imagination of children and provide them with one of their earliest opportunities for creative expression. However, given the proliferation and popularity of digital devices, real-world activities like coloring can seem unexciting, and children become less engaged in them. Augmented reality holds unique potential to impact this situation by providing a bridge between real-world activities and digital enhancements. In this paper, we present an augmented reality coloring book App in which children color characters in a printed coloring book and inspect their work using a mobile device. The drawing is detected and tracked, and the video stream is augmented with an animated 3-D version of the character that is textured according to the child’s coloring. This is possible thanks to several novel technical contributions. We present a texturing process that applies the captured texture from a 2-D colored drawing to both the visible and occluded regions of a 3-D character in real time. We develop a deformable surface tracking method designed for colored drawings that uses a new outlier rejection algorithm for real-time tracking and surface deformation recovery. We present a content creation pipeline to efficiently create the 2-D and 3-D content. And, finally, we validate our work with two user studies that examine the quality of our texturing algorithm and the overall App experience.
In this work, we present a new automatic system for scene reconstruction, which delivers high- level structural models. We start with identifying planar regions in depth images obtained with a SLAM system. Our main contribution is an approach which identifies constraints such as incidence and orthogonality of planar surfaces and uses them in an incremental optimization framework to extract high-level structural models. The result is a manifold mesh with a low number of polygons, immediately useful in many Augmented Reality applications.
Volumetric methods provide efficient, flexible and simple ways of integrating multiple depth images into a full 3D model. They provide dense and photorealistic 3D reconstructions, and parallelised implementations on GPUs achieve real-time performance on modern graphics hardware. To run such methods on mobile devices, providing users with freedom of movement and instantaneous reconstruction feedback, remains challenging however. In this paper we present a range of modifications to existing volumetric integration methods based on voxel block hashing, considerably improving their performance and making them applicable to tablet computer applications. We present (i) optimisations for the basic data structure, and its allocation and integration; (ii) a highly optimised raycasting pipeline; and (iii) extensions to the camera tracker to incorporate IMU data. In total, our system thus achieves frame rates up 43 Hz on a Nvidia Shield Tablet and 820 Hz on a Nvidia GTX Titan X GPU, or even beyond 1 kHz without visualisation.
VWe present the first pipeline for real-time volumetric surface reconstruction and dense 6DoF camera tracking running purely on standard, off-the-shelf mobile phones. Using only the embedded RGB camera, our system allows users to scan objects of varying shape, size, and appearance in seconds, with real-time feedback during the capture process. Unlike existing state of the art methods, which produce only point-based 3D models on the phone, or require cloud-based processing, our hybrid GPU/CPU pipeline is unique in that it creates a connected 3D surface model directly on the device at 25Hz. In each frame, we perform dense 6DoF tracking, which continuously registers the RGB input to the incrementally built 3D model, minimizing a noise aware photoconsistency error metric. This is followed by efficient key-frame selection, and dense per-frame stereo matching. These depth maps are fused volumetrically using a method akin to KinectFusion, producing compelling surface models. For each frame, the implicit surface is extracted for live user feedback and pose estimation. We demonstrate scans of a variety of objects, and compare to a Kinect-based baseline, showing on average ~1.5cm error. We qualitatively compare to a state of the art point-based mobile phone method, demonstrating an order of magnitude faster scanning times, and fully connected surface models.
In the last few years, the advancement of head mounted display technology and optics has opened up many new possibilities for the field of Augmented Reality. However, many commercial and prototype systems often have a single display modality, fixed field of view, or inflexible form factor. In this paper, we introduce Modular Augmented Reality (ModulAR), a hardware and software framework designed to improve flexibility and hands-free control of video see-through augmented reality displays and augmentative functionality. To accomplish this goal, we introduce the use of integrated eye tracking for on-demand control of vision augmentations such as optical zoom or field of view expansion. Physical modification of the device’s configuration can be accomplished on the fly using interchangeable camera-lens modules that provide different types of vision enhancements. We implement and test functionality for several primary configurations using telescopic and fisheye camera-lens systems, though many other customizations are possible. We also implement a number of eye-based interactions in order to engage and control the vision augmentations in real time, and explore different methods for merging streams of augmented vision into the user’s normal field of view. In a series of experiments, we conduct an in depth analysis of visual acuity and head and eye movement during search and recognition tasks. Results show that methods with larger field of view that utilize binary on/off and gradual zoom mechanisms outperform snapshot and sub-windowed methods and that type of eye engagement has little effect on performance.
We present a method which can quickly and robustly match 2D and 3D point patterns based on their sole spatial distribution, but it can also handle other cues if available. This method can be easily adapted to many transformations such as similarity transformations in 2D/3D, and affine and perspective transformations in 2D. It is based on local geometric consensus among several local matchings and a refinement scheme. We provide two implementations of this general scheme, one for the 2D homography case (which can be used for marker or image tracking) and one for the 3D similarity case. We demonstrate the robustness and speed performance of our proposal on both synthetic and real images and show that our method can be used to augment any (textured/textureless) planar objects but also 3D objects.
We present a natural gesture interface for ambient-objects using a wearable RGB-D sensor. The aim of this work is to propose a methodology that determines accurately where a user is pointing at when gesturing with their finger. First, the wearable RGB-D sensor is affixed around the user forehead. A calibration between the user’s eyes and the RGB-D camera is performed by having the user move their fingers along their line of sight. We detect the fingertip in the depth camera and then find the direction of the line of sight. Finally we estimate where the user is pointing at in the RGB image in different scenarios with a depth map, a detected object and a controlled virtual element. To validate our methods, we perform a point-to-screen experiment. Results demonstrate that when a user is interacting with a display up to 1.5 meters away, our natural gesture interface has an average error of 2.1cm. In conclusion, the presented technique is a viable option for a reliable user interaction.
In this extended abstract, we present a model that aims to provide developers with an extensive and extensible set of context-aware interaction techniques, greatly facilitating the creation of meaningful AR-based user experiences. To provide a complete view of the model, we detail the different aspects that form its theoretical foundations, while also discussing several considerations for its correct implementation.
An important yet unsolved problem in computer vision and Augmented Reality (AR) is to compute the 3D shape of nonrigid objects from live RGB videos. When the object's shape is provided in a rest pose, this is the Shape-from-Template (SfT) problem. We present a general framework for realtime SfT. This handles generic objects, complex deformations and most of the difficulties present in real imaging conditions. Achieving this has required new solutions to two core sub-problems in SfT: robust registration and fast 3D shape inference. For registration we propose Deformable Render-based Block Matching (DRBM), which is a tracking-based solution that combines the advantages of feature-based and direct approaches without their main disadvantages. Shape inference is achieved by solving a single sparse linear least squares system for each frame, which is done quickly with a Geometric Multi-Grid method. On a standard desktop PC we archive up to 21fps depending on the object. Code will be released to the community.
In this work, we propose a multi-user system for tracking and mapping, which accommodates mobile clients with different capabilities, mediated by a server capable of providing real-time structure from motion. Clients share their observations of the scene according to their individual capabilities. This can involve only keyframe tracking, but also mapping and map densification, if more computational resources are available. Our contribution is a system architecture that lets heterogeneous clients contribute to a collaborative mapping effort, without prescribing fixed capabilities for the client devices. We investigate the implications that the clients' capabilities have on the collaborative reconstruction effort and its use for AR applications.
In this paper we propose an adaptive Augmented Reality interface for general hand gestures based on a probabilistic model. The proposed interface provides multiple interfaces and the corresponding gesture inputs by recognizing a context of the hand shape which requires the accurate recognition of static and dynamic hand states. For the accuracy, we present a hand representation that is robust to the hand shape variation, and the extraction of hand features based on the fingertip posteriors from a GMM model. Experimental results show that both context-sensitivity and accurate hand gesture recognition are achieved throughout the quantitative evaluation and its implementation as a three-in-one virtual interface.
Securing one’s personal space is quite important in leading a comfortable social life. However, it is difficult to maintain an appropriate interpersonal distance all the time. Therefore, we propose an interpersonal distance control system with a video see-through system, consisting of a head-mounted display (HMD), depth sensor, and RGB camera. The proposed system controls the interpersonal distance by changing the size of the person in the HMD view. In this paper, we describe the proposed system and conduct an experiment to confirm the capability of the proposed system. Finally, we show and discuss the results of the experiment.
The ideal AR x-ray vision should enable users to clearly observe and grasp not only occludees, but also occluders. We propose a novel selective visualization method of both occludee and occluder layers with dynamic opacity depending on the user's gaze depth. Using the gaze depth as a trigger to select the layers has a essential advantage over using other gestures or spoken commands in the sense of avoiding collision between user's intentional commands and unintentional actions. Our experiment by a visual paired-comparison task shows that our method has achieved a 20% higher success rate, and significantly reduced 30% of the average task completion time than a non-selective method using a constant and half transparency.
For a digital clay modeling, AR technique is applied with highly cost-performed devices; web camera, print-out markers, and a pair of chopsticks, or "hashi" in Japanese. A user can build up a particle-based model by HASHI from scratch.
Historical inquiry involves investigating compelling questions by analyzing historical sources to construct evidence-based accounts of the past. However, teaching students to do history is challenging. This paper discusses the design of CI-Spy, a mobile augmented reality system that explicitly teaches inquiry strategies and engages students to practice the doing of history in an augmented real-world context. As a case study for the design of the application, we de- signed and embedded multiple augmented reality activities within an instructional unit using a local historic site (the Christiansburg Institute, or CI). We conducted a pilot study with elementary students to learn how and to what extent AR technologies can sup- port learning inquiry strategies and processes. After using our system, students demonstrated a greater understanding of inquiry and gained significant insight into the hidden history of CI.