Using CLANG/LLVM Vectorization to Generate Mixed Precision Source Code

At Supercomputing 2015, NVIDIA announced Jetson TX1, a mobile supercomputer, offering up to 1 TFLOPs of compute power for a power envelope typical of embed-ded devices. Targeting image processing and deep learning, this platform is the first available to natively expose mixed precision instructions. However, the new mixed precision unit requires that operations on 16-bit precision floating points are done in pairs. Hence, approaching peak performance level requires usage of the half2 type which pairs two values in a single register.
In this work, we present an approach that makes use of existing vectorization tool developed for CPU code optimization to further generate CUDA source code that uses half2 intrinsic functions, hence enabling mixed precision hardware usage with little effort. Using this approach, we are able to generate efficient CUDA code from a single scalar version of the code.
This source to source code translation may be used in many application fields for different numeric types. Moreover, this approach shows very nice boundary effects such as better memory access pattern and instruction level parallelism.

Lancer de rayon dans un octree. Utilisation de la connectivité via les coordonnées de Plüker.

Together with Régis Portalez, we proposed an optimized algorithm for octree ray traversal. The algorithm based on the expression of rays in Plücker coordinates reduces mathematical operations required to calculate ray-box intersections.

This work was presented at Journées Reims Image in november 2014 : http://reimsimage2014.univ-reims.fr/afig-2014/

This work has been archived in HAL : https://hal.archives-ouvertes.fr/hal-01281450

Altimesh Hybridizer

GPU computing performance and capabilities have improved at an unprecedented pace. CUDA dramatically reduced the learning curve to GPU usage for general purpose computing. The Hybridizer takes a step further in enabling GPUs in other development ecosystems (C#, java, dot net) and execution platforms (Linux, Windows, Excel). Transforming dot net binaries into CUDA source code, the Hybridizer is your in house GPU guru. With a growing number of features including virtual functions, generics and more, the Hybridizer also offers number of coding features for the multi- and many-core architectures, while making use of advanced optimization features like AVX and ILP.

The solution and its features have been presented several times. Here are some events and presentations.

Rapid Visualization of Large Point-Based Surfaces

In this work we propose a visually high quality rendering technique of large point data-sets. From a large point data set, we extract a coarse mesh, build patches and project input samples on those patches. The input samples are then used to generate a normal map of these patches enabling rendering at high framerate with high visual quality.

This work has been published at the 6th International Symposium on Virtual Reality, Archaeology and Cultural Heritage (VAST) 2005. http://dx.doi.org/10.2312/VAST/VAST05/075-082

A Point-Based Approach for Capture, Display and Illustration of Very Complex Archeological Artefacts

During this project, cultural heritage artifacts have been scanned using range scanners. For archeologists to document the findings the 3D artifacts are re-projected in 2D with unfolding of a cylinder. In this work we propose and implemented a sky-dome based approach to calculate a lighting that highlights details on these unfolding images.

Unfolding of a column virtually rebuild from fragments carefully placed by archeologists.

INRIA Research Report : https://hal.inria.fr/inria-00606750

French Publication : https://hal.inria.fr/inria-00606751

Eurographics DL : http://dx.doi.org/10.2312/VAST/VAST04/105-114

Flexible point-based rendering on mobile devices

In this work we developed data structures and rendering techniques for point data-sets with the contraints of mobile devices. We studied the cost of data structure with attributes attached to point samples, and a one-pass shadow algorithm enabled by this data structure.

This work has been archived as a technical report and published in a special issue of IEEE Computer Graphics and Applications.

HAL deposit of tech report – https://hal.inria.fr/inria-00071753/ – May 2003

IEEE-Explore – https://ieeexplore.ieee.org/document/1310212

Published in IEEE Computer Graphics and Applications ( Volume: 24 , Issue: 4 , July-Aug. 2004 )

Robust Higher-Order Filtering of Points

In this work, we present the benefit of using second-order surface approximations for filtering point data-sets with normals. Point data from multiple range scanner acquisitions suffer a registration noise due to approximations in registrations. This approach aims at reducing these position errors, while using normal information with second order polynomial surfaces. Pushed further, this filtering technique can push curvature of the surface on edges resulting in a edgy surface.

INRIA Research Report : https://hal.inria.fr/inria-00071424