Stochastic Optimization for x86 Binaries

01:03:55

0 views

Published January 14, 2015

About this talk

Google Tech Talks January 12, 2015 (more info below) ABSTRACT The optimization of short sequences of loop-free fixed-point x86_64 code sequences is an important problem in high-performance computing. Unfortunately, the competing constraints of transformation correctness and performance improvement often force even special purpose compilers to produce sub-optimal code. We show that by encoding these constraints as terms in a cost function, and using a Markov Chain Monte Carlo sampler to rapidly explore the space of all possible programs, we are able to generate aggressively optimized versions of a given target program. Beginning from binaries compiled by gcc -O0, we are able to produce provably correct code sequences that either match or outperform the code produced by gcc -O3, and in some cases expert hand-written assembly. Because most high-performance applications contain floating-point computations, we extend our technique to this domain and show a novel approach to trading full floating-point precision for further increases in performance. We demonstrate the ability to generate reduced precision implementations of Intel's handwritten C numerics library that are up to six times faster than the original code, and achieve end-to-end speedups of over 30% on a direct numeric simulation and a ray tracer. Because optimizations that contain floating-point computations are not amenable to formal verification using the state of the art, we present a technique for characterizing maximum error and providing strong evidence for correctness. Publication list: http://cs.stanford.edu/people/eschkufz/ Github: https://github.com/eschkufz/stoke-release About the speaker Eric Schkufza is a postdoctoral scholar at Stanford University working with professor Alex Aiken. He graduated from Stanford University with a PhD in computer science in June 2014. He is interested in applying stochastic search techniques to the design of optimizing compilers.