Matrix and Vector Transformations with SSE - Results and Conclusions

Our SSE libraries appear to perform approximately 300% (4 times as fast) more efficiently (in terms of time required) than those written in C. This speedup comes at a slight cost to precision. Since SSE works on floats, not doubles, our functions cast the incoming doubles down to floats and back up to doubles when computations are done. The precision cost is small (after 50,000 operations, the difference is approximately 0.544%), and, we believe, worth it, given the speedup. In addition, it is unlikely that any rendering application would require so many operations on the same matrices.

# Operations Performed Time to perform in C (ms) Time to perform in SSE (ms)

2,000 5 1

3,000 7 2

5,000 13 3

8,000 20 5

20,000 50 12

1,000,000 2,542 639

It should be noted, however, that there was significant inaccuracy in the SSE output of the last trial, presumably due to accumulated rounding and casting errors.

Another cost of SSE's speedup is scalability. Because of the optimizations made for 4x4 matrices, this library could not easily be modified to deal with other sized matrices. Luckily, most graphical rendering applications seem to use only 3x3 or 4x4 matrices, which are well-supported.

Back to main

# Operations Performed	Time to perform in C (ms)	Time to perform in SSE (ms)
2,000	5	1
3,000	7	2
5,000	13	3
8,000	20	5
20,000	50	12
1,000,000	2,542	639