Our SSE libraries appear to perform approximately 300% (4 times as fast) more efficiently (in terms of time required) than those written in C. This speedup comes at a slight cost to precision. Since SSE works on floats, not doubles, our functions cast the incoming doubles down to floats and back up to doubles when computations are done. The precision cost is small (after 50,000 operations, the difference is approximately 0.544%), and, we believe, worth it, given the speedup. In addition, it is unlikely that any rendering application would require so many operations on the same matrices.
# Operations Performed | Time to perform in C (ms) | Time to perform in SSE (ms) |
2,000 | 5 | 1 |
3,000 | 7 | 2 |
5,000 | 13 | 3 |
8,000 | 20 | 5 |
20,000 | 50 | 12 |
1,000,000 | 2,542 | 639 |
Another cost of SSE's speedup is scalability. Because of the optimizations made for 4x4 matrices, this library could not easily be modified to deal with other sized matrices. Luckily, most graphical rendering applications seem to use only 3x3 or 4x4 matrices, which are well-supported.
Back to main