## Efficient 4×4 matrix vector multiplication with SSE: horizontal add and dot product – what’s the point?

I am trying to find the most efficient implementation of 4×4 matrix (M) multiplication with a vector (u) using SSE. I mean Mu = v.

As far as I understand there are two primary ways to go about …