+ /*
+ * The carry for each operation fits on 32 bits:
+ * d[v+1] <= 2^31-1
+ * xu*y[v+1] <= (2^31-1)*(2^31-1)
+ * f*m[v+1] <= (2^31-1)*(2^31-1)
+ * r <= 2^32-1
+ * (2^31-1) + 2*(2^31-1)*(2^31-1) + (2^32-1) = 2^63 - 2^31
+ * After division by 2^31, the new r is then at most 2^32-1
+ *
+ * Using a 32-bit carry has performance benefits on 32-bit
+ * systems; however, on 64-bit architectures, we prefer to
+ * keep the carry (r) in a 64-bit register, thus avoiding some
+ * "clear high bits" operations.
+ */