I seemed to have gotten around a 2x speedup.  However, the values
varried substantially from test to test (generally about 3 seconds apart)
most likely from the fact that I didn't have sole ownership of Pollux
during this time...   WOrking on this last problem!

the tst file has some sample runs...if you wish to run it
yourself:

To compile:  _make        // which will create the a.out binary.

To run the prog:  a.out   // runs w/out threads, and w/out printing the
			  // resulting matrix

You can add these command line args:  DT   //do threads
				      DD   //do display (of result matrix)


The matrices are 100x100, and the vales of a and b range widely, so don't
expect the resulting matrix to make any sense.

Ben
