On Sun, Mar 28, 2010 at 09:07:05AM -0700, BGB / cr88192 wrote:
Yes, it is. That's why I posted it. I am sure the results I got aren't
directly comparable to yours, and the different CPU is one of the reasons.
Oh, I see. I misunderstood you there. I thought you would be measuring
bytes of output, because your input likely wouldn't be the same size for
textual input vs. binary input.
Of course, that makes the MB/s figures we got completely incomparable.
I can't produce MB/s of input assembly code for my measurements, because,
in my case, there is no assembly code being used as input.
Right. I could, of course, come up with some assembly code corresponding to
the instructions that I'm generating, but I don't see much point to that.
First of all, the size would vary based on how you wrote the assembly code,
and, secondly, I'm not actually processing the assembly code at all, so
I don't think the numbers would be meaningful even as an approximation.
Clearly, with the numbers being so different. :-) The point of posting these
numbers wasn't so much to show that the same thing you are doing can be
done in fewer instructions, but rather to give an idea of how much time
the generation of executable code costs using Alchemist. This is basically
the transition from "I know which instruction I want and which operands
I want to pass to it" to "I have the instruction at this address in memory".
In particular, Alchemist does _not_ parse assembly code, perform I/O,
have a concept of labels, or decide what kind of jump instruction you need.
Right. The one I am talking about is at http://libalchemist.sourceforge.net/
Right, I forgot to mention my compiler settings. The results I posted
are using gcc 4.4.1-4ubuntu9, with -march-native -pipe -Wall -s -O3
-fPIC. So that's with quite a lot of optimization, although the code for
Alchemist hasn't been optimized for performance at all.
I expect that this may be costly, especially with debug settings enabled.
Alchemist doesn't make a function call for each byte emitted and doesn't
automatically expand the buffer, but it does perform a range check.
Right. Alchemist doesn't know anything about object file formats. It just
gives you the raw machine code.
That may be a major difference, too. Alchemist has different functions for
emitting different kinds of instruction. For reference, the code that
emits the "or ebx,byte +0x2a" instruction above looks like this:
/* or ebx, 42 */
n += cg_x86_emit_reg32_imm8_instr(code + n,
sizeof(code) - n,
There are other functions for emitting code, with names like
cg_x86_emit_reg32_reg32_instr, cg_x86_emit_imm8_instr, etc.
Each of these functions contains a switch statement that looks at the
operation (an int) and then calls an instruction-format-specific function,
substituting the actual x86 opcode for the symbolic constant. A similar
scheme is used to translate the symbolic constant for a register name to
an actual x86 register code.
You can take a look at
for all the gory details, if you like.