introspection in SML

introspection in SML

Post by michele.si » Mon, 03 Dec 2007 15:57:46


I am not sure about what to do with introspection in SML. It seems
that introspection capabilities are there, since the interactive
prompt is able to print the signature of any object, but they are
somewhat not exposed to the programmer. Case in point: I was thinking
about writing a test runner (*). My idea was to collect tests in
structures; each test would have a name starting with "test" and a
signature unit->bool, returning true if the test passed and false
otherwise. The runner should be able to execute all the functions in
the structure matching the signature and with a name starting with
"test". This would be trivial to do in a language with introspection;
but in SML I don't know how to proceed. I can think of various
solutions, but I am not happy with them. One possibility is to parse
the source code with a regular expression, to find the tests and to
generate the source code for the runner, but it certainly does not
look clean; another solution is to register the test names in a list,
but that requires to add a registration call for each test and
duplicating the names, in that case I would better off just calling
the tests directly in the runner. This is disturbing, since each time
I add a test I must change the runner, and if I change the name of a
test I have to change it twice. I could go with an association
list like the following:

testList = [
("testOnePlusOne", fn () => 1+1=2),
("testTwoPlusTwo", fn () => 2+2=4)
]

bit it just looks ugly compared to

fun testOnePlusOne = 1+1=2
fun testTwoPlusTwo = 2+2=4

The most disturbing thing is that the compiler already knows all the
names
in a structure: why I cannot extract them from the signature? It looks
like a serious restriction to me (for instance, how do you write
documentation
tools without introspection, expecially if you have only the compiled
form of a library?) but maybe there is some trick I am not aware of.
Please illuminate me!


(*) I am not really going into writing it. It is just an example of
a program where I would use introspection.
 
 
 

introspection in SML

Post by Vesa Karvo » Mon, 03 Dec 2007 21:29:39

XXXX@XXXXX.COM < XXXX@XXXXX.COM > wrote:


SML doesn't directly support introspection. In cases where something like
introspection is really needed, it is usually encoded with combinators.
For example, my generic programming library

http://mlton.org/cgi-bin/viewsvn.cgi/mltonlib/trunk/com/ssh/generic/unstable/

can be seen as a form of introspection. Basically, using a set of
combinators, the programmer encodes the shapes (or structure) of types.
Generic functions, such as serialization, are then implemented based on
that encoding.


Yes, what you see is a capability of the interactive REPL. Some SML
implementations, e.g. MLton and MLKit, do not offer such a REPL.


I've gone through the same exercise.

I think that my first idea was to just have tests as side-effecting
computations at the top-level:

testThat "testOnePlusOne" (fn () => 1+1=2) ;

The side-effect are in the testThat function.

As a generalization of this approach, you can collect tests into functors,
so you can define test cases next to the artifacts you are testing

fun foo ... = ...
fun bar ... = ...

functor TestEm () = struct
val () = test "foo" (fn () => ...) ;
val () = test "bar" (fn () => ...) ;
end

and then invoke the tests later

structure ? = TestEm ()

I think I noticed this idea, using functors to collect tests, later from
MLton's source code, which uses it in some places.


I think that my second idea was also to collect tests into a structure.
However, as you have noticed, just collecting tests into a structure
doesn't get you very far.


I personally find that approach an ugly hack. It is only slightly
better than grepping through the source code for functions starting
with the same prefix. Aside from the fragility of grepping function
names, it makes tests second-class entities and makes it harder to
provide new abstractions for specifying tests.


Instead of grepping the source code, one could parse the signatures
reported by the compiler. Parsing only the signature language is much
simpler than parsing the full grammar and you would get more robust
identification of the test functions. This doesn't mean that I would
recommend this approach.


Yes, I think that any approach based on identifying tests by looking
for functions with a particular kind of name, whether by grepping
source code or grepping function names through introspection, is an
ugly hack.


Now you are finally getting to the actual issue, which has little to do
with reflection per se. What you really want is to be able to specify a
test in one place. Adding or removing an individual test should not
require you to make a corresponding change elsewhere. This is a
fundamental principle of good program organization. For example, I've
been paid to maintain a largish code base (not SML code) where one of the
earlier (lead) programmers (I think his title was "Software Architect")
had this programming pattern of adding comments with "ADDNOTE" in places
where you needed to change things when you made a particular kind of
addition (several "ADDNOTE"s per kind of change). This is, of course,
silly. It is better to organize the code so that all the logic related to
a particular kind of thing is given as a unit so you don't have to go
through the source code grepping for other places you might need to
change. So, at the office, we referred to him as 'John "AddNote"
 
 
 

introspection in SML

Post by michele.si » Mon, 03 Dec 2007 22:53:58


I don't buy your argument. A few bytes spent in docstrings are a
non-issue when we all have hundreds of gigabytes of unused
disk space. And even if some dead code was not removed by
the compiler, would it matter much, in practice? I am of the
opinion that if an user wants an optimization, it should say so
explicitely with a compiler flag: by default he should not pay
the price of the optimization.


Yes, I understand that I can define something like

datatype ('a, 'b) function_with_docstring = FUN of ('a->'b) * string

val double = FUN (fn x => 2*x, "A function doubling its argument")

and even implement a mechanism to register all the docstrings
and possibly other informations such as names and types, but
frankly this sounds to me like a job for the language, not
for the user. As I have said before, it looks absurd to me
that information which is available to the compiler (even
if absence of a REPL the compiler knows the signatures, which
is the things which are more important for a documentation
tool, even in absence of docstrings) is hidden to the user.
And what if I wanted to code an IDE for SML? It looks
impossible without resorting to help from the underlying
implementation.


I will look at the example, but since I am lazy I will ask
a question before. How are implemented registers in SML?
I would go with something like this

val registerList = ref []

fun register(name, function) =
registerList := (name, function) :: ! registerList

or with a StringMap if I wanted to be able to retrieve functions
by name. However, a register works with side effects and I am told
that functional languages should avoid side effects as much as
possible; nevertheless I don't see how to avoid mutating the
reference here.

Michele Simionato
 
 
 

introspection in SML

Post by Vesa Karvo » Tue, 04 Dec 2007 04:21:18

XXXX@XXXXX.COM < XXXX@XXXXX.COM > wrote:
[...]


I just noted that introspection *alone* is not sufficient. Most of my
comment is concerned with the disadvantages of run-time introspection, not
docstrings. I'm certainly not suggesting that introspection or docstrings
wouldn't have advantages. I'm just noting that introspection also has
costs and that many uses of introspection really aren't that difficult to
encode or are even counter productive. In particular, I think that the
approach I'm using for test specification is more expressive (users can
write their own shorthands) and less fragile and ad hoc (not dependent on
an ad hoc naming convention) than the use of introspection to grep for
functions matching a particular kind of name --- while being roughly
equally syntactically lightweight for simple uses and more concise with
user defined abbreviations.

I've never really felt the need for run-time introspection in SML. In my
experience, introspection (and stuff like aspect oriented programming)
tends to lead to fragile, non composable, hard to understand and debug, ad
hoc hacks. Introspection also conflicts (well, requires careful
consideration) with type checking, security, and abstraction, but I'm not
going to go into that (I know that there are a couple of guys here who
have written theses on related subjects).


It is not just about some dead code. As I hinted earlier, run-time
introspection makes it much harder to perform effective data
representation optimizations. Introspection also interferes with
control flow optimizations. Such optimizations are very important for
making SML programs run fast.


That is an interesting opinion. If you want to use a slow implementation,
you can choose from multiple SML implementations that do not optimize
aggressively. Some of those implementations do offer some limited forms
of introspection. Also, I think that Common Lisp would probably be a good
fit to your ideology --- you might want to take a look at it.

Really, you can't have everything. I can easily imagine, and have seen
several times in other contexts, the opposite of your argument. I've seen
people wondering why it takes forever to start an application or why
binaries are so large. Heck, I've personally had the "pleasure" of using
a few (Java) applications that were simply ridiculously slow, both at
starting up and during normal operation, even with a rather powerful
machine (this was actually quite recently).

Let me elaborate on why my preference is the opposite of yours. I love
abstraction. I do not hesitate to create even small utility functions to
eliminate repetition from my code. Often these functions are higher-order
functions. When I write applications, I continuously look for stuff that
could be moved into libraries. When designing a library, I try to make
the interface of the library as convenient to the user as possible. Often
this means that I look for ways to make the library into essentially a
domain specific language for expressing specifications for program
components. SML is quite good at this kind of programming (but I'll omit
the arguments).

The downside of programming abstractly is efficiency. Layers of
abstraction tend to have costs that accumulate (each layer adds to the
costs). This is called abstraction penalty and is a very real impediment
in some application domains and languages (or their implementations). I
u
 
 
 

introspection in SML

Post by michele.si » Tue, 04 Dec 2007 15:33:29

n Dec 2, 8:21 pm, Vesa Karvonen < XXXX@XXXXX.COM > wrote:

Let me concede you this point,
I don't feel strongly about the test runner example.


There are two usages for introspection: the "passive" usage, in which
you use introspection just for
documentation/reporting/debugging purposes,
and the "active" sense, in which you use introspection to operate on
you code,in the AOP sense. I think the "passive" usage is good in all
circumstances, and so useful that I am willing to pay a small
performance penalty for it.
OTOH, the "active" usage can often be abused, leading, as you say, to
"fragile, non composable, hard to understand and debug, ad hoc hacks".
I actually even wrote a paper about metaclass abuse in Python
http://www.phyast.pitt.edu/~micheles/python/classinitializer.html
and I tend to avoid these tricks in production code. Nevertheless,
they
have their usages *a debugging and testing tools*.
For instance, just last week at work I was testing our logging
framework.
We essentially have a class with many methods doing many things, and
a runner methods that calls the other methods by adding logging
capabilities
(each method has its own logger instance). Since I wanted to test the
framework
and not the methods, I just wrote 5-6 lines of code to replace (at
runtime)
all methods with dummy methods, and I was done. In particular, I did
not
need to subclass the original class and to override all methods, and I
avoided the need to change the test every time I add a new method
to the original class.
So, I think the "active" usage of introspection is good too, but it
should
be used with care; it would be acceptable to me if it was disabled by
default
and if it required a compiler switch.


This is interesting, if you have a reference, please let me know.


Uhm ... examples?


Yeah, you hit the nail on the head. Even if I have no practical
experience with Common Lisp (I think I played with Slime for a weekend
or so a few years ago) I am very used to the Lisp way of incremental
development, having programmed both in Scheme and in Python. This is
why I am not considering MLton: I simply cannot live without a REPL.
The ability to define new functions in a running program and to see
their effect immediately is a big productivity boost, especially when
you are developing user interfaces. For instance, at work, among other
things, I am developing web applications using Paste (a Python
library). Paste monitors all the modules imported by the program (and
by "all" I mean *all* modules, even the ones imported indirectly; in a
Web application this can easily mean hundreds of different modules)
and when I change any of them it reloads it, so that I can immediately
see the changes on my browser. Of course this has a performance
penalty and it is disabled in production, but when developing I must
say that I am not perceiving any penalty: the reload works
instantaneously
and seamlessly (actually Python reload has its own shortcomings,
but in practice it seems to work pretty well with Paste). I am sure
Lisp frameworks can do the same, as Smalltalk can do the same,
and I think even Ruby. Probably Alice can do the same in the ML
world, but I have yet to try.
Notice the big difference between a live system, in which I can even
introspect just entered functions which are not yet saved in the
file system, and a static documentation tool grepping the source
code. There is no comparison
 
 
 

introspection in SML

Post by Jon Harro » Wed, 05 Dec 2007 17:44:08


Introspection can be very useful and is the right tool for many jobs and, as
you have discovered, is basically completely absent from SML.

I would say that your options are basically OCaml or F#.

OCaml has a macro system that makes it easy to extend the syntax of the
language (e.g. with test augmentations) and manipulate its ASTs _prior_ to
batch compilation. You might like to look at some of the unit testing
frameworks for OCaml.

F# is far more advanced and more useful because it carries relevant
information at run-time, so you can dynamically determine the members of an
object and invoke them.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.yqcomputer.com/
 
 
 

introspection in SML

Post by Jon Harro » Wed, 05 Dec 2007 17:47:26


Common Lisp often nothing like as fast as ML and, therefore, does not show
that. However, I agree that there is no reason why an ML couldn't provide
all of these features as well as the best performance. F# is best
positioned to do this, hopefully its performance will improve a lot in the
near future now that it has been productized.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.yqcomputer.com/
 
 
 

introspection in SML

Post by Jon Harro » Wed, 05 Dec 2007 18:13:44


With no evidence that these "disadvantages of run-time introspection" exist,
that is just FUD. In reality, encoding them manually quickly gets
prohibitively difficult and this is precisely why industrial-strength
language implementations like OCaml and F# provide tools to do it for you.


With no evidence that SML programs run faster than equivalents without
benefit of those optimizations (e.g. OCaml, F#), that is also FUD. In
reality, good performance requires a good code generator and no SML
compiler has one for current generation CPUs.


That is clearly FUD: the problems you cite are with Java and don't even have
anything to do with introspection. Both F# and Lisp are counterexamples.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.yqcomputer.com/
 
 
 

introspection in SML

Post by Vesa Karvo » Wed, 05 Dec 2007 19:04:33


[...]

Here is my assessment of the OCaml test frameworks I found easily:

OUnit ( http://www.yqcomputer.com/ ~mmzeeman/ocaml/ounit-doc/OUnit.html):

Only supports xUnit style testing. Rather simplistic interface.
Doesn't match the features and convenience of my testing framework.

TestSimple ( http://www.yqcomputer.com/ ):

Seems to be a very simplistic assertion based (like XUnit) testing
framework (refers to some Perl testing framework that I'm not
familiar with). Output is TAP conformant, which might be a minor
advantage (e.g. if you wish to put test results on a web page, for
example). Otherwise doesn't match the functionality and convenience of
my testing framework.

Conclusion: Inferior to my SML UnitTest framework.

Fort ( http://www.yqcomputer.com/ ):

A simple assertion based testing framework. Supports some test
outcomes (notably expected failures and unexpected passes), which have
been on my TODO list (easy to do). Otherwise doesn't match the
functionality and convenience of my testing framework.

OCaml-Reins / QuickCheck (https://sourceforge.net/projects/ocaml-reins/):

Supports QuickCheck style testing. A brief look would suggest that it
uses modules (functors) to build generators. This is very verbose.
Also seems to require separate registration (listing) of test cases.
Generally seems rather verbose. Doesn't match the features and
convenience of my testing framework.

Here are some features of my testing framework:

- Supports xUnit style assertion based testing:
- Framework provides basic assertion functions.
- Assertions can produce pretty printed output. For example, the
equality assertion prints out the values that failed the equality test.
- setUp/tearDown like functionality is supported through my Extended
Basis library (no need to have it in the test framework).

- Supports QuickCheck style test:
- Generic Arbitrary random value generators can be used to implement
default test data generators concisely.
- RandomGen combinators can be used to implement customized data
generators when needed.
- Generic Pretty printing is used to print out counter examples.
- Generic Shrinking is used to minimize counter examples.
- Supports collection of data during testing.

- Designed for non-interactive testing, but can also be used from a REPL.

- Interface is designed for concise specification of tests:
- The user is not required to provide a title for each test.
- Tests are identified by an implicit index number.
- Failed test output includes a pretty printed stack trace, which
usually further helps to identify the test.
- There is no need to separately list or register tests.
- The user can easily specify new shorthand procedures for specifying
tests.

-Vesa Karvonen
 
 
 

introspection in SML

Post by michele.si » Wed, 05 Dec 2007 19:35:04


A macro facility has nothing to do with introspection: does OCaml
have introspection or not?


That's interesting, but I do not develop on Windows.

Michele Simionato
 
 
 

introspection in SML

Post by Vesa Karvo » Wed, 05 Dec 2007 19:50:40


[...]

No, it does not.

-Vesa Karvonen
 
 
 

introspection in SML

Post by Vesa Karvo » Wed, 05 Dec 2007 22:27:12

XXXX@XXXXX.COM < XXXX@XXXXX.COM > wrote:
[...]


Sure, *completely* passive usage, for stuff like debugging, seems
basically fine. But introspection isn't the only way to do such things.
The usual objection I have with excessive need of debugging is that it is
a symptom of problems with writing working code. I rather put more effort
into static checking (typing) and testing than debugging, because they are
investments that have lasting value, while a debug session is a one-time
effort for hunting a particular bug and doesn't help to reduce the number
of bugs introduced into the code in the first place.


It is difficult to say without actually seeing your code, but if you are
just testing a generic logging framework, then I would think that you
could just write a simple test case for it separately. Shouldn't be much
more involved than the 5-6 lines of introspection code.



Here is one http://lambda-the-ultimate.org/node/219 .



Here is an example. Suppose the introspection facility provides the
ability to call any (accessible) function at run-time. Now, suppose then
that there is a function that is only called from a single place in the
program. Large programs typically have many such functions. Sometimes
even in performance critical inner loops. There is an optimization called
contification (http://mlton.org/References#FluetWeeks01) that replaces a
function called from a single place (or, more generally, one that always
returns to the same place) with a continuation. You could think that
instead of using "call + ret" the compiler uses "jmp + jmp" to call that
function and return to the caller (this is a simplistic explanation of the
optimization). That optimization becomes invalid. IOW, you can't just
transform the function to jmp to the single return point. Instead of just
applying the optimization, the compiler would have to do something to
cater for the introspection facility to make it possible to call the
function from multiple places. Of course, this is just a single example.
It doesn't take a lot of common sense to see that introspection really
does interfere with optimizations.



BTW, I have programmed in Scheme quite a bit (both professionally and
on my own time). I know enough about Python to say that it isn't the
kind of language I would particularly like to program in. (The same
goes for Perl.)


Yes, I know. In some circumstances it can be quite useful. Most SML
implementations provide you with a REPL that does let you enter new code
interactively. A REPL alone, can boost productivity a lot, because it
gives fast feedback. I use SML implementations with a REPL on a daily
basis to develop code. Aside from Alice ML, I think that with SML/NJ +
(slightly modified) CML, you might even be able to productively modify a
running system. Poly/ML 5.1, which supports OS threads, would probably
also make it possible. Compared to something like Common Lisp, the main
difference is that you need to explicitly specify which functions you want
to be able to replace at run-time, because bindings in SML are immutable.



From what I've seen, like OCaml, some Common Lisp implementations can also
be used as assemblers with named variables, which means that if you need
good performance, you can get it by writing manually tweaked, imperative,
low-level code. The same applies to some Scheme implementations and then
there is also Stalin, which gives
 
 
 

introspection in SML

Post by Rainer Jos » Wed, 05 Dec 2007 23:42:47

In article <fj3kjg$ccv$ XXXX@XXXXX.COM >,


...


...

That's not completely true. You might need to check out the
chapters about compiling code in the ANSI CL standard.

See here for semantic constraints for the compiler:

http://www.yqcomputer.com/

The standard for CL for example says

* any Common Lisp built-in construct is not changeable.
The compiler does not need to make them changeable.

this (CL:+ 1 2) has always a fixed semantics and is using
the CL function +

You are not allowed to redefine CL:+ .

Many implementations have the idea of protected packages
and will (warn or give an error) when you attempt
to redefine a built-in function or macro.

* the compiler can assume inside a function that all references
to the function itself do not change

* the compile can assume that inside a file the
function defined and used does NOT change.

Several compilers have the idea of larger compilation units
and make assumptions based on those.
See for example the chapter on block compilation
in the manual of CMUCL :

http://www.yqcomputer.com/ #toc176

So, in Common Lisp calls do not go necessary through some
dynamic lookup mechanism - there are also situation, where
static assumptions can be made.

...

--
http://www.yqcomputer.com/
 
 
 

introspection in SML

Post by Jon Harro » Thu, 06 Dec 2007 00:00:59


The problems you described are easily solved using macros in OCaml.


Then use Mono.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.yqcomputer.com/
 
 
 

introspection in SML

Post by Pascal Cos » Thu, 06 Dec 2007 00:06:56

ainer Joswig wrote:

You can also fine-tune things, including across compilation units, with
inline and notinline declarations, and compiler macros.

Pascal

--
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/