The whole thing comes down to whether it is better to first transmit
"N", the number of symbols and then the data, or to include an EOF
There are costs to doing both.
Imagine there is a probability model for the data itself
p(s|past) available at any point. The data
First case: transmitting N ahead of time.
The cost for the data is L_data = sum(i=1..N) -log p(s_i | past) plus
the cost to transmit N. The Elias delta code can provide a code for
natural integers in less than L(N) <= log N + 2 log log (2 N ).
L = L_data + log N + 2 log log (2N)
Second case: transmititng an EOF signal in-band.
Here, we will have to model the probability of an EOF signal.
The common choice is p(EOF) = 1/(i+1) with i the number of symbols
processed. Thus at any step the codelength for the actual signal
is -log [ (1-p(EOF))*p(s_i | past) ] (since an EOF didn't occur), and then
at the end, one EOF, -log [ P(EOF) ]
this is L = L_data + \sum_(i=1..N) (- log( i/i+1) ) + log (N+1)
= L_data + 2*log(N+1)
Conclusion: the length information costs
Lelias = log N + 2 log log (2N) for pre-transmitting the length.
Leof = 2 log(N+1) for an inband EOF.
(all logarithms are base 2)
For N >= 37, the Elias code wins, providing a shorter codelength.
So, at least with this EOF model, there is an advantage to
pre-transmitting the length for files longer than 37 bits.
Note that you can make an EOF marker cost of
1 + min( Lelias, Leof )
by pre-transmitting one bit saying which scheme you will be using.
The difference between Lelias and Leof is small, being less than 6
bits for N=10000.
PS: That Elias codelength is an upper bound, more specifically
L(elias) = 1 + floor(log(N)) + 2*floor(log(1+floor(log(N))))
counts the integral concrete bits.