division of variable integer into best fit integer parts

division of variable integer into best fit integer parts

Post by racheledd » Sun, 07 Sep 2003 22:14:44


Hello

I have a variable integer which must be split into 8 integers based on
fixed percentages (33%, 18%, 15%, 13%, 11%, 4%, 3%, 3%). These 8
integers must add up to the original number (ie I cannot just multiply
integer by % and then use the roundup/rounddown functions as they
could add up to more/less than the original number).

Is there a function in excel which would work out a best fit solution?

Thanks
 
 
 

division of variable integer into best fit integer parts

Post by Tushar Meh » Mon, 08 Sep 2003 00:52:37

Following up on Jerry Lewis's suggestion, I would calculate the 2nd
through 8th percentages as percentage values, and calculate the first
(i.e., 33% value) as the 'left over.' By putting the correction into
the largest number, it minimizes the % error -- if any.

--
Regards,

Tushar Mehta, MS MVP -- Excel
www.tushar-mehta.com
Excel, PowerPoint, and VBA add-ins, tutorials
Custom MS Office productivity solutions

In article < XXXX@XXXXX.COM >,
XXXX@XXXXX.COM says...

 
 
 

division of variable integer into best fit integer parts

Post by racheledd » Mon, 08 Sep 2003 20:49:57

Good scheme - thanks.
R
 
 
 

division of variable integer into best fit integer parts

Post by Dana DeLou » Mon, 08 Sep 2003 22:54:48

Would using Solver be an acceptable solution? If I understand the question,
you have a number like 90.
You multiply 90 by .33 to get 29.7. You have to decide to use integer 29,
or rounded up 30. You do this for each percentage given.
These rounded numbers must total the original 90.
I hope I understand this correctly. This is hard to explain, so here goes.
Have two columns that round the percentages down, and up.( to get the 29,
and 30 from above). In the next 3 columns, put 0,1, and for the third, use
"SUM" and sum the 0 & 1 cells. Since you can not use IF functions, this is
a way to have Solver decide on the two cells. The 0/1 cells will be a
constraint that are Binary. The Sum cell is another constraint that says it
should total 1. Solver will alternate the two 0/1 cells so that one of them
will be 1, and the other will be zero. Now, use Sumproduct on the two
rounded values (29/30, and the two cells 0/1). This will be the value that
Solver picks. To determine the "Error" I used for example (29.7- Solver's
Pick)^2.
Basically, the difference squared. (Note that using ABS(difference) will
not work in Solver, so that is why ^2).
Copy these down for the other percentages.
Add a Sum formulas that sum the differences. Add another formula that does
this: =90-Sum(Solver's pick). Solver needs to make this equal zero.

For Solver, try to minimize the total of the "differences."
For constraints:
0/1 cells are binary
Each of Sum(x,y) cells that sum the 0/1 cells should total 1.
Set the cell that has "90-Sum(Solver's pick). " equal to zero.

In Excel XP, you often have to run Solver twice to get the "better"
solution.
For some numbers, Solver suggested a solution that was not similar to
rounding the 7 smallest percentages, and then adjusting the 33%.

I'm not a Stat's expert, so I'm not sure if (difference)^2 is the best
measure. It appears that if you round the 7 smallest percentages, some
numbers will make the larger .33 value have to move a larger integer amount
to get back to the original value.
For example, if using (difference)^2, Solver suggests taking you number (90
* 4% = 3.6) and rounding down to 3.
This allowed your 90*33% = 29.7 to use 30 instead of a value 28!
The penalty of (29.7 - 28)^2 was too high doing it this way.
Anyway, HTH. :>)

--
Dana DeLouis
Using Windows XP & Office XP
= = = = = = = = = = = = = = = = =
 
 
 

division of variable integer into best fit integer parts

Post by Tushar Meh » Tue, 09 Sep 2003 10:31:07

i Dana,

Inline
In article < XXXX@XXXXX.COM >, XXXX@XXXXX.COM
says...

The above can be done with just one binary variable. Suppose the
ROUNDUP values are in C16:C23 and the ROUNDDOWN values in D16:D23.
Then, suppose the binary variables are in E16:E23. In F16, use the
formula =C16*E16+D16*(1-E16). This ensures that one of C16 or D16 is
picked. Copy F16 down to F17:F23.

Yes, ABS will work. It just makes the problem non-linear, but in this
case, it already is non-linear.

You can also just designate F16:F23 as non-negative integer decision
variables. That too works just fine. ;-)


From what I remember of my PhD introductory stats class, sum-of-ABS is
statistically a better measure that sum-of-squares. However, sum-of-
square has become the preferred method because it was easier to program
in days-gone-by. There is a similar issue with how one calculates
variance. SUM((Xi-X-bar)^2) is the numerically superior method, but
SUM(Xi^2)-n*X-bar^2 is easier to calculate with 'pen and paper.'


This results from the use of the difference rather than the % of the
difference. When using just the difference, consider the choices for
adjusting 29.7 and 3.6. It is preferable to drop 3.6 to 3 rather than
29.7 to 29.0! However, one could use ((Adjusted-Original)/Original)^2
and now the changing 3.6 to 3 is more 'expensive' than changing it to
4!


A more sophisticated method would be to allocate the difference to not
just the largest value but among some of the largest values. I believe
Harlan Grove has posted some UDF solutions along those lines (though I
am not sure).


Yeah, depending on how one defines the penalty function, one can get
different results. Here are a few I ran:

A B C D E F
1.42 1.02 0.043 0.042 0.043 0.44
30 30 29 30 29 29
16 16 16 16 16 16
14 13 14 13 14 13
12 12 11 11 11 12
10 10 10 10 10 10
3 3 4 4 4 4
2 3 3 3 3 3
3 3 3 3 3 3
90 90 90 90 90 90

(A): Two variables as you documented with sum-of-difference-squared
(B): One variable as illustrated above with sum-of-difference-squared
(C): One variable with sum-of-percent-difference-squared
(D): Just INT specification with sum-of-percent-difference-squared
(E): One variable as above with sum-of-percent-difference-squared
(F): Just INT specification with sum-of-abs-percent-difference

The top row is the minimum value achieved by Solver. The numbers are
*not* comparable across different methods.

My guess from looking at the results of the sum-of-percent-difference-
squared is that the objective function is reasonably flat near the
optimum and Solver finds (and indeed cycles through) one of a few
'reasonably' close choices.

--
Regards,

Tushar Mehta, MS MVP -- Excel
www.tushar-mehta.com
Excel, PowerPoint, and VBA add-ins, tutorials
Custom MS Office productivity solutions

In article < XXXX@XXXXX.COM >, XXXX@XXXXX.COM
says...
 
 
 

division of variable integer into best fit integer parts

Post by Dana DeLou » Tue, 09 Sep 2003 21:23:05

Thank you Tushar. You are absolutely right. With a choice between two
items, it requires only 1 binary variable. Thank you. I feel dumb.
Thank you for the discussion on possible ways to measure "Best fit." I
was thinking that it should give more weight to the higher percentages, but
wasn't sure exactly how to proceed. I like your ideas here.

Have you had much luck using ABS in your Solver models? I usually do not
have much luck. I thought functions like Abs confuse Excel's solver, and
was not allowed. I looked it up at Frontline, and you are correct.
Frontline does not call Abs a discontinuous function. They call it a
"non-smooth" function. They just "recommend" not using functions like Abs
in Excel. I gather from some articles that the "Premium" Solver will handle
Abs a little better.
I guess that is why I was more incline to square the value to get a positive
value instead. I don't know.

I found it interesting that when I used my "two" binary solution, using ABS
did not work well. I had to run it multiple times to get closer and closer
to an optimal solution. However, when I changed it to your "improved" way
with just 1 binary variable, Abs worked much better.

Anyway, here is a quick copy of a frontline article if interested.
<Short copy...>
If the graph of the function's derivative also contains no breaks, then the
original function is called a smooth function. An example of a continuous
function that is not smooth is ABS(C1) -- its graph is an unbroken "V"
shape, but the graph of its derivative contains a break, jumping from -1 to
+1 at C1=0. The nonlinear GRG Solver also makes use of second derivatives
(in multiple dimensions, the Hessian) of the problem functions to make
faster progress, and to test whether it has found the optimal solution; it
sometimes has trouble with functions such as ABS(C1).

Here is a short list of common non-smooth functions in Excel.
ABS, MIN, MAX, INT, ROUND, CEILING, FLOOR

<end of copy>

I have never understood why they list "Min" and "Max" as being non-smooth
vs. discontinuous. I have never gotten these to work well in Excel. I
find that the "jump" when using these functions causes confusion for Excel's
Solver.
Anyway, this is very interesting. Thank you.


Very nice! Congrats.
--
Dana DeLouis
Using Windows XP & Office XP
= = = = = = = = = = = = = = = = =

<snip>
 
 
 

division of variable integer into best fit integer parts

Post by Tushar Meh » Mon, 15 Sep 2003 03:59:09

In article < XXXX@XXXXX.COM >, XXXX@XXXXX.COM
says...


Well, ABS is a 'non-smooth' function. And, I guess Frontline has to
make money some way. I imagine it decided that the basic Solver would
not do non-linear optimizations as well as the priced versions.

However, it is possible to linearize some/many seemingly non-linear
functions. For two examples (IF and ABS) see
http://www.yqcomputer.com/ %
40msnews.microsoft.com

{snip}

Plot MIN(10,x) for 0 <= x <= 20 and you'll see why Frontline defines it
as it does. :)

--
Regards,

Tushar Mehta, MS MVP -- Excel
www.tushar-mehta.com
Excel, PowerPoint, and VBA add-ins, tutorials
Custom MS Office productivity solutions