Finding differences between records with same key in a big dataset (lots of fields)

Finding differences between records with same key in a big dataset (lots of fields)

Post by Ajay » Sun, 22 Nov 2009 03:56:31


Hi,

I am looking for some easy way to find differences between two records
having the same key. The dataset contains several hundred variables,
so i dont want to name the fields..

Data is something like the following...


Key_Field Field_1 Field_2 Field_3 ............

Key1 a b c
Key1 a b not-c
Key2 x y z.
Key2 x not-y z...
...

in the above, i would like to come up with something like following
(basically for each key identifying fields that have mismatches)..

Key1 and Field_3
Key2 and Field_2 and so on..

Any inputs, would be highly appreciated!

Thanks,
Ajay
 
 
 

Finding differences between records with same key in a big dataset (lots of fields)

Post by Tom Aberna » Sun, 22 Nov 2009 04:06:05

Split the data (or make views that split it) and use proc compare.
Example:

data one two;
set big;
by key_field;
if first.key_field then output one;
else output two;
run;

proc compare data=one compare=two;
id key_field;
run;



>
> Key1 a > b
> Key1 > a b gt;ot-c
> Key2 gt;> gt; .
> Key2 ot-y gt;...
> ...
>
> in the above, i would like to come up with somet>in> like following
> >basically for each key identifyi>g >ields that have mismatches)..
>
> Key1 an> F>eld_3
> K>y2 and Field_2 and so on..