Matching records from data set one with records from data set

Matching records from data set one with records from data set

Post by art29 » Sat, 27 Jan 2007 13:21:54


Erik,

I'm not really sure what you are asking. Does the following come close?

data one;
input id;
cards;
1
2
4
;
run;

data two;
input id y;
cards;
1 1
1 2
2 1
3 1
4 1
4 2
4 3
;
run;

data matched extra_matches others;
merge one (in=a) two (in=b);
by id;
if a and b and first.id then output matched;
else if a and b then output extra_matches;
else output others;
run;

Art
--------
On Mon, 22 Jan 2007 12:37:28 +0100, erik christiansen


I
one.
 
 
 

Matching records from data set one with records from data set

Post by nospa » Mon, 29 Jan 2007 12:42:08

On Mon, 22 Jan 2007 12:37:28 +0100, erik christiansen



MERGE is usually the tool for this type of problem.

If I understand, ONE and TWO are something like

data one;
input var1 track1;
cards;
1 111
1 112
2 121
2 122
2 123
3 131
;

data two;
input var2 track2;
cards;
1 211
1 212
1 213
2 221
2 222
4 241
;

The straightforward merge is then like this

data simple_merge;
merge one(rename = (var1=byvar) in=inone)
two(rename = (var2=byvar) in=intwo);
by byvar;
match = inone and intwo;
run;

The result is

byvar track1 track2 match

1 111 211 1
1 112 212 1
1 112 213 1
2 121 221 1
2 122 222 1
2 123 222 1
3 131 . 0
4 . 241 0

Notice that the 222 observation is flagged as a match twice. That's wrong.

So we need to add a bit to the code

data merge_with_a_twist;
intwo = 0;
merge one(rename = (var1=byvar) in=inone)
two(rename = (var2=byvar) in=intwo);
by byvar;
match = inone and intwo;
run;

By managing the indicator INTWO, something which is ordinarily left to SAS,
we prevent an "inherited" match. The result is

byvar track1 track2 match

1 111 211 1
1 112 212 1
1 112 213 1
2 121 221 1
2 122 222 1
2 123 222 0
3 131 . 0
4 . 241 0

The 222 observation is now flagged just once.

 
 
 

Matching records from data set one with records from data set

Post by davidlcass » Tue, 30 Jan 2007 09:00:08


If you have multiple records that match, how do you select the one
you want to use?

Why do you need to do this?

How big are the two data sets? How many records, how much disk
space do they take up, ...

Are both data sets SAS files, or are they external files that have to
be read in?

Are there any limitations that would keep us from sorting or seeking
through the data sets?

HTH,
David
_________________________________________________________________
Check out all that glitters with the MSN Entertainment Guide to the Academy
Awards http://www.yqcomputer.com/