Referencing data set B in data step A's DO loop

Referencing data set B in data step A's DO loop

Post by Andrew Z » Thu, 07 May 2009 04:22:53


How do I reference the data set 'correct_domains' in the do loop, so I
can look for close Levenshtein distances (to find misspelled domains)?

data correct_domains;
input domain $200.;
infile datalines truncover;
datalines;
yahoo.com
gmail.com
hotmail.com
aol.com
comcast.net
msn.com
sbcglobal.net
verizon.net
bellsouth.net
cox.net
att.net
;;;;
run;

data check_these_domains;
input domain $200.;
infile datalines truncover;
datalines;
yahoo.cm
gmail.co
hotmial.com
aol.com
comcast.net
;;;;
run;

data checked;
set check_these_domains;
do _i_ = 1 to 11;
r = COMPLEV(???, domain);
if r in (1,2) then leave /* do something useful */;
run;
run;





Andrew
 
 
 

Referencing data set B in data step A's DO loop

Post by pchoat » Thu, 07 May 2009 05:37:33

Andrew -

Use a nested SET statement with a pointer, renaming your correct DOMAIN.

data checked;
set check_these_domains;
do _i_ = 1 to nobs;
set correct_domains(rename=(domain=correctdomain))
point=_i_ nobs=nobs;
output;
put (_all_)(=);
end;
run;

or you can use an SQL join.



Paul Choate
DDS Data Extraction
(916) 654-2160
-----Original Message-----
From: SAS(r) Discussion [mailto: XXXX@XXXXX.COM ] On Behalf Of
Andrew Z.
Sent: Tuesday, May 05, 2009 12:23 PM
To: XXXX@XXXXX.COM
Subject: Referencing data set B in data step A's DO loop

How do I reference the data set 'correct_domains' in the do loop, so I
can look for close Levenshtein distances (to find misspelled domains)?

data correct_domains;
input domain $200.;
infile datalines truncover;
datalines;
yahoo.com
gmail.com
hotmail.com
aol.com
comcast.net
msn.com
sbcglobal.net
verizon.net
bellsouth.net
cox.net
att.net
;;;;
run;

data check_these_domains;
input domain $200.;
infile datalines truncover;
datalines;
yahoo.cm
gmail.co
hotmial.com
aol.com
comcast.net
;;;;
run;

data checked;
set check_these_domains;
do _i_ = 1 to 11;
r = COMPLEV(???, domain);
if r in (1,2) then leave /* do something useful */;
run;
run;





Andrew

 
 
 

Referencing data set B in data step A's DO loop

Post by Andrew Z » Thu, 07 May 2009 05:38:04


proc sql;
create table matches as
select
a.domain as correct,
b.domain as found
from
correct_domains a,
check_these_domains b
where
complev(a.domain, b.domain) in (1,2)
;
quit;

Found here
http://www.yqcomputer.com/ #4efa2b32e287eee6



Andrew
 
 
 

Referencing data set B in data step A's DO loop

Post by NordlD » Thu, 07 May 2009 05:39:50

> -----Original Message-----

Here is one option, there are probably more elegant solutions.

data _null_;
if 0 then set correct_domains nobs=nobs;
call symput('ndomains', put(nobs,best.));
run;

data checked(drop=_:);
**-- read correct domains into an array --**;
if _n_ EQ 1 then do _i=1 by 1 until(eof);
set correct_domains end=eof;
array temp[&ndomains] $200 _temporary_ ;
temp[_i] = domain ;
end;

set check_these_domains;
do _i_ = 1 to &ndomains;
r = COMPLEV(temp[_i_], domain);
if r in (1,2) then leave /* do something useful */;
end;
put "done" _i_= r=; /* doing something semi-useful */
run;

Hope this is helpful,

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204