sas自9版开始支持perl(Perl 5.6.1 ) 正则表达式支持,极大的方便了数据校验的简易性、可靠性 在没有Regular Expressions(RE)前,只能使用index,substr,tranwrd等函数对字符串进行操作,但这些函数对动态字符串的操作是缺乏弹性且效率较低 故SAS9推出RE,以方便的进行字符串校验、替换、提取 Regexp是由一组被称为metacharacters的特殊字符组成,这些特殊字符代表着特殊的匹配规则,具体请参考 http://www.perldoc.com/perl5.6.1/pod/perlre.html
各种使用案例如下: 1、对客户数据中的电话号码进行数据校验 data _null_; ?? retain re; ?? length first last home business $ 16;
?? if _N_ = 1 then do; ????? /*设置电话匹配模式1 (XXX) XXX-XXXX */ ????? paren = "\([2-9]\d\d\) ?[2-9]\d\d-\d\d\d\d"; ??? ????? /*设置电话匹配模式2 XXX-XXX-XXXX */ ????? dash = "[2-9]\d\d-[2-9]\d\d-\d\d\d\d";
????? /* 合并两种匹配模式,使用【|】特殊符号 */ ????? regexp = "/(" || paren || ")|(" || dash || ")/"; ???/*判断是否为正确的正则表达式*/ ????? re = prxparse(regexp); ????? if missing(re) then do; ???????? putlog "ERROR: Invalid regexp " regexp; ???????? stop; ????? end; ?? end;
?? input first last home business; ?/*启用正则匹配,如果匹配失败则返回missing*/ ?? if ^prxmatch(re, home) then ????? putlog "NOTE: Invalid home phone number for " first last home; ?? if ^prxmatch(re, business) then ????? putlog "NOTE: Invalid business phone number for " first last business; datalines; Jerome Johnson (919)319-1677 (919)846-2198 Romeo Montague 800-899-2164 360-973-6201 Imani Rashid (508)852-2146 (508)366-9821 Palinor Kent . 919-782-3199 Ruby Archuleta . . Takei Ito 7042982145 . Tom Joad 209/963/2764 2099-66-8474 ; 输出结果如下: NOTE: Invalid home phone number for Palinor Kent NOTE: Invalid home phone number for Ruby Archuleta NOTE: Invalid business phone number for Ruby Archuleta NOTE: Invalid home phone number for Takei Ito 7042982145 NOTE: Invalid business phone number for Takei Ito NOTE: Invalid home phone number for Tom Joad 209/963/2764 NOTE: Invalid business phone number for Tom Joad 2099-66-84
2、替换字符串,把<替换为<把>替换为> data _null_; ?? retain lt_re gt_re; ?? if _N_ = 1 then do; ?? ??/*设置替换模式 格式为:s/正则匹配表达式/替换的文本/*/ ????? lt_re = prxparse('s/</'); ????? ????? gt_re = prxparse('s/>/>/'); ????? if missing(lt_re) or missing(gt_re) then do; ???????? putlog "ERROR: Invalid regexp."; ???????? stop; ????? end; ?? end;
?? input; ?? /*启用这则替换*/ ?? call prxchange(lt_re, -1, _infile_); ?? call prxchange(gt_re, -1, _infile_);
?? put _infile_; datalines4; The bracketing construct ( ... ) creates capture buffers. To refer to the digit'th buffer use \ within the match. Outside the match use "$" instead of "\". (The \ notation works in certain circumstances outside the match. See the warning below about \1 vs $1 for details.) Referring back to another part of the match is called a backreference. ;;;;
输出结果如下: The bracketing construct ( ... ) creates capture buffers. To refer to the digit'th buffer use \<digit> within the match. Outside the match use "$" instead of "\". (The \<digit> notation works in certain circumstances outside the match. See the warning below about \1 vs $1 for details.) Referring back to another part of the match is called a backreference. 3、从客户信息中提取客户的办公电话文本 data _null_; ?? retain re areacode_re; ?? length first last home business $ 16; ?? length areacode $ 3;
?? if _N_ = 1 then do; ????? /* (XXX) XXX-XXXX */ ????? paren = "\(([2-9]\d\d)\) ?[2-9]\d\d-\d\d\d\d";
????? /* XXX-XXX-XXXX */ ????? dash = "([2-9]\d\d)-[2-9]\d\d-\d\d\d\d";
????? /* Combine two phone patterns into one with a | */ ????? regexp = "/(" || paren || ")|(" || dash || ")/";
????? re = prxparse(regexp); ????? if missing(re) then do; ???????? putlog "ERROR: Invalid regexp " regexp; ???????? stop; ????? end;
????? areacode_re = prxparse("/828|336|704|910|919|252/"); ????? if missing(areacode_re) then do; ???????? putlog "ERROR: Invalid area code regexp"; ???????? stop; ????? end; ?? end;
?? input first last home business;
?? if ^prxmatch(re, home) then ????? putlog "NOTE: Invalid home phone number for " first last home; ? ?? if prxmatch(re, business) then do; ?? ??/*返回最后匹配结果的信息*/ ????? which_format = prxparen(re); ????? /*从匹配结果中提取字符串*/ ????? call prxposn(re, which_format, pos, len); ????? areacode = substr(business, pos, len); ????? /*判断提取出的字符串的区号是否匹配,匹配则输出结果*/ ????? if prxmatch(areacode_re, areacode) then ???????? put "In North Carolina: " first last business; ?? end; ?? else ????? putlog "NOTE: Invalid business phone number for " first last business; datalines; Jerome Johnson (919)319-1677 (919)846-2198 Romeo Montague 800-899-2164 360-973-6201 Imani Rashid (508)852-2146 (508)366-9821 Palinor Kent 704-782-4673 704-782-3199 Ruby Archuleta 905-384-2839 905-328-3892 Takei Ito 704-298-2145 704-298-4738 Tom Joad 515-372-4829 515-389-2838 ;
输出结果如下: In North Carolina: Jerome Johnson (919)846-2198 In North Carolina: Palinor Kent 704-782-3199 In North Carolina: Takei Ito 704-298-4738
以上源代码来自SAS网站,我只是稍微加了点注释,便于初次接触者了解,详情请参考SAS网站
? ?
?
?

|