BOOST 1..33.0 快出来了,并重写了regex,增加了 *对unicode支持 *对ATL MFC CString的支持 *********** 迫不及待,先下了一个来看看. 源码下载: ========= boost地址: cvs -d:pserver:[email protected]:/cvsroot/boost login cvs -z9 -d:pserver:[email protected]:/cvsroot/boost co -P boost ICU地址:(boost 1.33.0的regex的unicode解决方案是基于IBM的unicode库ICU) http://www.ibm.com/software/globalization/icu/ 源码编译: ============= 编译环境是vc7.1+vc7.1自带的C++ STL,进入到BOOST_ROOT\libs\regex\build bjam -sICU_PATH=d:\icu32 -sTOOLS=vc-7_1 stage Unicode支持测试: ================ 看了一下icu的dll,boost regex动态连接的三个dll总体积居然达到10M,心情不好,放弃测试。 ATL MFC支持: =============== 在vc7.1里面,新开个win32 console,加入下面代码: /* * * Copyright (c) 2004 * John Maddock * * Use, modification and distribution are subject to the * Boost Software License, Version 1.0. (See accompanying file * LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) * */ /* * LOCATION: see http://www.boost.org for most recent version. * FILE mfc_example.cpp * VERSION see <boost/version.hpp> * DESCRIPTION: examples of using Boost.Regex with MFC and ATL string types. */ #define TEST_MFC #ifdef TEST_MFC #include <boost/regex/mfc.hpp> #include <cstringt.h> #include <atlstr.h> #include <assert.h> #include <tchar.h> #include <iostream> #ifdef _UNICODE #define cout wcout #endif // // Find out if *password* meets our password requirements, // as defined by the regular expression *requirements*. // bool is_valid_password(const CString& password, const CString& requirements) { return boost::regex_match(password, boost::make_regex(requirements)); } // // Extract filename part of a path from a CString and return the result // as another CString: // CString get_filename(const CString& path) { boost::tregex r(__T("(?:\\A|.*\\\\)([^\\\\]+)")); boost::tmatch what; if(boost::regex_match(path, what, r)) { // extract $1 as a CString: return CString(what[1].first, what.length(1)); } else { throw std::runtime_error("Invalid pathname"); } } CString extract_postcode(const CString& address) { // searches throw address for a UK postcode and returns the result, // the expression used is by Phil A. on www.regxlib.com: boost::tregex r(__T("^(([A-Z]{1,2}[0-9]{1,2})|([A-Z]{1,2}[0-9][A-Z]))\\s?([0-9][A-Z]{2})$")); boost::tmatch what; if(boost::regex_search(address, what, r)) { // extract $0 as a CString: return CString(what[0].first, what.length()); } else { throw std::runtime_error("No postcode found"); } } void enumerate_links(const CString& html) { // enumerate and print all the <a> links in some HTML text, // the expression used is by Andew Lee on www.regxlib.com: boost::tregex r(__T("href=[\"\']((http:\\/\\/|\\.\\/|\\/)?\\w+(\\.\\w+)*(\\/\\w+(\\.\\w+)?)*(\\/|\\?\\w*=\\w*(&\\w*=\\w*)*)?)[\"\']")); boost::tregex_iterator i(boost::make_regex_iterator(html, r)), j; while(i != j) { std::cout << (*i)[1] << std::endl; ++i; } } void enumerate_links2(const CString& html) { // enumerate and print all the <a> links in some HTML text, // the expression used is by Andew Lee on www.regxlib.com: boost::tregex r(__T("href=[\"\']((http:\\/\\/|\\.\\/|\\/)?\\w+(\\.\\w+)*(\\/\\w+(\\.\\w+)?)*(\\/|\\?\\w*=\\w*(&\\w*=\\w*)*)?)[\"\']")); boost::tregex_token_iterator i(boost::make_regex_token_iterator(html, r, 1)), j; while(i != j) { std::cout << *i << std::endl; ++i; } } // // Take a credit card number as a string of digits, // and reformat it as a human readable string with "-" // separating each group of four digits: // const boost::tregex e(__T("\\A(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})\\z")); const CString human_format = __T("$1-$2-$3-$4"); CString human_readable_card_number(const CString& s) { return boost::regex_replace(s, e, human_format); } int main() { // password checks using regex_match: CString pwd = "abcDEF---"; CString pwd_check = "(?=.*[[:lower:]])(?=.*[[:upper:]])(?=.*[[:punct:]]).{6,}"; bool b = is_valid_password(pwd, pwd_check); assert(b); pwd = "abcD-"; b = is_valid_password(pwd, pwd_check); assert(!b);
// filename extraction with regex_match: CString file = "abc.hpp"; file = get_filename(file); assert(file == "abc.hpp"); file = "c:\\a\\b\\c\\d.h"; file = get_filename(file); assert(file == "d.h"); // postcode extraction with regex_search: CString address = "Joe Bloke, 001 Somestreet, Somewhere,\nPL2 8AB"; CString postcode = extract_postcode(address); assert(postcode = "PL2 8NV"); // html link extraction with regex_iterator: CString text = "<dt><a href=\"syntax_perl.html\">Perl Regular Expressions</a></dt><dt><a href=\"syntax_extended.html\">POSIX-Extended Regular Expressions</a></dt><dt><a href=\"syntax_basic.html\">POSIX-Basic Regular Expressions</a></dt>"; enumerate_links(text); enumerate_links2(text); CString credit_card_number = "1234567887654321"; credit_card_number = human_readable_card_number(credit_card_number); assert(credit_card_number == "1234-5678-8765-4321"); return 0; } #else #include <iostream> int main() { std::cout << "<NOTE>MFC support not enabled, feature unavailable</NOTE>"; return 0; } #endif 设置编译环境: ============= *include路径里面包含$(BOOST_ROOT);%(ICU_PATH)\include,都在vc7.1相关include目录之后。 设置编译属性: ============ *使用unicode字符集 *使用/Zc:wchar_t(注意:vc7.1默认编译boost时候,wchar_t是作为元数据处理的,所以,如果要支持unicode,而不是mbcs时候,请使用此编译项编译工程) *使用多线程调试dll /MDd(请不要使用其他的,如果你不明白这个是什么意思) *设置宏BOOST_REGEX_DYN_LINK(默认情况下,regex是静态连接,如果想动态连接,就设置此宏) 编译连接“顺利”通过。 编译命令行为: /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "BOOST_REGEX_DYN_LINK" /D "_UNICODE" /D "UNICODE" /Gm /EHsc /RTC1 /MDd /Zc:wchar_t /Yu"stdafx.h" /Fp"Debug/capture.pch" /Fo"Debug/" /Fd"Debug/vc70.pdb" /W3 /nologo /c /Wp64 /ZI /TP 连接命令行为: /OUT:"Debug/capture.exe" /INCREMENTAL /NOLOGO /DEBUG /PDB:"Debug/capture.pdb" /SUBSYSTEM:CONSOLE /MACHINE:X86 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib BOOST 1.33.0 regex changelog ===================== Boost 1.33.0. Boost 1.32.1. - Fixed bug in partial matches of bounded repeats of '.'.
Boost 1.31.0. - Completely rewritten pattern matching code - it is now up to 10 times faster than before.
- Reorganized documentation.
- Deprecated all interfaces that are not part of the regular expression standardization proposal.
- Added regex_iterator and regex_token_iterator .
- Added support for Perl style independent sub-expressions.
- Added non-member operators to the sub_match class, so that you can compare sub_match's with strings, or add them to a string to produce a new string.
- Added experimental support for extended capture information.
- Changed the match flags so that they are a distinct type (not an integer), if you try to pass the match flags as an integer rather than match_flag_type to the regex algorithms then you will now get a compiler error.
[end] 
|