Foxhound is the better* Database Monitor for SQL Anywhere.
*better: More thorough, more relevant, more effective.
...more Alerts, more All Clears, more details, more control in your hands.
|
Breck Carter
Last modified: June 14, 1996
mail to: bcarter@bcarter.com
...continued from The Dreaded Meta-Tilde (I)
"I can't tell you how frustrating it has been working with Match(). Please tell me how to let the user type car and have it match Breck Carter, carrot, car lot, etc."
Yowza! The roller coaster just went ballistic and I'm gonna lose my lunch. SetFilter gets really interesting when you include calls to the Match() function to do powerful searches. Listing 3 shows a simple SetFilter call that matches all rows that have the string "car" embedded anywhere within string_column.
li_RC = dw_groups.SetFilter & ( 'match ( string_column, ' & + '".*[Cc][Aa][Rr].*" )' ) if li_RC <> 1 then MessageBox ( 'Error', 'SetFilter failed' ) return end if
The periods, asterisks and square brackets are called "metacharacters" in PowerBuilder-speak, meaning they have special meanings other than simply being characters that match themselves. The letters CcAaRr are non-metacharacters because they do match themselves. The whole bizarre string is sometimes called a "regular expression", one of those oxymorons we're so fond of using like "transparent interface" and "portable software".
The code in Listing 3 can be expressed in English as follows: Show only the rows where string_column contains one of the letters C or c, followed by A or a, then R or r, preceded by zero or more occurrences (represented by the first asterisk) of any character (the period), and followed by zero or more occurrences of any character.
Whew! In other words, it matches the substring "car" in upper or lower case embedded anywhere in string_column. I once coded Hewlett Packard Printer Control Language (PCL) by hand, and it wasn't much harder than getting regular expressions right. Both are tiny little programming languages where single characters are combined to form powerful operators.
"But I want the user to just type 'car', not [Cc][Aa][Rr]. How is that accomplished?"
Well, you could break the value into single characters and build the match string, but it's a lot simpler to convert both the user input and the DataWindow column to lower case. Listing 4 shows how the user can type a word in a SingleLineEdit and have it used in a case insensitive match. Note that leading and trailing spaces are removed from the user's input but not the DataWindow column.
ls_match = trim ( lower ( sle_match.text ) ) ls_match & = 'match( lower( string_column ), ".*' & + ls_match & + '.*" )' li_RC = dw_groups.SetFilter ( ls_match ) if li_RC <> 1 then MessageBox ( 'Error', 'SetFilter failed' ) return end if
"What about matching 'car' at the beginning of words like 'carrot' but not embedded within words like 'incarnate'?"
Incarnate indeed. Match can do what you want but first you must figure out exactly what that is; i.e., how is "the beginning of a word" defined? The answer might be "a prefix is a substring that appears at the beginning of the string, or anywhere in the string if it's preceded by a blank." Listing 5 shows how the caret metacharacter can be used to match the beginning of the string, and how the "or" operator may be used to combine multiple match calls in one SetFilter call. These two match calls find 'car' at the beginning of the string, or one blank plus 'car' anywhere.
ls_match & = 'match( lower( string_column ), "^' & + ls_match & + '.*" ) or ' & + 'match( lower( string_column ), ".* ' & + ls_match & + '.*" )'
"That works on 'Breck Carter' but what about 'Jones,Carmen'?"
Yes, well, I guess that calls into doubt our definition of prefix. Let's refine it to include the words "following a separator character" instead of just a blank. And we'll include a few other possibilities besides the comma. Listing 6 shows how the square bracket metacharacters can be used to match one blank, comma, colon, semicolon or underscore so it will find 'car' in 'Jones;Carmen' as well as 'Jones,Carmen'.
ls_match & = 'match( lower( string_column ), "^' & + ls_match & + '.*" ) or ' & + 'match( lower( string_column ), ' & + '".*[ ,_:;]' & + ls_match & + '.*" )'
"What about finding 'ted' or 'day' in Edward "Ted" O'Day? How about 'com' in fpostle@powersoft.com?"
Listing 2 dealt with searching for quotes. Now we're talking about treating quotes as separator characters. In the first case, the program handled quotes embedded in the search string, while in this case the program must embed quotes in the match string. Eventually we'll want to do both, but first it's time to talk seriously about tildes. Here are some hints that might help:
Does your head hurt now? I once knew a Physics professor who used post-graduate thesis topics as questions on first-year mid-terms. The class would average about 10% but every once in a while some front-row genius would come up with a useful idea. If he was teaching PowerBuilder, here is what one of those questions might be: One, two or four tildes are all you ever need, never three. Two tildes are often superfluous, and the need for four is rare. The key is to carefully choose double quotes inside singles, or vice versa. True or false? Please explain.
I think I know the answer, but my head hurts too. Send your answer to bcarter@bcarter.com. For extra points, explain why the four tildes in Listing 2 are an example of point 12 above. And have a look at Listing 7, and explain why single and double-tildes are used.
ls_match & = 'match( lower( string_column ), "^' & + ls_match & + '.*" ) or ' & + 'match( lower( string_column ), ' & + '".*[ ,_:;~'~~"]' & + ls_match & + '.*" )'
"What about finding 'com' in fpostle@powersoft.com? Or 'car' in Smythe-Carter?"
The problem with treating the period as a separator is that it's a metacharacter itself. Putting it inside a match string has the effect of matching any character, not just the period. To embed a metacharacter as an ordinary character within a regular expression you must precede it with a backslash. This applies to several characters like the dollar sign, asterisk and question mark, all listed in the PowerBuilder Help.
But wait! There's one missing from the list! The hyphen in Smythe-Carter is also a metacharacter when used inside square brackets. For example [a-z] means "match any single character in the range from a to z" but not the hyphen itself. And that's exactly where we want to put a hyphen, inside the list of separators. It too must also be preceded by a backslash as shown in Listing 8.
ls_match & = 'match( lower( string_column ), "^' & + ls_match & + '.*" ) or ' & + 'match( lower( string_column ), ' & + '".*[ ,_:;~'~~"\.\-]' & + ls_match & + '.*" )'
This business of the hyphen not being explicitly documented as a metacharacter is troublesome. What if some other character becomes a metacharacter in a future release of PowerBuilder? What if the tilde became a metacharacter?
Here's another hint to add to the list: Backslashes are processed after all the tilde stripping is finished. And although the documentation says to use backslash before a metacharacter, it looks like backslashes are stripped from in front of any character. So today's solution to future metacharacters might be to put a backslash in front of every single character you want embedded in the match string. But not inserted inside tilde pairs that are going to be reduced by the earlier stripping processes, just ahead each tilde sequence that's going to end up as a single character. The mind boggles at the monstrosity of Listing 9, but Figure 2 is proof that it works.
ls_match & = 'match( lower( string_column ), "^' & + ls_match & + '.*" ) or ' & + 'match( lower( string_column ), ' & + '".*[\ \,\_\:\;\~'\~~"\.\-\~~~~]' & + ls_match & + '.*" )'
"What if the user includes metacharacters and quotation marks in the search string?"
Enough! The ride is over, time to get off. As that lazy professor might have done, let's assign this problem as homework. The user wants to search for the prefix "test." so that it matches "test.dta" and "run test.com" but not "xtest.com" or "thorough testing". The search string might include quotes and tildes as well as metacharacters. In fact, the user might want to search this web page for some very strange strings indeed. Hint: combine the code from Listings 4 and 9 and add some logic to preprocess the user's input like the code in Listing 2.
Breck Carter can be reached by phone at (416) 763-5200 or via email at bcarter@bcarter.com.