List Parsing Regular Expression

List Parsing Regular Expression

Post by Zach » Thu, 08 Jun 2006 09:56:38


Say I have a string which is of the format

{A, B, C, D}

for some variable number of objects. I want a regular expression that
will put each of A, B, C, and D into its own separate capture group
without any commas. If the string is

{}

then the only capture by the regular expression should be the zeroth
capture, or the whole string.

I've tried some various things, but I keep coming up short and it's
capturing things I don't expect.

Any gurus out there who can offer a suggestion?

Thanks
Zach
 
 
 

List Parsing Regular Expression

Post by Mark Harri » Thu, 08 Jun 2006 10:11:34

\{((?<Item>\w+?),?\s*)*\}

results in a named group Item, with multiple captures A B C D.

- Mark

 
 
 

List Parsing Regular Expression

Post by Zach » Thu, 08 Jun 2006 11:32:19

That didn't quite work for some reason, but I used it as a starting
point and got a little bit closer: I'm using

\{([^,]+(?:,\s)?)*\}

at the moment.

This works for the single item in the list case, but if there's
multiple items in the list it only captures the last item into a group.
Example

{Hello, Goodbye}

My set of captures after calling the Match function are

"{Hello, Goodbye}"
"Goodbye"

I'm a little confused why this would happen, as the only text inside
the {} that should ever not be a part of some capture is the two
character sequence ", " specified by (?:,\s)
 
 
 

List Parsing Regular Expression

Post by Mark Harri » Thu, 08 Jun 2006 15:40:42

Zach,

Both are getting matched, but they are in different captures. However,
"hello" comes out as "hello," as you are including the ,\s in your match
group - hence my use of named capture groups.

try this:
\{((?<Items>[^,]+)(?:,\s+)?)*\}

The code you need to use will be something like this:
Regex rex = new Regex(@"\{((?<Items>[^,]+)(?:,\s+)?)*\}",
RegexOptions.IgnorePatternWhitespace | RegexOptions.Singleline |
RegexOptions.IgnoreCase | RegexOptions.Compiled);
if (rex.IsMatch(selectStatement))
{
MatchCollection mtcs = rex.Matches(selectStatement);

foreach (Match mtc in mtcs)
{
foreach (Capture cap in mtc.Groups["Items"].Captures)
{
string itemValue = cap.Value;
}
}
}