Getting a list of results from one regular expression

Getting a list of results from one regular expression

Post by tiety » Fri, 24 Jun 2005 11:28:30


Hello I'm new to Ruby. I've read most of the pragmatic programmer
guide but couldn't find anything that explained how to do this.

To summarize my whole question: how do I get EVERY match of a regular
expression (instead of just the first)?

Here's my situation, I've got this long string that contains XML. I
would like to parse it. Specifically, I want to search this string for
all instances of a pattern like /stringAlias="(.*)"/

I'm no pro with regex, but I think that will find a match for a string
that looks like this: stringAlias="BLAH"

And because of the (.*), the result will be BLAH

Now this is all fine and good. But what I can't figure out is how to
get every match in an array (instead of just the first match.

If i have stringAlias="BLAH" ... stringAlias="BLEH" how do I get an
array that is ["BLAH", "BLEH"]?

Keep in mind that there are a dynamic number of matches for
stringAlias="(.*)"


This is the code I wrote to try to do it:

def ...
@aliases = []
matchedData = /stringAlias="(.*?)"/.match(@data)
@aliases = matchedData.to_a
puts @aliases
end

The length of the array is 2 and the result is this:
stringAlias="OP"
OP

Even though the data is this:
<string RSLDefined="false" active="false" languageId="1"
sortOrder="0" stringAlias="OP">
<stringValue><![CDATA[Open or Pending]]></stringValue>
</string>
<string RSLDefined="false" active="true" languageId="1"
sortOrder="1" stringAlias="1">
<stringValue><![CDATA[Open]]></stringValue>
</string>
<string RSLDefined="false" active="true" languageId="1"
sortOrder="2" stringAlias="2">
<stringValue><![CDATA[Pend]]></stringValue>
</string>
<string RSLDefined="false" active="true" languageId="1"
sortOrder="3" stringAlias="3">
<stringValue><![CDATA[Decline]]></stringValue>
</string>
<string RSLDefined="false" active="true" languageId="1"
sortOrder="4" stringAlias="4">
<stringValue><![CDATA[Complete]]></stringValue>
</string>
 
 
 

Getting a list of results from one regular expression

Post by Devin Mull » Fri, 24 Jun 2005 11:44:28


String#scan

I'm sure there are other ways, though. I just learned about String#scan
today. (Yes, Dave, my copy of the Pickaxe is on its way.)

Devin

 
 
 

Getting a list of results from one regular expression

Post by C Erle » Fri, 24 Jun 2005 11:46:11

I usually use String#scan.

"testwoohootestkaboomtestyutyut".scan(/test../)
=> ["testwo", "testka", "testyu"]
 
 
 

Getting a list of results from one regular expression

Post by Mark Hubba » Fri, 24 Jun 2005 11:47:19


Regexp#match only gives the first match; the matchdata object is sort
of an array of the entire match, followed by the subexpression
matches. What you want is String#scan: (warning, untested)

regexp = /stringAlias="(.*?)"/
matches = @data.scan(regexp)

Since the regexp has a subexpression matcher, that is what will be put
into the array "matches". You'll get an array something like this:

[["OP"],["1"],["2"], ... ]

(each match has it's own subarray, since it's a subexpression match)

Check out the docs for String#scan for more info...

cheers,
Mark
 
 
 

Getting a list of results from one regular expression

Post by Gavin Kist » Fri, 24 Jun 2005 13:36:06


In addition to the correct response given by others (String#scan),
you might also want to look at the StringScanner class. It gives you
the ability to crawl through a string with successive regexp calls,
where each new call starts at the new 'current' position.

story = <<ENDSTORY
Hello World! There are 3 cats in my house, with 4 feet each.

6 of those 12 feet have 5 claws each; the other 6 feet have 4 claws
each.

Ow, my back. 54 claws need clipping.
ENDSTORY

require 'strscan'
scanner = StringScanner.new( story )

info = []
count_nouns = /(\d+) (\w+)/

until scanner.eos?
break unless scanner.scan_until( count_nouns )
tidbit = {
:full_match => scanner[0],
:count => scanner[1].to_i,
:noun => scanner[2]
}
info << tidbit
end

require 'pp'
pp info
info.each{ |tidbit|
puts "Of %7s, I saw %02d" % [ tidbit[:noun], tidbit[:count] ]
}



[{:noun=>"cats", :count=>3, :full_match=>"3 cats"},
{:noun=>"feet", :count=>4, :full_match=>"4 feet"},
{:noun=>"of", :count=>6, :full_match=>"6 of"},
{:noun=>"feet", :count=>12, :full_match=>"12 feet"},
{:noun=>"claws", :count=>5, :full_match=>"5 claws"},
{:noun=>"feet", :count=>6, :full_match=>"6 feet"},
{:noun=>"claws", :count=>4, :full_match=>"4 claws"},
{:noun=>"claws", :count=>54, :full_match=>"54 claws"}]
Of cats, I saw 03
Of feet, I saw 04
Of of, I saw 06
Of feet, I saw 12
Of claws, I saw 05
Of feet, I saw 06
Of claws, I saw 04
Of claws, I saw 54
 
 
 

Getting a list of results from one regular expression

Post by Pit Capita » Fri, 24 Jun 2005 15:32:02


XXXX@XXXXX.COM schrieb:

One additional remark: if the input can contain multiple stringAlias
expressions on one line, the pattern should be /stringAlias="(.*?)"/
(note the question mark). You can see the difference if you match a
string like

str = "stringAlias=\"one\" bla stringAlias=\"two\""

p str.scan( /stringAlias="(.*)"/ )
# => [["one\" bla stringAlias=\"two"]]

p str.scan( /stringAlias="(.*?)"/ )
# => [["one"], ["two"]]

Regards,
Pit
 
 
 

Getting a list of results from one regular expression

Post by tiety » Fri, 24 Jun 2005 16:15:06

First of all, thanks for all that super fast help. I've never asked a
technical question anywhere before and got such a fast response.

Specifically to Pit Capitain:
Thanks for that tip. I just googled that and learned what the .*?
does.