Need help creating regular expression for html lists...

Need help creating regular expression for html lists...

Post by bharathit » Thu, 22 Mar 2007 18:25:37


I'm working on regular expressions to parse html tags into the wiki
syntax. i.e. for example, if i encounter text like - some <b> more </
b> text, my regular expression should be able to convert that to some
'more' text. Simple things like the above, i was able to write but the
real problem lies when it comes to parsing lists and tables.

For example, i write down some text as follows -

# number one

# number two

# number three

i want to be able to convert that into html code -

<ol>

<li>number one</li>

<li>number two</li>

<li>number three</li>

</ol>

i was able to find out the first occurance and last occurance of #,
append a <ol> there and get the number one/two/three within <li></li>.
So far so good...But the problem occurs when i have multiple lists on
the same page. Obviously, my search for the first and last occurance
of # will not be as desired as there are two lists in the page. i.e.

# number one

# number two

some text here

# number one

# number two

then unfortunately my parsing into html yields me

<ol>

<li>number one</li>

<li>number two</li>

some text here

<li>number one</li>

<li>number two</li>

</ol>

and not as

<ol>

<li>number one</li>

<li>number two</li>

</ol>

some text here

<ol>

<li>number one</li>

<li>number two</li>

</ol>

Can anybody help???

I'm trying out the above using VB.Net as well as javascript. So any
help in either of the languages is most welcome...
 
 
 

Need help creating regular expression for html lists...

Post by anand110 » Thu, 22 Mar 2007 18:33:36

I'm working on regular expressions to parse html tags into the wiki
syntax. i.e. for example, if i encounter text like - some <b> more </
b> text, my regular expression should be able to convert that to some
'more' text. Simple things like the above, i was able to write but the
real problem lies when it comes to parsing lists and tables.

For example, i write down some text as follows -

# number one

# number two

# number three

i want to be able to convert that into html code -

<ol>

<li>number one</li>

<li>number two</li>

<li>number three</li>

</ol>

i was able to find out the first occurance and last occurance of #,
append a <ol> there and get the number one/two/three within <li></li>.
So far so good...But the problem occurs when i have multiple lists on
the same page. Obviously, my search for the first and last occurance
of # will not be as desired as there are two lists in the page. i.e.

# number one

# number two

some text here

# number one

# number two

then unfortunately my parsing into html yields me

<ol>

<li>number one</li>

<li>number two</li>

some text here

<li>number one</li>

<li>number two</li>

</ol>

and not as

<ol>

<li>number one</li>

<li>number two</li>

</ol>

some text here

<ol>

<li>number one</li>

<li>number two</li>

</ol>

Can anybody help???

I'm trying out the above using VB.Net as well as javascript. So any
help in either of the languages is most welcome...