Ticket #49 (new defect)

Opened 11 months ago

Last modified 7 months ago

stray closing tags (was: lists broken)

Reported by: volker Owned by: ralf
Priority: major Component: mwlib
Keywords: Cc:

Description

broken mw-markup also breaks the parsing of lists. can we handle stuff like that?

>>> uparser.simpleparse('*blub</b>blub\n*bla')
parser.info >> Parsing "'unknown'"
parser.info >> in parseArticle: skipping (<EndTag:b>, '</b>')
 Article 'unknown': 2 children
     Node '': 1 children
         Node '': 1 children
             ItemList '': 1 children
                 Item '': 1 children
                     'blub'
     Paragraph '': 3 children
         'blub'
         '\n'
         Node '': 1 children
             ItemList '': 1 children
                 Item '': 1 children
                     'bla'

should be a list with two items

seen at http://de.wikipedia.org/wiki/Akutes_Nierenversagen:

***[[Freie Leichtketten|Leichtketten]]</b>ablagerungen in den Nierenkanälchen bei [[Multiples Myelom|multiplem Myelom]] ([[Myelomniere]]) oder

Change History

Changed 10 months ago by ralf

related to #63

Changed 10 months ago by ralf

  • summary changed from lists broken to stray closing tags (was: lists broken)

Changed 7 months ago by ralf

see also #124:

uparser.simpleparse(u'Wurzeln,<ref><span class="cite">J. Jen\xedk, D. H. Sen:&#32;\'\'<span lang="en" xml:lang="en" class="lang">Morphology of root systems in trees: a proposal for terminology</span>.\'\'&#32;In:&#32;\'\'<span lang="en" xml:lang="en" class="lang">Tenth International Botanical Congress, Edinburgh. Abstracts</span>.\'\'&#32;1964,&#32;S.&#160;393\u2013394</span></span>.</ref> ist bei den Ameisenb\xe4umen nicht m\xf6glich.\n\n\n')

Changed 7 months ago by ralf

from #169:

In [2]: uparser.simpleparse(u'[http://web.de <span class="geo-</span>]')
parser.info >> Parsing "'unknown'"
parser.info >> in parseArticle: skipping (<EndTag:span>, u'</span>')
 Article 'unknown': 2 children
     Node '': 1 children
         Node '': 3 children
             '['
             URL u'http://web.de': 0 children
             u' <span class="geo-'
     Paragraph '': 1 children
         u']'
Out[2]: Article 'unknown': 2 children
Note: See TracTickets for help on using tickets.