Ticket #97 (closed defect: fixed)

Opened 9 months ago

Last modified 7 weeks ago

parse all allowed html tags

Reported by: volker Owned by: volker, ralf
Priority: blocker Component: mwlib
Keywords: Cc:

Description (last modified by ralf) (diff)

even if it should not be the goal to build an html-parser, we need to deal with html tags. we should find out all allowed html tags and try to parse them all...

one example is the <em> tag

http://en.wikipedia.org/wiki/Help:HTML_in_wikitext#Permitted_HTML

Change History

  Changed 9 months ago by ralf

  • description modified (diff)

  Changed 8 months ago by heiko

  • priority changed from major to critical

http://meta.wikimedia.org/wiki/Help:HTML_in_wikitext

currently missing:

<ruby>
<rb>
<rp>
<rt>
<dl>
<dt>
<dd>
<tbody>
<caption>

  Changed 8 months ago by heiko

also broken:

element -> becomes

<var>  -> <p>
<thead> -> <td>
<b> -> <strong>
<i> -> <em>

<!-- comment --> -> nothing

  Changed 8 months ago by ralf

comment tags cannot be parsed. they are removed when expanding templates.

follow-up: ↓ 7   Changed 8 months ago by heiko

regarding comments: can this be fixed when we add extra handling for templates?

and: please parse vlist for all tags

  Changed 7 weeks ago by heiko

  • priority changed from critical to blocker

for unsupported ruby tags see: http://pl.wikibooks.org/wiki/Japo%C5%84ski

in reply to: ↑ 5   Changed 7 weeks ago by ralf

Replying to heiko:

regarding comments: can this be fixed when we add extra handling for templates?

no

  Changed 7 weeks ago by ralf

  • status changed from new to closed
  • resolution set to fixed

they are now parsed, but some/most of the writers won't handle them (tests already fail)

upped version to 0.9.1.dev

Note: See TracTickets for help on using tickets.