Ticket #175 (closed defect: invalid)

Opened 7 months ago

Last modified 7 weeks ago

handling of unknown tags

Reported by: volker Owned by:
Priority: critical Component: mwlib
Keywords: Cc:

Description

unknown tag nodes should be handled in a better way. the currenty parsing result is problematic:

In [3]: p('<idl>blub</idl>')
parser.info >> Parsing "'unknown'"
parser.info >> in parseArticle: skipping (<EndTag:idl>, '</idl>')
 Article 'unknown': 1 children
     Node '': 2 children
         '<idl>'
         'blub'

Change History

  Changed 7 months ago by ralf

  Changed 6 months ago by heiko

  • priority changed from major to critical

http://wikitravel.org/en/Alexandria

e.g.

<listing name="Poison Center Main University Hospital" alt="" address="" directions="" phone="+20-3-4862244" url="" hours="" price="" lat="" long="" email="" fax=""></listing>

Proposal:

  • Introduce a UnknownTagNode?-Class. So exports can decide how to handle those tags.
  • Compile a list of popular mw-extensions with the elements they add

follow-up: ↓ 4   Changed 6 months ago by ralf

this bug is about unknown closing tags, which are not returned as text (this is how interpret this.) They are skipped. Why should this be critical?

in reply to: ↑ 3 ; follow-up: ↓ 5   Changed 6 months ago by heiko

Replying to ralf:

this bug is about unknown closing tags, which are not returned as text (this is how interpret this.) They are skipped. Why should this be critical?

the title of this ticket suggests that it is is about "handling of unknown tags" in general.

have a look at MediaWiki Tag Extensions. treating unknown tags in the parser as text is fail. we rather should parse them into a special node class and

  • let the writers decide how to treat them
  • give users the opportunity to write plugins that inject a correct interpretation of those tags into the parse tree

btw, are we aware of this markup:

    "{{#tag:code | {{{1}}} }}" 

http://www.mediawiki.org/wiki/Manual:Tag_extensions#Extensions_and_Templates

in reply to: ↑ 4   Changed 6 months ago by ralf

  • owner ralf deleted
  • status changed from new to assigned

Replying to heiko:

Replying to ralf:

this bug is about unknown closing tags, which are not returned as text (this is how interpret this.) They are skipped. Why should this be critical?

the title of this ticket suggests that it is is about "handling of unknown tags" in general.

then the description sucks.

have a look at MediaWiki Tag Extensions. treating unknown tags in the parser as text is fail.

no.

we rather should parse them into a special node class and

no. writing <nosuchtag> in mediawiki results in <nosuchtag> as text in the output.

* let the writers decide how to treat them

the parser should know how to treat them

* give users the opportunity to write plugins that inject a correct interpretation of those tags into the parse tree

plugins probably make sense. but who writes them? I'd rather not spent my time on writing a plugin system as long as we are the only ones using it.

btw, are we aware of this markup: {{{ "{{#tag:code | 1 }}" }}} http://www.mediawiki.org/wiki/Manual:Tag_extensions#Extensions_and_Templates

In [2]: expander.expandstr("{{#tag:code | 1 }}" ) EXPAND: '{{#tag:code | 1 }}' -> u'<code></code>'

can someone please fix the description of this issue or open new ones? Is the problem that we don't handle <idl>, that </ idl> is skipped or something else? I'm unassigning it...

  Changed 7 weeks ago by ralf

  • status changed from assigned to closed
  • resolution set to invalid

unknown tag nodes just don't make sense. #212 is about the unknown nodes being dropped from the parse tree.

Note: See TracTickets for help on using tickets.