Skip to main content

A simple, purely python, WikiText parsing tool.

Project description

A simple, purely python, WikiText parsing tool.

The purpose is to allow users easily extract and/or manipulate templates, template parameters, parser functions, tables, external links, wikilinks, etc. in wikitexts.

Installation

Use pip install wikitextparser

Usage

Here is a short demo of some of the functionalities:

>>> import wikitextparser as wtp

WikiTextParser can detect sections, parserfunctions, templates, wikilinks, external links, arguments, tables, and HTML comments in your wikitext:

>>> wt = wtp.parse("""
== h2 ==
t2

=== h3 ===
t3

== h22 ==
t22

{{text|value1{{text|value2}}}}

[[A|B]]""")
>>>
>>> wt.templates
[Template('{{text|value2}}'), Template('{{text|value1{{text|value2}}}}')]
>>> wt.templates[1].arguments
[Argument("|value1{{text|value2}}")]
>>> wt.templates[1].arguments[0].value = 'value3'
>>> print(wt)

== h2 ==
t2

=== h3 ===
t3

== h22 ==
t22

{{text|value3}}

[[A|B]]

It provides easy-to-use properties so you can get or set names or values of templates, arguments, wikilinks, etc.:

>>> wt.wikilinks
[WikiLink("[[A|B]]")]
>>> wt.wikilinks[0].target = 'Z'
>>> wt.wikilinks[0].text = 'X'
>>> wt.wikilinks[0]
WikiLink('[[Z|X]]')
>>>
>>> from pprint import pprint
>>> pprint(wt.sections)
[Section('\n'),
 Section('== h2 ==\nt2\n\n=== h3 ===\nt3\n\n'),
 Section('=== h3 ===\nt3\n\n'),
 Section('== h22 ==\nt22\n\n{{text|value3}}\n\n[[Z|X]]')]
>>>
>>> wt.sections[1].title = 'newtitle'
>>> print(wt)

==newtitle==
t2

=== h3 ===
t3

== h22 ==
t22

{{text|value3}}

[[Z|X]]

There is a pprint function that pretty-prints templates:

>>> p = wtp.parse('{{t1 |b=b|c=c| d={{t2|e=e|f=f}} }}')
>>> t2, t1 = p.templates
>>> print(t2.pprint())
{{t2
    |e=e
    |f=f
}}
>>> print(t1.pprint())
{{t1
    |b=b
    |c=c
    |d={{t2
        |e=e
        |f=f
    }}
}}

If you are dealing with [[Category:Pages using duplicate arguments in template calls]] there are two functions that may be helpful:

>>> t = wtp.Template('{{t|a=a|a=b|a=a}}')
>>> t.rm_dup_args_safe()
>>> t
Template('{{t|a=b|a=a}}')
>>> t = wtp.Template('{{t|a=a|a=b|a=a}}')
>>> t.rm_first_of_dup_args()
>>> t
Template('{{t|a=a}}')

Extracting cell values of a table is easy:

>>> p = wtp.parse("""{|
|  Orange    ||   Apple   ||   more
|-
|   Bread    ||   Pie     ||   more
|-
|   Butter   || Ice cream ||  and more
|}""")
>>> pprint(p.tables[0].getdata)
[['Orange', 'Apple', 'more'],
 ['Bread', 'Pie', 'more'],
 ['Butter', 'Ice cream', 'and more']]

And values are rearranged according to colspan and rowspan attributes (by default):

>>> t = wtp.Table("""{| class="wikitable sortable"
|-
! a !! b !! c
|-
!colspan = "2" | d || e
|-
|}""")
>>> t.getdata(span=True)
[['a', 'b', 'c'], ['d', 'd', 'e']]

Have a look at the test modules for more details and probable pitfalls.

See also:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikitextparser-0.6.3.zip (40.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wikitextparser-0.6.3.win32.exe (177.7 kB view details)

Uploaded Source

File details

Details for the file wikitextparser-0.6.3.zip.

File metadata

  • Download URL: wikitextparser-0.6.3.zip
  • Upload date:
  • Size: 40.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for wikitextparser-0.6.3.zip
Algorithm Hash digest
SHA256 777082cf6604f5634766d85fcc63abe7d5974f933a7d22171c43412839ee1f26
MD5 60c5a766ee719409133b595d47cc2886
BLAKE2b-256 f49efbcd172b1311be417c1cc3ae11013b96d2bf663f19a8dd808104f05982e8

See more details on using hashes here.

File details

Details for the file wikitextparser-0.6.3.win32.exe.

File metadata

File hashes

Hashes for wikitextparser-0.6.3.win32.exe
Algorithm Hash digest
SHA256 4e6b911f1520e86574cb5dd0d958764f970f80fe9dc3613eab08433c8993df56
MD5 71b52a27d0a6ca22c1dbebf8dfd5cbd1
BLAKE2b-256 61ad54c003b407f844e4f9c94cd4ffe792a16966ad0f55fc40e4f9b095693ca0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page