A simple, purely python, WikiText parsing tool.
Project description
A simple, purely python, WikiText parsing tool.
The purpose is to allow users easily extract and/or manipulate templates, template parameters, parser functions, tables, external links, wikilinks, etc. in wikitexts.
Installation
Use pip install wikitextparser
Usage
Here is a short demo of some of the functionalities:
>>> import wikitextparser as wtp
WikiTextParser can detect sections, parserfunctions, templates, wikilinks, external links, arguments, tables, and HTML comments in your wikitext:
>>> wt = wtp.parse("""
== h2 ==
t2
=== h3 ===
t3
== h22 ==
t22
{{text|value1{{text|value2}}}}
[[A|B]]""")
>>>
>>> wt.templates
[Template('{{text|value2}}'), Template('{{text|value1{{text|value2}}}}')]
>>> wt.templates[1].arguments
[Argument("|value1{{text|value2}}")]
>>> wt.templates[1].arguments[0].value = 'value3'
>>> print(wt)
== h2 ==
t2
=== h3 ===
t3
== h22 ==
t22
{{text|value3}}
[[A|B]]
It provides easy-to-use properties so you can get or set names or values of templates, arguments, wikilinks, etc.:
>>> wt.wikilinks
[WikiLink("[[A|B]]")]
>>> wt.wikilinks[0].target = 'Z'
>>> wt.wikilinks[0].text = 'X'
>>> wt.wikilinks[0]
WikiLink('[[Z|X]]')
>>>
>>> from pprint import pprint
>>> pprint(wt.sections)
[Section('\n'),
Section('== h2 ==\nt2\n\n=== h3 ===\nt3\n\n'),
Section('=== h3 ===\nt3\n\n'),
Section('== h22 ==\nt22\n\n{{text|value3}}\n\n[[Z|X]]')]
>>>
>>> wt.sections[1].title = 'newtitle'
>>> print(wt)
==newtitle==
t2
=== h3 ===
t3
== h22 ==
t22
{{text|value3}}
[[Z|X]]
There is a pprint function that pretty-prints templates:
>>> p = wtp.parse('{{t1 |b=b|c=c| d={{t2|e=e|f=f}} }}')
>>> t2, t1 = p.templates
>>> print(t2.pprint())
{{t2
|e=e
|f=f
}}
>>> print(t1.pprint())
{{t1
|b=b
|c=c
|d={{t2
|e=e
|f=f
}}
}}
If you are dealing with [[Category:Pages using duplicate arguments in template calls]] there are two functions that may be helpful:
>>> t = wtp.Template('{{t|a=a|a=b|a=a}}')
>>> t.rm_dup_args_safe()
>>> t
Template('{{t|a=b|a=a}}')
>>> t = wtp.Template('{{t|a=a|a=b|a=a}}')
>>> t.rm_first_of_dup_args()
>>> t
Template('{{t|a=a}}')
Extracting cell values of a table is easy:
>>> p = wtp.parse("""{|
| Orange || Apple || more
|-
| Bread || Pie || more
|-
| Butter || Ice cream || and more
|}""")
>>> pprint(p.tables[0].getdata)
[['Orange', 'Apple', 'more'],
['Bread', 'Pie', 'more'],
['Butter', 'Ice cream', 'and more']]
And values are rearranged according to colspan and rowspan attributes (by default):
>>> t = wtp.Table("""{| class="wikitable sortable"
|-
! a !! b !! c
|-
!colspan = "2" | d || e
|-
|}""")
>>> t.getdata(span=True)
[['a', 'b', 'c'], ['d', 'd', 'e']]
Have a look at the test modules for more details and probable pitfalls.
See also:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wikitextparser-0.6.3.zip.
File metadata
- Download URL: wikitextparser-0.6.3.zip
- Upload date:
- Size: 40.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
777082cf6604f5634766d85fcc63abe7d5974f933a7d22171c43412839ee1f26
|
|
| MD5 |
60c5a766ee719409133b595d47cc2886
|
|
| BLAKE2b-256 |
f49efbcd172b1311be417c1cc3ae11013b96d2bf663f19a8dd808104f05982e8
|
File details
Details for the file wikitextparser-0.6.3.win32.exe.
File metadata
- Download URL: wikitextparser-0.6.3.win32.exe
- Upload date:
- Size: 177.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e6b911f1520e86574cb5dd0d958764f970f80fe9dc3613eab08433c8993df56
|
|
| MD5 |
71b52a27d0a6ca22c1dbebf8dfd5cbd1
|
|
| BLAKE2b-256 |
61ad54c003b407f844e4f9c94cd4ffe792a16966ad0f55fc40e4f9b095693ca0
|