A simple, purely python, WikiText parsing tool.
Project description
A simple, purely python, WikiText parsing tool.
The purpose is to allow users easily extract and/or manipulate templates, template parameters, parser functions, tables, external links, wikilinks, etc. in wikitexts.
Installation
Use pip install wikitextparser
Usage
Here is a short demo of some of the functionalities:
>>> import wikitextparser as wtp
>>> # wikitextparser can detect sections, parserfunctions, templates,
>>> # wikilinks, external links, arguments, and HTML comments in
>>> # your wikitext:
>>> wt = wtp.parse("""
== h2 ==
t2
=== h3 ===
t3
== h22 ==
t22
{{text|value1{{text|value2}}}}
[[A|B]]""")
>>>
>>> wt.templates
[Template('{{text|value2}}'), Template('{{text|value1{{text|value2}}}}')]
>>> wt.templates[1].arguments
[Argument("|value1{{text|value2}}")]
>>> wt.templates[1].arguments[0].value = 'value3'
>>> print(wt)
== h2 ==
t2
=== h3 ===
t3
== h22 ==
t22
{{text|value3}}
[[A|B]]
>>> # It provides easy-to-use properties so you can get or set
>>> # name or value of templates, arguments, wikilinks, etc.
>>> wt.wikilinks
[WikiLink("[[A|B]]")]
>>> wt.wikilinks[0].target = 'Z'
>>> wt.wikilinks[0].text = 'X'
>>> wt.wikilinks[0]
WikiLink('[[Z|X]]')
>>>
>>> from pprint import pprint
>>> pprint(wt.sections)
[Section('\n'),
Section('== h2 ==\nt2\n\n=== h3 ===\nt3\n\n'),
Section('=== h3 ===\nt3\n\n'),
Section('== h22 ==\nt22\n\n{{text|value3}}\n\n[[Z|X]]')]
>>>
>>> wt.sections[1].title = 'newtitle'
>>> print(wt)
==newtitle==
t2
=== h3 ===
t3
== h22 ==
t22
{{text|value3}}
[[Z|X]]
>>> # There is a pprint function that pretty-prints templates.
>>> p = wtp.parse('{{t1 |b=b|c=c| d={{t2|e=e|f=f}} }}')
>>> t2, t1 = p.templates
>>> print(t2.pprint())
{{t2
|e=e
|f=f
}}
>>> print(t1.pprint())
{{t1
|b=b
|c=c
|d={{t2
|e=e
|f=f
}}
}}
>>> # If you are dealing with
>>> # [[Category:Pages using duplicate arguments in template calls]],
>>> # there are two functions that may be helpful:
>>> t = wtp.Template('{{t|a=a|a=b|a=a}}')
>>> t.rm_dup_args_safe()
>>> t
Template('{{t|a=b|a=a}}')
>>> t = wtp.Template('{{t|a=a|a=b|a=a}}')
>>> t.rm_first_of_dup_args()
>>> t
Template('{{t|a=a}}')
>>> # Extract cell values of a table
>>> p = wtp.parse("""{|
| Orange || Apple || more
|-
| Bread || Pie || more
|-
| Butter || Ice cream || and more
|}""")
>>> pprint(p.tables[0].getdata)
[['Orange', 'Apple', 'more'],
['Bread', 'Pie', 'more'],
['Butter', 'Ice cream', 'and more']]
>>> # It can even rearrage cells according to cellspan and colspan values.
>>> t = wtp.Table("""{| class="wikitable sortable"
|-
! a !! b !! c
|-
!colspan = "2" | d || e
|-
|}""")
>>> t.getdata(span=True)
[['a', 'b', 'c'], ['d', 'd', 'e']]
>>> # Have a look at test modules for more details and probable pitfalls.
>>>
See also:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wikitextparser-0.5.8.zip.
File metadata
- Download URL: wikitextparser-0.5.8.zip
- Upload date:
- Size: 38.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dff8e60ff016458214bfd97cf0a6dad5b00593f31ca5d06090a929e88dc7f6af
|
|
| MD5 |
e68128e6020548603f154ff3dbdff19e
|
|
| BLAKE2b-256 |
d3484b34d6259572f8d2dc313e06c19886810cd147026a9c7975a579d7b1ca91
|
File details
Details for the file wikitextparser-0.5.8.win32.exe.
File metadata
- Download URL: wikitextparser-0.5.8.win32.exe
- Upload date:
- Size: 175.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7ace3559a290adbe52300fba270ca97456c704005a6c23c1cf2271982cee7c37
|
|
| MD5 |
681c74590aac94ae5f1edc82271a402c
|
|
| BLAKE2b-256 |
eec8747fe951d941f71e3b2083a84c3f2b4e90c9d8ecac3381cb73ead0e68ae2
|