Simple mapping view to docx (Word Doc) elements
Project description
msword
Simple mapping view to docx (Word Doc) elements
To install: pip install msword
Examples
LocalDocxTextStore
Local files store returning, as values, text extracted from the documents. Use this when you just want the text contents of the document. If you want more, you'll need to user LocalDocxStore with the appropriate content extractor (i.e. the obj_of_data function in a py2store.wrap_kvs wrapper).
Note: Filters for valid msword extensions (.doc and .docx).
To NOT filter for valid extensions, use AllLocalFilesDocxTextStore instead.
>>> from msword import LocalDocxTextStore, test_data_dir
>>> import docx
>>> s = LocalDocxTextStore(test_data_dir)
>>> assert {'more_involved.docx', 'simple.docx'}.issubset(s)
>>> v = s['simple.docx']
>>> assert isinstance(v, str)
>>> print(v)
Just a bit of text to show that is works. Another sentence.
This is after a newline.
<BLANKLINE>
This is after two newlines.
LocalDocxStore
Local files store returning, as values, docx objects.
Note: Filters for valid msword extensions (.doc and .docx).
To Note filter for valid extensions, use AllLocalFilesDocxStore instead.
>>> from msword import LocalDocxStore, test_data_dir
>>> import docx
>>> s = LocalDocxStore(test_data_dir)
>>> assert {'more_involved.docx', 'simple.docx'}.issubset(s)
>>> v = s['more_involved.docx']
>>> assert isinstance(v, docx.document.Document)
What does a docx.document.Document have to offer?
If you really want to get into it, see here: https://python-docx.readthedocs.io/en/latest/
Meanwhile, we'll give a few examples here as an amuse-bouche.
>>> ddir = lambda x: set([xx for xx in dir(x) if not xx.startswith('_')]) # to see what an object has
>>> assert ddir(v).issuperset({
... 'add_heading', 'add_page_break', 'add_paragraph', 'add_picture', 'add_section', 'add_table',
... 'core_properties', 'element', 'inline_shapes', 'paragraphs', 'part',
... 'save', 'sections', 'settings', 'styles', 'tables'
... })
paragraphs is where the main content is, so let's have a look at what it has.
>>> len(v.paragraphs)
21
>>> paragraph = v.paragraphs[0]
>>> assert ddir(paragraph).issuperset({
... 'add_run', 'alignment', 'clear', 'insert_paragraph_before',
... 'paragraph_format', 'part', 'runs', 'style', 'text'
... })
>>> paragraph.text
'Section 1'
>>> assert ddir(paragraph.style).issuperset({
... 'base_style', 'builtin', 'delete', 'element', 'font', 'hidden', 'locked', 'name', 'next_paragraph_style',
... 'paragraph_format', 'part', 'priority', 'quick_style', 'style_id', 'type', 'unhide_when_used'
... })
>>> paragraph.style.style_id
'Heading1'
>>> paragraph.style.font.color.rgb
RGBColor(0x2f, 0x54, 0x96)
You get the point...
If you're only interested in one particular aspect of the documents, you should your favorite py2store wrappers to get the store you really want. For example:
>>> from py2store import wrap_kvs
>>> ss = wrap_kvs(s, obj_of_data=lambda doc: [paragraph.style.style_id for paragraph in doc.paragraphs])
>>> assert ss['more_involved.docx'] == [
... 'Heading1', 'Normal', 'Normal', 'Heading2', 'Normal', 'Normal',
... 'Heading1', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal',
... 'ListParagraph', 'ListParagraph', 'Normal', 'Normal', 'ListParagraph', 'ListParagraph', 'Normal'
... ]
The most common use case is probably getting text, not styles, out of a document. It's so common, that we've done the wrapping for you: Just use the already wrapped LocalDocxTextStore store for that purpose.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file msword-0.0.3.tar.gz.
File metadata
- Download URL: msword-0.0.3.tar.gz
- Upload date:
- Size: 4.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d5e3649f53909627c04ddf2488983e5f067fb988d5b61afac817e805bff40653
|
|
| MD5 |
bac5ab2932947777d34de841eeb864a0
|
|
| BLAKE2b-256 |
cafe12795701d4ec92b1b4aecd44cdec33f877e10442e393cd0227520d0d67e3
|
File details
Details for the file msword-0.0.3-py3-none-any.whl.
File metadata
- Download URL: msword-0.0.3-py3-none-any.whl
- Upload date:
- Size: 9.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a07d3b56320e6b3e4365f592a5664ba92c818368ca48ec578e15120665908f9
|
|
| MD5 |
f7899eab6777e62faccf5d7db304da99
|
|
| BLAKE2b-256 |
8c1b804123096df87af496d6e8c30553252a229104b7e6f8e565be318c4611ac
|