Skip to main content

Simple mapping view to docx (Word Doc) elements

Project description

msword

Simple mapping view to docx (Word Doc) elements

To install: pip install msword

Examples

LocalDocxTextStore

Local files store returning, as values, text extracted from the documents. Use this when you just want the text contents of the document. If you want more, you'll need to user LocalDocxStore with the appropriate content extractor (i.e. the obj_of_data function in a py2store.wrap_kvs wrapper).

Note: Filters for valid msword extensions (.doc and .docx). To NOT filter for valid extensions, use AllLocalFilesDocxTextStore instead.

>>> from msword import LocalDocxTextStore, test_data_dir
>>> import docx
>>> s = LocalDocxTextStore(test_data_dir)
>>> assert {'more_involved.docx', 'simple.docx'}.issubset(s)
>>> v = s['simple.docx']
>>> assert isinstance(v, str)
>>> print(v)
Just a bit of text to show that is works. Another sentence.
This is after a newline.
<BLANKLINE>
This is after two newlines.

LocalDocxStore

Local files store returning, as values, docx objects. Note: Filters for valid msword extensions (.doc and .docx). To Note filter for valid extensions, use AllLocalFilesDocxStore instead.

>>> from msword import LocalDocxStore, test_data_dir
>>> import docx
>>> s = LocalDocxStore(test_data_dir)
>>> assert {'more_involved.docx', 'simple.docx'}.issubset(s)
>>> v = s['more_involved.docx']
>>> assert isinstance(v, docx.document.Document)

What does a docx.document.Document have to offer? If you really want to get into it, see here: https://python-docx.readthedocs.io/en/latest/

Meanwhile, we'll give a few examples here as an amuse-bouche.

>>> ddir = lambda x: set([xx for xx in dir(x) if not xx.startswith('_')])  # to see what an object has
>>> assert ddir(v).issuperset({
...     'add_heading', 'add_page_break', 'add_paragraph', 'add_picture', 'add_section', 'add_table',
...     'core_properties', 'element', 'inline_shapes', 'paragraphs', 'part',
...     'save', 'sections', 'settings', 'styles', 'tables'
... })

paragraphs is where the main content is, so let's have a look at what it has.

>>> len(v.paragraphs)
21
>>> paragraph = v.paragraphs[0]
>>> assert ddir(paragraph).issuperset({
...     'add_run', 'alignment', 'clear', 'insert_paragraph_before',
...     'paragraph_format', 'part', 'runs', 'style', 'text'
... })
>>> paragraph.text
'Section 1'
>>> assert ddir(paragraph.style).issuperset({
...     'base_style', 'builtin', 'delete', 'element', 'font', 'hidden', 'locked', 'name', 'next_paragraph_style',
...     'paragraph_format', 'part', 'priority', 'quick_style', 'style_id', 'type', 'unhide_when_used'
... })
>>> paragraph.style.style_id
'Heading1'
>>> paragraph.style.font.color.rgb
RGBColor(0x2f, 0x54, 0x96)

You get the point...

If you're only interested in one particular aspect of the documents, you should your favorite py2store wrappers to get the store you really want. For example:

>>> from py2store import wrap_kvs
>>> ss = wrap_kvs(s, obj_of_data=lambda doc: [paragraph.style.style_id for paragraph in doc.paragraphs])
>>> assert ss['more_involved.docx'] == [
...     'Heading1', 'Normal', 'Normal', 'Heading2', 'Normal', 'Normal',
...     'Heading1', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal',
...     'ListParagraph', 'ListParagraph', 'Normal', 'Normal', 'ListParagraph', 'ListParagraph', 'Normal'
... ]

The most common use case is probably getting text, not styles, out of a document. It's so common, that we've done the wrapping for you: Just use the already wrapped LocalDocxTextStore store for that purpose.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

msword-0.0.3.tar.gz (4.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

msword-0.0.3-py3-none-any.whl (9.6 kB view details)

Uploaded Python 3

File details

Details for the file msword-0.0.3.tar.gz.

File metadata

  • Download URL: msword-0.0.3.tar.gz
  • Upload date:
  • Size: 4.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6

File hashes

Hashes for msword-0.0.3.tar.gz
Algorithm Hash digest
SHA256 d5e3649f53909627c04ddf2488983e5f067fb988d5b61afac817e805bff40653
MD5 bac5ab2932947777d34de841eeb864a0
BLAKE2b-256 cafe12795701d4ec92b1b4aecd44cdec33f877e10442e393cd0227520d0d67e3

See more details on using hashes here.

File details

Details for the file msword-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: msword-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 9.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6

File hashes

Hashes for msword-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 1a07d3b56320e6b3e4365f592a5664ba92c818368ca48ec578e15120665908f9
MD5 f7899eab6777e62faccf5d7db304da99
BLAKE2b-256 8c1b804123096df87af496d6e8c30553252a229104b7e6f8e565be318c4611ac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page