Extending PyYAML with a custom constructor for including YAML files within YAML files
Project description
pyyaml-include
An extending constructor of PyYAML: include other YAML files into YAML document.
Install
pip install --pre "pyyaml-include>=2.0"
Since we are using fsspec to open including files from v2.0, an installation can be performed like below, if want to open remote files:
-
for files on website:
pip install --pre "pyyaml-include>=2.0" fsspec[http]
-
for files on S3:
pip install --pre "pyyaml-include>=2.0" fsspec[s3]
-
see fsspec's documentation for more
🔖 Tip
“pyyaml-include” itself depends on fsspec, so it will be installed no matter including local or remote files
Basic usages
Consider we have such YAML files:
├── 0.yml
└── include.d
├── 1.yml
└── 2.yml
-
1.yml
's content:name: "1"
-
2.yml
's content:name: "2"
To include 1.yml
, 2.yml
in 0.yml
, we shall:
-
Register a
YamlIncludeCtor
to PyYAML's loader class, with!inc
as it's tag:import yaml from yamlinclude import YamlIncludeCtor # add the tag yaml.add_constructor( tag="!inc", constructor=YamlIncludeCtor(base_dir='/your/conf/dir'), Loader=yaml.Loader )
-
Write
!inc
tags in0.yaml
:file1: !inc include.d/1.yml file2: !inc include.d/1.yml
-
Load it
with open('0.yml') as f: data = yaml.load(f, Loader=yaml.Loader) print(data)
we'll get:
{'file1':{'name':'1'},'file2':{'name':'2'}}
-
(optional) the constructor can be unregistered:
del yaml.Loader.yaml_constructors["!inc"]
Include in Mapping
If 0.yml
was:
file1: !inc include.d/1.yml
file2: !inc include.d/2.yml
We'll get:
file1:
name: "1"
file2:
name: "2"
Include in Sequence
If 0.yml
was:
files:
- !inc include.d/1.yml
- !inc include.d/2.yml
We'll get:
files:
- name: "1"
- name: "2"
Advanced usages
Wildcards
File name can contain shell-style wildcards. Data loaded from the file(s) found by wildcards will be set in a sequence.
That is, a list will be returned when including file name contains wildcards. Length of the returned list equals number of matched files:
If 0.yml
was:
files: !inc include.d/*.yml
We'll get:
files:
- name: "1"
- name: "2"
- when only 1 file matched, length of list will be 1
- when there are no files matched, an empty list will be returned
We support **
, ?
and [..]
. We do not support ^
for pattern negation.
The maxdepth
option is applied on the first **
found in the path.
❗ Important
- Using the
**
pattern in large directory trees or remote file system (S3, HTTP ...) may consume an inordinate amount of time.- There is no method like lazy-load or iteration, all data of found files returned to the YAML doc-tree are fully loaded in memory, large amount of memory may be needed if there were many or big files.
Work with fsspec
In v2.0
, we use fsspec to open including files, which makes it possible to include files from many different sources, such as local file system, S3, HTTP, SFTP ...
For example, we can include a file from website in YAML:
conf:
logging: !inc http://domain/etc/app/conf.d/logging.yml
In such situations, when creating a YamlIncludeCtor
constructor, a fsspec filesystem object shall be set to fs
argument.
For example, if want to include files from website, we shall:
-
create a
YamlIncludeCtor
with a fsspec HTTP filesystem object as it'sfs
:import yaml import fsspec from yamlinclude import YamlIncludeCtor http_fs = fsspec.filesystem("http", client_kwargs={"base_url": f"http://{HOST}:{PORT}"}) ctor = YamlIncludeCtor(http_fs, base_dir="/foo/baz") yaml.add_constructor("!inc", ctor, yaml.Loader)
-
then, write a YAML document to include files from
http://${HOST}:${PORT}
:key1: !inc doc1.yml # relative path to "base_dir" key2: !inc ./doc2.yml # relative path to "base_dir" also key3: !inc /doc3.yml # absolute path, "base_dir" does not affect key3: !inc ../doc4.yml # relative path one level upper to "base_dir"
-
load it with PyYAML:
yaml.load(yaml_string, yaml.Loader)
Above YAML snippet will be loaded like:
key1
: pared YAML ofhttp://${HOST}:${PORT}/foo/baz/doc1.yml
key2
: pared YAML ofhttp://${HOST}:${PORT}/foo/baz/doc2.yml
key3
: pared YAML ofhttp://${HOST}:${PORT}/doc3.yml
key4
: pared YAML ofhttp://${HOST}:${PORT}/foo/doc4.yml
🔖 Tip
Check fsspec's documentation for more
ℹ️ Note
Iffs
argument is omitted orNone
, a"file"
/"local"
fsspec filesystem object will be used automatically. That is to say:data: !inc: foo/baz.yamlis equivalent to (if no
base_dir
was set inYamlIncludeCtor()
):data: !inc: file://foo/baz.yamland
yaml.add_constructor("!inc", YamlIncludeCtor())is equivalent to:
yaml.add_constructor("!inc", YamlIncludeCtor(fs=fsspec.filesystem("file")))
Parameters in YAML
As a callable object, YamlIncludeCtor
passes YAML tag parameters to fsspec for more detailed operations.
The first argument is urlpath
, it's fixed and must-required, either positional or named.
Normally, we put it as a string after the tag(eg: !inc
), just like examples above.
However, there are more parameters.
-
in a mapping way, parameters will be passed to python as positional arguments, like
*args
in python function. eg:files: !inc [include.d/**/*.yaml, {maxdepth: 1}, {encoding: utf16}]
-
in a sequence way, parameters will be passed to python as named arguments, like
**kwargs
in python function. eg:files: !inc {urlpath: /foo/baz.yaml, encoding: utf16}
But the format of parameters has multiple cases, and differs variably in different fsspec implementation backends.
-
If a scheme/protocol(“
http://
”, “sftp://
”, “file://
”, etc.) is defined inurlpath
,YamlIncludeCtor
will invokefsspece.open
directly to open it. Which meansYamlIncludeCtor
'sfs
will be ignored, and a new standalonefs
will be created implicitly.In this situation,
urlpath
will be passed tofsspece.open
's first argument, and all other parameters will also be passed to the function.For example,
-
the YAML snippet
files: !inc [file:///foo/baz.yaml, r]
will cause python code like
with fsspec.open("file:///foo/baz.yaml", "r") as f: yaml.load(f, Loader)
-
and the YAML snippet
files: !inc {urlpath: file:///foo/baz.yaml, encoding: utf16}
will cause python code like
with fsspec.open("file:///foo/baz.yaml", encoding="utf16") as f: yaml.load(f, Loader)
🔖 Tip
urlpath
with scheme/protocol SHOULD NOT include wildcards character(s),urlpath
like"file:///etc/foo/*.yml"
is illegal. -
-
If
urlpath
has wildcards in it,YamlIncludeCtor
will:- invoke corresponding fsspec implementation backend's
glob
method to search files, - then call
open
method to open the found file(s).
urlpath
will be passed as the first argument to bothglob
andopen
method of the corresponding fsspec implementation backend, and other parameters will also be passed toglob
andopen
method as their following arguments.In the case of wildcards, what need to pay special attention to is that there are two separated parameters after
urlpath
, the first is forglob
method, and the second is foropen
method. Each of them could be either sequence, mapping or scalar, corresponds single, positional and named argument(s) in python. For example:-
If we want to include every
.yml
file in directoryetc/app
recursively with max depth at 2, and open them in utf-16 codec, we shall write the YAML as below:files: !inc ["etc/app/**/*.yml", {maxdepth: !!int "2"}, {encoding: utf16}]
it will cause python code like:
for file in local_fs.glob("etc/app/**/*.yml", maxdepth=2): with local_fs.open(file, encoding="utf16") as f: yaml.load(f, Loader)
-
Since
maxdepth
is the seconde argument afterpath
inglob
method, we can also write the YAML like this:files: !inc ["etc/app/**/*.yml", [!!int "2"]]
The parameters for
open
is omitted, means no more arguments excepturlpath
is passed.it will cause python code like:
for file in local_fs.glob("etc/app/**/*.yml", 2): with local_fs.open(file) as f: yaml.load(f, Loader)
-
The two parameters can be in a mapping form, and name of the keys are
"glob"
and"open"
. for example:files: !inc {urlpath: "etc/app/**/*.yml", glob: [!!int "2"], open: {encoding: utf16}}
❗ Important
PyYAML sometimes takes scalar parameter of custom constructor as string, we can use a ‘Standard YAML tag’ to ensure non-string data type in the situation.For example, following YAML snippet may cause an error:
files: !inc ["etc/app/**/*.yml", open: {intParam: 1}]
Because PyYAML treats
{"intParam": 1}
as{"intParam": "1"}
, which makes python code likefs.open(path, intParam="1")
. To prevent this, we shall write the YAML like:files: !inc ["etc/app/**/*.yml", open: {intParam: !!int 1}]
where
!!int
is a ‘Standard YAML tag’ to force integer type ofmaxdepth
argument.ℹ️ Note
BaseLoader
,SafeLoader
,CBaseLoader
,CSafeLoader
do NOT support ‘Standard YAML tag’.🔖 Tip
maxdepth
argument of fsspecglob
method is already force converted byYamlIncludeCtor
, no need to write a!!int
tag on it. - invoke corresponding fsspec implementation backend's
-
Else,
YamlIncludeCtor
will invoke corresponding fsspec implementation backend'sopen
method to open the file, parameters besideurlpath
will be passed to the method.
Absolute and Relative URL/Path
When the path after include tag (eg: !inc
) is not a full protocol/scheme URL and not starts with "/"
, YamlIncludeCtor
tries to join the path with base_dir
, which is a argument of YamlIncludeCtor.__init__()
.
If base_dir
is omitted or None
, the actually including file path is the path in defined in YAML without a change, and different fsspec filesystem will treat them differently. In local filesystem, it will be cwd
.
For remote filesystem, HTTP
for example, the base_dir
can not be None
and usually be set to "/"
.
Relative path does not support full protocol/scheme URL format, base_dir
does not effect for that.
For example, if we register such a YamlIncludeCtor
to PyYAML:
import yaml
import fsspec
from yamlinclude import YamlIncludeCtor
yaml.add_constructor(
"!http-include",
YamlIncludeCtor(
fsspec.filesystem("http", client_kwargs={"base_url": f"http://{HOST}:{PORT}"}),
base_dir="/sub_1/sub_1_1"
)
)
then, load following YAML:
xyz: !http-include xyz.yml
the actual URL to access is http://$HOST:$PORT/sub_1/sub_1_1/xyz.yml
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pyyaml_include-2.0a1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4a84c531270b79221c2348e1d0f49a0dd625db64f86ae5bcf21202ea03a2a314 |
|
MD5 | 04b43a28ff02f3e32fec5841913d1766 |
|
BLAKE2b-256 | 83172b18933c3357671ae879f8b6f50287f20417e85b1ef6fbaa8c970d7a1855 |