Skip to main content

Scrape data from SEC's EDGAR

Project description

EDGAR

A small library to access files from SEC's edgar.

Installation

pip install edgar

Example

To get a company's latest 5 10-Ks, run

from edgar import Company
company = Company("Oracle Corp", "0001341439")
tree = company.get_all_filings(filing_type = "10-K")
docs = Company.get_documents(tree, no_of_documents=5)

or

from edgar import Company, TXTML

company = Company("INTERNATIONAL BUSINESS MACHINES CORP", "0000051143")
doc = company.get_10K()
text = TXTML.parse_full_10K(doc)

To get all companies and find a specific one, run

from edgar import Edgar
edgar = Edgar()
possible_companies = edgar.find_company_name("Cisco System")

To get XBRL data, run

from edgar import Company, XBRL, XBRLElement

company = Company("Oracle Corp", "0001341439")
results = company.get_data_files_from_10K("EX-101.INS", isxml=True)
xbrl = XBRL(results[0])
XBRLElement(xbrl.relevant_children_parsed[15]).to_dict() // returns a dictionary of name, value, and schemaRef

API

Company

The Company class has two fields:

  • name (company name)
  • cik (company CIK number)
  • timeout (optional) (default: 10)

Methods

get_filings_url

get_filings_url(self, filing_type="", prior_to="", ownership="include", no_of_entries=100) -> str

Returns a url to fetch filings data

  • filing_type: The type of document you want. i.e. 10-K, S-8, 8-K. If not specified, it'll return all documents
  • prior_to: Time prior which documents are to be retrieved. If not specified, it'll return all documents
  • ownership: defaults to include. Options are include, exclude, only.
  • no_of_entries: defaults to 100. Returns the number of entries to be returned. Maximum is 100.
get_all_filings

get_all_filings(self, filing_type="", prior_to="", ownership="include", no_of_entries=100) -> lxml.html.HtmlElement

Returns the HTML in the form of lxml.html

  • filing_type: The type of document you want. i.e. 10-K, S-8, 8-K. If not specified, it'll return all documents
  • prior_to: Time prior which documents are to be retrieved. If not specified, it'll return all documents
  • ownership: defaults to include. Options are include, exclude, only.
  • no_of_entries: defaults to 100. Returns the number of entries to be returned. Maximum is 100.
get_10Ks

get_10Ks(self, no_of_documents=1) -> List[lxml.html.HtmlElement]

Returns the HTML in the form of lxml.html of concatenation of all the documents in the 10-K

  • no_of_documents (default: 1): numer of documents to be retrieved
get_document_type_from_10K

get_document_type_from_10K(self, document_type, no_of_documents=1) -> List[lxml.html.HtmlElement]

Returns the HTML in the form of lxml.html of the document within 10-K

  • document_type: Tye type of document you want, i.e. 10-K, EX-3.2
  • no_of_documents (default: 1): numer of documents to be retrieved
get_data_files_from_10K

get_data_files_from_10K(self, document_type, no_of_documents=1, isxml=False) -> List[lxml.html.HtmlElement]

Returns the HTML in the form of lxml.html of the data file within 10-K

  • document_type: Tye type of document you want, i.e. EX-101.INS
  • no_of_documents (default: 1): numer of documents to be retrieved
  • isxml (default: False): by default, things aren't case sensitive and is parsed with html in lxml. If this is True, then it is parsed with etree` which is case sensitive

Class Method

  • get_documents(cls, tree, no_of_documents=1, debug=False) -> List Returns a list of strings, each string contains the body of the specified document from input
    • tree: lxml.html form that is returned from Company.getAllFilings
    • no_of_documents: number of document returned. If it is 1, the returned result is just one string, instead of a list of strings. Defaults to 1.
    • debug (default: False): if True, displays the URL and form

Edgar

Gets all companies from EDGAR

get_cik_by_company_name(company_name: str) -> str: Returns the CIK if given the exact name or the company

get_company_name_by_cik(cik: str) -> str: Returns the company name if given the CIK (with the 000s)

find_company_name(words: str) -> List[str]: Returns a list of company names by exact word matching

match_company_by_company_name(self, name, top=5) -> List[Dict[str, Any]]: Returns a list of dictionarys, with company names, CIK, and their fuzzy match score

  • top (default: 5) returns the top number of fuzzy matches. If set to None, it'll return the whole list (which is a lot)

XBRL

Parses data from XBRL

  • relevant_children
    • get children that are not context
  • relevant_children_parsed
    • get children that are not context, unit, schemaRef
    • cleans tags

Contribution

Buy Me A Coffee

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edgar-5.3.8.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

edgar-5.3.8-py3-none-any.whl (20.7 kB view details)

Uploaded Python 3

File details

Details for the file edgar-5.3.8.tar.gz.

File metadata

  • Download URL: edgar-5.3.8.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.6.9

File hashes

Hashes for edgar-5.3.8.tar.gz
Algorithm Hash digest
SHA256 46ca861f3423e81991da0d9770078d2171003c06a0955e55b6e543a9a9f75b00
MD5 8702e51820b2e2099085d03c8429bb50
BLAKE2b-256 514a9869fc3c185008be534c8d140588f6d4e9c8f69e3dbfd5934b7162daa751

See more details on using hashes here.

File details

Details for the file edgar-5.3.8-py3-none-any.whl.

File metadata

  • Download URL: edgar-5.3.8-py3-none-any.whl
  • Upload date:
  • Size: 20.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.6.9

File hashes

Hashes for edgar-5.3.8-py3-none-any.whl
Algorithm Hash digest
SHA256 f3ca814253bfff3061e2c5ffb7d1296d0064b04230a9af53f2c0cf5b5b5f6ef6
MD5 d118b47d03ddceceae9c3d4e9a8b4ad2
BLAKE2b-256 9230158d50df1081c0dd59f3c0563ae7f2a2c61ae1b924b0bbef43c8b8c96fd5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page