axa-group / Parsr
- четверг, 16 января 2020 г. в 00:18:57
TypeScript
Transforms PDF, Documents and Images into Enriched Structured Data
Parsr, is a minimal-footprint document (image, pdf) cleaning, parsing and extraction toolchain which generates readily available, organized and usable data for data scientists and developers.
It provides users with clean structured and label-enriched information set for ready-to-use applications ranging from data entry and document analysis automation, archival, and many others.
Currently, Parsr can perform:
Parsr takes as input an image (.JPG, .PNG, .TIFF, ...) or a PDF generates the following output formats:
-- The advanced installation guide is available here --
The quickest way to install and run the Parsr API is through the docker image:
docker pull axarev/parsrIf you also wish to install the GUI for sending documents and visualising results:
docker pull axarev/parsr-ui-localhostNote: Parsr can also be installed bare-metal (not via Docker containers), the procedure for which is documented in the installation guide.
-- The advanced usage guide is available here --
To run the API, issue:
docker run -p 3001:3001 axarev/parsrwhich will launch it on http://localhost:3001.
Consult the documentation on the usage of the API.
docker run -t -p 8080:80 axarev/parsr-ui-localhost:latestRefer to the Configuration documentation to interpret the configurable options in the GUI viewer.
The API based usage and the command line usage are documented in the advanced usage guide.
All documentation files can be found here.
Please refer to the contribution guidelines.
Third Party Libraries licenses for its dependencies:
Copyright 2019 AXA Group Operations S.A.
Licensed under the Apache 2.0 license (see the LICENSE file).