word counters in pdf and docx and convert to price - jspdf

the website has a feature that when someone uploads a pdf document it will count how many words it has and give a price. I want this feature to now include Docx files and to allow for multiple documents. how do i set this up?

Related

Can I combine documents while both keep their own format?

For my MSc. I'm writing up a manuscript using the publisher's templates. This document also has to be included in the thesis report. I'd like to insert the manuscript in the report without the format of the manuscript changing. How can I do this?
I've seen some examples of the standalone package but they all talk about including document B into document A while both keep the formatting of A. What I'm looking for is essentially a workaround to using two document classes.

What is the best way to Parse a scanned PDF file using PHP or JS?

I have a translation website and I would like to parse PDF files so that I can count words and I set the price for translation.
I have tried Poppler JS before. But It can't handle the scanned files. How should I handle them?
For example this PDF is a scanned article. It is a PDF file but each page is a picture and I need to extract the text:
What you are looking for is an OCR library. There are a bunch of options to do this, here are some Software Recommendation Stack Exchange links:
Scan Text Document To PDF With OCR
JavaScript library for OCR

Open standard for magazine?

Are there any standard for weekly digital magazines? That understand meta data of a magazine? Eg. issue number, date, author, editorial columns, series etc., so that content is searchable and presentable in a better intuitive GUI?
First of all, you need to clarify your question. The first part seems to refer to a file format for digital magazines, the second part to a file format for the metadata associated with a digital publication.
With respect to the metadata: there are several standards used in the industry. For examples:
ONIX: http://www.editeur.org/8/ONIX/
Dublin Core: http://dublincore.org/
MARC: http://www.loc.gov/marc/
With respect to the file format for the publication itself, some formats that might be suitable for digital magazines:
PDF
EPUB (3) supports Fixed Layout, and the IDPF has a working group focussing on comics and digital magazine issues (Advanced/Hybrid Layout, or EPUB 3 AHL WG)
Amazon Kindle has its own KF8 format
HTML5 allows you to deliver pre-paginated contents by setting the viewport dimensions

How do I get genre metadata from video files of arbitrary formats?

I want to categorise the video files that a user loads based on the genre stored in the file metadata. I know this is true for MP3 files, and the format of this data, and location at the eof is well documented,
Im looking for information on how video file metadata is formatted and where it is stored in the file(eg. how many bytes at the eof are dedicated to metadata). While I appreciate that different file formats will have different formatting methods they use to store the information, I'm trying to figure out if there is a known format for certain video file formats, or a basic model that can be applied to most file formats.
You would have to go threw all the video formats and get them like this.. http://www.fastgraph.com/help/avi_header_format.html
or an easier way is to use the libary all ready created, http://mediaarea.net/en/MediaInfo

What file types does MarkLogic support?

I'm a student and I want to find a search engine for big data. I found MarkLogic Server but I don't know what file types it supports. Does it support doc, docx, pdf, xml, ppt, pptx, etc.? What other types are supported?
At low level, MarkLogic supports storage of xml, plain text, and binary. XML is fully searchable, including range indexes for faceted search. Text is only full-text searchable. Binary is not searchable as is, but there are facilities to extract meta information, and text out of many binary formats. You can find more details about the latter in the online documentation:
http://docs.marklogic.com/guide/search-dev/binary-document-metadata#chapter
There is a sample application that shows this functionality:
http://developer.marklogic.com/code/document-discovery
HTH!

Resources