Library for rendering OOXML (.docx) file - preview

I am investigating the effort it would take to render a Microsoft Office XML file (.docx, in this case) to an image programmatically. For illustration, I want to achieve something similar to Apple's QuickLook preview for said file. Requirements:
Must be portable (specifically, it will not run on Windows, nor any other platform with Microsoft Office)
Needs to be headless and reasonably resource constrained (think VPS!).
Preferably a self-contained, well-maintained open-source solution :)
Text extraction would be nice (although another library could be used for that - I already have this)
A good online service could do it as a last resort, if I fail to find a good offline solution
Accuracy is good, but not the primary goal here.
My attempts to locate such a library have not been entirely successful. There are a few Java-based projects that seems to have sprung from OpenOffice, but they all either seem a bit heavyweight or have the wrong focus (i.e. text extraction, search, document generation).
To reiterate, I am looking to render the document (e.g. to a PNG). Speed and memory use is more important than features such as OLE images, equations, advanced formatting and whatnot.

Related

What is the difference between Sublime text and Github's Atom [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
Github announced Atom which is very similar to Sublime. Even some keyboard shortcuts like ⌘ + P, ⌘ + Shift + P etc. are same.
How is Atom different from Sublime?
Does it include IDE features like build tools, function definition jumps, documentations, etc.?
Has anyone using Sublime got a Beta invitation to point out the differences?
Can I use the themes, schemes and packages from Sublime as is, like Sublime could do with text mate.
1
PS: Open image in new tab for bigger resolution.
In addition to the points from prior answers, it's worth clarifying the differences between these two products from the perspective of choices made in their development.
Sublime is binary compiled for the platform. Its core is written in C/C++ and a number of its features are implemented in Python, which is also the language used for extending it. Atom is written in Node.js/Coffeescript and runs under webkit, with Coffeescript being the extension language. Though similar in UI and UX, Sublime performs significantly better than Atom especially in "heavy lifting" like working with large files, complex SnR or plugins that do heavy processing on files/buffers. Though I expect improvements in Atom as it matures, design & platform choices limit performance.
The "closed" part of Sublime includes the API and UI. Apart from skins/themes and colourisers, the API currently makes it difficult to modify other aspects of the UI. For example, Sublime plugins can't interact with the sidebar, control or draw on the editing area (except in some limited ways eg. in the gutter) or manipulate the statusbar beyond basic text. Atom's "closed" part is unknown at the moment, but I get the sense it's smaller. Atom has a richer API (though poorly documented at present) with the design goal of allowing greater control of its UI. Being closely coupled with webkit offers numerous capabilities for UI feature enhancements not presently possible with Sublime. However, Sublime's extensions perform closer to native, so those that perform compute-intensive, highly repetitive or complex text manipulations in large buffers are feasible in Sublime.
Since more of Atom will be open, Github open-sourced Atom on May 6th. As a result it's likely that support and pace of development will be rapid. By contrast, Sublime's development has slowed significantly of late - but it's not dead. In particular there are a number of bugs, many quite trivial, that haven't been fixed by the developer. None are showstopping imo, but if you want something in rapid development with regular bugfixing and enhancements, Sublime will frustrate. That said, installable Atom packages for Windows and Linux are yet to be released and activity on the codebase seems to have cooled in the weeks before and since the announcement, according to Github's stats.
In terms of IDE functions, from a webdev perspective Atom will allow extensions to the point of approaching products like Webstorm, though none have appeared yet. It remains to be seen how Atom will perform with such "heavy" extensions, since the editor natively feels sluggish. Due to restrictions in the API and lack of underlying webkit, Sublime won't allow this level of UI customisation although the developer may extend the API to support such features in future. Again, Sublime's underlying performance allows for things that involve computational grunt; ST3's symbol indexing being an example that performs well even with big projects. And though Atom's UI is certainly modelled upon Sublime, some refinements are noticeably missing, such as Sublime's learning panels and tab-complete popups which weight the defaults in accordance with those you most use.
I see these products as complementary. The fact that they share similar visuals and keystrokes just adds to the fact. There will be situations where the use of either has advantages. Presently, Sublime is a mature product with feature parity across all three platforms, and a rich set of plugins. Atom is the new kid whose features will rapidly grow; it doesn't feel production ready just yet and there are concerns in the area of performance.
[Update/Edit: May 18, 2015]
A note about improvements to these two editors since the time of writing the above.
In addition to bugfixes and improvements to its core, Atom has experienced a rapid growth in third-party extensions, with autocomplete-plus becoming part of the standard Atom distribution. Extension quality varies widely and a particular irritation is the frequency by which unstable third party packages can crash the editor. Within the last year, Atom has moved to using React by way of shifting reflow/repaint activity to the GPU for performance reasons, significantly improving the responsiveness of the UI for typical editing actions (scrolling, cursor movement etc.). While this has markedly improved the feel of the editor, it still feels cumbersome for CPU intensive tasks as described above, and is still slow in startup. Apart from performance improvements, Atom feels significantly more stable across the board.
Development of Sublime has picked up again since Jan 2015, with bugfixes, some minor new features (tooltip API, build system improvements) and a major development in the form of a new yaml-based .sublime-syntax definition (to eventually replace the old xml .tmLanguage). Together with a custom regex engine which replaces Onigurama, the new system offers more potential for precise regex matching, is significantly faster (up to 4x) and can perform multiple matches in parallel. Apart from colouring syntax, Sublime uses these components for symbol indexing (goto definition etc.) and other language-aware features. In addition to further speeding up Sublime, particularly for large files, this feature should open up the potential for performant language-specific features such as code-refactoring etc.. Further 'big developments' are promised, though the author remains, as ever, tight lipped about them.
Atom is written using Node.js, CoffeeScript and LESS. It's then wrapped in a WebKit wrapper, which was originally only available for OSX, although there is now also a Windows version available. (Linux version has to be built from source, but there is a PPA for Ubuntu users.)
A lot of the architecture and features have been duplicated from Sublime Text because they're tried and tested. The plugin system works almost the same, but opens up a lot of new features and potential by exposing new APIs too.
I believe that the shortcuts remain mostly the same due to muscle memory – people will remember them and be able to instantly click with Atom.
The preferences can be controlled with a GUI rather than by editing JSON directly, which might lower the entry barrier towards getting people started with Atom. I myself find it difficult to navigate them all since there is no search feature in Preferences.
You can signup for an invite on the ##atom-invites IRC channel or signup to their website and add your email. The first round of invites came quickly.
How is Atom different from Sublime?
Atom is an open source text editor/IDE, built on JavaScript/HTML/CSS.
Sublime Text is a commercial product, built on C/C++ and Python.
Comparable to Atom is Adobe Brackets, another open source text editor/IDE built on JavaScript/HTML/CSS. Be minded that this makes Brackets more oriented towards Web development, specially in the front end.
Advantages of open source projects are faster rate of development and, of course, price.
Does it include IDE features like build tools, function definition jumps, documentations, etc.?
The short answer is yes, yes, and yes. The app is completely modular. Open source will give people the freedom to fill the gaps on several of these features.
Has anyone using Sublime got a Beta invitation to point out the differences?
Advantages of Atom is entry-level hackability, since it's built on the same code that powers Web sites.
Advantages of Sublime Text is performance, as it doesn't need to run on top of Node.js, and it's a more mature product, about to reach a stable version 3.
There are a long list of minor differences that can be included in the comments (I wish this markdown could be able to draw a table for comparisons, but that's another issue).
Because of Atom's rapid turnout, I am afraid some of differences I list here will become outdated over time. Per example, at the time of this writing, Atom is only available on the Macintosh while Sublime Text is already multiplatform.
Can I use the themes, schemes and packages from Sublime as is, like Sublime could do with text mate.
The short answer is no, but because of Atom's hackability, it will be easy to retool packages from other editors to Atom.
Atom is open source (has been for a few hours by now), whereas Sublime Text is not.
Here are some differences between the two:
Atom is open source (MIT License)
A single user license for Sublime Text costs $70.
Atom is written in Node.js, CoffeeScript, HTML and LESS.
Sublime Text is written in C++, Python for plugins, and Objective-C for Cocoa integration
Atom has a built-in package manager*
Sublime Text depends on a third-party solution for package management
(Wbond Package Control)
At the time of writing this (05/20/2014), there are Atom binaries only for Mac OS X (10.8 or later). If you want to use it under Windows or Linux, you'll have to build it. Update: Nowadays, there are Atom binaries for Mac OS X (10.8 or later) Windows and Linux.
Sublime Text binaries are available for Mac OS X, Windows (installable or portable) and Linux (as a .deb or tarball)
Atom settings can be configured either through a user-friendly interface or directly by editing configuration files.
Sublime Text only allows you to change settings through configuration files.
*Though APM is a separated tool, it's bundled and installed automatically with Atom
Atom has been created by Github and it includes "git awareness". That is a feature I like quite a lot:
Also it highlights the files in the git tree that have changed with different colours depending on their commit status:
I just got my beta invitation today and tried Atom right away. The GUI feels like Sublime, and yes, there some shortcuts adopted from Sublime.
Besides everything mentioned above, here are some differences I have noticed so far:
Vim mode is not as good as the Vintage mode on Sublime (which is not a fully featured vim either) because the vim package is in an early stage of development. See https://atom.io/packages/vim-mode for detail.
As James mention, Atom is written using web tools, so you have access to the stylesheet of the text editor (styles.less) to do whatever appearance changes you want using CSS. There is also an option to change the startup CoffeeScript.
Again, because Atom is still in the beta stage, Sublime has much more native plugin packages. However, since Atom is written in Node.js, the Atom official site said you can "choose from over 50 thousand in Node's package repository." (Because I am not a Node.js pro, I haven't look into this feature though)
Atom has better Github support out of the box, but Sublime has a several Git packages.
Sublime is a paid application unlimited evaluation period. Atom is free at the beta stage but we don't know whether Github wants to charge it or not.
So the bottom line is Atom is a text editor built with web technology at beta stage. By contrast, Sublime has evolved through many different iterations. Atom is still missing a lot of packages that Sublime supports, so the question is will Atom catch up with Sublime or become some better? Github seems to be confident about the future of this text edit because of its popular underlying technologies, and Atom is probably going to be a good alternative to Sublime in the long run.
Another difference is that Sublime text is a closed source project, while Atom source code is/will be publicly available --although Github does not plan to release it as a real open source project. They want to give access to the code, without opening it to contributions.
Github made the code public: http://blog.atom.io/2014/05/06/atom-is-now-open-source.html
Atom is still in beta (v0.123 as I'm writing this) but it's moving fast. Way faster than Sublime. New builds are released on a weekly basis, sometimes even few of them in the same week. In its short life span, it had more releases than Sublime which takes months to release a new feature or a bug fix. Here's an updated take on things looking back on the path Atom has taken since the launch of the beta:
Sublime has better performance than Atom. Simply because it's written in C++. Atom on the other hand is a web based desktop app built on top of Chromium, and while they take performance close to heart, it will be really hard or even impossible to reach the same speed and responsiveness. Last July Atom began using React and it gave it a nice performance boost but you can still feel the difference. Apart from that, if Atom’s performance issues will not push users away - Sublime better speed up the release cycle, brush up its small UX tweaks, and consider letting in more contributors because this is where Atom is winning.
Atom's package ecosystem is also growing really fast, it might not be as big as Sublime's at the moment but I have a feeling that with GitHub at it's back it will keep growing even faster. It probably has the majority of IDE like plug-ins you can think of. A major difference right now is that it can't handle files bigger than 2MB so it's something to keep in mind.
The one thing you'll notice first is that the Sublime minimap is gone! Other than that, the first impression is that Atom looks almost the same as Sublime. I wrote a more in depth comparison about it in this blog post.
No easy straightforward way to port your Sublime configurations, packages and such as far as I know.
I tried Atom and it looks really nice BUT there is one major problem (at least in v 0.84):
It doesn't support vertical select Alt+Drag - this is a must for every modern code editor.
One major difference is the support of "Indic Fonts" aka South Asian Scripts (including Southeast Asian languages such as Khmer, Lao, Myanmar and Thai). Also, there is much better support for East Asian languages (Chinese, Japanese, Korean). These are known bugs (actually the most highly rated bugs) that have been going on for years (thought it appears East Asian language support used to work better but have now become difficult to use):
http://sublimetext.userecho.com/topic/117587-thai-language-issue/
http://sublimetext.userecho.com/topic/99013-can-not-show-or-type-chinese-charactor-on-ubuntu-system/
I'm working in little extreme environment; edit files on remote filesystem (external network, surely) that is mounted on my Laptop thru ssh(aka. sshfs). Regardless why I'm doing like this, also though its cumbersome responsiveness, it's fairly edible when I'm using Sublime Text 2.
I tried on Atom after reading this post, but it turned out to be somewhat painful to me; Atom seems that it doesn't cache directory structure so efficiently. Every time I expand a folder on Tree View, the UI froze for a short time, 2~3 seconds, maybe fetching file system info. Yes, it's because I'm using remote filesystem. But Sublime handles this more efficient, at least it doesn't freeze every time I expand a folder, so less painful.
I think Atom is hell nice for free, and my story is trivial that might be enhanced someday, but it would be helpful to someone at this time.
--
added on 8/26/2014
Recently, I changed my laptop from Macbook Air 2010 late to Macbook Pro 13" 2013 late. It has likely 4 times faster CPU and much enhancements in performance. I want to mention my opinion is about in the case WHEN YOU MOUNT REMOTE FILE SYSTEM. (using OS X Mavericks, most recent version of Atom, FUSE 2.7.3 / OSXFUSE 2.6.4 / sshfs 2.5.0, and remote system is Ubuntu server) Eventually, UI freeze gets pretty shorter, but it is still there. Specifically, to open a folder with many folder/files in it and index it is requires certain amount of time. Also, if you expand a folder full of files, it just falters. (when collapsing the folder, it doesn't)
According to #EliDuenisch , it seems not happen on Linux Mint. I'm not sure but it might be from difference between OSes. Surely, if you work on local file system, you don't have to care about this issue at all.
One major difference that no one has pointed out so far and that might be important to some people is that (at least on Windows) Atom doesn't fully support other keyboard layouts than US. There is an bug report on that with a few hundred posts that has been open for more than a year now (https://github.com/atom/atom-keymap/issues/35).
Might be relevant when choosing an editor.
ATTENTION ::
-- because of poorly made caching system, in Atom loss of data occurs often when using big files.
It has been proven numerous times.

CMS, or pre-baked solutions for community file sharing

I want to create a community around a current iPhone app I've built. It will allow registered users to upload and download small configuration or settings files, which are used in my app to customize functionality. These files are serialized plists (binary files around 500 bytes), but can be converted to a JSON or XML format if necessary.
I do not need an HTML front-end; I plan for it to be accessed only via my app. Files do not need to be private or secure. I do not plan to store or ask for any user private data--just a login and password.
I'm looking for tips that might get me close to my goals with the least amount of effort - I want to focus on the core functionality of the app, and have this as a stable feature that I can add to in the future if it is useful. I would of course prefer FOSS, but a commercial solution is not out of the question. Things like file sharing sites with apis, login ideas, and so on.
So, what software solutions are out there that I may not be aware of? I know that Drupal has modules to allow user logins. Is there something that would work not as a web app, but as a service only? Dropbox has file sharing and an API, but I'm not sure I could use it the way I'm intending.
In short, I could code this, but would prefer a pre-baked solution that would deal with things I may not have thought of. I am sure there must be something out there which I can use.
More Details, and what I plan on the service offering:
Registration of users via the iPhone, and all that entails (will code the UI myself--I just want an API to connect to)
Viewing of these files quickly and efficiently (the files were built with performance in mind, and this is a free app, so I would like to keep server costs down)
Uploading their own files, with a few integrity checks
Rating the files
Gathering statistics on usage (which files were downloaded most often), etc., to provide a way for the files to be ranked by rating, popularity, etc.
Optional - submitting revised versions of the files (a tree).
Optional but preferred - statistics on users (no. files uploaded, perhaps rewards system for sharing)
I'm just not up to date with current technologies and open source solutions. I have experience in SQL, relational database design, and have built backends in Java, so a custom solution is not out of the question. However, it's been a while, I'm not a security expert, and would prefer to not reinvent the wheel for what is a fairly simple project, so an off-the-shelf solution would be preferred.
Check out www.parse.com!
It is absolutely brilliant for stuff like this.
You may want to look at source versioning systems like SVN or distributed systems like Mercurial or GIT. Both would be much better if the data were serialized to a text format, like JSON or XML as you mentioned.
Registration would need to be done by you of course
Viewing of files (including changes, of course) is quick and efficient. The interface can be done in a number of ways, even simulating command-line.
Uploading files will of course work, and changes made will be stored as diffs. Integrity checks can be done, for example, by Mercurial plugins
Rating the files probably can't be done directly unless you wanted an awkward hack involving parsing change entries or writing a plugin.
Submitting revised versions of files would work as that is the raison d'être of versioning systems.
Some statistics are made available in VCSs.
This is honestly a bit of a strange use for version control systems and not altogether elegant, but sometimes that's what innovation is about.
I suggest TikiWiki .
Pros:
Out-of-the-box all you need to build a community. (See reference below for list of features)
It's FOSS
It has 200 active developers - so it really has a lot of momentum.
Cons:
So many out-of-the-box features that it suffers from feature bloat. Configuration and initial set-up may be complicated.
Not really oriented to mobile platforms.

Can I set up database connections in Qt without writing code (like in Delphi)?

Although it is comparatively hard to write in C++ than in Pascal I'm really attracted by multi-platform support of Qt. I can connect to an MSSQL server running on Win2003 server from Linux or I can connect to a PostGreSQL server running on Linux. That made a plus when comparing with Delphi.
I'm trying to write sample programs to get used to the Qt and C++. So far I'm comfortable with the layouts and signals-and-slots mechanism (still double clicking the buttons to write event code though :) ). I wish I was using the SQL data in my programs as easily as in Delphi.
Is there any way that I can put some connection object, a DataSource, a DBGrid and a DBNavigator on to a form and go on without writing code? (For some forms it is really a time saver, a project with 300+ forms can be made faster)
I would like to hear from people using Qt with data from SQL servers.
You would have to write your own designer plugins to achieve that and make your widgets invisible, as there is no direct support for non gui components in Qt Designer.
However, writing explicit code in Qt (which is really a lot less work than in most other programming environments) helps the program to stay readable. Delphi projects with a lot of forms and components tend to become readable to the author alone, because dependencies jump across files a lot. If you store your forms in binary format, you are lost anyway, because you then cannot search your project textually to find dependencies.
Good design, which causes your code to become small and easily readable is necessary in any programming environment and makes aspects like invisible components in forms less important (though you will miss them for a while to come like I do).
So, unfortunately, you are on your own for the moment.

What exactly does the Open XML SDK v2 take care of that you would have to do manually when coding by hand with an XML library?

This is closely related to another question I asked: Is there functionality that is NOT exposed in the Open XML SDK v2?
I am currently working with Open XML files manually. I recently had a look at the SDK and was surprised to find that it looked pretty low level, quite similar in fact to the helper classes I have created myself. My question is what exactly does the SDK v2 take care of that you would have to do manually when coding by hand with an XML library?
For example, would it automatically patch the _rels files when deleting a PowerPoint slide?
In addition to Otaku's links, this shows an example (near the bottom) of navigating an OpenXML document using the IO.Packaging namespace versus the SDK.
Just like Microsoft states on the download page for the SDK:
The Open XML SDK 2.0 for Microsoft
Office is built on top of the
System.IO.Packaging API and provides
strongly typed part classes to
manipulate Open XML documents. The SDK
also uses the .NET Framework
Language-Integrated Query (LINQ)
technology to provide strongly typed
object access to the XML content
inside the parts of Open XML
documents.
The Open XML SDK 2.0 simplifies the
task of manipulating Open XML packages
and the underlying Open XML schema
elements within a package. The Open
XML Application Programming Interface
(API) encapsulates many common tasks
that developers perform on Open XML
packages, so you can perform complex
operations with just a few lines of
code.
I've worked pretty much only with the SDK, but for example, it's nice to be able to grab a table out of a Word document by just using:
Table table = wordprocessingDocument.MainDocumentPart.Document.Body.Elements<Table>().First();
(I mean, assuming it's the first table)
I'd say the SDK does exactly what it seeks to do by providing a sort of intuitive object-based way to work with documents.
As far as automatically patching the relationships -- no, it doesn't do that. And looking back at how you actually state the question, I guess I might even say that (and I'm fairly new to Open XML so this isn't gospel by means) the SDK2.0 doesn't necessarily offer any extra functionality, so much as it offers a more convenient way to achieve the same functionality. For example, you still need to know about those relationships when you delete an element, but it's a lot easier to deal with them.
Also, there's been some efforts on top of the SDK to add even more abstraction -- see, for example, ExtremeML (Excel library only. I've never used it but I think it does get into things like patching relationships).
So I'm sorry if I've rambled a bit too much here. But I guess my short answer is: there's probably not extra functionality, but there's a nice level of abstraction that makes achieving certain functionality a lot easier to handle -- and if you've been doing it by hand up until now, you'll certainly have the understanding of the OPC to understand what exactly is being abstracted.
As a starting point, read this from the Brian Jones & Zeyad Rajabi blog.
I don't know of a side-by-side comparison, but the following articles/videos do discuss the two:
Using the Open XML SDK 2.0 Classes
Versus Using .NET XML Services is
a good place to start comparing the
two.
Open XML and the Open XML SDK is
a deep dive video which discusses both.
Finally, this is a What's New for 2.0 - it can be assumed that neither 1.0 or hand-coding have these benefits.

Tools to manage semantic webs

I've seen a lot frameworks to create a semantic web (or rather the model below it). What tools are there to create a small semantic web or repository on the desktop, for example for personal information management.
Please include information how easy these are to use for a casual user, (in contrast to someone who has worked in this area for years). So I'd like to hear which tools can create a repository without a lot of types and where you can type the nodes later, as you learn about your problem domain.
For personal semantic information management on the desktop there is NEPOMUK. There are two versions, one embedded in kde4, this lets you tag, rate and comment things such as files, folders, pictures, mp3s, etc. on the desktop across all applications.
Another version is written in Java and is OS independent, this is more of a research prototype. It has more features, but is overall less stable.
For KDE-Nepomuk see http://nepomuk.kde.org/
For Java-Nepomuk see http://dev.nepomuk.semanticdesktop.org/ and http://dev.nepomuk.semanticdesktop.org/download/ for downloads (the DFKI version is better)
Extensive list of semantic web tools
Also check out Protege
If you need to create a small model, then I suggest that you use topbraid. I have used for creating much larger models and I know people who have used to create humongous models. It comes packaged with a set of reasoners and provides ability to plug-in custom reasoner and in case if you decide to make your model larger, you can even integrate Topbraid with a triple store like Allegrograph.
And since its based on eclipse, to get started with it is relatively easier.
For developers who are spoiled working in more matured programming languages like Java (IDEA ? anyone), topbraid is the closest tool to an actual IDE.
Chandler is a "a notebook you can organize, back up and share!" It seems to be pretty simple to use.
OS: Windows, Mac, Linux

Resources