What is this SWX data format that I keep hearing about?
The Wikipedia article says: "Data is stored as SWF bytecode, that is interpreted by Adobe Flash Player."
Their official site says: "SWX is the native data format for the Flash Platform.", and there are many examples of sites which allow users to modify/update data.
Does this mean that:
Data is stored following the open Adobe SWF-specification meaning that data (arrays/objects) can be loaded directly into the Flash SWF as SWF movies.
Data is stored in XML/SQL and when Flash requests the "SWF" file, server-side code generates an "SWF" file and passes it on to the Flash SWF.
"SWX is the native data format for the Flash Platform" is a very confusing statement. The short answer to your question is (from Wikipedia):
SWX data files can be loaded into
Flash movies with:
ActionScript 2, using the Flash internal function loadMovie().
ActionScript 3, using an SWX API function, when data is received, SWX
dispatches custom events.
This means SWX is not a "data format", but rather the specification for something written in normal SWF bytecode. Otherwise it would not be loadable using internal Flash functions. (The reason why AS3 needs an SWX API function is that AS3 is less forgiving than AS2.) So your first alternative ("following the open Adobe SWF") is correct. A good example on the official web page is:
just like JSON is a subset of
JavaScript
In your terms, JSON is JavaScript, and correspondingly, SWX is SWF.
Have you seen the website for the SWX format? Hopefully that (the linked page in particular) should answer your questions.
Related
I have been using Ms Graph API, to download the files of OneDrive successfully.
I was looking for a way to read only the text content (for indexing purpose in my application) using Graph API, for different types of files(pdf,xls,zip,Images etc.) instead of going by the conventional approach of downloading the complete file and then extracting the text using some "Text extracting api" and then index the file, which would be a time consuming task. I am aware GraphAPI has its own search features, but it lacks ability to do complicated search like regular expression search (please correct me if I am wrong). I am sure OneDrive does its own indexing for each file which helps a user to do the basic search.
So, is there any way I can get the text content of the documents using the Graph API?
I don't believe getting a 'preview' of text-based documents is currently available through the API. You will need to make a GET request to fetch the content. If you don't want the full document, you can request a partial range of bytes that you believe would be enough for the document. In addition, to make it easier to handle different file types, we currently support converting common file formats to PDF (to possibly standardize your file parsing logic).
Let's say I have a 400MB PDF, on a web server, and I want an ipad user to be able to open it and to start looking at it as soon as possible, without downloading the file entirely first. What are the options?
Is Safari able to stream a large PDF like this? Can it start showing the first pages while the file is still downloading?
Is there a way to build a native app to achieve this? If so, should the PDF be "spitted" first on the server? How?
Any tips on how to open a large PDF, in a friendly way and on an ipad, would be appreciated. Bonus points if the index of the PDF is accessible!
To clear things out:
Linearized PDF has its order of objects reordered, so that all required data to display page are in front of the others. You can read more details in PDF specification in Annex F (page 683)
In case You haven't found application, that supports Linearized PDF, the best solution from my perspective would be rendering pages at the server side and create custom protocol to transfer only a single page with the table of contents.
Unless You want to create custom PDF viewer with Linearized PDF, You just create small portable "page viewer", which can quickly show the table of contents and then ask server for exact page number.
Example Server Interface:
class Pages
{
string title;
int number;
}
Stream GetDocumentPage(Page n);
List<Page> GetDocumentPages();
For this, one can use open-source solutions (such as Ghostscript) to parse pdf file and render the required page. Then send the binary data over Your custom protocol and display page at Your small portable "page viewer".
This solution does not require downloading whole document as it can cause significant network traffic.
Hope this helps.
I'm trying to write an iOS application that'll get data from a web server and display it as I want. I want to use JSON for this purpose. But as I'm absolutely new to web apps I've got no idea how I'm going to get the url to a certain feed. Now here're the two big questions:
How do I find the url to a feed provided by a web service? Is there a standard way or is it publicly or exclusively handed to the web service subscribers?
Is the format they provide data in up to their preference (like XML or JSON)? I mean, do I choose my data parsing method according to the format the web service gives data in? So that if the feed is in XML format using NSJSONSerialization class makes no sense.
The URL to use is dependent on the web service and is usually well described in the documentation.
The type of data they return and the the structure is also usually well described in the documentation.
The common bits you'll need to know are how to get to the web-service (NSURLRequest/NSURLConnection or any of the many asynchronous wrappers that are open source and available with a bit of searching), And how to deal with the the returned data - whether it's in JSON (NSJSONSerialization, JSONKit) format or XML (NSXMLParser, libxml, or any of the many open source implementations that are available and described with a bit of searching)
I want to know if there is a better way of extracting info from a web page than parsing the HTML for what i'm searching. ie: Extracting movie rating from 'imdb.com'
I'm currently using the IndyHttp components to get the page and i'm using strUtils to parse the text but the content is limited.
I found plain simple regex-es to be highly intuitive and simple when dealing with good web-sites, and IMDB is a good web site.
For example the movie rating on the IMDB's movie HTML page is in a <DIV> with class="star-box-giga-star". That's VERY easy to extract using a regular expression. The following regular expression will extract the movie rating from the raw HTML into capture group 1:
star-box-giga-star[^>]*>([^<]*)<
It's not pretty, but it does the job. The regex looks for the "star-box-giga-star" class id, then it looks for the > that terminates the DIV, and then captures everything until the following <. To create a new regex like this you should use a web browser that allows inspecting elements (for example Crome or Opera). With Chrome you can simply look at the web-page, right-click on the element you want to capture and do Inspect element, then look around for easily identifiable elements that can be used to create a good regex. In this case the "star-box-giga-star" class is obviously easily identifiable! You'll usually have no problem finding such identifiable elements on good web sites because good web sites use CSS and CSS requires ID's or class'es to be able to style the elements properly.
Processing RSS feed is more comfortable.
As of the time of posting, the only RSS feeds available on the site are:
Born on this Date
Died on this Date
Daily Poll
Yet, you may make a call for adding a new one by getting in touch with the help desk.
Resources on RSS feed processing:
Relevant post here on SO.
Super Object
Wikipedia.
When scraping websites, you cannot rely on the availability of the information. IMDB may detect your scraping and attempt to block you, or they may frequently change the format to make it more difficult.
Therefore, you should always try to use a supported API Or RSS feed, or at least get permission from the web site to aggregate their data, and ensure that you're abiding by their terms. Often, you will have to pay for this type of access. Scraping a website without permission may open you up to liability on a couple legal fronts (Denial of Service and Intellectual Property).
Here's IMDB's statement:
You may not use data mining, robots, screen scraping, or similar
online data gathering and extraction tools on our website.
To answer your question, the better way is to use the method provided by the website. For non-commercial use, and if you abide by their terms, you can download the IMDB database directly and use the data from there instead of scraping their site. Simply update your database frequently, and it's a better solution than scraping the site. You could even wrap your own web API around it. Ratings are available as a standalone table.
Use HTML Tidy to convert any HTML to valid XML and then use an XML parser, maybe using XPATH or developing your own code (which is what I do).
All the answers posted cover well your generic question. I usually follow an strategy similar to the one detailed by Cosmin. I use wininet and regex for most of my web extraction needs.
But let me add my two cents at the specific subquestion on extracting imdb qualification. IMDBAPI.COM provides a query interface returning json code, which is very handy for this type of searches.
So a very simple command line program for getting a imdb rating would be...
program imdbrating;
{$apptype console}
uses htmlutils;
function ExtractJsonParm(parm,h:string):string;
var r:integer;
begin
r:=pos('"'+Parm+'":',h);
if r<>0 then
result:=copy(h,r+length(Parm)+4,pos(',',copy(h,r+length(Parm)+4,length(h)))-2)
else
result:='N/A';
end;
var h:string;
begin
h:=HttpGet('http://www.imdbapi.com/?t=' + UrlEncode(ParamStr(1)));
writeln(ExtractJsonParm('Rating',h));
end.
If the page you are crawling is valid XML, i use SimpleXML to extract infos. Works pretty well.
Resource:
Download link.
I'm currently developing a corporate intranet that serves large PDF files. Users get frustrated when they have to wait for entire PDF files to download before they can view them. I have used the embedded Google documents viewer ( http://googlesystem.blogspot.com/2009/09/embeddable-google-document-viewer.html ) on other public facing websites for lazy loading and ease of document navigation, but this is not feasible as the solution is required for an intranet. Is it possible to achieve lazy loading of a PDF nativity within a browser and if so what are the requirements for this to happen? I am using ASP .NET MVC 3.
Thanks
You should make sure that the PDF documents that your serve are 'linearized' (optimized for web). It allows the browser to download the PDF document partially to display typically the first page fast. When the user navigates to another page, again just a part of the PDF document is downloaded. Here is a good article on the topic:
http://www.jpedal.org/PDFblog/2010/02/linearized-pdf-files/
In this scenario you would not write directly to the Response stream.
First - this question has nothing to do with ASP.NET MVC..
Second - this question has nothing to do with lazy loading. Lazy loading is "pattern" in object-relational mapping, it is not synonym for streaming
Finally - it depends on the PDF viewer you use. Browser does not display PDF files, some plugin in the browser does, typically Adobe Reader. So your question in fact is :
Can i stream PDF file so that it can be open and read before it is whole on the client ?
As far as i know, yes you can. But you must use .NET streams - for example "plug" the HttpContext Response stream as output of your PDF generator.