what is the correct use of schema.org imageGallery - google-schemas

i tried to make use of schema.org for an image gallery but the images are not indexed by google
link is link to azie4y!
are there any examples on the web apart from the schema.org webpage ? for example the imageObject

Having a structured web site is required for the indexing process but is not the only thing needed. Here ( https://support.google.com/webmasters/answer/70897?hl=en) you can find more information about how the process works, good practices and tools like https://support.google.com/webmasters/answer/35769 to help your website being indexed.
Hope this information helps.

Related

How to read title and description of a news websites

Some website like DIGG or NGOOR allow user to input an article or website URL then it will fetch that website title and description, display and images on its page. I searched around internet and also stackoverflow but cannot find any solutions that is the same as these websites. i planned to create this function using Java and Jquery, so i wonder if someone could help to give me some hints on this?
Moreover, in discussions related to this issue, some idea discussed on if we need to load all web page to read meta data tags or just load only necessary information? what's your ideas?
I am sorry if this post is an old question but i could not find any suitable data
Thank you very much.

Pdf Parsing Challenge

I have the following problem: I have a lot of papers in pdf format and I have to extract information from the first page of each one and then save it into a database
I just need to extract, the title, the abstract, keywords, authors list, universities list, emails. I want to do a script to get a string for each one of that fields, for each paper.
How can I do that? Does anyone already did that? What languages and tools do you recommend me?
and Does exist a paper repository that already do that database feeding?
Considering the pdfs could be with different encodings, I have to deal with this problem too. Any help with this would be great.
An example of a paper its here
Greetings!
http://pdfbox.apache.org/
You have to check about the security of the pdf, that it's really text and not an image. Check the command line application of pdfbox if it works extracting the text, then you can use the jar and use http://pdfbox.apache.org/apidocs/org/apache/pdfbox/examples/util/ExtractTextByArea.html
Hope it helps....
By the way it's java...
edit.
I have not used this as a jar library http://www.qoppa.com/pdftext/, but I used the example application and it works, but I decided to go with pdfbox...
You need a API to read your pdf.
Seems fine (I never try it though)
You can probably find others with this link :-)

How to recreate an image preview from outside websites?

Similar to Facebook's UI, I am attempting at generating a preview image from an external linked website. So that when a user types in a url he is linking, the UI will by default, scan that site for an img and scrape a preview thumb.
Is there a specific name for this technique? Or can anyone point me in the direction of learning this?
Thanks so much!
Its called scraping. There is a library called scrAPI.
Here is a code example http://crunchlife.com/articles/2007/08/13/code-snippet-ruby-image-scraper
There are a couple different options when it comes to page scraping. Another one to check out would be nokogiri, http://nokogiri.org/. You can find tutorials on how to use it at http://nokogiri.org/tutorials.
Instead of grabbing an image from the site, why not grab the image of the entire page? You could make use of a free screenshot service like http://www.websnapr.com/ or http://www.thumbshots.com/ among others. In one application, I use that for my preview image, and use nokogiri to scrape the page title and description. Just an idea.

Trackback implementation: rel="trackback" vs RDF

I want my Rails App to parse external websites for a trackback URL but I'm not really sure if I should just look for a
Text
or follow the RDF specifications described by sixapart. Or both. Wordpress and Techcrunch both only offer a rel="trackback" link and they should know. On the other hand maybe some blog only provides RDF and I'm missing the link.
What do you think?
And is there any ready gem/plugin out there (it's really hard to google for trackback...)
Thanks.
UPDATE
I'm now first trying to find the RDF information. If I do not find anything, I look for the link tag. I was refering to the sixapart specifications. Thanks for your help.
I'm checking for both now (first RDF, then link if not successfull). I was refering to the sixapart specifications. Thanks for your help!

How to get rid of stupid "pad" labels produced by RTML functions?

I am unlucky to be in charge of maintaining some old Yahoo! Store built using their RTML-based platform.
Recently I've noticed that HTML code generated by some RTML functions is sprinkled all over with "padding images" (or whatever is the conventional name for those 1x1 pixel images used to enforce layout). I have nothing against using such images, but... all those images are supplied with an ALT attribute like this:
<img href="http://.../image1x1.gif" alt="pad">
With all due respect to the original authors of RTML, but they must have been smoking something when they came up with this "accessibility enhancement"... :-(
Anyway, here are my questions:
Does anybody know a list of all RTML functions that generate HTML with all these "pad" images?
Is there any way to get rid of all those alt="pad" attributes without rewriting a lot of RTML code?
NB: This may sound a little cynical, but improved accessibility is not the main goal here. The main goal is to stop exposing those moronic alt="pad" attributes to Google and other smart search engines. So client-side scripting is not going to help, as far as I know.
Thank you!
P.S. Probably, most of you are really lucky and never heard of RTML. Because if somebody would establish a prize for software products based on
commercial success
------------------
usability
ratio, this RTML-based "platform" would probably win the first place.
P.P.S. Apparently someone from Yahoo! finally listened, because I can no longer find those silly "pad" tags in the RTML generated for our store. Nevertheless, one of the ideas offered in response to my original question does provide a very practical solution - not just to the original problem but to any similar problem with RTML platform. See the winning answer - it's really good.
The only way I see is to have your own website front-end that will filter whatever you want from the RTML site....
for example, your rtml site is at http://rtmlusglysite.yahoo.com/store/XYZ01134 , you could host a simple PHP front-end at http:://www.example.com that would be acting like a "filtering" HTTP web proxy, so http://rtmlusglysite.yahoo.com/store/XYZ01134/item1234.rtml would be accessed by http://www.example.com/item1234.html
It's not an ideal solution, but it should work, and you could do some more fancy stuff.
Nice try from the other posters, but there is a very simple RTML command that will do it. . .
TEXT PAT-SUBST s GRAB
MULTI
HEAD
BODY
TEXT #var-with-alt-tag-equals-pad-in-it
frompat "alt=\"pad\""
topat ""
The above RTML will find all instances of alt="pad" and replace it with nothing.
Well you're right on RTML being relatively untraveled :)
Do you have a way to add your own attributes to these images tags? If so, would it be possible to override the alt attribute? If you specify alt="", I would think that would override Yahoo's... Otherwise consider putting a useful alt tag in there for the blind and dialup types.
It's the first time I'm hearing about this platform, but here is an idea: if you can add javascript to the pages, you could write a function that will run after the page has loaded and remove all the alt="pad" attributes from the page.
Unfortunately this solutions works only with browsers that know about scripting, so lynx or some other text based browsers might not support it.
I have shared a link official RTML guide from yahoo. Hope it will help. Thanks!
List of available RTML books and resources

Resources