nokogiri and video tags - ruby-on-rails

nokogiri and video tags - ruby-on-rails

some pieces of HTML structure are stored on server. Before saving, they will be preprocessed.
Preprocessing inserts HTML 5 video tags to certain places.
I trying to do it, but, everytime i deal with video tags, i get following:
Tag video invalid
I think, they this is because of the HTML 4.0 DOCTYPE, that i saw in debugger:
< !DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
I also tried to use XML as a parser, but, i cannot figure out, how to obtain clean HTML code from Nokogiri::XML object.
Any ideas ?

First, you can use #to_html (or #to_xhtml) on an XML document. However, I'm not sure that's necessary here. I don't get any 'Tag video invalid' errors when creating elements. Here's a sample program showing how to parse existing HTML4, inject a video element, and get HTML out again:
require 'nokogiri'
html = Nokogiri::HTML <<ENDHTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html><head><title>Sauceome</title></head>
<body><p class="video" id="foo"><!-- put vid here--></p></body></html>
ENDHTML
wrap = html.at('.video')
wrap.inner_html="<video src='#{wrap['id']}.mov'></video>"
puts html.to_html
#=> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
#=> <html>
#=> <head>
#=> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
#=> <title>Sauceome</title>
#=> </head>
#=> <body><p class="video" id="foo"><video src="foo.mov"></video></p></body>
#=> </html>

Related

schema.org Failing to Validate Dublin Core Meta in 'meta' and 'link' elements

I have Dublin Core (DC) meta data in <meta ...> and <link...> elements. Testing my html document with the validator fails to identify the dublin core meta data in my document. But when using DC tags in elemetns like <td rel="dc:date" content="2017-02-10">10 February 2017 </td> the validator identifies those meta data elements.
This validator also fails to identify DC tags in meta and link elements.
Example that does not validate but should:
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head profile="http://dublincore.org/specifications/dublin-core/dc-html/2008-08-04/">
<title>Services to Government</title>
<link rel="schema.DC" href="http://example.org/terms/" />
<meta name="DC.date" content="2007-05-05" />
</head>
<body>
</body>
</html>
Is the meta data invalid or are the validators in the wrong? Is there a validator that will support <meta > and <link>?
it seems like the prefix:
#prefix dc: http://purl.org/dc/elements/1.1/ .
is not appearing the the validator results for some reason.
I have tried adding additional vocabulaires like:
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head profile="http://dublincore.org/specifications/dublin-core/dc-html/2008-08-04/">
<title>Services to Government</title>
<link rel="schema.DC" href="http://example.org/terms/" />
<link rel="schema.DC" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:gml="http://www.opengis.net/gml" xmlns:v="http://rdf.data-vocabulary.org/#"/>
<meta name="DC.date" content="2007-05-05" />
</head>
<body>
<td rel="dc:date" content="2017-02-10">10 February 2017</td>
</body>
</html>
Without success.
To recreate, just paste the example html into one of the validators linked above.

Those examples are written with an obviously unsupported syntax.
So the validators are not suppose to detect it, as they support common syntax, such as RDFa, JSON-LD, Microdata etc.
Here's a quote that might be relevant:
The major search engines now extract and index metadata embedded with
one of several syntaxes: HTML Microdata, of limited expressivity but
the easiest for webmasters to deploy; RDFa, a richer syntax with
better support for internationalization and multiple RDF namespaces;
and JSON-LD, an RDF-compatible variant of the popular Javascript
Object Notation (JSON). These broadly supported syntaxes effectively
obsolete a series of IETF and DCMI syntax specifications developed
prior to 2008 specifically for expressing Dublin Core™ metadata.
https://www.dublincore.org/resources/metadata-basics/
Parsing those examples would require a parser for that specific syntax (there doesn't seem to be many out there..).
So the solution might be to use some of the common serializations (JSON-LD, Microdata, RDFa)

Doctype in JSF Mojarra

What Doctype should I use in JSF pages? The other day I'm trying to migrate from Mojarra 2.1.13 to 2.1.18 and it seems that the way the doc types are interpreted changed. In the root template I have following DOC TYPE
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Do I also have to include this?
<?xml version="1.0"?>
In composites (that use this template) I used to have following doctype
<!DOCTYPE composite PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
But it seems that Mojarra 2.1.18 doesn't really support that. Also I didn't find this in any JSF 2.0 reference, this we used to use in JSF 1.2. If I have this doctype in composite page, it will render composite doctype instead of html that is in the template. In the result, the css styles are messed up.
So what's the correct usage of doctypes in JSF 2.0. Or is this issues with Mojarra? I didn't find any reference regarding this.

I created a JIRA issue for this: http://java.net/jira/browse/JAVASERVERFACES-2820
and it has been closed as this is the expected behavior.
"The composite page is where you actually use the template. So it is the outer most file where you specified a doc type. As such it defines the doc type that will be rendered."
Just specify the doctype in a template and nowhere else

I also migrated Jboss 7.1 to JBoss EAP 6.1
I found not very nice workaround - to insert on each page (not template):
<!DOCTYPE html>
e. g.:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<ui:composition xmlns="http://www.w3.org/1999/xhtml"
xmlns:ui="http://java.sun.com/jsf/facelets" xmlns:f="http://java.sun.com/jsf/core"
xmlns:h="http://java.sun.com/jsf/html" template="template.xhtml">
Is there any other way - for doctype to be read from master template?

Avoid multiple DOCTYPE and html tags when using ui:include

we are using several ui:include tags in the "main" page. The page that is to be included looks like this:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns=".." xmlns:ui="..." ...>
<ui:fragment rendered="${foo}">
some html code
</ui:fragement>
<ui:fragment rendered="${!foo || bar}">
some more html code
</ui:fragement>
</html>
Using the ui:include for templating results in repeating the DOCTYPE and html tag several times in the source code, which is pretty ugly. (Sure, the user doesn't see, but I'm a fan of tidy html)
However, if I remove the DOCTYPE and html tag from the to-be-included-xhtml, the Faces Servlet throws an exception stating that the prefix ui for ui:fragment is not bound.
Does anybody know, how I can include another XHTML page without the multiple DOCTYPEs and htmls?

You should take a look at the ui:composition tag.
We also use ui:include to include jsf2 pages, and to solve the problem you have I believe you could alter your included page by adding the ui:composition tag as follows:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns=".." xmlns:ui="..." ...>
<ui:composition>
<ui:fragment rendered="${foo}">
some html code
</ui:fragement>
<ui:fragment rendered="${!foo || bar}">
some more html code
</ui:fragement>
</ui:composition>
</html>

Why does nginx + memcache corrupt my response body?

I'm caching some web pages in memcache. When I read the page directly from the cache, the page is well formed like this ...
!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"-:-- 0
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="content-type" content="text/html;charset=UTF-8" />
but when I use a browser or curl to read it from nginx (version 0.8.50), it looks like response headers are ending up in the body of the response like this ...
�{
" ETag"'"16bb9f51667d334aa4e7663ca28d308a""X-Runtime177"Content-Type"text/html; charset=utf-8"Content-Length"5428"Set-Cookie""Cache-Control"(private, max-age=0, must-revalidate"4<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
My nginx config is pretty simple ...
set $memcached_key $cookie__app_session$uri;
memcached_pass localhost:11211;
default_type text/html;
error_page 404 502 /fallback$uri;
Does anyone have an idea why the response is corrupt?

Do! Stupid developer problem!
There were two mistakes
(a) I was storing the response header and body in memcache, then adding headers in an nginx rule. Storing only the response body in memcache removed the bulk of the problems
(b) I was storing the response in Ruby's marshal format (the default setting in memcache-client) - reading the contents of memcache using a simple Ruby client was hiding the fact that the format was not directly usable by nginx.
Hope that helps someone sometime!
Chris

Grails interprets and closes HTML meta tag

In my Grails GSP file I'm using the HTML meta tag:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
The problem is that Grails closes this tag and renders it as:
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
This fails W3C's HTML validation (since my doctype is HTML and not XHTML).
Is there a fix for this? How can I get Grails to not interpret the
meta tag?
I'm using grails-1.2-M4.
Follow up:
I create the Grails bug GRAILS-5696 for this issue.

Not sure that this is the most beautiful solution, but at least it will work for your case:
<%= '<meta http-equiv="Content-Type" content="text/html; charset=utf-8">' %>
Well...this does not work since it is preprocessed by Grails before displayed as is.
So the only solution I see is to create a TagLib and output the content like this:
class MetaTagLib {
static namespace = 'my'
def meta = {
out << "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\"/>"
}
}
and use it like:
<my:meta />
It works. Tested.

You could validate as HTML5 instead of HTML 4.01, by using <!DOCTYPE html> (that's it, really!). HTML5 allows trailing slashes even in the HTML syntax, in order to allow for systems like this that produce pseudo-XHTML.
Of course, HTML5 is not yet a finished standard; it may change. I think that this aspect of it is unlikely to be changed, but there is still some fairly contentious debate about a lot of the new HTML5 features, so keep in mind that it's not yet finalized.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

nokogiri and video tags - ruby-on-rails

Related

schema.org Failing to Validate Dublin Core Meta in 'meta' and 'link' elements

Doctype in JSF Mojarra

Avoid multiple DOCTYPE and html tags when using ui:include

Why does nginx + memcache corrupt my response body?

Grails interprets and closes HTML meta tag

Categories

Resources