Open Search Server: Connect custom html meta tags to schema fields - search-engine

I've set up a new OSS to handle search on a forum. The basic setup was rather straight forward but upon tweaking it I've gotten stuck. The issue is that the pages have a custom meta tag like this:
<meta name="searchtype" content="construction_collection" />
I have set up a field in my Schema with the same name and then added it to the returned fields in the query. However that tag in the result xml is always empty:
<result name="response" numFound="173" collapsedDocCount="0" start="0" rows="10" maxScore="2357,006" time="6">
<doc score="2357,006" pos="0" docId="4008">
<field name="searchtype"/>
and I fail to comprehend how to setup the Parser and Crawler in order to connect these. Some threads here insinuate that it should work automatically, but it doesn't. Surely I need to set up something more. What have I missed?
/Simon

By default, the HTML parser of OpenSearchServer try to extract only the visible information of the Web page.
It is possible to retrieve information stored in meta only if they use a specific syntax. Your meta should be in the form:
<meta name="opensearchserver.field.searchtype" content="contruction_collection" />
You can also populate several fields:
<meta name="opensearchserver.field.searchtype.anotherfield" content="contruction_collection" />

Related

Twitter API - Is there anyway to retrieve the scraped text & image from a tweet?

When someone posts a tweet that only contains a URL, Twitter does a bit of scraping where it grabs some text and an image from the webpage.
Example below:
Is there any way of retrieving this data from the Twitter API? I've not been able to find this data in anything that's returned. Do I need to provide some special parameter maybe? Or is this just something that's not possible?
No. You cannot get that data from the Twitter API.
The data that you're seeing is a Twitter Card.
Here's how it works.
The web developer puts some meta tags in their web page - take a look at the source for https://www.nytimes.com/2017/10/04/well/move/for-your-brains-sake-keep-moving.html and you'll see:
<meta name="twitter:site" value="#nytimes" />
<meta property="twitter:url" content="https://www.nytimes.com/2017/10/04/well/move/for-your-brains-sake-keep-moving.html" />
<meta property="twitter:title" content="For Your Brain’s Sake, Keep Moving" />
<meta property="twitter:description" content="Exercise changes the workings of new brain cells in ways that may protect against dementia, a study in mice suggests." />
<meta property="twitter:image" content="https://static01.nyt.com/images/2017/10/10/well/04physed-brain-photo/04physed-brain-photo-videoSixteenByNineJumbo1600.jpg" />
<meta name="twitter:card" value="summary_large_image" />
When Twitter sees a URL, it fetches it and looks for those tags. If it finds them, it will display a photo and headline on the Twitter website.
If you want to retrieve that data, you need to visit the URL and look for the OpenGraph tags.

Google SDTT appending "#__sid=md3" to URL for mainEntityOfPage

Why is this happening?
HTML shows:
<meta content='http://www.costumingdiary.com/2015/05/freddie-mercury-robe-francaise.html' itemprop='mainEntityOfPage' itemscope='itemscope'/>
Structured Data Testing Tool output shows:
http://www.costumingdiary.com/2015/05/freddie-mercury-robe-francaise.html#__sid=md3
Update: It looks like it has to do with my breadcrumb list. But still, why is it happening, and is it wrong?
If the URL you want to provide is unique you can use the itemid property.
I was confronted with mainEntityOfPage by the tool after the latest update. And using Google's example I used the following code
<meta itemscope itemprop="mainEntityOfPage" itemType="https://schema.org/WebPage" itemid="https://blog.hompus.nl/2015/12/04/json-on-a-diet-how-to-shrink-your-dtos-part-2-skip-empty-collections/" />
And this show up correctly in the Structured Data Testing Tool results for my blog
I don’t know where the fragment #__sid=md3 is coming from, but as the SDTT had some quirks with BreadcrumbList in the past, it might also be a side effect of this.
But note that if you want to provide a URL as value for the mainEntityOfPage property, you must use a link element instead of a meta element:
<link itemprop="mainEntityOfPage" href="http://www.costumingdiary.com/2015/05/freddie-mercury-robe-francaise.html" />
(See examples for Microdata markup that creates an item value, instead of a URL value, for mainEntityOfPage.)

W3C validator shows new error: "Meta requires 'name' attribute"

The w3C validator was all fine with this code:
<meta property="og:site_name" content="--Sitename--" />
If I replace the property attribute with name, the validator says og:site_name is not registered.
All of a sudden today it displayed this error:
Error Line 7, Column 66: Element meta is missing required attribute name.
Nothing is changed but this error popped up.
Anyone knows why, and the solution for that?
For HTML5
If a meta element has the property attribute (from RDFa), the name attribute is not required.
See the section "Extensions to the HTML5 Syntax" from the W3C Recommendation HTML+RDFa 1.1 - Second Edition:
If the RDFa #property attribute is present on the meta element, neither the #name, #http-equiv, nor #charset attributes are required and the #content attribute MUST be specified.
So your markup is fine:
<meta property="og:site_name" content="--Sitename--" />
But it’s (now) even valid if you use the name attribute instead of RDFa’s property, because the OGP values are registered. So this is fine, too:
<meta name="og:site_name" content="--Sitename--" />
And you could even combine both ways:
<meta name="og:site_name" property="og:site_name" content="--Sitename--" />
It's hard to get what validator and in what mode you're using. Suppose it's validator.w3.org. Than notice that HTML5 support there is "experimental". And "property" tags refer to rdfa which is part of HTML5 standard. To dive in further details one need your code snippet or page url...
I had the same problem which I find really borring.
This might not be the answer you were waiting for but I recommend using http://validator.nu/ instead of W3C validator.

widgetVar name collision in Primefaces in multiple cc:renderFacet

I have composite component, in which I have toolbar and datatable. I also defined facet which contains a form for manipulating data from datatable. Users define that facet for different kinds of data. Now, I have problem because I render that facet multiple times and now I have collisions for widgetVar names for Primefaces components. It is no possible to use insertChildren multiple times so I think this is only possible solution. Also I wouldn't like to force users of component to define 10 facets and write ui:include 10 times. Is there any other way to insert some facelet code in composite component, or is there any way to pass parameter to facet, and use that parameter to dynamically create widgetVar?
OK, after some time I just didn't succeeded to do what I wanted. First I had some composite component like this:
<cc:interface>
<!-- Attributes definition -->
<cc:facet name="form"/>
</cc:interface>
<cc:implementation>
<p:dialog><f:subview id="detailSubview1"><cc:renderFacet name="form"/></f:subview></p:dialog>
<p:dialog><f:subview id="detailSubview2"><cc:renderFacet name="form"/></f:subview></p:dialog>
<!-- There is some more renderFacets but this is enough -->
</cc:implementation>
If I have for example p:selectOneMenu inside the form, without any widgetVar definitions, all will be with same name for widgetVar and this is a problem.
So, I changed this completely and I will transform this composite component to ui:composition and decorate it in my page. In that case widget vars are generated as I want, with different names, because they are in different naming containers.
A widgetVar is in fact used in JavaScript to identify the component. Therefor a widgetVar must be unique in a page. You'll have to declare it yourself.
If you want to create a custom component, as I think might suit you better than ui:define/ui:include, you might want to do something like this:
Say we want to create a component that renders a p:commandButton and a h:outputText with the same value (for whatever reason). You create a XHTML page in directory [deployed-root]/resources/example, named customComponent.xhtm:
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:h="http://java.sun.com/jsf/html"
xmlns:p="http://primefaces.org/ui"
xmlns:c="http://java.sun.com/jsf/composite">
<c:interface>
<c:attribute name="text" required="true" />
</c:interface>
<c:implementation>
<h:outputText value="#{cc.attrs.text}" />
<p:commandButton value="#{cc.attrs.text}" />
</c:implementation>
</html>
Then to use this in another page you'll have to define the namespace xmlns:e="http://java.sun.com/jsf/composite/example", and then you can refer to the custom component like this: <e:customComponent text="some text here"/>.
It should also be noted that it is bad practice to declare forms in custom components. This affects flexibility of use drastically since forms cannot be nested.
PrimeFaces can generate wigetVars so you don't have to.
From the 3.4 User's Guide:
<p:dialog id="dlg">
<!-- contents -->
</p:dialog>
<p:commandButton type="button" value="Show" onclick="#{p:widgetVar('dlg')}.show();"/>
This is designed to work in naming containers, so it should work just fine in composite components, <ui:repeat/>, <h:dataTable/>, etc.

Href not working in GSP pages

I am using modal box plugin with grails. The problem is that the link that it creates does not always call the server side code.
here is the link on the page
<modalbox:createLink
controller="company"
action="setChangeCompanyAdmin"
absolute="true"
mapping="changeAdmin"
id="${companyInstance.id}"
title="Change Primary Admin"
width="600"
linkname="Change Primary Admin" />
The action in the controller is preparing a list in the certain way to be displayed in the popup that the modal box opens. But the problem is that the server side is not being called every time, only in IE.
I have tried absolute and specifying a mapping as well but to no avail.
Also i have set the page attributes in the gsp page to not cache the data at all.
<META HTTP-EQUIV="Pragma" CONTENT="no-cache">
<META HTTP-EQUIV="Expires" CONTENT="-1">
But even this does not seem to work.
Any help is much appreciated.
Adhir
The browser is still caching your request. You can add a parameter of the current time stamp to the request.
<modalbox:createLink
controller="company"
action="setChangeCompanyAdmin"
absolute="true"
params="${cacheKiller: new Date()​.time​}"
mapping="changeAdmin"
id="${companyInstance.id}"
title="Change Primary Admin"
width="600"
linkname="Change Primary Admin" />
It is probably IE caching the response. If you want to disable caching via the controller's response object, the following code should work:
response.setHeader("Pragma", "no-cache")
response.setHeader("Cache-Control", "no-cache, no-store")

Resources