WebHarvest can't find response headers - webharvest

I'm working with WebHarvest to fetch data from a site that requires logging in.
It's setup like this:
Page 1 = Login page
Page 2 = Login validation page
Page 3 = Statistics page
On page 2 a cookie is set. When monitoring the opening of Page 2 with Firebug I get these headers:
Connection Keep-Alive
Content-Type text/html; charset=UTF-8
Date Tue, 23 Oct 2012 18:25:12 GMT
Keep-Alive timeout=15, max=100
Server Apache/2.0.64 (Win32) JRun/4.0 SVN/1.3.2 DAV/2
Set-Cookie SESSION=hej123;expires=Thu, 16-Oct-2042 18:25:12 GMT;path=/
Transfer-Encoding chunked
When calling the same page with WebHarvest I only get these headers:
Date=Tue, 23 Oct 2012 18:31:51 GMT
Server=Apache/2.0.64 (Win32) JRun/4.0 SVN/1.3.2 DAV/2
Transfer-Encoding=chunked
Content-Type=text/html; charset=UTF-8
It seems that three headers (Set-Cookie, Connection and Keep-Alive) are not found by WebHarvest. Page 1, 2 and 3 are dummys so no actual validation is done. The cookie is always set on the serverside for Page 2.
Here is the WebHarvest code I am currently using:
<var-def name="content2">
<html-to-xml>
<http method="post" url="http://myurl.com/page2.cfm">
<http-param name="Login">sigge</http-param>
<http-param name="Password">hej123</http-param>
<http-param name="doLogin">Logga in</http-param>
<loop item="currField">
<list>
<var name="ctxtNewInputs" />
</list>
<body>
<script><![CDATA[
item = (NvPair) currField.getWrappedObject();
SetContextVar("itemName", item.name);
SetContextVar("itemValue", item.value);
]]></script>
<http-param name="${item.name}"><var name="itemValue" /></http-param>
</body>
</loop>
<script><![CDATA[
String keys="";
for(int i=0;i<http.headers.length;i++) {
keys+=(http.headers[i].key + "=" + http.headers[i].value +"\n---\n");
}
SetContextVar("myCookie", keys);
]]></script>
<file action="write" path="c:/kaka.txt">
<var name="myCookie"/>
</file>
</http>
</html-to-xml>
</var-def>
Edit:
when checking I noticed that the cookie is set in WebHarvest, even if the http header can't be found programatically. Is it possible that some response headers are hidden from usage?
Does anyone know a work-around for this problem?
Thank you and best regards,
SiggeLund

The way to get http header value into user-defined variable scoped for the whole config is the following:
<http url="your.url.here" method="GET">
<!--Any settings you apply for the POST/GET call-->
</http>
<!--Now you've got your http object you are going to get header value from -->
<!--At it simplest the acquisition of value goes like the below-->
<var-def name="fifth_header_val">
<script return="http.headers[5].value"/>
</var-def>
The above is just to give a clue. You can iterate over http.headers index and collect keys and values you need for your particular task.

Related

Make POST request from Mule ESB

I'm stuck with making correct HTTP request to web server (running under PHP).
I need to send POST request with property json and some value, for example { "employee_id":191, "date":"2015-08-11", "time":"14:26:00" }.
It's working if I make a request from Postman or cURL for example, the request will look something like this
POST /DeliveryDetails/ HTTP/1.1
Host: 192.168.0.100:80
Cache-Control: no-cache
Content-Type: application/x-www-form-urlencoded
json=%7B+%22employee_id%22%3A191%2C+%22date%22%3A%222015-08-11%22%2C+%22time%22%3A%2214%3A26%3A00%22+%7D
Also I can send with conntent type multipart/form-data
POST /DeliveryDetails/ HTTP/1.1
Host: 192.168.0.100:80
Cache-Control: no-cache
Content-Type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW
----WebKitFormBoundary7MA4YWxkTrZu0gW
Content-Disposition: form-data; name="json"
{ "employee_id":191, "date":"2015-08-11", "time":"14:26:00" }
----WebKitFormBoundary7MA4YWxkTrZu0gW
or with cURL
curl -d "json={ \"employee_id\":191, \"date\":\"2015-08-11\", \"time\":\"14:26:00\" }" http://192.168.0.100:80/DeliveryDetails/
But when I'm trying to make request from Mule ESB it's not working since the request is incorrect.
The flow looks like this
<sub-flow name="my-flow">
<logger message="Request: #[payload]" level="INFO" doc:name="Log request"/>
<http:request config-ref="request-HTTP" path="/DeliveryDetails/" method="POST" doc:name="HTTP call" />
<object-to-string-transformer doc:name="Object to String"/>
<logger message="Response: #[payload]" level="INFO" doc:name="Log response"/>
</sub-flow>
#[payload] contains the value { "employee_id":191, "date":"2015-08-11", "time":"14:26:00" }
and if I do it like this the body would simply contain it (without additional information like Content-Type, I think thats the problem).
I have tried to add query-param
<http:request-builder >
<http:query-param paramName="json" value="#[payload]" />
</http:request-builder>
or use message-properties-transformer
<message-properties-transformer doc:name="Message Properties">
<add-message-property key="json" value="#[payload]"/>
</message-properties-transformer>
but the result is still the same.
EDIT
The HTTP configuration look like this
<http:request-config name="request-HTTP"
host="192.168.0.100"
port="80"
doc:name="HTTP Request Configuration" />
Also tried to set Content-Type with
<set-property propertyName="Content-Type" value="application/x-www-form-urlencoded" doc:name="Property"/>
and
<http:request-builder>
<http:query-param paramName="json" value="#[payload]"/>
<http:header headerName="Content-Type" value="application/x-www-form-urlencoded"/>
</http:request-builder>
However the body I'm receiving is still just payload, without other properties for example json= or Content-Disposition: form-data; name="json"
Since the payload is just the JSON data, that's what will be sent in most scenarios. Your curl example sets the "json=" part to the body. So there are a couple of options here:
Modify the payload to add the body you want as you do with curl and set the Content-Type to application/x-www-form-urlencoded.
Send multipart content by adding the data as an attachment. In your case try:
<sub-flow name="my-flow">
<logger message="Request: #[payload]" level="INFO" doc:name="Log request"/>
<set-attachment attachmentName="json" value="#[payload]" contentType="application/json"/>
<http:request config-ref="request-HTTP" path="/DeliveryDetails" method="POST" doc:name="HTTP call" />
<object-to-string-transformer doc:name="Object to String"/>
<logger message="Response: #[payload]" level="INFO" doc:name="Log response"/>
</sub-flow>
Set the payload to be a map containing a key "json" with the payload as value. This should make Mule send a form request without you setting the Content-Type explicitly to application/x-www-form-urlencoded.
HTH.
You need to put path="/DeliveryDetails"
You can follow the following config :-
<http:request-config name="HTTP_Request_Configuration" host="192.168.0.100" port="80" doc:name="HTTP Request Configuration"/>
and in the Mule flow or sub flow and set the Content-Type as follows:-
<set-property propertyName="Content-Type" value="application/json" doc:name="Property"/>
<http:request config-ref="HTTP_Request_Configuration" path="/DeliveryDetails" method="POST" doc:name="HTTP call" />
<logger message="Input JSON message ****** #['\n'+ message.payloadAs(java.lang.String)]" level="INFO" doc:name="Logger"/>
You can configure the Content-Type here as per your requirement
You can also refer here :- How do I force the HTTP Request Connector to use a certain Content-Type?

Inserting data with OData in atom format

Odata is a new thing for me and I'm trying getting in deep with it. So I'm trying insert data using OData protocol in atom format and using a rest client. So I've created the following http Post request:
POST /HelloOdata/library.xsodata/books HTTP/1.1
Host: coe-he-55:8010
Authorization: Basic xxxxxxxxxxxxxxxxxxxxx
DataServiceVersion: 1.0
MaxDataServiceVersion: 2.0
accept: application/atom+xml
Content-Type: application/atom+xml
Cache-Control: no-cache
Postman-Token: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
<?xml version="1.0" encoding="utf-8"?>
<Entry xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices"
xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata"
xmlns="http://www.w3.org/2005/Atom">
<title type="text">books</title>
<author>
<name />
</author>
<link href="books('Test_post')/Author" rel="http://schemas.microsoft.com/ado/2007/08/dataservices/related/Author" title="Author" type="application/atom+xml;type=entry"/>
<category term="HelloOdata.library.booksType"
scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
<content type="application/xml">
<m:properties>
<d:title>Test_post</d:title>
<d:ISBN>ISBN_POST</d:ISBN>
<d:editions>2</d:editions>
</m:properties>
</content>
</Entry>
and as a response I've got: The serialized resource has an missing value for member 'title'.
Well my table books has only three properties which are title, ISBN and editions precisely those one I'm trying insert through this statement. So, do you have any idea what can be wrong in it?
Thank you
Pablo
I've found where the error was.
Unbelievably the right xml request is:
POST /HelloOdata/library.xsodata/books HTTP/1.1
Host: coe-he-55:8010
Authorization: Basic xxxxxxxxxxxxxxxxxxxxx
DataServiceVersion: 1.0
MaxDataServiceVersion: 2.0
accept: application/atom+xml
Content-Type: application/atom+xml
Cache-Control: no-cache
Postman-Token: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
<?xml version="1.0" encoding="utf-8"?>
<entry xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices"
xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata"
xmlns="http://www.w3.org/2005/Atom">
<title type="text">books</title>
<author>
<name />
</author>
<category term="HelloOdata.library.booksType"
scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
<content type="application/xml">
<m:properties>
<d:title>Test_post</d:title>
<d:ISBN>ISBN_POST</d:ISBN>
<d:editions>2</d:editions>
</m:properties>
</content>
</entry>
well I also had to get off with this part:
<link href="books('Test_post')/Author" rel="http://schemas.microsoft.com/ado/2007/08/dataservices/related/Author" title="Author" type="application/atom+xml;type=entry"/>
but this was an attempt after the first one, because the real problem was the tag
<Entry>
write with E and not
<entry>
Once I changed it, the Http request works well.
I saw this example of insertion of data with OData on the official website guideline:
http://www.odata.org/documentation/odata-version-2-0/operations and there the tag entry was written with capital letter.
Thank you!
Pablo

.NET MVC What is the best way to disable browser caching?

As far as my research goes, there are several steps in order to make sure that browser caching is disabled. These HTTP headers must be set:
Cache-Control: no-cache, no-store, must-revalidate, proxy-revalidate
Pragma: no-cache
Expires: -1
Last-Modified: -1
I have found out that this can be done in two ways:
Way One: use the web.config file
<add name="Cache-Control" value="no-store, no-cache,
must-revalidate, proxy-revalidate"/>
<add name="Pragma" value="no-cache" />
<add name="Expires" value="-1" />
<add name="Last-Modified" value="-1" />
Way Two: use the meta tags in _Layout.cshtml
<meta http-equiv="Cache-Control" content="no-cache, no-store,
must-revalidate, proxy-revalidate" />
<meta http-equiv="Pragma" content="no-cache" />
<meta http-equiv="Expires" content="-1" />
<meta http-equiv="Expires" content="-1" />
My Question: which is the better approach? Or, alternatively, are they equally acceptable? How do these all relate to different platforms? Which browsers would honor what headers?
In addition, please feel free to add anything I've missed, if any.
Okay folks, seems I made a blunt mistake. There is a best way and that is not using meta tags. The only correct way is to use headers.
Why not use meta tags? Because they are guaranteed not to work with
proxies, which do not read (not supposed to read) the HTML body; they
rely on the headers.
When both Cache-Control and Expires are present, Cache-Control takes
precedence. Source here.
Cache-Control general-header field is used to specify directives that
MUST be obeyed by all caching mechanisms along the request/response
chain. Source here.

how to set up both httpexpires and cachecontrol headers web.config

Earlier I was asking the question how to set up both httpexpires and cachecontrol headers
I think I kind of found the answer
<clientCache cacheControlCustom="public" httpExpires="Tue, 19 Jan 2038 03:14:07 GMT" cacheControlMaxAge="12:00:00" cacheControlMode="UseExpires" />
Now i am not receiving 500 internal error for image requests.
But now I have a new question
Looks like If I set cacheControlMode="UseExpires" it will use httpExpires as content expiratoin but if I set cacheControlMode="UseMaxAge" it will use cacheControlMaxAge as content expiration. So still not clear how to set both cacheControlMaxAge and httpExpires? Is it possible?

Trouble getting a YQL table working

So I'm trying to set up a YQL table using the API at http://www.teamliquid.net/video/streams/?filter=live&xml=1 but having some issues.
Here's my table definition:
<?xml version="1.0" encoding="UTF-8"?>
<table xmlns="http://query.yahooapis.com/v1/schema/table.xsd">
<meta>
<author>TL.net</author>
<description>TL.net's streams</description>
<documentationURL>none</documentationURL>
<sampleQuery>select * from {table}</sampleQuery>
</meta>
<bindings>
<select itemPath="streamlist" produces="XML">
<urls>
<url>http://www.teamliquid.net/video/streams/?xml=1</url>
</urls>
<inputs>
<key id="filter" type="xs:string" paramType="query" />
</inputs>
</select>
</bindings>
</table>
Running use "store://q5awkFLmEqteFVOTUJbQ6h" as tl; select * from tl where filter="live" yields the following error:
<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng"
yahoo:count="0" yahoo:created="2012-02-13T22:14:48Z" yahoo:lang="en-US">
<diagnostics>
<publiclyCallable>true</publiclyCallable>
<url execution-start-time="1" execution-stop-time="33"
execution-time="32" proxy="DEFAULT"><![CDATA[store://q5awkFLmEqteFVOTUJbQ6h]]></url>
<url execution-start-time="35" execution-stop-time="232"
execution-time="197" http-status-code="406"
http-status-message="Not Acceptable" proxy="DEFAULT"><![CDATA[http://www.teamliquid.net/video/streams/?xml=1&filter=live]]></url>
<user-time>232</user-time>
<service-time>258</service-time>
<build-version>25247</build-version>
</diagnostics>
<results/>
</query>
I really can't figure out why it's not working.
In the debug statements, you can see that YQL is reading from your source URL: http://www.teamliquid.net/video/streams/?xml=1&filter=live, but is receiving back an HTTP 406 Not Acceptable error message.
HTTP 406 is meant to cover cases where the server cannot respond in any of the requested (Accept header) formats. I don't know how that applies in this case, but the teamliquid.net source mentions the following:
gzip encoding is required, please also send a valid User-Agent with the name of your application / site and contact info. This page and the XML are updated every five minutes, please do not poll more frequently than every five minutes or you may risk being IP banned. If you have any questions, please PM R1CH.
I suspect it's one of two things:
The YQL servers are not requesting data in gzip or compressed format
The teamliquid.net servers are blocking YQL

Resources