I have implemented a feature of showing url preview for that i need meta information of the url . I'm using JSOUP to read the HTML meta information. All of a sudden i'm facing issues with youtube url's . ex :
https://www.youtube.com/watch?v=qszGzNoopTc. When i tried to pull meta information of the above mentioned url. I'm getting head tag as empty.
Here is the sample to fetch html of url.
Document doc = Jsoup.connect((String)url)
.header("Accept-Encoding", "gzip, deflate")
.userAgent("Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.69 Safari/537.36")
.maxBodySize(0)
.timeout(600000)
.ignoreContentType(true).get();
Note : I'm trying this in JAVA & App engine environment . Till couple of days back it worked fine, Not sure what is causing this problem now.
When i tried in https://try.jsoup.org/ it is fine.
Related
I'm a noob here, be gentle with me :)
I'm using puppeteer to extract data from a suppliers website (they've given me permission to do this) and import into WordPress / WooCommerce . I can get the product data no problem but I'm hitting a wall with images.
I can extract the images fine. The problem I'm facing is that the website is serving some images in webp format. From what I understand, the server would/should have both .jpg and .webp images and if the browser supports it, it serves the webp image.
So the URL that I get the image from is something like "https://example.com/images/myimage.jpg" but it's actually giving me the webp image. I need to know at the point of getting the image from the site if I'm being given the jpg or webp version so I can save it appropriately and then work out what to do with it.
I'm planning to convert these images using sharp when I know what extension I've actually got
So I guess a few questions are;
Is it possible to force puppeteer to NOT serve me webp format and give me just jpg?
OR
Is it possible when extracting the image to see what type it actually is before I save it so I know what extension to save it as?
Is it possible for sharp to identify the image type before I try to convert?
Thanks, Dan
Looks like puppeteer allows you to set a user agent. If I set this to a browser that does not support webp images, I'm given the jpg images by default
page.setUserAgent('Mozilla/5.0 CK={} (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko')
I am developing a HTML5 web app for use offline on an iPad2 using mobile Safari and the "Add to home screen" feature. I am able to achieve offline caching using a cache.manifest file in desktop Chrome but cannot make it work in iOS mobile Safari.
The app runs smooth on the iPad while online, but once I go offline I get these error messages: "MyApp could not be opened because it is not connected to the internet" (in "added to home screen" view on an iPad) and "Safari cannot open the page because it is not connected to the internet" (in safari-view on that same iPad).
I have read hundreds of troubleshooting / question pages and manifest tutorials on the Net trying to resolve this issue and none of the suggestions work. After reading so much about this capability it should be very easy to implement and yet here I am.
Here is a summary of what I have done / tried / used so far without success. I have tried all of the below using both cache.manifest and manifest.appcache variations without success but for simplicity I will only document the cache.manifest case:
I am developing and testing using latest Xampp Apache for Windows server locally installed on Win10 x64
The target device is an iPad2 running iOS version 8.4 and mobile safari version 8. My full user agent string is:
Mozilla/5.0 (iPad; CPU OS 8_4 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12H143 Safari/600.1.4
In Xampp I have updated the httpd.conf file to include the correct MIME types for .manifest
AddType text/cache-manifest .manifest
In Xampp I have updated the mime.types file under xampp\apache\conf\ to include the correct MIME types for .manifest
text/cache-manifest manifest
In Xampp, as my web app uses ttf, woff, ico, png, jpg, js, mp3 and css files, I have verified the mime.types file under xampp\apache\conf\ to ensure it includes the MIME types for:
application/x-font-ttf ttf ttc
application/x-font-woff woff
image/x-icon ico
image/png png
image/jpeg jpeg jpg jpe
application/javascript js
audio/mpeg mpga mp2 mp2a mp3 m2a m3a
I have placed a .htaccess file in the web apps root public HTML directory for the correct MIME types for .manifest
AddType text/cache-manifest .manifest*
I have included the manifest attribute in the HTML element of the index page:
<!DOCTYPE html>
<html lang="en" manifest="cache.manifest">
<head>
I've tried removing this line from the declaring index.html but it did not work:
<meta name="apple-mobile-web-app-capable" content="yes">
I've allowed plenty of time for the app to cache in Safari before switching to Airplane mode and refreshing. I am using a Windows machine so cannot use Web Inspector to debug. I used Jonathan Stark's Debugging Script and JSConsole to try and debug but it doesn't really give much useful information except that it is uncached which I know because it isn't working.
I have created a cache.manifest file and placed it in the web apps root public HTML directory. I have included the advice of other solutions, many of which where derived from other stackoverflow questions, including:
Primarily I've stuck with the cache.manifest name as multiple sources have advised Safari mobile will igrnore everything else
Not including the index.html file which references the .manifest
Listing all resources under the CACHE section
Including the * after NETWORK:
Including all section headers even if not used
Used only relative URI's
The manifest file contents are relative to the manifest file (it is in the web apps root directory with index.html)
The manifest file is being served from the same origin as the host
Ensured all files are available to avoid errors and dropping the .manifest. As I mentioned offline caching is working in desktop Chrome which validates the manifest's contents
The manifest file does not list the manifest file
The content of the manifest is:
CACHE MANIFEST
# ver 0.0.8
CACHE:
data/apple-touch-icon.png
data/favicon.ico
data/fnt0.ttf
data/fnt0.woff
data/fnt1.ttf
data/fnt1.woff
data/fnt2.ttf
data/fnt2.woff
data/fnt3.ttf
data/fnt3.woff
data/html5.png
data/html5-unsupported.html
data/img0.jpg
data/img1.png
data/img10.jpg
data/img11.jpg
data/img12.png
data/img13.png
data/img14.png
data/img15.png
data/img16.jpg
data/img17.png
data/img18.png
data/img19.png
data/img2.png
data/img20.png
data/img21.png
data/img22.png
data/img23.png
data/img24.png
data/img25.png
data/img26.png
data/img27.png
data/img28.png
data/img29.png
data/img3.png
data/img30.png
data/img31.png
data/img4.png
data/img5.png
data/img6.png
data/img7.png
data/img8.png
data/img9.png
data/player.js
data/slide1.css
data/slide1.js
data/slide10.css
data/slide10.js
data/slide11.css
data/slide11.js
data/slide12.css
data/slide12.js
data/slide13.css
data/slide13.js
data/slide14.css
data/slide14.js
data/slide15.css
data/slide15.js
data/slide16.css
data/slide16.js
data/slide17.css
data/slide17.js
data/slide18.css
data/slide18.js
data/slide2.css
data/slide2.js
data/slide3.css
data/slide3.js
data/slide4.css
data/slide4.js
data/slide5.css
data/slide5.js
data/slide6.css
data/slide6.js
data/slide7.css
data/slide7.js
data/slide8.css
data/slide8.js
data/slide9.css
data/slide9.js
data/sound1.mp3
NETWORK:
*
FALLBACK:
I would really appreciate some fresh eyes on this issue, I just can't see where the problem could be.
Can you try to decrease the size of cached files? In my case it helps, but not solved all the problems) Cached files size was at least 30 Mb, after weight loss they become <1 Mb and AppCache finally start working.
I have a React Native WebView that runs a small HTML document. The document shows a few images.
My hope is to show images located in the app's Documents folder, i.e. the images are not static assets, but are downloaded by the app at runtime and stored on disk. These images are then referenced from HTML running inside a React Native WebView.
This is what I have tried so far:
Sourcing the file directly
I have tried sourcing the file from within the WebView which does not work (404 Not Found):
1. Simulator
::1 - - [25/Nov/2016:09:55:52 +0000] "GET /Users/me/Library/Developer/CoreSimulator/Devices/43707753-69A2-4EC7-B990-F7910A853F42/data/Containers/Data/Application/E4F4A368-02B0-4BAE-BEB3-BDF0FF7ADDDF/Documents/1657.jpeg HTTP/1.1" 404 193 "http://localhost:8081/assets/src/index.html?platform=ios&hash=8cb6d49177b95c46ed6654eb038a9a8d" "Mozilla/5.0 (iPhone; CPU iPhone OS 10_1 like Mac OS X) AppleWebKit/602.2.14 (KHTML, like Gecko) Mobile/14B72"
2. Phone
::ffff:192.168.100.143 - - [25/Nov/2016:11:58:31 +0000] "GET /var/mobile/Containers/Data/Application/970B3033-4AB1-48CF-AFC9-D30534D30BCE/Documents/1657.jpeg HTTP/1.1" 404 108 "http://192.168.100.114.xip.io:8081/assets/src/index.html?platform=ios&hash=8cb6d49177b95c46ed6654eb038a9a8d" "Mozilla/5.0 (iPhone; CPU iPhone OS 10_1_1 like Mac OS X) AppleWebKit/602.2.14 (KHTML, like Gecko) Mobile/14B100"
Seeing as I think this is the correct path (for iOS at least), I think it might be a permissions problem, although unsure.
Looking at the docs for React Native WebView component, the source property has two modes:
Load a URI
Load a static HTML string
The result of a call to require(some.html)
In the second case it is possible to specify a baseUrl. If you set the baseUrl to the directory into which images are downloaded, you should be able to reference them by using <img src="./your-image.png" />.
Post-answer edit: Clarification regarding external HTML
Sadly there is no way to specify a base URL in cases 1 and 3, so you have to first convert it to a string which can be passed to the source property. If your HTML refers some external Javascript, you have to get a reference to the bundle directory path, which is where those files have to be to be readable by your app, and reference them relative to that path.
EDIT: This refers to React Native 0.38
I am guessing your project structure follows the assets/src... structure that you are trying to retrieve. The RN packager does not expose your project at all, it simply packs the transpiled bundle (several, actually, varying on platform and debug/release mode) and offers it for download. Event if this worked, it wouldn't help you much once your application goes live. I think this answer might cover your usecase.
We have a rails application that serves up PDFs to users via send_file
We are getting complaints that when the user opens multiple PDFs in a given day, when they click on our link, adobe opens the PDF they read last time.
We have looked at our logs / audits and everything appears that the correct data was sent to the user's browser.
We are unable to reproduce this problem, and we are only getting 1 or 2 out of thousands of users that are running into this issue.
The only workaround right now is for the user to close all instances of Firefox.
Anyone ever seen anything like this before?
It sounds like a caching issue to me.
I add this to the headers of the PDF's my web applications serve:
format.pdf do
response.headers['Accept-Ranges'] = 'none'
response.headers['Cache-Control'] = 'private, max-age=0, must-revalidate'
response.headers['Pragma'] = 'public'
response.headers['Expires'] = '0'
render
end
I added these headers to solve issues serving Internet Explorer clients over SSL and there may be more in there than you need, but it looks like it could solve your issue, too.
We have a Web application (ASP.NET) which displays in some page links to miscellaneous Office documents.
The links are not to web files but rather to a web page which dynamically loads the content form a network folder then sends it to the browser by appropriately setting the mime type.
Everything works fine when testing with desktop browsers (a.k.a. the browser proposes the right application to open the file and the file is successfully opened).
When testing with a Blackberry (Bold 9000) the built-in browser raises an error message stating the selected item (an Excel document) cannot be displayed.
The odd thing is Excel files attached to mail messages can be opened on the same device (via Documents To Go I think).
Anybody has an idea why the Excel attachment could be opened as email attachment but not when downloaded from web? Could this be caused by an incorrect MIME type setting?
Please note that the Blackberry testing was done only by a (remote) user as the BB used for development has an older OS (4.3) which doesn't support Office files anyway. I am not able to actually test with a 4.5+ BB.
Here's the code (excerpts, f is a FileInfo):
Response.Clear();
Response.ClearHeaders();
Response.ClearContent();
Response.AddHeader("Content-Length", f.Length.ToString());
Response.ContentType = "application/excel" // for xls files
Response.AddHeader("Content-Disposition", "inline; filename=" + f.Name);
Response.WriteFile(f.FullName);
Response.Flush();
Response.Close();
Response.End();
I am going to try different mime types as documented on filext.com but, as this is going to take a while because of physical device unavailability, if anybody has a clue I'd be glad to hear about it. I'll keep this posted if I find a solution.
Thanks.
On RIM they say you should use BES to view PDF, DOC etc
I've test it: ASP.NET site with simple < a href ="...">< /a > link to doc file + MDS simulator + 9350 device emulator. Results same as in forum topic:
Socket Channel not able to connect;
address 127.0.0.1:1900
Unfortunately can't test it on BES, but you should try it.
My opinion is BES allows to view office files with DocsToGo opened with simple links.
Actually I was a bit misguided by user's feedback: after further investigating this issue, it came out that only Excel files failed to open, not all Office files.
After changing the MIME type from "application/excel" to "application/**vnd.ms-**excel" it worked fine.