Finding out if a specific external website page exists (dart)

Finding out if a specific external website page exists (dart) - dart

I'm trying to find out if a specific webpage exists. I'm using the dart:html library in the dart file where I'm interested in this information.
For example, I want to find out if a page like this exists: https://developer.mozilla.org/en-US/docs/Web/API/HTMLTextAreaElement.autofocus
How could I do this in dart ?

Just fetch the site and if you get an error response, the site doesn't exist (or is currently down).
import 'dart:html';
main() async {
var url = 'http://www.google.com/somepage.html';
var response = await HttpRequest.getString(url);
// examine response for errors
}
try at DartPad

Related

How to catch the redirect with a webapp using playwright

When you go to this link, the page will run some javascript and then automatically redirect to a pdf. I have a hard time getting that final url from Playwright.
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.goto("https://scnv.io/760y", wait_until="networkidle")
print(page.url)
page.close()
Is there a way to get that final url?

There are multiple ways to do it. One way is using page.expect_response:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
page = browser.new_page()
# Catch any responses with '.pdf' at the end of the url
with page.expect_response('**/*.pdf') as response:
page.goto("https://scnv.io/760y")
print(response.value.url)
page.close()
Output
https://qcg-media.s3.amazonaws.com/media/uploads/72778/2022/06/20220622_663043_221.pdf
Check out this section of the documentation that details handling network traffic in playwright.
Also note that I did not include wait_until='networkidle' because that was not appropriate for this use case. For that event to trigger, the network must remain idle for at least 500 ms, which does not happen in the case of this website when it's making the request to the pdf. Therefore, if you were to include that, then the code will be inconsistent at best in catching the request we wanted the url of.

How to get link to third party site in 'about channel' section via python

I want to display information about links in the YouTube profile in a text document, I tried to do it through the requests library, but Google gave links to privacy and security, I did not find information about this in the YouTube API documentation. Who knows, you can help with this

This isn't possible to get using the YouTube API, I actually found myself needing to do the same thing as yourself and was not able to because the YouTube API lacked the necessary functionality (Hopefully, It will be added soon!)
I see you mentioned Python, My only solution is in Node but I will do a large explanation and you can base your code off of it. In order to get the banner links without the YouTube API, we need to scrape the data, since YouTube uses client-side rendering we need to scrape the JSON configuration from the source.
There's a variable defined inside a script called ytInitialData which is a big JSON string with a massive amount of information about the channel, viewer, and YouTube configurations. We can find the banner links by parsing through this JSON link.
const request = require("request-promise").defaults({
simple: false,
resolveWithFullResponse: true
})
const getBannerLinks = async () => {
return request("https://www.youtube.com/user/pewdiepie").then(res => {
if (res.statusCode === 200) {
const parsed = res.body.split("var ytInitialData = ")[1].split(";</script>")[0]
const data = JSON.parse(parsed)
const links = data.header.c4TabbedHeaderRenderer.headerLinks.channelHeaderLinksRenderer
const allLinks = links.primaryLinks.concat(links.secondaryLinks || [])
const parsedLinks = allLinks.map(l => {
const url = new URLSearchParams(l.navigationEndpoint.commandMetadata.webCommandMetadata.url)
return {
link: url.get("q"),
name: l.title.simpleText,
icon: l.icon.thumbnails[0].url
}
})
return parsedLinks
} else {
// Error/ratelimit - Handle here
}
})
}
The way the links are scraped is as follows:
We make a HTTP request to the channel's URL
We parse the body to extract the JSON string that the banner links are inside using split
We parse the JSON string into a JSON object
We extract the links from their JSON section (It's a big JSON object data.header.c4TabbedHeaderRenderer.headerLinks.channelHeaderLinksRenderer
Because there are two types of links (Primary, the one that shows the text and secondary, links that don't show the text) we have to concatenate them together so we can map through them
We then map through the links and use URLSearchParams to extract the q query parameter since YouTube encrypts their outgoing links (Most likely for security reasons) and then extract the name and icon too using their appropriate objects.
This isn't a perfect solution, should YouTube update/change anything on their front end this could break your program easily. YouTube also has rate limits for their software if you're trying to mass scrape you'll run into 429/403 errors.

Retrying a failed dynamic import? (SPA/PWA)

Something that I'm running into with a project I'm working on is, I have a single page application. I'm handling browser navigation routing on the client-side, that lets me dynamically import some modules whenever a route is matched. My routing setup looks a bit like this:
router.setRoutes([
{
path: '/',
component: 'app-main', // statically loaded
},
{
path: '/posts',
component: 'app-posts',
action: () => { import('./app-posts.js').catch(() => Router.go('/offline');} // dynamically loaded
},
{
path: '/offline', network
component: 'app-offline', // also statically loaded
}
]);
And here's a simple image of the 'app' for clarity:
I'm caching the app shell in my service worker, which means that the main page, and the offline page get precached, and the posts page should get cached during runtime (once requested, so if the user clicks on the posts link)
So my precache manifest caches: main.js, offline.js, and my index.html.
Where I'm hitting a bump is:
The user loses network connection
The user tries to go to the posts page
The dynamic import for this may fail if it hasnt been requested and cached before (and the user will be redirected to the offline page)
But when my user gains network connectivity again, clicks the posts link, the dynamic import will still fail; I'm guessing because the browser dedupes dynamic imports.
Which is a huge shame, because my user has a network connection; this request should succeed! The only way I can figure out how to deal with this is to have the user reload the page, and request the posts page again.
So my question is, how should I go about this?

Solved it by checking if the user has network connection before trying to do the import like this:
function handleRoute(url) {
if('onLine' in navigator) {
if(navigator.onLine) {
// user is online, safe to import
import(url);
} else {
// user is offline, don't even try to import -> straight to `/offline`
Router.go('/offline');
}
} else {
// incase the browser doesnt support `navigator.onLine`, try to import anyway
import(url);
}
}
It feels like a slightly hacky way to do it, but then again browser support for navigator.onLine is pretty good.
For more information on retrying failed dynamic imports, I found this issue in the tc39/proposal-dynamic-import github repo.

The browser is not doing that sort of caching/deduping/whatever. You can verify this easily by calling import from the console and toggling your network online/offline.
So the problem is most likely with the framework you're using. The caching of the first call to the route action happens somewhere in the framework. Maybe consult the documentation?

How to retrieve Medium stories for a user from the API?

I'm trying to integrate Medium blogging into an app by showing some cards with posts images and links to the original Medium publication.
From Medium API docs I can see how to retrieve publications and create posts, but it doesn't mention retrieving posts. Is retrieving posts/stories for a user currently possible using the Medium's API?

The API is write-only and is not intended to retrieve posts (Medium staff told me)
You can simply use the RSS feed as such:
https://medium.com/feed/#your_profile
You can simply get the RSS feed via GET, then if you need it in JSON format just use a NPM module like rss-to-json and you're good to go.

Edit:
It is possible to make a request to the following URL and you will get the response. Unfortunately, the response is in RSS format which would require some parsing to JSON if needed.
https://medium.com/feed/#yourhandle
⚠️ The following approach is not applicable anymore as it is behind Cloudflare's DDoS protection.
If you planning to get it from the Client-side using JavaScript or jQuery or Angular, etc. then you need to build an API gateway or web service that serves your feed. In the case of PHP, RoR, or any server-side that should not be the case.
You can get it directly in JSON format as given beneath:
https://medium.com/#yourhandle/latest?format=json
In my case, I made a simple web service in the express app and host it over Heroku. React App hits the API exposed over Heroku and gets the data.
const MEDIUM_URL = "https://medium.com/#yourhandle/latest?format=json";
router.get("/posts", (req, res, next) => {
request.get(MEDIUM_URL, (err, apiRes, body) => {
if (!err && apiRes.statusCode === 200) {
let i = body.indexOf("{");
const data = body.substr(i);
res.send(data);
} else {
res.sendStatus(500).json(err);
}
});
});

Nowadays this URL:
https://medium.com/#username/latest?format=json
sits behind Cloudflare's DDoS protection service so instead of consistently being served your feed in JSON format, you will usually receive instead an HTML which is suppose to render a website to complete a reCAPTCHA and leaving you with no data from an API request.
And the following:
https://medium.com/feed/#username
has a limit of the latest 10 posts.
I'd suggest this free Cloudflare Worker that I made for this purpose. It works as a facade so you don't have to worry about neither how the posts are obtained from source, reCAPTCHAs or pagination.
Full article about it.
Live example. To fetch the following items add the query param ?next= with the value of the JSON field next which the API provides.

const MdFetch = async (name) => {
const res = await fetch(
`https://api.rss2json.com/v1/api.json?rss_url=https://medium.com/feed/${name}`
);
return await res.json();
};
const data = await MdFetch('#chawki726');

To get your posts as JSON objects
you can replace your user name instead of #USERNAME.
https://api.rss2json.com/v1/api.json?rss_url=https://medium.com/feed/#USERNAME

With that REST method you would do this: GET https://api.medium.com/v1/users/{{userId}}/publications and this would return the title, image, and the item's URL.
Further details: https://github.com/Medium/medium-api-docs#32-publications .
You can also add "?format=json" to the end of any URL on Medium and get useful data back.

Use this url, this url will give json format of posts
Replace studytact with your feed name
https://api.rss2json.com/v1/api.json?rss_url=https://medium.com/feed/studytact

I have built a basic function using AWS Lambda and AWS API Gateway if anyone is interested. A detailed explanation is found on this blog post here and the repository for the the Lambda function built with Node.js is found here on Github. Hopefully someone here finds it useful.

(Updating the JS Fiddle and the Clay function that explains it as we updated the function syntax to be cleaner)
I wrapped the Github package #mark-fasel was mentioning below into a Clay microservice that enables you to do exactly this:
Simplified Return Format: https://www.clay.run/services/nicoslepicos/medium-get-user-posts-new/code
I put together a little fiddle, since a user was asking how to use the endpoint in HTML to get the titles for their last 3 posts:
https://jsfiddle.net/h405m3ma/3/
You can call the API as:
curl -i -H "Content-Type: application/json" -X POST -d '{"username":"nicolaerusan"}' https://clay.run/services/nicoslepicos/medium-get-users-posts-simple
You can also use it easily in your node code using the clay-client npm package and just write:
Clay.run('nicoslepicos/medium-get-user-posts-new', {"profile":"profileValue"})
.then((result) => {
// Do what you want with returned result
console.log(result);
})
.catch((error) => {
console.log(error);
});
Hope that's helpful!

Check this One you will get all info about your own post........
mediumController.getBlogs = (req, res) => {
parser('https://medium.com/feed/#profileName', function (err, rss) {
if (err) {
console.log(err);
}
var stories = [];
for (var i = rss.length - 1; i >= 0; i--) {
var new_story = {};
new_story.title = rss[i].title;
new_story.description = rss[i].description;
new_story.date = rss[i].date;
new_story.link = rss[i].link;
new_story.author = rss[i].author;
new_story.comments = rss[i].comments;
stories.push(new_story);
}
console.log('stories:');
console.dir(stories);
res.json(200, {
Data: stories
})
});
}

I have created a custom REST API to retrieve the stats of a given post on Medium, all you need is to send a GET request to my custom API and you will retrieve the stats as a Json abject as follows:
Request :
curl https://endpoint/api/stats?story_url=THE_URL_OF_THE_MEDIUM_STORY
Response:
{
"claps": 78,
"comments": 1
}
The API responds within a reasonable response time (< 2 sec), you can find more about it in the following Medium article.

Modify URL before loading page in firefox

I want to prefix URLs which match my patterns. When I open a new tab in Firefox and enter a matching URL the page should not be loaded normally, the URL should first be modified and then loading the page should start.
Is it possible to modify an URL through a Mozilla Firefox Addon before the page starts loading?

Browsing the HTTPS Everywhere add-on suggests the following steps:
Register an observer for the "http-on-modify-request" observer topic with nsIObserverService
Proceed if the subject of your observer notification is an instance of nsIHttpChannel and subject.URI.spec (the URL) matches your criteria
Create a new nsIStandardURL
Create a new nsIHttpChannel
Replace the old channel with the new. The code for doing this in HTTPS Everywhere is quite dense and probably much more than you need. I'd suggest starting with chrome/content/IOUtils.js.
Note that you should register a single "http-on-modify-request" observer for your entire application, which means you should put it in an XPCOM component (see HTTPS Everywhere for an example).
The following articles do not solve your problem directly, but they do contain a lot of sample code that you might find helpful:
https://developer.mozilla.org/en/Setting_HTTP_request_headers
https://developer.mozilla.org/en/XUL_School/Intercepting_Page_Loads

Thanks to Iwburk, I have been able to do this.
We can do this my overriding the nsiHttpChannel with a new one, doing this is slightly complicated but luckily the add-on https-everywhere implements this to force a https connection.
https-everywhere's source code is available here
Most of the code needed for this is in the files
IO Util.js
ChannelReplacement.js
We can work with the above files alone provided we have the basic variables like Cc,Ci set up and the function xpcom_generateQI defined.
var httpRequestObserver =
{
observe: function(subject, topic, data) {
if (topic == "http-on-modify-request") {
var httpChannel = subject.QueryInterface(Components.interfaces.nsIHttpChannel);
var requestURL = subject.URI.spec;
if(isToBeReplaced(requestURL)) {
var newURL = getURL(requestURL);
ChannelReplacement.runWhenPending(subject, function() {
var cr = new ChannelReplacement(subject, ch);
cr.replace(true,null);
cr.open();
});
}
}
},
get observerService() {
return Components.classes["#mozilla.org/observer-service;1"]
.getService(Components.interfaces.nsIObserverService);
},
register: function() {
this.observerService.addObserver(this, "http-on-modify-request", false);
},
unregister: function() {
this.observerService.removeObserver(this, "http-on-modify-request");
}
};
httpRequestObserver.register();
The code will replace the request not redirect.
While I have tested the above code well enough, I am not sure about its implementation. As far I can make out, it copies all the attributes of the requested channel and sets them to the channel to be overridden. After which somehow the output requested by original request is supplied using the new channel.
P.S. I had seen a SO post in which this approach was suggested.

You could listen for the page load event or maybe the DOMContentLoaded event instead. Or you can make an nsIURIContentListener but that's probably more complicated.

Is it possible to modify an URL through a Mozilla Firefox Addon before the page starts loading?
YES it is possible.
Use page-mod of the Addon-SDK by setting contentScriptWhen: "start"
Then after completely preventing the document from getting parsed you can either
fetch a different document from the same domain and inject it in the page.
after some document.URL processing do a location.replace() call
Here is an example of doing 1. https://stackoverflow.com/a/36097573/6085033

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Finding out if a specific external website page exists (dart) - dart

Just fetch the site and if you get an error response, the site doesn't exist (or is currently down). import 'dart:html'; main() async { var url = 'http://www.google.com/somepage.html'; var response = await HttpRequest.getString(url); // examine response for errors } try at DartPad

Related

How to catch the redirect with a webapp using playwright

How to get link to third party site in 'about channel' section via python

Retrying a failed dynamic import? (SPA/PWA)

How to retrieve Medium stories for a user from the API?

Modify URL before loading page in firefox

Categories

Resources