Receive notification when site server adds page - url

I've been doing some programming off and on for my brother, who is a stock trader. I'm wondering if it is possible to receive a push notification when a site server adds a page. For example, the site smallcapfortunes.com frequently adds pages that are simple extensions off the main URL. For example, the site regularly adds pages under URLs such as /neca/, /stev/, etc.
Are there existing methods to execute this? Or is this something I need to write myself? Has anyone here written anything like that?
I know there are existing sites to track basic updates to a single page. In my research, though, I haven't found anything like this.
Please let me know if there are any other details I need to provide.

Generally you can only get a push notification if a specific website offers that service.
Some websites publish a structured (XML) site map. If the one you're interested in does that, you could pull that sitemap on a regular basis and look for differences.

you're most likely going to want to use http://scrapy.org/ to go through the site and find new /neca/ and /stev/ urls, etc, then just trigger the script every so often.

Related

Should I be using clean URLs or URL parameter in my Web App?

Which URL structure should I use for my Web-app?
Clean URLs like this
http://dashboard.company.com/sales/john-doe/2017/32
or with URL parameters?
http://dashboard.company.com/sales?person=john.doe&year=2017&week=32
Are there any guidelines for this?
Edit to explain my question better From the user perspective, the two ways are identical in ways of sharing the url. For the programming part they are not, I use Flask. I want know if there's a standard way of handling it, what is the better way?
Background
I am developing a Sales Dashboard for internal use at my company. It display the sales of every sales person. I want to make the reports shareable so that my colleagues can send their own page for a certain weeknumber with each other, or whatever. Or the boss can easily get the page for a meeting with the sales person.
No SEO
Just to stress this point. I don't need clean URLs for SEO.
It doesn't matter at all, by adding the parameters as GET or POST they will be visible but if you use a framework for your app, you should use clean as possible because the parameters to the controllers must be specific and not by data. Otherwise if is not a big project you can use like that but you need to make sure that soon you wont have something like lang?en or something which will be as main parameter. It's up to you, read GET x POST differences and you'll figure it out better.

Avoid robots from going into a www.domain.com/thishash when link posted to twitter, facebook

I'm building a service where people gets notified (mails) when they follow a link with the format www.domain.com/this_is_a_hash. The people that use this server can share this link on different places like, twitter, tumblr, facebook and more...
The main problem I'm having is that as soon as the link is shared on any of this platforms a lot of request to the www.domain.com/this_is_a_hash are coming to my server. The problem with this is that each time one of this requests hits my server a notification is sent to the owner of the this_is_a_hash, and of course this is not what I want. I just want to get notifications when real people is going into this resource.
I found a very interesting article here that talks about the huge amount of request a server receives when posting to twitter...
So what I need is to avoid search engines to hit the "resource" url... the www.mydomain.com/this_is_a_hash
Any idea? I'm using rails 3.
Thanks!
If you don’t want these pages to be indexed by search engines, you could use a robots.txt to block these URLs.
User-agent: *
Disallow: /
(That would block all URLs for all user-agents. You may want to add a folder to block only those URLs inside of it. Or you could add the forbidden URLs dynamically as they get created, however, some bots might cache the robots.txt for some time so they might not recognize that a new URL should be blocked, too.)
It would, of course, only hold back those bots that are polite enough to follow the rules of your robots.txt.
If your users would copy&paste HTML, you could make use of the nofollow link relationship type:
cute cat
However, this would not be very effective, as even some of those search engines that support this link type still visit the pages.
Alternatively, you could require JavaScript to be able to click the link, but that’s not very elegant, of course.
But I assume they only copy&paste the plain URL, so this wouldn’t work anyway.
So the only chance you have is to decide if it’s a bot or a human after the link got clicked.
You could check for user-agents. You could analyze the behaviour on the page (e.g. how long it takes for the first click). Or, if it’s really important to you, you could force the users to enter a CAPTCHA to be able to see the page content at all. Of course you can never catch all bots with such methods.
You could use analytics on the pages, like Piwik. They try to differentiate users from bots, so that only users show up in the statistics. I’m sure most analytics tools provide an API that would allow sending out mails for each registered visit.

SignalR dynamic endpoints

I'm working on a collaborative document editing application where clients can open up a document, post edits via a webservice, and subscribe to updates made to the document using SignalR. I'm experimenting with my SignalR setup and can't quite get what I want.
My gut tells me that I should shoot for a setup where each document has an endpoint with a name like "subscribe", so the full path would be "/documents/1/subscribe" for document 1 and "/documents/2/subscribe" for document 2. However, as far as I can tell, SignalR wants me to have a single endpoint, and then manage which clients get updates either by using Groups or by managing the list of subscribers for a document in code myself and send out individual messages.
As a result I have two questions.
Is there a way to do what I want to do what I want to do with SignalR?
Is there a reason what I want to do is totally wrong headed and silly?
Aside from "dedicated", friendly looking URLs I don't really see any value to this vs. just using groups. In fact, the only thing I could see it doing is adding more overhead because of the way the message bus internals of SignalR work with respect to scale.
If you did want to try this, the base thing you'd need to figure out would be registering routes on the fly per document, which, as Phil Haack's RouteMagic has done for MVC, I suppose it might be possible for SignalR route configurations as well.

Tracking users' clicks and page visits in Rails

I would like to monitor users' page visits and clicks in my Rails app to make recommendations. My questions are:
Is there a Rails gem for this, or Google Analytics is the standard? If latter is true, then how should I link a page visit to a particular user profile?
It is typical in Rails to have a section in application.html.erb, which is shared for all pages. If I add Google Analytics pageview tracking code to in application.html.erb, will it be able to track all individual pages?
There are other ways, but the vast majority probably use Google Analytics. Several gems exist that help you integrate with GA to get at the data. See here: https://www.ruby-toolbox.com/categories/Web_Analytics.
Based on your first question, it seems you may want more insight than GA can provide. I've used ClickTale (http://www.clicktale.com) and Woopra (http://www.woopra.com) before, to good effect. This article lists several other alternatives, too - notice the high marks for Clicky: http://imimpact.com/web-stats-alternatives-to-google-analytics/.
Google Analytics (and almost all of these others) will take care of your second question automatically whenever the user loads a new page, since it keyed by URL. That means that, although you put the GA script code in a single place, each unique page is tracked individually.
If you have AJAX requests that change that page without changing the URL, you'll need to dig in to the GA script API. Essentially you'll need to push a new url (possibly with a # in it) whenever you want to track an AJAX-driven link/button click. See here: http://davidwalsh.name/ajax-analytics
I am biased, but I would recommend checking out impressionist, if you need to integrate the page views into the app in real-time. With analytics you will always have some lag time and you are also relying on an external dependency. Impressionist is good if you need this kind of control, but if you are just looking for simple metrics and don't need to pull them into the app, then analytics is probably the way to go.
Check out Ahoy, at https://github.com/ankane/ahoy. With just a few lines of code in your app, you can track page views and tie them to user accounts.
You can further customize Ahoy to track custom events, both the client (with JavaScript) and server.
Ahoy does not depend on any third-party services.

Making a firefox/chrome extension from 0

i have a website, its to exchange links, files... to say it quickly it's my 'version' of twitter+megaupload,
Well, users add links all the time and so on, but i would like user be able to syinch his bookmarks from the browser to the ones he has at his profile of mywebsite,
Where should i look into?
Basically i need to be able to:
- Acces bookmarks file (1)
- being able to send the urls to my service ( 2 )
- maybe adding the login feature (in the future)
I was google'ing about this for ages few weeks a go and i kind of give up, because i'm ok with PHP and JS, but with this plugin languages i'm very lost. So i decided posting here, wich always brings positive answers
(1) - > I don't even know where to start
(2) -> i was thinking to have a website.com/auto_import_no_confirm.php?url=[URL] and put it in a for each.
how many different languages and extension files do i have to work with? I really need any kind of tip with point (1)
feel like?
-edit-
Just found This -> https://developer.mozilla.org/En/Code_snippets/Bookmarks
wich really looks like i need, but where do i place this code?
thanks!
Might not be a bad question, but there are too many subtopics raised to answer that. (And there is too much tagspam as well. Break up your question into PHP- and Javascript-specific tasks, when you have devised the general application scheme.)
But to get started, download similar Firefox extensions (.xpi) and unzip them to inspect the general structure. You'll find examplary code for bookmark handling and invoking remote APIs pretty quickly. And basically you only need Javascript for the extension itself. (It sounds like your extension does not need much UI.)
And there are many tutorials on designing Firefox addons: http://roachfiend.com/archives/2004/12/08/how-to-create-firefox-extensions/ or http://www.google.com/search?q=firefox+develop+an+xpi
The good news first, you won't need much more than javascript if you just want to access bookmarks and send them to a server, neither on firefox nor on chrome.
But still you'll have to make yourself familiar with the apis of the browsers and learn how to develop extensions.
However, both Mozilla and Google provide all necessary information on their developer sites.
For Chrome, this is a good place to start, you'll find the api for bookmark access here.
The Corresponding site for Firefox can be found here, with information on bookmark access here.

Resources