Get invisible web page info with BeautifulSoup - parsing

I am trying to get some info from the site "https://www.estimize.com/jpm/fq3-2016#chart=table", to be more precise all individual estimates, which are at the bottom of the page. But it shows only first 30 and then you should manually press the button "Show All" to get another 30 and so on.
Here is my code so far:
from urllib import urlopen
from bs4 import BeautifulSoup
html = urlopen("https://www.estimize.com/jpm/fq3-2016#chart=table")
soup = BeautifulSoup(html.read(), "html.parser")
print(soup)
I see that there is a part of the printed code:
"totalCount":142,"total_estimates_showing":30,"
Is it possible to change this to get printed all the estimates?

Looking in the ajax request the site made when you clicked "Show all" button you should parse the url:
"https://www.estimize.com/jpm/fq3-2016?sort=rank&direction=asc&estimates_per_page=142&show_confirm=false&selected_user=&_=1490697888459"
to get all results directly

Related

Could anyone scrape an element with Jsoup?

I'm trying to scrape this link using Jsoup with Kotlin/Java. And I have problem in scrapping players part (under Current Squad). Could anyone parse it?
You can not access the information directly using only the response from that link.
You can make a JSON object with the http response from https://stats.fn.sportradar.com/betsgi/en/America:Argentina:Buenos_Aires/gismo/stats_team_squad/2817 and https://stats.fn.sportradar.com/betsgi/en/America:Argentina:Buenos_Aires/gismo/stats_teamplayer_facts/2817/42556.
As an example in python you can get the minutes played by each player as follows:
import urllib
import json
f=urllib.urlopen('https://stats.fn.sportradar.com/betsgi/en/America:Argentina:Buenos_Aires/gismo/stats_team_squad/2817')
f2=urllib.urlopen('https://stats.fn.sportradar.com/betsgi/en/America:Argentina:Buenos_Aires/gismo/stats_teamplayer_facts/2817/42556')
j=json.loads(f.read())
j2=json.loads(f2.read())
plrs=j['doc'][0]['data']['players']
for plr in plrs:
print '========================='
print plr['name']
try:
print 'minutes played:' +str(j2['doc'][0]['data'][str(plr['_id'])]['stats']['total']['minutes_played'])
except KeyError, e:
pass

Input Control Parameters not passing in Jaspersoft Reference Hyperlink to Dashboard

I have a jaspersoft report (line chart built in studio), and I want the data series in the chart to be hyperlinks that drilldown to open a dashboard.
Based on this wiki page I was able to create Reference hyperlinks so that clicking on any data series in the chart opens the correct dashboard. But I cannot get the Input Control parameters to pass correctly.
The URL when I load my dashboard directly from the repository (not by clicking hyperlinks in my line chart report) is
http://ddevrpt:8080/jasperserver-pro/dashboard/viewer.html#%2Fpublic%2FP2%2FMidcap%2FFinancial%2FDashboards%2FWell_Profile
The URL generated when i do not include input controls in my hyperlink reference expression is the same:
http://ddevrpt:8080/jasperserver-pro/dashboard/viewer.html#%2Fpublic%2FP2%2FMidcap%2FFinancial%2FDashboards%2FWell_Profile
JRXML:
<itemHyperlink hyperlinkType="Reference">
<hyperlinkReferenceExpression><![CDATA["./dashboard/viewer.html#%2Fpublic%2FP2%2FMidcap%2FFinancial%2FDashboards%2FWell_Profile"]]></hyperlinkReferenceExpression>
</itemHyperlink>
The URL generated when i do include Input Control parameter values is different, but still loads the dashboard empty (without passing the parameter values):
http://ddevrpt:8080/jasperserver-pro/dashboard/viewer.html?hidden_WellConcatenated_0=49005478.1:%20DILTS%2044-15%20TH&hidden_OccurrenceDate_1=2015-09-28%2000:00:00.0&hidden_OccurrenceDate_2=2015-10-05%2000:00:00.0#%2Fpublic%2FP2%2FMidcap%2FFinancial%2FDashboards%2FWell_Profile
JRXML:
<itemHyperlink hyperlinkType="Reference">
<hyperlinkReferenceExpression><![CDATA["./dashboard/viewer.html#%2Fpublic%2FP2%2FMidcap%2FFinancial%2FDashboards%2FWell_Profile"+"&hidden_WellConcatenated_0=" + $V{WellConcatenated_0} + "&hidden_OccurrenceDate_1=" + $P{RecordDate_0_1} + "&hidden_OccurrenceDate_2=" + $P{TimeStampMinusOneWeek}]]></hyperlinkReferenceExpression>
</itemHyperlink>
I know I am naming the input controls correctly because if i change my link type to report execution and link to a simple report using those input controls the proper report opens and the input control values are passed correctly.
I would also appreciate if anyone has other references they can point me to for drilling down TO a dashboard from a report.
I'm working with 6.3 and was able to resolve the issue with a small modification to the HyperlinkReferenceExpression syntax.
Specifically, I removed the "_hidden" before the input control resource IDs:
HyperlinkReferenceExpression:
original syntax:
"./dashboard/viewer.html#%2Fpublic%2FP2%2FMidcap%2FFinancial%2FDashboards%2FWell_Profile"
+"&hidden_WellConcatenated_0=" + $V{WellConcatenated_0}
+"&hidden_OccurrenceDate_1=" + $P{RecordDate_0_1}
+"&hidden_OccurrenceDate_2=" + $P{TimeStampMinusOneWeek}
modified syntax:
"./dashboard/viewer.html#%2Fpublic%2FP2%2FMidcap%2FFinancial%2FDashboards%2FWell_Profile"
+"&WellConcatenated_0=" + $V{WellConcatenated_0}
+"&OccurrenceDate_1=" +$P{RecordDate_0_1}
+"&OccurrenceDate_2=" + $P{TimeStampMinusOneWeek}
I'm assuming you're running 6.4.0. I'm not sure that approach is still valid.
You will likely need to register a custom hyperlink handler in your report in order to drill-down to a Dashboard. See here for more details: http://community.jaspersoft.com/wiki/how-use-custom-hyperlink-handler-dashboard-jasperreports-server
And here: http://www.helicaltech.com/use-custom-hyperlink-handler-with-a-dashboard-in-jasperreports-server/
Let me know if that works for you on 6.4.0!
My solution is a bad solution., but it worked for me.
In my Dataset - query I used the following:
Select p.printer_name, p.display_name, $P{start_date_1} as start_date_param, ....
Then, use the start_date_param as a field in the hyperlink.

Python code to display default page name for url

is there a way in python to display what the default initial page is for www.xyz.com when there isn't an index.html, default.html page?
For example I am trying to use Beautiful Soup to do the following:
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.example.com/index.html")
bsObj = BeautifulSoup(html.read())
print(bsObj.h1)
But index.html and default.html do not exist.
How do I find out what the default page is when I go to www.example.com?
You can just use urlopen("http://www.example.com/"), which will open the default page served by the webserver.

Why copy & paste dart2js' output to console doesn't work?

compile the following code with dart2js -o test.js test.dart
open test.js, copy its content
open browser, go to stackoverflow.com
open dev tools, go to console tab
paste test.js's content into console, hit Enter
I expect it to click the "Ask Question" button, but it doesn't, why?
(The reason I want to do this is, I need some js, but I don't want to touch js.)
// test.dart
import 'dart:html';
void main() {
document.querySelector('#nav-askquestion').click();
}
I didn't dig very deep but I had the impression the generated code registers itself for a script loaded event and then executes "main" as event handler. I don't know JavaScript and browser behavior good enough to understand how this can work.
I got it working by running this code in the dev console
(function runTest() {
var s = document.createElement("script");
s.type = "text/javascript";
s.src = "test.js";
document.body.appendChild(s);}
)();
where the test.js (generated output from dart2js) file is in the same directory as the index.html.
The code adds the script tag referencing the dart2js output dynamically and the code in test.js is executed.

Clicking on a button in a file download popup Window using watir-webdriver

Folks,
I am having an interesting problem. I have some javascript on the webpage which opens a popup window when clicked. I am trying to find the title of the window so that I can click on that, the window has the following two buttons "Cancel" and "Save File". Here is what I am doing in my ruby code:
#windows = #browser.windows #this should return an array, so #windows is an array
p #windows[1] #output of this is #<Watir::Window:0x115c796cc located=true>
puts "This is the title of the second window---->"+#windows[1].title #this puts blank
The problem that I am seeing is why does my windows object does not have any variables when I print it out using p #windows[1] also why is the title not printed when I do #windows[1].title. My goal is click on the "Download the file" button of the popup window
This is the piece of HTML that I have:
<td>
<a onclick="window.open(this.href);return false;" href="/search/searches/1563/exports/1017">6175-1017-20120418181521-karnire.eml.zip</a>
</td>
The other thing that I tried is doing something like this in my code:
#windows = #browser.windows
#browser.window(:title => #windows[1].title).use do
#browser.button(:value => "Save File").click
end
for the above I get an error like this:
Unable to locate window "{639686d9-4641-aa41-bf6f-3ba89659d921}" (Selenium::WebDriver::Error::NoSuchWindowError)
I'd start with the information provided here on the watir-webdriver blog
if that does not work, then try looking at the watir-wiki page on file downloads?
It's a little dated (not having been updated in a year, also using autoit not rautomation) but it might be enough to get you going.
It might be that Watir isn't waiting for the window to load. Try putting a sleep(10) after the click.

Resources