Google Sheets - Scrape table involved with pagination

Google Sheets - Scrape table involved with pagination - google-sheets

I'm trying to find a work around with google sheets. I'm pulling data from finviz.com to build out custom stock screeners, but the only issue is they make use of pagination, therefore only allowing 20 rows for the first few results. I've checked that if I click on the 2nd page results in the pagination section of the table, only the URL changes, indicating the first row of the new table. Meaning if my first result page would be 20 rows, the second result page URL would have a parameter like this "r=21" indicating the first row of the second page results. Now how would I go about this to ensure that I'm pulling all the data once pagination of the table is in place? Also, checking the source of the page, these new parameters are stored into href's, meaning if our pagination had 3 pages as results, then within the <table/> elements we can see the new urls in href's, for example:
<table>
<a href="screener.ashx?v=111&f=targetprice_a5&r=21"/>
<a href="screener.ashx?v=111&f=targetprice_a5&r=41"/>
<a href="screener.ashx?v=111&f=targetprice_a5&r=61"/>
</table>
Take note only one new parameter is added to the url "r=21", the rest are consistant throughout different result pages.
Is this even possible with google sheets?
Here's what I have. The goal to this idea is to build out stock market screeners that are updated every 3 mins, which allows an integration/view from notion.
=QUERY(IMPORTHTML("https://finviz.com/screener.ashx?v=111&f=cap_smallover,earningsdate_thismonth,fa_epsqoq_o15,fa_grossmargin_o20,sh_avgvol_o750,sh_curvol_o1000,ta_perf_52w10o,ta_rsi_nob50&ft=4&o=perfytd&ar=180","Table","19"),"SELECT Col1,Col2,Col7,Col8,Col9,Col10,Col11")

try:
=QUERY({
IMPORTHTML("https://finviz.com/screener.ashx?v=111&f=cap_smallover,earningsdate_thismonth,fa_epsqoq_o15,fa_grossmargin_o20,sh_avgvol_o750,sh_curvol_o1000,ta_perf_52w10o,ta_rsi_nob50&ft=4&o=perfytd&ar=180","Table","19");
IMPORTHTML("https://finviz.com/screener.ashx?v=111&f=cap_smallover,earningsdate_thismonth,fa_epsqoq_o15,fa_grossmargin_o20,sh_avgvol_o750,sh_curvol_o1000,ta_perf_52w10o,ta_rsi_nob50&ft=4&o=perfytd&r=21&ar=180","Table","19");
IMPORTHTML("https://finviz.com/screener.ashx?v=111&f=cap_smallover,earningsdate_thismonth,fa_epsqoq_o15,fa_grossmargin_o20,sh_avgvol_o750,sh_curvol_o1000,ta_perf_52w10o,ta_rsi_nob50&ft=4&o=perfytd&r=41&ar=180","Table","19");
IMPORTHTML("https://finviz.com/screener.ashx?v=111&f=cap_smallover,earningsdate_thismonth,fa_epsqoq_o15,fa_grossmargin_o20,sh_avgvol_o750,sh_curvol_o1000,ta_perf_52w10o,ta_rsi_nob50&ft=4&o=perfytd&r=61&ar=180","Table","19")},
"select Col1,Col2,Col7,Col8,Col9,Col10,Col11 where Col1 matches '\d+'", 1)

Related

How to do Indirect on a range reference of cells with Google Sheet [duplicate]

In a google sheet linked with a google form, I am putting
=ARRAYFORMULA(Responses!$A$2:R500)
in a blank sheet(namely dataList) to copy raw data from the response sheet so it is more readable and manageable.
After submitting some test data, I need to clear them and publish the form for production use. If I simply select the rows and hit "delete" on my keyboard, when new submission comes in, it will not appear on the first row(or row 2), instead it remembers how many rows there were and put the new data on the next row, thus leaving the first rows blank on both of the sheets, which is unacceptable. So I select the rows with test data in the sheet Response and delete the rows:
Now when new submission comes in, it does appear on row 2 in Sheet Response; however, when I go to my "dataList" sheet, it is like this
The A1 notation which is supposed to be absolute has been altered, hence my dataList sheet doesn't get the new submission data from sheet Response.
How to deal with this unwanted behavior?

you can freeze it like:
=INDIRECT("Responses!A2:R500")
instead of your:
=ARRAYFORMULA(Responses!$A$2:R500)

If you want to avoid string ranges or INDIRECT, you could use INDEX:
=INDEX(Responses!A:A,2):INDEX(Responses!R:R,500)
This always takes the second row from A:A and 500th row of R:R regardless of the deleted rows.
Advantage:
This can be drag filled. It can change based only on certain conditions.

Google Sheets - Grab data from a separate sheet and add it to a master sheet

I have a Google Sheet that has a main master sheet, with a column for users to fill in their Name to show they are "working" on that row, then that row gets populated to their own tab based on a =QUERY(Master!A3:AA,"select * Where L='Name'") for each of the users' tabs, there is 8 total tabs where users are updating information. This is already quite a bit of processing on Googles part, so I am trying to generate a separate Google Sheet that pulls in the information that the users are entering on each of their tabs so the management can monitor that sheet for updates and then both sheets will run a lot faster/smoother.
I have tried using a VLookup with this syntax: =vlookup(A3,importrange("sheetID",{"Name1!$A$3:$N";"Name2!$A$3:$N";"Name3!$A$3:$N";"Name4!$A$3:$N";"Name5!$A$3:$N";"Name6!$A$3:$N";"Name7!$A$3:$N";"Name8!$A$3:$N"}),12,FALSE) which gives me an #N/A Error, cannot find Value '1' in VLOOKUP evaluation.
I have also tried using a =QUERY({importrange("sheetID"x8 with the ranges)}, "Select Col12,Col13,Col14 where Col2 matches '^.\*($" &B3 & ").\*$'")
That only returns headers, I am trying to get the query to basically find the unique key in Column A then spit out what is in Col 12-14, but that doesn't seem to work either. Columns 1-11 are static, but Columns 12-14 are what I am trying to populate for the management, which is the work that the staff is inputting on each of their tabs.
I can get the query working if I keep it on the same worksheet as the one the staff is working on, but then it bogs down the whole sheet so I would like to keep it separate if possible. Any ideas? I can't provide a sample sheet at this time since it has financial info on it, but I can add more details if I know what to look for.

your formula should be:
=VLOOKUP(A3, {
IMPORTRANGE("sheetID1", "Name1!A3:N");
IMPORTRANGE("sheetID2", "Name2!A3:N");
IMPORTRANGE("sheetID3", "Name3!A3:N");
IMPORTRANGE("sheetID4", "Name4!A3:N");
IMPORTRANGE("sheetID5", "Name5!A3:N");
IMPORTRANGE("sheetID6", "Name6!A3:N");
IMPORTRANGE("sheetID7", "Name7!A3:N");
IMPORTRANGE("sheetID8", "Name8!A3:N")}, 12, 0)
keep in mind that every importrange needs to be run as a standalone formula where you connect your sheets by allowing access. only then you can use the above formula

Can a single Google Sheets filter view show OR conditions in multiple columns simultaneously?

I am trying to make a spreadsheet for my teachers that will assign them particular students to call each day based on how many periods they are absent during the day. I currently have a hyperlink set up on the Dashboard page so teachers can click their names and see a filter view of which calls they need to make for 8/24. Is it possible to make a single filter view that would simultaneously show where their name is assigned for 8/24 in Column H AND where their name is assigned for 8/25 in Column O? My goal is for them to click a single hyperlink on the Dashboard which takes them to all the calls they need to make for the week WITHOUT me having to make five separate links for each day of the week.

Solution
You should extend the Filter View range to cover the other columns you need to filter.
In this case for the "Amber" Filter View, you should have Range: A1:P116. Then select the proper filter on the column H and O: Filter by Values: Amber.
With the filter view still open, copy the URL in your browser and paste it in the HYPERLINK formula you have in the "Dashboard" Sheet.

I can't figure out how to filter or query in Google Sheets without returning a bunch of blank strings appended to actual data

I'm at my wit's end on trying to figure out why filtering/querying in Google Sheets is so broken. I have a sheet with some data about practice exams I'm taking and I'm attempting to pull some data from that sheet to another sheet for calculating statistics. I've made a shareable document with the pertinent stuff so you can see what I mean.
My raw data is in the TestScores sheet and I made a TESTSTATS sheet to test different methods of pulling data from TestScores. In my example, I'm only trying to pull unique dates from range TestScores!B2:B and I've added a few different methods to do so in TESTSTATS (removed the equal sign from each one so each can be tested on its own by putting in the equal sign).
The methods I've tried:
=UNIQUE(TestScores!B2:B)
=UNIQUE(FILTER(TestScores!B2:B, TestScores!B2:B<>""))
=UNIQUE(FILTER(TestScores!B2:B, TestScores!B2:B<>0))
=UNIQUE(FILTER(TestScores!B2:B, NOT(ISBLANK(TestScores!B2:B))))
=UNIQUE(QUERY(TestScores!B2:B, "select B"))
=ARRAY_CONSTRAIN(UNIQUE(QUERY(TestScores!B2:B, "select B")), ROWS(UNIQUE(TestScores!B2:B))+1,5)
You'll see that each one, when activated by adding the = in front of the formula returns the proper data, but also appends 500 empty rows which look empty, but are in fact blank strings (""). This makes it difficult to work with because there are a lot of calculations in my sheet that depend on one another. I also do not want to specify an explicit end to my ranges and would prefer to keep them open ended (B2:B instead of B2:B17) so everything updates automatically as new records are added.
What am I doing wrong? Why are the returned data appended with a bunch of empty cells, and why 500 specifically (seems arbitrary considering my source data is 29 or 30 rows depending on whether or not you include headers)?

Starting with only two rows in TESTSTATS more rows have to be added for somewhere to place the output. It seems Google choose to do so 500 rows at a time (from the last required cell). "Why?" would have to be a matter for Google.
If you know 14 rows are required for the output and increase the size of TESTSTATS to 16 no more rows will be added. Since you want room for expansion you can't extend to 16 and avoid further issues but you could allow some room, say to 30 rows, and delete the few extra, or, if 30 becomes insufficient (when sheet shoots up to say 540 rows) delete the rows not required but set the sheet size to say 60 rows - and so on.

twitter search api +paging +max_id +next_page

What is the purpose of paging + next_page in the twitter search api? - they don't pivot around data as one would expect.
I'm experimenting with the search api and noticed the following query changes overtime.
This url was returned from search api "next_page".
http://search.twitter.com/search.json?page=3&max_id=192123600919216128&q=IndieFilmLove&rpp=100&include_entities=1
hit refresh on a trending topic and you will notice that the page is not constant.
When iterating through all 15 pages on a trending topic you run into duplicates on the first few items on each page.
It seems the paging variable + next_page are useless if you were aggregating data. page 1 will be page 3 in a few minutes of a trending topic. So you end up with duplicates on 1-3 items of each page since new data is pushing the pages down.
The only way to avoid this is by NOT using next_page and or paging parameter as discussed here:
https://dev.twitter.com/discussions/3809
I pass the oldest id from my existing result set as the max_id. I do
not pass a page.
which approach is better for aggregating data?
i could use next_page but skip statuses already processed on this run of 15 pages.
or
use max_id only and skip already processed
==============

In their Working with Timelines document at http://dev.twitter.com/docs/working-with-timelines Twitter recommend cursoring using the max_id parameter in preference to attempting to step through a timeline page by page.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Google Sheets - Scrape table involved with pagination - google-sheets

Related

How to do Indirect on a range reference of cells with Google Sheet [duplicate]

Google Sheets - Grab data from a separate sheet and add it to a master sheet

Can a single Google Sheets filter view show OR conditions in multiple columns simultaneously?

I can't figure out how to filter or query in Google Sheets without returning a bunch of blank strings appended to actual data

twitter search api +paging +max_id +next_page

Categories

Resources