Google Sheets import multiple HTML table images - google-sheets

Summary
I'm looking to import a data table from a website that does not appear to have an API. The table is broken down to various images and text. The goal is to have all of the content available in a table to then reference for other sheets.
Issue
When I pull in the data, I get some of the text, none of the other images, and a reference to another table. I looked up some options, but none of them yielded anything but blank cells.
I also tried to use the =IMAGE() formula with a direct link to the images URLs, but there is a portion of the URL that is specific to the unit's release date, and as such, too dynamic to account for.
Excel Formula
=IMPORTHTML("https://gamepress.gg/pokemonmasters/database/sync-pair-list","table",3)

Unfortunately without an API it is going to be difficult to achieve what you aim here. These are the main reasons why:
PROBLEMS AND WORKAROUNDS
This table has nested tables that therefore need to be accessed separately. If you take a look at: =IMPORTHTML("https://gamepress.gg/pokemonmasters/database/sync-pair-list","table",4)
you will see how the table 4 of this HTML page is the stats of a random character of the main table. If you go for 5 or 6 you will realise that the nested tables are not even numerically ordered and that you cannot access them by accessing to the main table (i.e mainTable[0].nestedTable). A hard working approach to do this is to go one by one finding their corresponding stat table and placing next to it. For this I recommend extracting only the name field of the main table to be able to align each stat to their character. You can simply do this using:=INDEX(IMPORTHTML("https://gamepress.gg/pokemonmasters/database/sync-pair-list","table",3),0,1). You can find out more about INDEX here
IMPORTHTML cannot access images nor links so it will be very difficult to get the images in the last columns. A way to solve this is by using as you mentioned the image with its url like this: =IMAGE("https://gamepress.gg/pokemonmasters/sites/pokemonmasters/files/styles/30x30/public/2019-07/Electric.png?itok=fkRfkrFX"). You can find more info about inserting images here
CONCLUSION
To sum up, there is no easy way to solve this problem. The closest you can get is by:
Importing the name column.
Figuring out which tables belong to which character and placing them with next to their name.
Getting the image url of each weakness and type and add it to each character.
I am sorry this site does not have an API to make things smooth, good luck with your project and let me know if you need anything else or if you did not understand anything.
Here you can find more information about IMPORTHTML

Related

Google Docs API and Dreaded Table Row Inserts

I've been playing around with Google Docs API and am stuck on being able to add a row to an existing table in a doc and fill that row (3 columns) with data.
Below is Pastebin file of Google Get which returns a huge JSON of pretty much everything in the doc (formatting, content etc.)
(Stack OVerflow has an issue with me including pastebin file so be ready for a huge file underneath here which probably won't fit)
This a sample doc - and if you check it out in a too like https://jsoneditoronline.org/ (which I just used) to see the document structure - you'll note that it has 3 tables in total.
I've written some code that puts the start indexes of all the tables in the document into an array but I can't for the life of me figure out a clear explanation of how I can:
a) Insert a row (at the bottom of the first table for example)
b) Insert data into the first, second and 3rd column of that new row
I have read the guides but it is all very confusing - because after I insert a row the document changes and the startIndexes and all that adjust - is that correct?
If anyone has any input on the code that would insert a new row AND populate the columns in that row in a one easy to use solution I would really appreciate any help (hopefully without having to query the whole JSON again after inserting the row).
Thank you
P.S. Tried to insert pastebin link but it wouldn't let me... tried to paste JSON directly and it was too big so... I'll have to leave the question with the most info I can for now - I will ask Google direct and include the JSON.
just updating that I've solved this by using the FPDF PHP library instead - and I just copy the Google Docs text into this Google converter (conerts to HTML) then passing all the HTML to the FPDF library.
So... question is no longer relevant.
For interested parties:
use DocumentService.BatchUpdateDocumentRequest()
request should be InsertTableRequest
for more information see:

Creating Dynamic Sheet Cell Reference List for pulling numbers to SUM

I've been working on building a data analysis sheet, which is quite verbose at the moment and a bit more complicated than it should be as I've been trying to figure this out. Please note, I work doing student data in a school.
Basically, I have two sets of input data:
Data imported from a CSV file that includes test data and codes for Common Core Standards and the questions tied to those standards as a whole class summary
Data imported from a CSV file that includes individual scores by question
I am looking to construct 2 views:
A view that collates and displays data of individual standards per student that includes a dropdown to change the standard allowing a teacher to see class performance by standard in a broad view. The drop-down is populated dynamically from the input data (so staff could eventually dump data and go directly to reports)
A view that collates and displays data of individual students broken down by performance on each standard allowing a teachers to see the broader spectrum for each student. The student drop-down is populated from Source list 2.
I have been able to build the first view, but am struggling with the second. I've been able to separate the question codes and develop strings of cell references to the scoring data, including a dynamic reference to the row the selected student's score data appears on in the second source set from above.
I tried to pass through an indirect() formula into a sum() so as to process for a mean evaluation, and have encountered errors. I think SUM() doesn't process comma-separated cell reference lists from Indirect() [or in general] or there is something that I am missing to help parse it. Here is the formula I have tried:
=Sum(vlookup(D7,CCCodeManip!$A:$C,3,false))
CCCodeManip!C:C includes the created text (based on the dynamic standards and question codes, etc), here's an example of what would be found there:
'M-ADI'!M17, 'M-ADI'!N17, 'M-ADI'!O17, 'M-ADI'!P17, 'M-ADI'!Q17, 'M-ADI'!R17, 'M-ADI'!J17
I need these to be dynamic so that teachers can input different sets of standards, question, and student data and the sheet automatically collates and reports it in uniform ways (with an upward bound of 20 standards as I currently have it built)
Here is a link to the sheet I built, with names and ID anonymized. There's a CRAP TON of sub-tabs, and that's really just being able to split apart and re-combine data neatly without things error-ing out due to data overlapping, aside from a few different attempts and different approaches to parse the cell reference strings.
The first two tabs are the current status of the data views. I plan to hide a bunch of the functional stuff that is there to help pull data accurately.
The 3rd and 4th tab are the source data sets. 5th is a modified version of source data that allows me to reference things better, and I've tried to arrange the sheets most relevant towards the front of the set.
https://docs.google.com/spreadsheets/d/1fR_2n60lenxkvjZSzp2VDGyTUO6l-3wzwaV4P-IQ_5Y/edit?usp=sharing
Some have a different approach? I am aware that I might be as far as I cn go with this and perhaps should consider scripts - my coding experience is a bit out of date and my strength is more with the formulas, but I can dig into things with some direction, if anyone can help.
Ok so I noticed something.
It seems the failure is in the indirect reference:
=indirect(CCCodeManip!C3)
The string I am trying to parse via indirect is going to be generated into something like this, dynamic from reference to other data:
'M-ADI'!M17, 'M-ADI'!N17, 'M-ADI'!O17, 'M-ADI'!P17, 'M-ADI'!Q17, 'M-ADI'!R17, 'M-ADI'!J17
The indirect returns the error that the above string is not a cell reference with the #REF code.
Can someone give me a clue as to what is causing this? I am going to dig into the docs on Indirect() from google and will post anything that I find.
Perhaps it is that indirect() can't handle lists, but only specific references and arrays, which may require me a to build a sheet to do the SUM formula on for each question set (?)
So I think I figured it out, but i Ended up parsing the data differently, basically doing the sum based on individual cell references and a separate sum formula, bypassing the need to do it all at once, it jsut makes my sheets a lot dirtier! I am eventually going to see if code could do it better if I need to, but this is closed for now.
Basically, I did individual cell references to recall scores in a row, then used a separate SUM formula, and created references / structures to be able to pull those sum() results. Achieves the same end, but with extra crap on the sheet.

Empty Row Labels in PivotTable after adding descriptions as a measure

I'm trying to use Powerpivot to help me summarize Actual and Committed Costs from a project. After some manipulations in Power query I managed to get the right structure, but getting it in a powerpivottable is still a disaster.
As I also want to include Descriptions but keep the nice indented powerpivot Layout, I used a formula to convert the description to a measure using DAX. I found a formula online and tweeked it a bit so it would also work when there was more than 1 hit:
=IF(COUNTROWS(VALUES('Table1'[Description]))>1,FIRSTNONBLANK('Table1'[Description],0),VALUES('Table1'[Description]))
This gave a more satisfying result
However there were two set-backs:
There are a lot of copies of the same descriptions creating empty Row labels (I don't want any Empty Row Labels)
When taking the description from the source table it doesn't take the result from the first match for the row label in my table.
How can I solve this problem?

Getting inconsistent tab delimiter width when pasting from Google docs spreadsheet

I am trying to create a gadget for some people, where all they need to do is really copy the contents of a spreadsheet, then paste it in a textbox, which will in turn create a nice table for them to embed in their articles.
I managed to do everything, however Google docs, when copying and pasting data in a text editor, seems to get the size (width) of the tab delimiter wrong between values. So, instead of getting 4 spaces that is the default, i am getting 2 in some cases and so far i managed to find out that the reason is that some of the cells contain strings with spaces. For some reason, this seems to confuse Google docs, thus supplying wrong spacings, which in turn, ruin my script.
I know i can use comma separated values here, but the issue is we are trying to give people the ability to simply copy and paste. Look at the example output below:
School Name Location Type No. eligible pupils
In this example, School Name is one cell, Location is another, Type is another and No. eligible pupils is the last one. It is clear that the first cell does not have the necessary space on the right.
Any ideas? I thought about converting all blank spaces that take more than 1 space to commas, but this might lead to a situation users might actually use 2... which would not work again.
For some reason, it was the code editor that was actually not showing the tabs right. Using a regexp and another code editor (vim) showed that all of them were actual tabs. :)

Problems creating hyperlinks using Apache POI 3.8-beta4 in a SXSSF workbook

It appears that hyperlink cells are not created correctly when using the POI SXSSF implementation. I have taken an exact copy of the example code from the HOW-TO guide for creating hyperlinks and changed the workbook to be SXSSF instead of XSSF, and the hyperlinks no longer function.
Has anyone else seen this problem or discovered a workaround?
Thanks,
Mark.
SXSSF is quite new, and currently aimed at only certain tasks. If you can, I'd advise you to look at how XSSF does it, and submit a patch!
In the mean time, you can probably get away with using the HYPERLINK function instead. Set your cell to be a formula cell, and set the Formula to be something like HYPERLINK('http://stackoverflow.com/','Stack Overflow') and it'll show as a link in Excel
Update: Support was added to SXSSF to support hyperlinks in r1145629
I know this is an old post, but it came up repeatedly while I was doing searches on the same subject.
I'm using POI 3.9X and it does work with hyperlinks, however there is a big downside if you are using really large amounts of rows with a hyperlink.
there is a limit of 65K hyperlinks per sheet in Excel
If you decide to break your workbook into sheets after the 65K mark the total number of hyperlink objects stays in memory (say if using 1 per row), which can cause a huge spike if iterating quickly and can cause Out of Memory errors if not enough Heap... by huge , I mean gigabytes for 200,000 rows.
The use of the formula method DOES work, and I switched to it as it does not have the limitations of creating a hyperlink object that stays in memory when using SXSSF. This is assuming dealing with a URL and not a relation.
For those that see a "0" based on the previous example, make sure to include the "=" before the Hyperlink Excel function

Resources