I have an iOS app that uses sqlite3 databases extensively. I need to add a column to a good portion of those tables. None of the tables are what I'd really consider large (I'm used to dealing with many millions of rows in MySQL tables), but given the hardware constraints of iOS devices I want to make sure it won't be a problem. The largest tables would be a few hundred thousand rows. Most of them would be a few hundred to a few thousand or tens of thousands.
I noticed that sqlite3 can only add columns to the end of a table. I'm assuming that's for some type of speed optimization, though possibly it's just a constraint of the database file format.
What is the time cost of adding a row to an sqlite3 table?
Does it simply update the schema and not change the table data?
Does the time increase with number of rows or number of columns already in the table?
I know the obvious answer to this is "just test" and I'll be doing that soon, but I couldn't find an answer on StackOverflow after a few minutes of searching so I figured I'd ask so others can find this information easier in the future.
From the SQLite ALTER TABLE documentation:
The execution time of the ALTER TABLE command is independent of the
amount of data in the table. The ALTER TABLE command runs as quickly
on a table with 10 million rows as it does on a table with 1 row.
The documentation implies the operation is O(1). It should run in negligible time.
Related
Good day to everyone! Hope all is well!
I am looking to run an update query or a group of queries that looks at my Date_Start and Date_End to determine if the Units (quantity of the respective record) fall in my defined current quarter 1/2/3/4 from another table (this table is a master table I’m using to provide the dates that I need to consider for defining the quarters).
I’ve been able to create queries that do this and then join them together to basically display the units out by quarter based on their respective start/end dates. The problem I am running into is this process takes a decent amount of time for the queries to populate that will drastically effect other processes down the line.
Thus we get to my desire. I am trying to no avail to create an update query that will update the quarter fields in my table based off of the queries I built to determine if the records start/end date fall in the respective quarter. I figure that running this update when records change will be an ok run time vs when I’m running reports or running an email script for the reports.
I have tried pulling in the table and query, joining them as equal on ID (the query pulls in the table's IDs), and selecting my field “CQ1” from the table, and setting the Update ether the respective field from the table or the query (which is the same as the field in the table).
All I get are the current values of the field in the data sheet view and an error of “Operation must use an updateable query.”
I have even tried placing a zero to see if that would do it with no luck. I have verified that all the fields are the same data type.
What am I doing wrong? Thanks!
Apologies to everyone.....I think my conscious brain was trying to overly complicate the process and while talking to a buddy about my issue distractedly created a new update query that worked. It all tied down to that I forgot to put a criteria on my quarter filed of is not null I believe. Thanks for anyone that has read this and is responding while I’m typing this or for those of you formulating a response.
I have a big sheet with quite a lot of info that keeps growing over time. Based on that, I have created several pivot tables that do some calculations and rankings.
Every ranking keeps growing, so pivot tables may contain 10 rows now, but can grow up to 20 or 30 rows.
I managed to insert several pivot tables in the same sheet and now it looks well, with every ranking after the next one. However, if I add multiple rows, the pivot tables grow and start overlapping, so after a while the ones below start disappearing in favour of the first ones.
Is there a way to have multiple pivot tables in the same sheet with a fixed number of rows among them, preventing them from overlapping?
If you want to 'play' with the data, I created a sample in https://docs.google.com/spreadsheets/d/1MVX3tp6GIqVX6hTyk6TsCxV7YngiMpi7E8oSxa7a9ck/edit?usp=sharing. It is just a ranking on races, where I want to track the best times on different legs. Then it has a second sheet 'rankings' in which I have different pivot tables, one for each leg.
HOw's this for a single formula solution that will scale infinitely in users or legs:
=ARRAYFORMULA(QUERY(SPLIT(FLATTEN(results!A2:A&"|Best of "&results!D1:1&"|"&OFFSET(results!D2,,,ROWS(results!D2:D),COLUMNS(results!D1:1))),"|",0,0),"Select Col2,Col1,MIN(Col3) where Col1<>'' and Col2<>'Best of ' group by Col1,Col2 order by Col2, MIN(Col3) label MIN(Col3)'Best', Col1'User',Col2''"))
You'll find it in cell B1 on the new tab in your sample called MK.Idea
I should mention that FLATTEN() is an undocumented function that I only recently discovered. I've believe it is intended to remain "hidden" in the back end of the sheets programming, but if what I did is what you're after, there really isn't a more efficient way to do it. I've spoken with an engineer at Google who was surprised it existed as well and told me there were no plans to deprecate it, so here's hoping! For a demo of what it does generally, you can see my sample here:
https://docs.google.com/spreadsheets/d/196NDPUZ-p2sPiiiYlYsJeHD6F_eJq7CWO_hP7rFqGpc/edit?usp=sharing
Spreadsheets and Pivot Tables are marvelous tools for data analysis but they aren't too friendly for creating reports and dashboards, if you are open, to recommendations try Google Data Studio (it includes pivot tables too --> https://support.google.com/datastudio/answer/7516660?hl=en)
Let say that you don't have time to learn another app or you just prefer to keep using spreadsheets, in this case it will be required to implement a workaround.
First, bear in mind that Pivot Tables don't actually overlap, instead an error message is shown:
Solution: insert some rows/columns to give enough room to the Pivot Table to be expanded.
NOTE: You could do this in advance by including a "safe zone" (meaning blank rows /columns) around the pivot tables. You could hide/unhide this "safe zone" as needed.
If you don't want to do the above manually, I think that it should be possible to use an on edit trigger of Google Apps Script to detect that pivot tables are shown and insert new rows/columns to avoid this. If the Pivot Table top left cell returns #REF! your script could use a do.. while or a while to insert the required rows or columns. An smart algorithm will read the Pivot Table settings to calculate the required rows and columns and then insert the required rows / columns in one pass.
Related
Pivot table with Google Script
So I'm doing a really basic left join, basically joining different identifiers of my database, described below :
SELECT
main_id,
DT.table_1.mid_id AS mid_id,
final_id
FROM DT.table_1
LEFT JOIN DT.table_2 ON DT.table_1.mid_id = DT.table_2.mid_id
Table 1 is composed of four columns, main_id, mid_id, firstSeen and lastSeen.
There is 17,014,676 rows, for 519 MB of data. Each row is composed of a unique main_id - mid_id couple, but a main_id/mid_id can appear multiple times in the table.
Table 2 is composed of four columns, mid_id, final_id, firstSeen and lastSeen.
There is 66,779,079 rows, for 3.86 GB of data. In the same way, each row is composed of a unique mid_id - final_id couple, but a mid_id/final_id can appear multiple times in the table.
BigQuery is using only 3.11 GB for the query himself.
first_id and mid_id are integers, final_id is a string.
The Query result was too big for bigQuery to resolve so I had to create a "result" table, containing first, mid and final id with the exact type I wrote above. The "Allow Large Results" option had to be selected, or an error was thrown.
My problem is that this simple query already took an hour, and is not even finalised yet ! I read that the good practice would have been to do a RIGHT JOIN so that the first table in the join is the biggest, but still, an hour is awfully long, even for that case !
Do you, kind people of Stack Overflow, have an explanation ?
Thank you by advance !
I have been asked to model a star diagram.
I have 3 dimensions:
Date (day,month, year, week, quarter, ...)
place (500 distinct values)
Product (80k different products)
The main question is how many items (products) are stored at the end of a day in every place.
After some study-time with regards to dimensional modeling. I think I should implement a Periodic snapshot table. However reading trough the Kimball Docs, I noticed that a periodic snapshot demands an entry for every combination of the dimensions. This means I should add 40M rows every day (80k*500).
Knowing that the products are (real) slow movers and that many places store zero products during long periods, this sounds like an extreme overkill.
FYI the transactions in the source DB are 150k rows after three years.
So should I really add 40M rows every day, or could I just add the non-empty stores with their products specified? Also if for whatever reason one day all stores are empty, should I make an entry for that day (with dimensions N/A for store and product)?
You modeled correctly. It depends from the specifications, but normally you store only the products that are present in a location (you do not store zeroes), which could yield a number substantially lower than the maximum 80k.
If you want to further reduce your numbers, you could store the last N days and then start to move data in a "cold" table. You store (say) last 10 day snapshot, then only monthly snapshots in the main "hot" Fact Table.
Do not exclude the possibility to calculate the snapshot on the fly in report system, depending on your environment it could be easy (in MDX or DAX for example it is). Mixed solutions are also possible (i.e only the last month calculated on the fly).
After committing a row with a SERIAL (autonumber) column, the row is deleted, but when another row is added, the deleted row's sequence ID is not reused.
The only way I have found for reusing the deleted row's sequence ID is to ALTER the SERIAL column to an INTEGER, then change it back to a SERIAL.
Is there an easier quicker way for accomplishing the resetting of the next sequence ID so that there are no gaps in the sequence?
NOTE: This is a single-user application, so no worries about multiple users simultaneously inserting rows.
There isn't a particularly easy way to do that. You can reset the number by inserting … hmmm, once upon a long time ago, there were bugs in this, and you're using ancient enough versions of the software that on occasions the bug might still be relevant, though all current versions of Informix products do not have the bug.
The safe technique is to insert: 231-2 (note the minus 2; that's +2,147,483,646), then insert a row with 0 (to generate +2,147,483,647), then insert another row with 0 to trip the next sequence number back to 1. That insert operation will fail if there's already a row with 1 in the system and you have a unique constraint on the SERIAL column. You then need to insert the maximum value, or the value before the first gap that you want to fill (another failing insert). However, note that after filling the gap, the inserted values will increase by one, bumping into any still existing rows and causing insertion failures (because you do have a unique constraint/index on each SERIAL column, don't you — and don't 'fess up if you do not have such indexes; just go and add them!).
If you have a more recent version of Informix, you can insert +2,147,483,647 and then a single row to wrap the value without running into trouble. If you have an old version of Informix with the bug, then inserting +2,147,482,647 directly caused problems. IIRC, the trouble was that you ended up with NULLs being generated, but it was long enough ago now (another millennium) that I'm not absolutely sure of that any more.
None of this is really easy, in case you hadn't noticed.
Generally speaking, it is unwise to fill the gaps; you're better off leaving them and not worrying about them, or inserting some sort of dummy record that says "this isn't really a record — but the serial value is otherwise missing so I'm here to show that we know about it and it isn't missing but isn't really used either".