delete specific rows from a flat file - ssis-2012

How to delete specific rows from a flat file on the basis of first 4 characters of a row? For example; delete the rows starting with '1000'. (After deleting these rows, the file will remain there with other rows that start with '2000', '3000', and so on.
Or, how to write the rows starting with '1000', '2000', and '3000' into different flat files? (Each row has varying number of columns and columns have different width).

Related

How to specify a range in a Google Sheets formula, where the end row number is a reference to another cell?

I have several google spreadsheets with different number of records (rows) - let's say
file 1: 200.000 records (rows)
file 2: 350.000 records (rows)
file 3: 246.000 records (rows)
etc.
I use a lot of formulas (20-30) that reference entire columns from file 1:
sumif(a$2:a$200000,">3")
countif(b$2:b$200000, "=n")
etc.
I want to reuse the already created formulas for the other files, but since the number of records there is different, I would have to replace the 200.000 with 350.000 for file 2 in 20-30 cells, with 246.000 for file 3 in 20-30 cells etc.
That would be too much work.
Is there a way to specify the end point of the range not with a constant but by pointing to a cell that contains the number of rows?
e.g.
I would add in cell z1 the number of rows: 200000
The other formulas would contain something like
sumif(a$2:a$ (something that tells sheets to use as row number the number from z1) )
This way I would need to only replace the number in z1, and all formulas would be updated correctly. Any ideas?
I tried using indirect:
="a"&indirect("z1")
where z1 contains 200000
This pastes
a200000
But if I try using it in a range, it's not recognized as a range
=sum(a1:"a"&indirect("z1"))
Any ideas how to do that correctly?
why not just skip it... instead of:
=sumif(a$2:a$200000,">3")
use:
=sumif(a$2:a,">3")
to answer your indirecting, the correct syntax would be:
=sum(INDIRECT("a1:a"&z1))
You don't need to use the line numbers limit on this case.
Just use sumif(A$2:A,">3") and it will read the whole column A starting from line 2

Can you use ARRAYFORMULA to SUM multiple columns changing dynamically

edit: must work with blank rows
I have a list of users in Column F and in Row 1 a list of dates.
I want to use ARRAYFORMULA to sum the values from relevant columns per each user. As an example, this sums 4 columns (F,G,H,I) per user:
=ARRAYFORMULA(IF(LEN(F1:F),G1:G+H1:H+I1:I+J1:J,""))
My question is, if it possible to sum for a dynamic number of columns. For example, I'll choose a number (e.g 7, 30...) and it will sum the relevant number of columns.
Can this be done?
Here's a spreadsheet with the above data:
https://docs.google.com/spreadsheets/d/17hyBEF1va4GMYZUFkDxxjJ0pXH2oCccgIaBT79GIsGc/edit#gid=0
In A2 I choose how many columns, and it will sum the relevant number of columns. In C1 I use such a formula to sum 4 columns using ARRAYFORMULA as an example (which is static, not dynamic).
Note that there was a nice solution, but because there is a blank row (#3) it causes the suggested solution to leave the sum for the final row (#7) empty. I'm looking for a solution that will work with blank rows.
There are 3 parameters:
A2: no of cols
G2: top left cell of values
F:F: col of row field (to count number of rows)
=ArrayFormula(MMULT(N(INDIRECT(CELL("address",G2)&":"&ADDRESS(COUNTA(F:F),COLUMN(G2)+A2-1,4))),N(TRANSPOSE(COLUMN(INDIRECT(CELL("address",G2)&":"&ADDRESS(COUNTA(F:F),COLUMN(G2)+A2-1,4)))^0))))

Automatically Number/ Count All Columns or Rows in Google Sheets

I just wanted a simple way to number columns or rows in a Google Sheet, and most answers I've found offer many options that are far more complicated than I needed them to be.
Example: I want to number every column in the active sheet, starting with 1 for Column A and counting up by 1, regardless of the content of any other cells on the sheet and if I add columns to the sheet later, I want them to automatically update with the correct column numbers.
Another way is to use SEQUENCE.
So putting =SEQUENCE(99) in A1 would number the first 99 rows, from 1 to 99.
To number columns, just rotate that array, with TRANSPOSE.
So if A1 held =TRANSPOSE(SEQUENCE(26))
that would number columns A to Z with the numbers 1 to 26.
If you want to number both columns and rows,try:
in A1: =SEQUENCE(999), and
in B1: =TRANSPOSE(SEQUENCE(25,1,2))
I realise that this is numbering a specific number of rows, or columns, but I often find that very useful. You could modify this to number all columns or rows by adding some count to determine the total number of rows or columns, and using that in place of the first parameter for the SEQUENCE function.
The simplest way I've found to do this is by putting either of the following formulas in A1:
For numbering rows: =ArrayFormula(ROW(A:A))
And for columns: =ArrayFormula(COLUMN(1:1))
After putting the formula in A1, I'll usually hide the column or row the formula is in so I don't accidentally change or delete it.
If I want the counting to start at 1 on the 2nd, 3rd, or 4th row or column, then adding a -1,-2, or -3 respectively to the end of the formula gets that done.
For example: To number columns starting with 1 in Column C, the formula I put in A1 is =ArrayFormula(COLUMN(1:1)-2).
This may be way more basic than most people on this site are generally looking for, but for some reason it took me an unexpectedly long time to find it/ figure it out, so I thought maybe someone else would find it useful in the future.

Google Sheets - Highlight cell based on another cell

I have a database with hundreds (will be thousands) of entries related to utility assets. These assets are ranked and inspected on various conditions. There are multiple inspections done periodically and the old inspection data is accessible along side the new data. I would like to use conditional formatting to highlight a cell in column Q, based on duplicate rows in column G. For example: I have one asset with an ID of 1234 in column G with 3 different inspections, and thus three entries on different rows. I want to highlight column Q if that value (in column Q) is not the same among all three inspections in the various rows. Is this something that is possible? I have tried various combinations using the =IF, =COUNTIF(S) functions. The end goal here is to recognize that column Q is not equal on all three inspections so that it can be updated to be the same value.
In the example sheet the value in column Q on row 3 does not match row's 4, 5. The value in column Q on row 7 does not match row's 6, 8, and 9. The information in all Column besides G is subject to change, so it must be based off that value.
https://docs.google.com/spreadsheets/d/1xAvRaxMii3Xijbuw3ITKo0CBPhXkW9-Bgdg_LRxv1qA/edit?usp=sharing
Logically, if there are at least as many cells with the same ID but different Q value as there are with the same ID and the same Q value, then the current cell should be highlighted:
=countifs(G:G,G3,Q:Q,"<>"&Q3)>=countifs(G:G,G3,Q:Q,Q3)

Insert duplicate rows (based on one cell-value) in google sheets

So I have this formula that copies rows (with data in col A) into a new range. Column A contains number indicating how many duplicates the row should yield in the result. Also the output rows gets sorted based on the value in column A.
=sort(arrayformula(vlookup(
transpose(split(query(rept(row(D2:D)&" ",A2:A),,9^9)," ")),
arrayformula({row(D2:D),{A2:A,D2:F}}),
{2,3,4,5},
0)),
1,
TRUE)
This is not exectly what I need thou. Instead of having a single value in the cells in column A that indicates how many times the row should be duplicated I need to have a text string like “2,3,5” in every cell in that column, where the individual numbers in the string indicates the position of the row in output (rather than the number of times the row should be copied).
For example, in the output I want the row with the string “2,3,5” to be copied three times. The output should have one of the rows be 2:nd from top, the other 3:rd from top, and the last one the 5:th from top.
If I could have the A2:A part of this range {A2:A,J2:N} instead contain the matching values for split(A2:A) I think it would do what I want.
This is a copy of my google sheet. Hopefully it's possible to understand what it is I'm trying to achieve.
https://docs.google.com/spreadsheets/d/1sp5DRBwFP0-aG-FvjUPKBmyylz0WoPB63ASOOdUGdnI/edit?usp=sharing

Resources