awk: print columns based on values of another column - printing

I have a file with six columns, and I only want to print the first two columns of the lines that have a value >3 in the sixth column.
This statement prints all lines where the sixth column > 3
awk '$6 > 3' file > out
This statement prints the first two columns:
awk '{print $1,$2}' file > out
Anyone knows how to combine these two commands into a one-liner?

you are almost there,just as you said, "combine them"! . try this:
awk '$6>3{print $1,$2}' file >out

Related

Grep getting numbers in a range

I'm trying to display rows which include numbers in range 20-200 in column 12 from a csv file. However, I don't get the right input.
I tried this:
grep -E "^[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,([2-9][0-9]|1[0-9][0-9]|200)" file.csv > sc1_d.csv
What do I do wrong?
Any thoughts?
Keep it simple, just use awk:
awk -F, '(20 <= $12) && ($12 <= 200)' file
If that doesn't do exactly what you want then edit your question to explain in what way this and your own attempt "don't get the right input" and to show concise, testable sample input and expected output.
There are two issues here.
(1) your way of finding column 12 won't work in any rows that have cells containing commas. And that's difficult to resolve unless you know who wrote the CSV file. Since there's no single CSV spec, there are multiple ways of escaping commas and quotes in CSV files. For example, in one spec, a comma within a cell is escaped with a backslash (for example, my doctor's name might be written as Dr. Bob\, MD. In another, any cell value that contains a comma needs to be put in double quotes, and double quotes themselves need to be written as two double quotes (so, "Dr. Bob, MD").
But if for some reason you happen to know that embedded commas in cell values is not an issue in your CSV file, you can ignore that.
(2) That expression would also allow some other values, such as 201 or 20000B, that you don't want. So if you know that there this is not the last column, you can just add commas after the choices:
([2-9][0-9]|1[0-9][0-9]|200),
And if you can't make that assumption, then you can just look for a comma OR end of line:
([2-9][0-9]|1[0-9][0-9]|200)(,|$)
And finally you can employ a "repeat" to specify exactly 11 instances of the [^,]*, pattern. So now your grep command looks like this:
grep -E "^([^,]*,){11}([2-9][0-9]|1[0-9][0-9]|200)(,|$)" file.csv > sc1_d.csv

Join character to columnar array

I want to add a character ("~ ") to the front of each value of a columnar array, but every formula I've tried concatenates the values into a single cell rather than back to the column array. Do I need to add SPLIT? What am I doing wrong?
This is what I've tried most recently
=JOIN("~ ",FILTER(Categories!A2:A,LEN(Categories!A2:A)))
=ArrayFormula(TEXTJOIN("~ ",TRUE,Categories!A2:A))
=ArrayFormula(JOIN("~ ",{Categories!A2:A}))
Ultimately, what I would like to see in a single column is:
~ Category 1
~ Category 2
etc.
=ARRAYFORMULA(IFERROR(SPLIT(IF(Categories!A2:A<>"", "~ ♦"&Categories!A2:A, ), "♦")))
=ARRAYFORMULA(IF(Categories!A2:A<>"", "~ "&Categories!A2:A, ))

TextPad Replace Character and Line Feed with Nothing

How do I replace a line in TextPad ' with nothing (ie: delete lines with just that one character)?
I have an Excel Spreadsheet containing three columns:
Column A - single quote
Column B - some number
Column C - single quote plus a comma
There are over 90,000 rows on this spreadsheet with data in column B. There are over one million rows with just a single quote in column A because I did a "Ctrl+D" on that column to copy the value in that column (a single quote) down to all rows.
When I copy and paste these three columns into TextPad, I end up with over one million lines. I replaced the tabs with nothing using the F8/Replace dialog.
(Replace: tab with: empty string)
The majority of what is left are lines that contain only a single quote. I want to delete these 900,000 extra lines.
How do I specify a Replace (delete) of single quote + line feed. I do not want to delete any of the single quotes from the lines that include a number that came from column B.
I just figured it out. The backslash n is the line feed.
If I check Regular Expression and enter this Find what:
'\n
(Keeping empty string for Replace with) and Replace All, I have deleted those extra lines.
I also experienced the same...it did not work for me until I did this:
uncheck the regular expression first before entering \n in the find box and replacing with whatever you chose to (in my case, it was ',').
Your result might be an entire list becoming transposed (that's what happened to my data).

How to ignore or replace "carriage return" in informix 4GL

Say I have the following query:
insert into myTable values ("TEST 1
TEST 2")
Then I'm selecting the description to output to an excel sheet:
select description from myTable
Result:
description TEST 1TEST 2
This will result the output for the single-line description column to be split on 2 separate lines in the .xls output.
How can I resolve this so I get the entire string on 1 line.
Can we loop it through and find the carriage return and replace it? Or?
Thank you.
Using the "replace" clause, you can do the following:
select replace(replace(description,chr(10),' '),chr(13),' ') from myTable
chr(10) is ASCII 10, which refers to LF (Line Feed).
chr(13) is ASCII 13, which refers to CR (Carriage Return).

Google Sheets: Split data and delete first part of each new cell

I'm feeding data from a SAAS into a Google Sheet, and would need to format it a bit to be able to work with it.
Most columns are ok, but one column has multiple parameters in one. Each cell looks like (data anonymized):
affiliate_fees: None
affiliate_percent: 0.X
amount_refunded: 0
author_fees: 0
author_id: xxxx
author_percent: 0.5
coupon_id: xxxx
created_at: 2016-xxxxx
currency: USD
custom_gateway?: None
earnings_usd: None
meta: {u'url': None, u'class': u'transaction', u'image_url': None, u'description': None, u'name': u'xxxx'}
net_charge: xxx
net_charge_usd: xxx
paypal_payment_id: PAY-XXXXXXX
purchased_at: 2016-xxxx
refundable: True
sale_id: xxxx
status: None
stripe_charge_token: None
stripe_invoice_id: None
total_fedora_fee: None
total_processor_fee: None
user_id: xxxx
vat_fees: None
I've already found out how to SPLIT the data into different columns - I'm doing it via =SPLIT(CC2,CHAR(10))
Now what I'd like to do, ideally in the same operation, is to remove the part before the first colon :
So the goal is: ending up with only the values (part after the :) spread into different columns. I can manually enter the column names. For examaple:
--------------------------------------------------
| affiliate_fees | affiliate_percent |
--------------------------------------------------
| None | 0.X |
--------------------------------------------------
| ... | ... |
--------------------------------------------------
Any hints? Thanks for your time!
Note: I don't really need the meta: line, it can be discarded. I just left it in there because it might (or might not?) make things extra tricky
Alternative 1
Google Sheets introduced few months ago "Split text to columns" as a menu command. See Separate cell text into columns for further details.
Once you separate the text, you could use copy & paste > transpose
Alternative 2
A single formula alternative is to use
=ArrayFormula(transpose(REGEXEXTRACT(A1:A25,{"(.*[\w\?])+\:","\: (.*)+"})))
This will return an 25 x 2 array, and you will not have to manually add the column headers.
Alternative 3
If you still want to use SPLIT, you could use ": " as the separator and FALSE as the third argument to threat them as a single separator, but this also will split the meta: ... into several columns.
Assume that your data start at A1, then the formula to use is:
=SPLIT(A1,": ",FALSE)
To include all the rows with data, you will have to fill down this formula. Then do copy & paste > transpose.
In this spreadsheet I used this formula in cell E2
=ArrayFormula({regexreplace(split(A3, char(10)), "\:(.+)",""); regexreplace(split(A3, char(10)), "(.+)\: ","")})
This will create a row with headers and the values in row 2. If you don't want the headers, just use
=ArrayFormula(regexreplace(split(A3, char(10)), "(.+)\: ",""))
See if that works for you ?

Resources