Pandas: grep like function - grep

Is there a grep like built-in function in Pandas to drop a row if it has some string or value?
Thanks in advance.

Have a look at df['column_label].str
Below example will drop all rows where column A holds 'a' character and 'B' equals 20.
In [46]: df
Out[46]:
A B
0 foo 10
1 bar 20
2 baz 30
In [47]: cond = df['A'].str.contains('a') & (df['B'] == 20)
In [48]: df.drop(df[cond].index.values)
Out[48]:
A B
0 foo 10
2 baz 30

Related

How to count 2 columns with a range

A B C
Val 1 2
Val 2 1
Val 3 1
Item 1 Val 1 1
Item 2 Val 2 1
Item 3 Val 3 0
Item 4 Val 1 0
Consider the above sheet. In the first 3 rows I am counting how many times corresponding val# shows up in the sheet. I have done that with: =COUNTIF($B$5:$B, A1) However, I can't figure out how to make it count only if the value matches and column C doesn't have a 1 next to it on same row. Is this possible?
try COUNTIFS:
=COUNTIFS(B$5:B, A1, C$5:C, "<>"&1)
make sure C column is formatted as Number

How to return a count of fields with a given value in a record?

I have a database table with the following fields :
---------------------
FIELDS : | H1 | H2 | H3 | H4
---------------------
VALUES : | A | B | A | C
---------------------
For a given record (row), I would like to count the number of fields with a value of A. In the above, for example, there are two fields with a value of A, so the expected result would be : 2
How can I achieve this?
I am trying to answer the question from a database point of view.
You have a table with one or more rows and every row has in the four columns either an 'A' or something else. For a given row (or for many rows) you want to get the number of columns that have an 'A' in it.
As one commenter pointed out you can't sum letters but you can check whether or not a value is the one you are looking for and then count this occurence as a 1 or 0. Finally sum those values and return the sum.
SELECT (CASE H1 WHEN 'A' THEN 1 ELSE 0 END) +
(CASE H2 WHEN 'A' THEN 1 ELSE 0 END) +
(CASE H3 WHEN 'A' THEN 1 ELSE 0 END) +
(CASE H4 WHEN 'A' THEN 1 ELSE 0 END) AS number_of_a
FROM name_of_your_table;
For your example row this will return:
NUMBER_OF_A
===========
2
If you have more than one row you'll get the number of As for every row.
I test this it work Thanx for help.
SELECT count(H1) + count(H2) + count(H3) + count(H4) + count(H5) +
count(H6) + count(H7) + count(H8) as TOT
from Table T
where T.H1 = 'A' or T.H2 = 'A' or T.H3 = 'A' or T.H4 = 'A'
or T.H5 = 'A' or T.H6 = 'A' or T.H7 = 'A' or T.H8 = 'A'
group by T.ID
order by 1 DESC
Other solution ...

How to apply function across each row and one of the parameters passed in is a table

I want to create a column that checks to see that each row of a table can be found in another table using 3 column ids. x, y and z are the columns of the table and transferrable is the second table
I tried this:
elligibleCrossMarginTransfers:{[x;y;z;transferrable]
potentialTransfers: select from transferrable where marginPctPost>collateralUpperLimitPct,not crossMargin;
if[1<count select from potentialTransfers where client=x, primeBroker=y,parentPortfolioId=z;
:1b]; //determine if parentPortfolio of crossMargin exists as possible transfer from other non-cross Margin counts
:0b
};
crossMarginNegExcess:update elligibleToTransfer:elligibleCrossMarginTransfers'[client;primeBroker;parentPortfolioId;transferrable] from crossMarginNegExcess
Are you looking for something like this?
q)0N!t:flip `a`b`c!(`a`b`c;1 2 3;10 20 30)
+`a`b`c!(`a`b`c;1 2 3;10 20 30)
a b c
------
a 1 10
b 2 20
c 3 30
q)0N!t2:flip `a`b`c!(`a`B`c;1 -2 3;10 -20 30)
+`a`b`c!(`a`B`c;1 -2 3;10 -20 30)
a b c
--------
a 1 10
B -2 -20
c 3 30
q)t[`elligibleToTransfer]:(`a`b#t) in `a`b#t2
q)t
a b c elligibleToTransfer
--------------------------
a 1 10 1
b 2 20 0
c 3 30 1
q)
updating with two examples you can attempt on your data (provide some samples for more complete answer)
crossMarginNegExcess[`elligibleToTransfer]:(`client`primeBroker`parentPortfolioId#crossMarginNegExcess) in select client,primeBroker,parentPortfolioId from transferrable where marginPctPost>collateralUpperLimitPct,not crossMargin
//all qsql
update elligibleToTransfer:1b from `crossMarginNegExcess where ([]client;primeBroker;parentPortfolioId) in select client,primeBroker,parentPortfolioId from transferrable where marginPctPost>collateralUpperLimitPct,not crossMargin

Merging files based on common fields

I'm trying to join three text files in similar formats based on common fields, while keeping the uncommon fields. Here's an example:
File1:
X
A 1
B 3
C 2
D 1
File2:
Y
A 3
C 2
E 3
File3:
Z
A 2
E 1
D 1
F 3
Merged:
X Y Z
A 1 3 2
B 3 - -
C 2 2 -
D 1 - 1
E - 3 1
F - - 3
It doesn't have to be a - where there's no corresponding value. The join command in this question https://unix.stackexchange.com/questions/43417/join-two-files-with-matching-columns works well except that it doesn't keep the uncommon fields.
Thank you.
join can't do what you're asking for, but here's a Python program that does:
#!/usr/bin/env python
import sys
files = map(open, sys.argv[1:]) # list of input files
headers = map(file.readline, files) # list of strings
headers = map(str.strip, headers)
blanks = ['-'] * len(headers)
data = {} # { rowname : [datum...] }
for ii, infile in enumerate(files): # read each file
for line in infile:
key, value = line.split()
if key not in data:
data[key] = blanks[:] # deep copy
data[key][ii] = value
print '\t', '\t'.join(headers)
for key, values in sorted(data.iteritems()):
print key, '\t', '\t'.join(values)

Problem with DISTINCT and Linq2SQL

I have a table like this:
idinterpretation | iddictionary | idword | meaning
1 1 1115 hello
2 1 1115 hi
3 1 1115 hi, bro
5 1 1118 good bye
6 1 1118 bye-bye
7 2 1119 yes
8 2 1119 yeah
9 2 1119 all rigth
And i try to get distinct rows (DISTINCT idword). So, at first i tried:
return dc.interpretations.Where(i => i.iddictionary == iddict).
ToList<interpretation>().Distinct(new WordsInDictionaryDistinct()).
OrderBy(w => w.word.word1).Skip(iSkip).Take(iTake);
But i have about 300.000 rows in my table and it is wrong solution.
Then, i tried:
IEnumerable<interpretation> res = (from interp in dc.interpretations
group interp by interp.idword into groupedres
select new interpretation
{
idword = groupedres.Key,
idinterpretation = groupedres.SingleOrDefault(i => i.idword == groupedres.Key).idinterpretation,
interpretation1 = groupedres.SingleOrDefault(i => i.idword == groupedres.Key).interpretation1,
iddictionary = groupedres.SingleOrDefault(i => i.idword == groupedres.Key).iddictionary
}).Skip(iSkip).Take(iTake);
and i took error: #foreach (interpretation interp in ViewBag.Interps) System.NotSupportedException: Explicit construction of entity type 'vslovare.Models.interpretation' in query is not allowed.
Is it really a way to take distinct rows and to have in finish rows like this:
idinterpretation | iddictionary | idword | meaning
1 1 1115 hello
5 1 1118 good bye
7 2 1119 yes
?
dictionaries:
dictionary table
iddictionary | dictionary_name
words:
word table
idword | word_name
interpretations:
interpretation table
idinterpretation | iddictionary | idword | meaning
I think your second attempt is almost there - you probably need to use a GroupBy clause to get this working within SQL.
Something like:
var query = from row in dc.interpretations
where row.iddictionary == iddict
group row by idword into grouped
select grouped.FirstOrDefault();
return query.OrderBy(w => w.word.word1).Skip(iSkip).Take(iTake);
On why your query is taking too long - in general, if your query is slow it will be because the data you are searching and returning is really large - or because it is poorly indexed at the database level. To help find out, it is analysing or profiling your query - see this article on MSDN http://msdn.microsoft.com/en-us/magazine/cc163749.aspx

Resources