Updating a query for false positives - google-sheets

I work in a compliance role at a very small start-up and review a lot of information,for example bank transfers/direct deposits/ACHs every day. A report is pulled from BigQuery,which is exported to Google Sheets.
My question is there are a lot of false positives (basically, "posting data" that repeats often). I'm trying to eliminate it.
One idea, was just to update the query for key words:
WHERE postingdata LIKE 'PersonName%'
But it's tired and time-consuming. And I feel e there's a better way, perhaps 'filtering' the results and then feeding it back to the query. Any ideas or tips or just general thoughts?

In this case you can use group by in your query. This is how you can use this clause.
You can see this code.
SELECT account,TypeTransaction,amount,currency
FROM `tblBankTransaction`
The code returns this data, and some rows are repeated; for example, rows 1 and 7 with the account 894526972455, and it's a deposit.
In this case, I will use the group by clause.
SELECT account,TypeTransaction,amount,currency
FROM `tblBankTransaction`
group by account,TypeTransaction,amount,currency
And it returns this data:
You can see in this example that the account 894526972455 with a deposit only returns 1 row. The same account returns a second row, but is a transfer; it’s a different type of transaction. It depends on the information you have and what column you want to group.

within GS you can try UNIQUE or QUERY with group by aggregation or SORTN with mode 2 as 3rd parameter

Related

Formula to sort rows of bank statement

I export my bank transactions to a PDF, that I then paste to a google spreadsheet.
Problem is: I may need to sort the transactions on my spreadsheet, and after reordering by date the amounts and balance may "shift" when there are several transactions on the same day:
It's not a big problem to me, but my accountant is all lost.
I would like to find a way to identify the orders of the transactions of a same date, by comparing the amounts/balance to the final balance of the previous date.
I managed to create a formula using a MATCH that would identify the first transaction of a specific date, but if I were to make it work for 10-20 potential transactions within a same date, it would get stupidly long and complex. I may eventually do that, but before i'd like to know if there is an easier solution.
I can add as many columns as I want, and I don't mind using scripts.
What I cannot do is create a column that would recalculate the balance according to the order the transactions are in. That would be the easiest solution, but if my accountant were to compare with what is on the real bank account, he would find discrepancies and be just as lost.
Thank you!
As #gries said:
Since your PDF contains the transactions already ordered the way you want you can assign to each of them an incremental ID.
In such a way, you will be able to restore the initial order ordering by the transaction ID instead of using the date that could be repeated.

using sum and countifs to get a percentage of 'yes'es across multiple columns by month and team - is there a simpler way?

I've been asked to create a summary for some google form responses, and though I have a working solution, I can't help but feel there must be a more elegant one.
The form collects data related to case checking - every month each team (there's 100+ teams) has to check a certain number of cases based on how many staff are in their team, and enter the results for each case they've checked in the google form. The team that have set this up want me to summarise the data by team, month, and section of the form (preliminary questions, case recording, outcomes, etc). There are 8 sections on the live form, ranging from 1-13 questions, all with Yes/No/NA/blank answers.
(honestly, it's not how I'd have approached setting all this up, but that is out of my hands!)
So they're essentially looking for a live monthly summary with team names down the side, section names along the top, and a %age completed that will keep up with entries as they come in (where we can also use importrange and query to pull the relevant bits into other google sheet summaries, as and when needed).
What I've currently got is this:
=iferror(sum(countifs('Form Responses'!$B:$B,$A3,'Form
Responses'!$F:$F,"Yes",'Form Responses'!$E:$E,">="&$B$1,'Form
Responses'!$E:$E,"<"&edate($B$1,1)),countifs('Form
Responses'!$B:$B,$A3,'Form Responses'!$G:$G,"Yes",'Form
Responses'!$E:$E,">="&$B$1,'Form
Responses'!$E:$E,"<"&edate($B$1,1)),countifs('Form
Responses'!$B:$B,$A3,'Form Responses'!$H:$H,"Yes",'Form
Responses'!$E:$E,">="&$B$1,'Form
Responses'!$E:$E,"<"&edate($B$1,1)),countifs('Form
Responses'!$B:$B,$A3,'Form Responses'!$I:$I,"Yes",'Form
Responses'!$E:$E,">="&$B$1,'Form
Responses'!$E:$E,"<"&edate($B$1,1)),countifs('Form
Responses'!$B:$B,$A3,'Form Responses'!$J:$J,"Yes",'Form
Responses'!$E:$E,">="&$B$1,'Form
Responses'!$E:$E,"<"&edate($B$1,1)),countifs('Form
Responses'!$B:$B,$A3,'Form Responses'!$K:$K,"Yes",'Form
Responses'!$E:$E,">="&$B$1,'Form
Responses'!$E:$E,"<"&edate($B$1,1)))/(countifs('Form
Responses'!$B:$B,$A3,'Form Responses'!$E:$E,">="&$B$1,'Form
Responses'!$E:$E,"<"&edate($B$1,1))*6),0)
It works, but it feels like a bit of a brute-force-and-ignorance solution. I've tried countifs & array, I've looked a pivot but I can't get the section groups, I've had a play with query but I can't figure out how to ask it to count all Yeses in multiple columns at once.
Is there a more elegant solution, or do I have to resign myself to setting up the next financial year's summaries like this?
Edit:
You can use plain array boolean multiplication to achieve the count, as trues are converted to 1s and false are converted to 0s:
=TO_PERCENT(ARRAYFORMULA(
SUM((f!F1:K="Yes")*(f!E1:E>=B1)*(f!E1:E<EDATE(B1,1))*(f!B:B=A3))/
SUM(6*(f!E1:E>=B1)*(f!E1:E<EDATE(B1,1))*(f!B:B=A3))
)
)
Renamed Form Responses to f
Numerator: SUM of
Question filter (f!F:K =Yes) and
Month filter (f!E:E is within month of B1) and
Team filter(B:B = A3)
Denominator: 6 times the SUM of
Month filter (f!E:E is within month of B1) and
Team filter(B:B = A3)
On this sample sheet that you provided you'll notice two new tabs. MK.Retab and MK.Summary.
On MK.Retab is a single formula in A2 that "re-tabulates" all of your survey data into a format that is much easier to analyze going forward. That tab can be "hidden" on your real project. It will continue to build the 6 column dataset forever. It would be a sort of "back end" sheet, only used to supply data to any further downstream analysis.
On MK.Summary is a single formula in cell A1 that Query's that dataset from MK.Retab and shows the percentage of Yes's by month by section by team in a format similar to what you proposed. I coded it to display the most recent month at the left, immediately to the right of the team names, and to push historical data off to the right. Even though people are often used to seeing time go from left to right, I find that the opposite method nice because it keeps you from having to scroll sideways to see the most recent data. It is very simple to change should you want to by getting rid of the "desc" that you find in the "order by" clause of the query string.
I find this kind of two step solution to problems like your useful, because while the summary migth not be exactly what you want, it's always easier to build formulas and analyses off of the data as laid out in the MK.Retab sheet.
As for the formula in MK.Retab, it is based on a method that I came up with a while back that constructs a large vlookup where the [search key] is actually a sequence of decimal numbers that is built by counting the number of rows in your real data set and multiplying by the number of columns of data that need to be repeated for each row. I built a demo some time ago that I'm happy to share with folks if you want to understand better how it works.
You said that your goal was to understand the formulas so that you could modify them going forward as needed. I'm not sure how easy that will be to do, but I can try my best to answer any questions you might have about the method or the solution generally.
What I can tell you is that some of the formulas are more complicated than they need to be because you just used Q1 Q2 Q3 etc instead of the actual questions. if you had a list of the questions asked somewhere (on some other tab, say), and what you wanted to call/name their corresponding "sections", it would make the formula significantly less complicated. As it stands, I had to use the appearance of the word "Comments", in row 1 to distinguish between where one section ended and another section began. The upside to that decision though, is that the formula I wrote is infinitely expandable to the right. That is, if you were to add another 100 columns worth of questions and answers to the sample set here, the formula would be able to handle that and break it out, so long as there was the word "Comments" between each section.
Hope all this helps.

PowerBI counts rows in non related table including filtering and non-matches

I have two tables in PowerBI and a slicer, presented below in an abstracted way.
I want to know the number of orders placed for a customer in a given date range. This data is a sample for illustration - there are actually around 10,000 Customers and 500,000 Orders and both tables have many other fields, Ids etc.
My challenge -
Whilst this is easy enough do by relating the tables and doing a count, the difficulty comes in when I still want to see customers with 0 orders and on top of that I want this to work within a date range. In other words, instead of the customers with no orders disappearing form the list, I want them to appear in the list, but with a 0 value, depending on the date range. It would also be good if this could act as a measure, so I can see the number of total customers that have not ordered on a month by month basis. I have tried outer joins, merge queries, cross joins and lookups and cant seem to crack it.
Example 1: If I set the order date slicer to be: 02/01/2017 to 01/01/2018 I want the following results
Example 2: If I set the order date slicer to be: 03/01/2017 to 06/01/2017 I want the following results
Any help appreciated!
Thanks
This is entirely possible with a Measure. When you're using the Order field to count the rows for each customer, you're essential doing a COUNTROWS() function.
With your relationship still active, we can Prefix this in a measure to check for the blanks, and in those cases, return 0. something like this would work
Measure = IF(ISBLANK(COUNTROWS(Orders)),0,COUNTROWS(Orders))
In this case, 'Orders' is the table containing the Order and Order Date fields

division not working with continuous query in influxdb

I am trying to generate a continuous query in influxDB. The query is to fetch the hits per second by doing (1/response time) of the value which i am already getting for another series (say series1).
Here is the query:
select (1000/value) as value from series1 group by time(1s) into api.HPS;
My problem is that the query "select (1000/value) as value from series1 group by time(1s)" works fine and provide me results but as soon as I store the result into continuous query, it starts to give me parse error.
Please help.
Hard to give any concrete advice without the actual parse error returned and perhaps the relevant log lines. Try providing those to the mailing list at influxdb#googlegroups.com or email them to support#influxdb.com.
There's an email on the Google Group that might be relevant, too. https://groups.google.com/d/msgid/influxdb/c99217b3-fdab-4684-b656-a5f5509ed070%40googlegroups.com
Have you tried using whitespace between the values and the operator? E.g. select (1000 / value) AS value....

How to build rails analytics dashboard

I'm looking to build an analytics dashboard for my data in a rails application.
Let's say I have a list of request types "Fizz", "Buzz", "Bang", "Bar".
I want to display a count for each day based on type.
How should I do this?
Here is what I plan on doing:
Add get_bazz_by_day, get_fizz_by_day, etc to the appropriate models.
In each model get all records of type Fizz, then create an array that stores date and count.
format in view so a JS library can format it into a pretty graph.
Does this sound reasonable?
Depending on number of records, your dashboard can soon get performance problems.
Step 1 is misleading. Don't get the data for each day individually, try to get them all at once.
In Step 2 you can have the database do the the aggregation over days, with the group method.
See http://guides.rubyonrails.org/active_record_querying.html#group
Fizz.select("date(created_at) as fizzed_day, count(*) as day_count").
group("date(created_at)")
In Step 3 you need to take care that days without any fizzbuzz are still displayed, as they are not returned in the query.

Resources