Nest 2 Calculated fields - tableau-desktop

I have the following two calculated fields and I was wondering how I could NEST the first formula into the second:
Quality Calc:
IF STR([Custom5 (Quality Metric)]) = "A" then "True"
ELSEIF STR([Custom5 (Quality Metric)]) = "B" then "True"
Else "False"
END
Quality Apps - Previous
sum(iif([Date Calc - Previous]=1, [Quality App Count],NULL))
Is it possible to nest Quality Calc into Quality Apps - Previous? I need to remove the [Quality App Count] then substitute the nested Quality Calc Thank you!
Is it possible to nest Quality Calc into Quality Apps - Previous? I need to remove the [Quality App Count] then substitute the nested Quality Calc Thank you!

Related

Is there a way to detect NaN and -NaN?

I want to save lua number to string and handle NaN case correctly.
Detecting any NaN is easy, x ~= x.
However, only one way which I've found to detect is it NaN or -NaN is to use tostring(x) == 'nan'. Is there a better way to do it?
Instead of tostring(x) == 'nan', which is not portable, you can do the comparison with the actual tostring call: tostring(x) == tostring(0/0) or tostring(x) == tostring(-(0/0)) depending on what you need. If you need to do multiple comparisons, you can save the result of tostring and reuse it.
There are more than two NaNs exist (actually, there are 2^52-1 NaNs according to IEEE-754).
Their tostring-ed representations are platform-dependent.
This is an example how to get three different NaNs (I'm using Lua 5.3 built with Visual Studio):
n = string.unpack(">d", string.pack(">d", 0/0):sub(1, -2).."#")
print(0/0, -(0/0), n) --> -1.#IND 1.#QNAN -1.#QNAN
So, it would be more correct to not distinguish between different variants of NaN.

R - how to exclude pennystocks from environment before calculating adjusted stock returns

Within my current research I'm trying to find out, how big the impact of ad-hoc sentiment on daily stock returns is.
Calculations functioned quite well and results also are plausible.
The calculations until now with quantmod package and yahoo financial data look like below:
getSymbols(c("^CDAXX",Symbols) , env = myenviron, src = "yahoo",
from = as.Date("2007-01-02"), to = as.Date("2016-12-30")
Returns <- eapply(myenviron, function(s) ROC(Ad(s), type="discrete"))
ReturnsDF <- as.data.table(do.call(merge.xts, Returns))
# adjust column names
colnames(ReturnsDF) <- gsub(".Adjusted","",colnames(ReturnsDF))
ReturnsDF <- as.data.table(ReturnsDF)
However, to make it more robust towards noisy influence of pennystock data I wonder, how its possible to exclude stocks that once in the time period go below a certain value x, let's say 1€.
I guess, the best thing would be to exclude them before calculating the returns and merge the xts object results or even better, before downloading them with the getSymbols command.
Has anybody an idea how this could work best? Thanks in advance.
Try this:
build a price frame of the Adj. closing prices of your symbols
(I use the PF function of the quantmod add-on package qmao which has lots of other useful functions for this type of analysis. (install.packages("qmao", repos="http://R-Forge.R-project.org”))
check by column if any price is below your minimum trigger price
select only columns which have no closings below the trigger price
To stay more flexible I would suggest to take a sub period - let’s say no price below 5 during the last 21 trading days.The toy example below may illustrate my point.
I use AAPL, FB and MSFT as the symbol universe.
> symbols <- c('AAPL','MSFT','FB')
> getSymbols(symbols, from='2018-02-01')
[1] "AAPL" "MSFT" "FB"
> prices <- PF(symbols, silent = TRUE)
> prices
AAPL MSFT FB
2018-02-01 167.0987 93.81929 193.09
2018-02-02 159.8483 91.35088 190.28
2018-02-05 155.8546 87.58855 181.26
2018-02-06 162.3680 90.90299 185.31
2018-02-07 158.8922 89.19102 180.18
2018-02-08 154.5200 84.61253 171.58
2018-02-09 156.4100 87.76771 176.11
2018-02-12 162.7100 88.71327 176.41
2018-02-13 164.3400 89.41000 173.15
2018-02-14 167.3700 90.81000 179.52
2018-02-15 172.9900 92.66000 179.96
2018-02-16 172.4300 92.00000 177.36
2018-02-20 171.8500 92.72000 176.01
2018-02-21 171.0700 91.49000 177.91
2018-02-22 172.5000 91.73000 178.99
2018-02-23 175.5000 94.06000 183.29
2018-02-26 178.9700 95.42000 184.93
2018-02-27 178.3900 94.20000 181.46
2018-02-28 178.1200 93.77000 178.32
2018-03-01 175.0000 92.85000 175.94
2018-03-02 176.2100 93.05000 176.62
Let’s assume you would like any instrument which traded below 175.40 during the last 6 trading days to be excluded from your analysis :-) .
As you can see that shall exclude AAPL and FB.
apply and the base function any applied(!) to a 6-day subset of prices will give us exactly what we want. Showing the last 3 days of prices excluding the instruments which did not meet our condition:
> tail(prices[,apply(tail(prices),2, function(x) any(x < 175.4)) == FALSE],3)
FB
2018-02-28 178.32
2018-03-01 175.94
2018-03-02 176.62

Project Euler #3 Ruby Solution - What is wrong with my code?

This is my code:
def is_prime(i)
j = 2
while j < i do
if i % j == 0
return false
end
j += 1
end
true
end
i = (600851475143 / 2)
while i >= 0 do
if (600851475143 % i == 0) && (is_prime(i) == true)
largest_prime = i
break
end
i -= 1
end
puts largest_prime
Why is it not returning anything? Is it too large of a calculation going through all the numbers? Is there a simple way of doing it without utilizing the Ruby prime library(defeats the purpose)?
All the solutions I found online were too advanced for me, does anyone have a solution that a beginner would be able to understand?
"premature optimization is (the root of all) evil". :)
Here you go right away for the (1) biggest, (2) prime, factor. How about finding all the factors, prime or not, and then taking the last (biggest) of them that is prime. When we solve that, we can start optimizing it.
A factor a of a number n is such that there exists some b (we assume a <= b to avoid duplication) that a * b = n. But that means that for a <= b it will also be a*a <= a*b == n.
So, for each b = n/2, n/2-1, ... the potential corresponding factor is known automatically as a = n / b, there's no need to test a for divisibility at all ... and perhaps you can figure out which of as don't have to be tested for primality as well.
Lastly, if p is the smallest prime factor of n, then the prime factors of n are p and all the prime factors of n / p. Right?
Now you can complete the task.
update: you can find more discussion and a pseudocode of sorts here. Also, search for "600851475143" here on Stack Overflow.
I'll address not so much the answer, but how YOU can pursue the answer.
The most elegant troubleshooting approach is to use a debugger to get insight as to what is actually happening: How do I debug Ruby scripts?
That said, I rarely use a debugger -- I just stick in puts here and there to see what's going on.
Start with adding puts "testing #{i}" as the first line inside the loop. While the screen I/O will be a million times slower than a silent calculation, it will at least give you confidence that it's doing what you think it's doing, and perhaps some insight into how long the whole problem will take. Or it may reveal an error, such as the counter not changing, incrementing in the wrong direction, overshooting the break conditional, etc. Basic sanity check stuff.
If that doesn't set off a lightbulb, go deeper and puts inside the if statement. No revelations yet? Next puts inside is_prime(), then inside is_prime()'s loop. You get the idea.
Also, there's no reason in the world to start with 600851475143 during development! 17, 51, 100 and 1024 will work just as well. (And don't forget edge cases like 0, 1, 2, -1 and such, just for fun.) These will all complete before your finger is off the enter key -- or demonstrate that your algorithm truly never returns and send you back to the drawing board.
Use these two approaches and I'm sure you'll find your answers in a minute or two. Good luck!
Do you know you can solve this with one line of code in Ruby?
Prime.prime_division(600851475143).flatten.max
=> 6857

Determine consecutive video clips

I a long video stream, but unfortunately, it's in the form of 1000 15-second long randomly-named clips. I'd like to reconstruct the original video based on some measure of "similarity" of two such 15s clips, something answering the question of "the activity in clip 2 seems like an extension of clip 1". There are small gaps between clips --- a few hundred milliseconds or so each. I can also manually fix up the results if they're sufficiently good, so results needn't be perfect.
A very simplistic approach can be:
(a) Create an automated process to extract the first and last frame of each video-clip in a known image format (e.g. JPG) and name them according to video-clip names, e.g. if you have the video clips:
clipA.avi, clipB.avi, clipC.avi
you may create the following frame-images:
clipA_first.jpg, clipA_last.jpg, clipB_first.jpg, clipB_last.jpg, clipC_first.jpg, clipC_last.jpg
(b) The sorting "algorithm":
1. Create a 'Clips' list of Clip-Records containing each:
(a) clip-name (string)
(b) prev-clip-name (string)
(c) prev-clip-diff (float)
(d) next-clip-name (string)
(e) next-clip-diff (float)
2. Apply the following processing:
for Each ClipX having ClipX.next-clip-name == "" do:
{
ClipX.next-clip-diff = <a big enough number>;
for Each ClipY having ClipY.prev-clip-name == "" do:
{
float ImageDif = ImageDif(ClipX.last-frame.jpg, ClipY.first_frame.jpg);
if (ImageDif < ClipX.next-clip-diff)
{
ClipX.next-clip-name = ClipY.clip-name;
ClipX.next-clip-diff = ImageDif;
}
}
Clips[ClipX.next-clip-name].prev-clip-name = ClipX.clip-name;
Clips[ClipX.next-clip-name].prev-clip-diff = ClipX.next-clip-diff;
}
3. Scan the Clips list to find the record(s) with no <prev-clip-name> or
(if all records have a <prev-clip-name> find the record with the max <prev-clip-dif>.
This is a good candidate(s) to be the first clip in sequence.
4. Begin from the clip(s) found in step (3) and rename the clip-files by adding
a 5 digits number (00001, 00002, etc) at the beginning of its filename and going
from aClip to aClip.next-clip-name and removing the clip from the list.
5. Repeat steps 3,4 until there are no clips in the list.
6. Voila! You have your sorted clips list in the form of sorted video filenames!
...or you may end up with more than one sorted lists (if you have enough
'time-gap' between your video clips).
Very simplistic... but I think it can be effective...
PS1: Regarding the ImageDif() function: You can create a new DifImage, which is the difference of Images ClipX.last-frame.jpg, ClipY.first_frame.jpg and then then sum all pixels of DifImage to a single floating point ImageDif value. You can also optimize the process to abort the difference (or sum process) if your sum is bigger than some limit: You are actually interested in small differences. A ImageDif value which is larger than an (experimental) limit, means that the 2 images differs so much that the 2 clips cannot be one next each other.
PS2: The sorting algorithm order of complexity must be approximately O(n*log(n)), therefore for 1000 video clips it will perform about 3000 image comparisons (or a little more if you optimize the algorithm and you allow it to not find a match for some clips)

splitting space delimited entries into new columns in R

I am coding a survey that outputs a .csv file. Within this csv I have some entries that are space delimited, which represent multi-select questions (e.g. questions with more than one response). In the end I want to parse these space delimited entries into their own columns and create headers for them so i know where they came from.
For example I may start with this (note that the multiselect columns have an _M after them):
Q1, Q2_M, Q3, Q4_M
6, 1 2 88, 3, 3 5 99
6, , 3, 1 2
and I want to go to this:
Q1, Q2_M_1, Q2_M_2, Q2_M_88, Q3, Q4_M_1, Q4_M_2, Q4_M_3, Q4_M_5, Q4_M_99
6, 1, 1, 1, 3, 0, 0, 1, 1, 1
6,,,,3,1,1,0,0,0
I imagine this is a relatively common issue to deal with but I have not been able to find it in the R section. Any ideas how to do this in R after importing the .csv ? My general thoughts (which often lead to inefficient programs) are that I can:
(1) pull column numbers that have the special suffix with grep()
(2) loop through (or use an apply) each of the entries in these columns and determine the levels of responses and then create columns accordingly
(3) loop through (or use an apply) and place indicators in appropriate columns to indicate presence of selection
I appreciate any help and please let me know if this is not clear.
I agree with ran2 and aL3Xa that you probably want to change the format of your data to have a different column for each possible reponse. However, if you munging your dataset to a better format proves problematic, it is possible to do what you asked.
process_multichoice <- function(x) lapply(strsplit(x, " "), as.numeric)
q2 <- c("1 2 3 NA 4", "2 5")
processed_q2 <- process_multichoice(q2)
[[1]]
[1] 1 2 3 NA 4
[[2]]
[1] 2 5
The reason different columns for different responses are suggested is because it is still quite unpleasant trying to retrieve any statistics from the data in this form. Although you can do things like
# Number of reponses given
sapply(processed_q2, length)
#Frequency of each response
table(unlist(processed_q2), useNA = "ifany")
EDIT: One more piece of advice. Keep the code that processes your data separate from the code that analyses it. If you create any graphs, keep the code for creating them separate again. I've been down the road of mixing things together, and it isn't pretty. (Especially when you come back to the code six months later.)
I am not entirely sure what you trying to do respectively what your reasons are for coding like this. Thus my advice is more general – so just feel to clarify and I will try to give a more concrete response.
1) I say that you are coding the survey on your own, which is great because it means you have influence on your .csv file. I would NEVER use different kinds of separation in the same .csv file. Just do the naming from the very beginning, just like you suggested in the second block.
Otherwise you might geht into trouble with checkboxes for example. Let's say someone checks 3 out of 5 possible answers, the next only checks 1 (i.e. "don't know") . Now it will be much harder to create a spreadsheet (data.frame) type of results view as opposed to having an empty field (which turns out to be an NA in R) that only needs to be recoded.
2) Another important question is whether you intend to do a panel survey(i.e longitudinal study asking the same participants over and over again) . That (among many others) would be a good reason to think about saving your data to a MySQL database instead of .csv . RMySQL can connect directly to the database and access its tables and more important its VIEWS.
Views really help with survey data since you can rearrange the data in different views, conditional on many different needs.
3) Besides all the personal / opinion and experience, here's some (less biased) literature to get started:
Complex Surveys: A Guide to Analysis Using R (Wiley Series in Survey Methodology
The book is comparatively simple and leaves out panel surveys but gives a lot of R Code and examples which should be a practical start.
To prevent re-inventing the wheel you might want to check LimeSurvey, a pretty decent (not speaking of the templates :) ) tool for survey conductors. Besides I TYPO3 CMS extensions pbsurvey and ke_questionnaire (should) work well too (only tested pbsurvey).
Multiple choice items should always be coded as separate variables. That is, if you have 5 alternatives and multiple choice, you should code them as i1, i2, i3, i4, i5, i.e. each one is a binary variable (0-1). I see that you have values 3 5 99 for Q4_M variable in the first example. Does that mean that you have 99 alternatives in an item? Ouch...
First you should go on and create separate variables for each alternative in a multiple choice item. That is, do:
# note that I follow your example with Q4_M variable
dtf_ins <- as.data.frame(matrix(0, nrow = nrow(<initial dataframe>), ncol = 99))
# name vars appropriately
names(dtf_ins) <- paste("Q4_M_", 1:99, sep = "")
now you have a data.frame with 0s, so what you need to do is to get 1s in an appropriate position (this is a bit cumbersome), a function will do the job...
# first you gotta change spaces to commas and convert character variable to a numeric one
y <- paste("c(", gsub(" ", ", ", x), ")", sep = "")
z <- eval(parse(text = y))
# now you assing 1 according to indexes in z variable
dtf_ins[1, z] <- 1
And that's pretty much it... basically, you would like to reconsider creating a data.frame with _M variables, so you can write a function that does this insertion automatically. Avoid for loops!
Or, even better, create a matrix with logicals, and just do dtf[m] <- 1, where dtf is your multiple-choice data.frame, and m is matrix with logicals.
I would like to help you more on this one, but I'm recuperating after a looong night! =) Hope that I've helped a bit! =)
Thanks for all the responses. I agree with most of you that this format is kind of silly but it is what I have to work with (survey is coded and going into use next week). This is what I came up with from all the responses. I am sure this is not the most elegant or efficient way to do it but I think it should work.
colnums <- grep("_M",colnames(dat))
responses <- nrow(dat)
for (i in colnums) {
vec <- as.vector(dat[,i]) #turn into vector
b <- lapply(strsplit(vec," "),as.numeric) #split up and turn into numeric
c <- sort(unique(unlist(b))) #which values were used
newcolnames <- paste(colnames(dat[i]),"_",c,sep="") #column names
e <- matrix(nrow=responses,ncol=length(c)) #create new matrix for indicators
colnames(e) <- newcolnames
#next loop looks for responses and puts indicators in the correct places
for (i in 1:responses) {
e[i,] <- ifelse(c %in% b[[i]],1,0)
}
dat <- cbind(dat,e)
}
Suggestions for improvement are welcome.

Resources