Search subdocuments in ElasticSearch with Searchkick

Search subdocuments in ElasticSearch with Searchkick - ruby-on-rails

Suppose I have a series of embedded or child documents that I'd like to search on, but return their parents as my results, like Buildings and Units:
Building A
- Unit 1F
- Unit 1R
- Unit 2F: 1200 sq ft
- Unit 2R: 2300 sq ft
Building B
- Unit 202: 500 sq ft
- Unit 203: 650 sq ft
Now suppose I want to return all buildings that have units >= 1000 sq ft. How would I do that?

Store the unit sizes as an Array:
class Building
def search_data
{
# ... other fields
unit_sq_ft: units.map(&:sq_ft)
}
end
end
and search with:
Building.search "pool", where: {unit_sq_ft: {gte: 1000}}

Related

Formula to classify multiple, specific values in a range using Google Sheets

I could be doing this completely wrong, or I could be on the right path, I have no idea! I'm trying to grade a decision based on 3 criteria. The grades are AAA-A and BBB-B, etc. but for now I just need AAA-A and can figure out the rest.
Essentially, we want Col. J to populate based on what Col.'s G-I say. In my head it's super easy but I want to automate this step.
So I start with col.I and see the pairing.. AAA-A results are any of these "G/G" "LG/G" "G/LG" or "R/R". If it is one of those 4 pairings then we start at AA grade.
Then I check col.G (it doesnt matter now if I check H or G first), and if G>=.5 we grade it higher at AAA, if its less than .5 then do nothing and keep it at AA.
Then I look at col. H (or G if we started at H) and if it is a "Y" we grade down from AA to A. or AAA to AA. But it is "N" do nothing.
What I have so far is attached. It technically works for 3/4 of these cells but that could be a coincidence. The results column(J) should be row3 - AA, row4 - AA, row5 - AAA, row6 - AA.
And for one additional test, imagine: col.g = .64, col.h = Y, col.i = G/G -- then we want AA as the result.
Definitely the hardest test I've had in excel/sheets. I appreciate the help! Thanks in advance!
Formula I tried:
=Ifs((or(I3="G/G",I3="LG/G",I3="G/LG",I3="R/R"),"AA", and(or(I3="G/G",I3="LG/G",I3="G/LG",I3="R/R"),G3>0.5),"AAA",H3="Y","A")
Data Sample:
G
H
I
J
3
-0.07
N
R/R
AA
4
-0.46
N
R/R
AA
5
0.64
N
G/G
AA
6
0.76
Y
LG/G
AA

As presented, your formula simply returns an error, and seems like a misinterpretation of how Ifs works. However, it suggest you're trying to Nest If statements. And, from your description, I think that makes sense.
Assuming that's a valid interpretation, the following does what you want.
(At least as far as AAA-A is concerned).
=If(or(I3="G/G",I3="LG/G",I3="G/LG",I3="R/R"),if(G3<0.5,"AA","AAA"),if(H3="Y","A","Not an A"))
The BBB-B logic would be the same (just nested in where "Not an A" is).

Defining allowed value ranges in type member in F#

I'm playing around with units of measure in F# and I'm currently trying to create compound units of measure for length and mass to reflect colloquial speech in the imperial system, e.g. "I'm 5 foot 10" or "She weighs 8 stone and 11 pounds" in the US and the UK.
I've defined a module for standard (non-compound) units like so:
module Units
// Mass
[<Measure>] type kg // Kilogram
[<Measure>] type g // Gram
[<Measure>] type lb // Pound (mass)
[<Measure>] type st // Stone (mass)
// Conversions
...
// Length
[<Measure>] type m // Metre
[<Measure>] type cm // Centimetre
[<Measure>] type inch // Inch
[<Measure>] type ft // Foot
// Conversions
...
And I've defined compound units in a different module:
module CompoundUnits
open Units
// Mass
type StonesAndPounds = {
Stones: float<st>
Pounds: float<lb>
}
// Length
type FeetAndInches = {
Feet: float<ft>
Inches: float<inch>
}
However, with the way I've currently written the compound mass and length types, there's room for illegal states (such as negative values) and states that are technically correct but not preferred:
// 39 lbs = 2 st 11 lbs
let eightStoneEleven: StonesAndPounds = { Stones = 6.0<st>; Pounds = 39.0<lb> }
// 22" = 1' 10"
let fiveFootTen: FeetAndInches = { Feet = 4.0<ft>; Inches = 22.0<inch> }
In his book "Domain Modeling made Functional" Scott Wlaschin talks about making illegal states unrepresentable, so I was wondering if there was a way to enforce some kind of restriction on my compound types so that 0<ft> <= Feet, 0<inch> <= Inches <= 12<inch> and 0<st> <= Stones, 0<lb> <= Pounds <= 14<lb>.

A common pattern is to create a module for the type, which contains its definition plus create functions and other validation logic.
Scott has some examples on his website, as part of his 'Designing with Types' series.
https://fsharpforfunandprofit.com/posts/designing-with-types-non-strings/
You can't enforce the restrictions on the units of measure themselves, but you could create dedicated types to represent your compound measurements as Scott does with SafeDate and NonNegativeInt etc.
These could still use the 'standard' units of measure for their component properties.
A quote from the article:
"Units of measure can indeed be used to avoid mixing up numeric values of different type, and are much more powerful than the single case unions we’ve been using.
On the other hand, units of measure are not encapsulated and cannot have constraints. Anyone can create a int with unit of measure say, and there is no min or max value."

Spark streaming join wierd results

I'm trying to observe how do spark streaming uses the RDDs inside DStream to join two DStreams, but seeing strange results which is confusing.
In my code, I am collecting data from a socket stream, split them into 2 PairedDStreams by some logic. In order to have some batches collected for join, I have created a window to collect last three batches. However, the results of join is clueless. Please help me understand.
object Join extends App {
val conf = new SparkConf().setMaster("local[4]").setAppName("KBN Streaming")
val sc = new SparkContext(conf)
sc.setLogLevel("ERROR")
val BATCH_INTERVAL_SEC = 10
val ssc = new StreamingContext(sc, Seconds(BATCH_INTERVAL_SEC))
val lines = ssc.socketTextStream("localhost", 8091)
//println(s"lines.slideDuration : ${lines.slideDuration}")
//lines.print()
val ds = lines.map(x => x)
import scala.util.Random
val randNums = List(1, 2, 3, 4, 5, 6)
val less = ds.filter(x => x.length <= 2)
val lessPairs = less.map(x => (Random.nextInt(randNums.size), x))
lessPairs.print
val greater = ds.filter(x => x.length > 2)
val greaterPairs = greater.map(x => (Random.nextInt(randNums.size), x))
greaterPairs.print
val join = lessPairs.join(greaterPairs).window(Seconds(30), Seconds(30))
join.print
ssc.start
ssc.awaitTermination
}
Test Results:
------------------------------------------- Time: 1473344240000 ms
------------------------------------------- (1,b) (4,s)
------------------------------------------- Time: 1473344240000 ms
------------------------------------------- (5,333)
------------------------------------------- Time: 1473344250000 ms
------------------------------------------- (2,x)
------------------------------------------- Time: 1473344250000 ms
------------------------------------------- (4,the)
------------------------------------------- Time: 1473344260000 ms
------------------------------------------- (2,a) (0,b)
------------------------------------------- Time: 1473344260000 ms
------------------------------------------- (2,ten) (1,one) (3,two)
------------------------------------------- Time: 1473344260000 ms
------------------------------------------- (4,(b,two))

When join is called, the two RDDs are recomputed again and thus they will contain different values than those shown when printed. So, we need to cache when the both RDDs are computed for the first time and thus same values will be used when join is called later (instead of recomputing both RDDs once again). I tried this on multiple examples and it works fine. I was missing the basic core concept of Spark.

Excerpt from "Learning Spark" book:
Persistence (Caching)
As discussed earlier, Spark RDDs are lazily evaluated, and sometimes we may wish to use the same RDD multiple times. If we do this naively, Spark will recompute the RDD and all of its dependencies each time we call an action on the RDD.

Return multiple columns / a dataframe in Deedle based on row-wise mapping

I want to look at each row in a frame and construct multiple columns for a new frame based on values in that row.
The final result should be a frame that has the columns of the original frame plus the new columns.
I have a solution but I wonder if there is a better one. I think the best way to explain the desired behavior is with an example. I'm using Deedle's titanic data set:
#r #"F:\aolney\research_projects\braintrust\code\QualtricsToR\packages\Deedle.1.2.3\lib\net40\Deedle.dll";;
#r #"F:\aolney\research_projects\braintrust\code\QualtricsToR\packages\FSharp.Charting.0.90.12\lib\net40\FSharp.Charting.dll";;
#r #"F:\aolney\research_projects\braintrust\code\QualtricsToR\packages\FSharp.Data.2.2.2\lib\net40\FSharp.Data.dll";;
open System
open FSharp.Data
open Deedle
open FSharp.Charting;;
#load #"F:\aolney\research_projects\braintrust\code\QualtricsToR\packages\FSharp.Charting.0.90.12\FSharp.Charting.fsx";;
#load #"F:\aolney\research_projects\braintrust\code\QualtricsToR\packages\Deedle.1.2.3\Deedle.fsx";;
let titanic = Frame.ReadCsv(#"C:\Users\aolne_000\Downloads\titanic.csv");;
This is what that frame looks like:
val titanic : Frame<int,string> =
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 -> 1 False 3 Braund, Mr. Owen Harris male 22 1 0 A/5 21171 7.25 S
1 -> 2 True 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female 38 1 0 PC 17599 71.2833 C85 C
My approach grabs each row, uses some selection logic, and then returns a new row value as a dictionary. Then I use Deedle's expansion operation to convert the values in this dictionary to new columns.
titanic?test <- titanic |> Frame.mapRowValues( fun x -> if x.GetAs<int>("Pclass") > 1 then dict ["A", 1; "B", 2] else dict ["A", 2 ; "B", 1] );;
titanic |> Frame.expandCols ["test"];;
This gives the following new frame:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked test.A test.B
0 -> 1 False 3 Braund, Mr. Owen Harris male 22 1 0 A/5 21171 7.25 S 1 2
1 -> 2 True 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female 38 1 0 PC 17599 71.2833 C85 C 2 1
Note the last two columns are test.A and test.B. Effectively this approach creates a new frame (A and B) and then joins the frame to the existing frame.
This is fine for my use case but it is probably confusing for others to read. Also it forces the prefix, e.g. "test", on the final columns which isn't highly desirable.
Is there a way to append the new values to the end of the row series represented in the code above by x?

I find your approach quite elegant and clever. Because the new series shares the index with the original frame, it is also going to be pretty fast. So, I think your solution may actually be better than the alternative option (but I have not measured this).
Anyway, the other option would be to return new rows from your Frame.mapRowValues call - so for each row, we return the original row together with the additional columns.
titanic
|> Frame.mapRowValues(fun x ->
let add =
if x.GetAs<int>("Pclass") > 1 then series ["A", box 1; "B", box 2]
else series ["A", box 2 ; "B", box 1]
Series.merge x add)
|> Frame.ofRows

Can we create the targets at run time using informatica Powercenter

When we do not know the number of targets, Can we create the targets at run time using informatica Powercenter.
Suppose we have below source:
Employee:
Dept_ID EmpName Sal
10 A 200
11 B 100
10 C 200
10 D 400
12 E 500
12 F 400
...
It can have any number of distinct Dept_ID.
I want to load all EmpName and Sal of a particular Dept_ID into a separate target table (i.e target name should be as Tar_10 or Tar_11 where 10 & 11 are Dept_ID).

You can achieve this by the following method:
While creating the target check the include file name port checkbox.
Use an expression to create the file name port name, something like " 'Tar' || Dept_ID" should do.
Use a sorter to sort your input with respect to Dept_ID.
Use a transaction control transformation on the condition that when Dept_ID is different from previous Dept_ID use "TC_COMMIT_AFTER", this will keep changing the file name depending on your input.
Your Output will look like this :
TAR_10
10 A 200
10 C 200
10 D 400
TAR_11
11 B 100
TAR_12
12 E 500
12 F 400

Yes Sumit is right. You can achieve this by creating the File name port and Transaction control. Also if your target is file then you can write the whole record in 1 port so there is no need to worry about the target structure either.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Search subdocuments in ElasticSearch with Searchkick - ruby-on-rails

Store the unit sizes as an Array: class Building def search_data { # ... other fields unit_sq_ft: units.map(&:sq_ft) } end end and search with: Building.search "pool", where: {unit_sq_ft: {gte: 1000}}

Related

Formula to classify multiple, specific values in a range using Google Sheets

Defining allowed value ranges in type member in F#

Spark streaming join wierd results

Return multiple columns / a dataframe in Deedle based on row-wise mapping

Can we create the targets at run time using informatica Powercenter

Categories

Resources