Is it possible to make a nested FOREACH without COGROUP in PigLatin?

Is it possible to make a nested FOREACH without COGROUP in PigLatin? - foreach

I want to use the FOREACH like:
a:{a_attr:chararray}
b:{b_attr:int}
FOREACH a {
res = CROSS a, b;
-- some processing
GENERATE res;
}
By this I mean to make for each element of a a cross-product with all the elements of b, then perform some custom filtering and return tuples.
==EDIT==
Custom filetering = res_filtered = FILTER res BY ...;
GENERATE res_filtered.
==EDIT-2==
How to do it with a nested CROSS no more no less inside a FOR loop without prior GROUP or COGROUP?

Depending on the specifics of your filtering, you may be able to design a limited set of disjoint classes of elements in a and b, and then JOIN on those. For example:
If your filtering rules are
if a_attr starts with "Foo" and b is 4, accept
if a_attr starts with "Bar" and b is greater than 17, accept
if a_attr begins with a letter in [m-z] and b is less than 0, accept
otherwise, reject
Then you can write a UDF that will return 1 for items satisfying the first rule, 2 for the second, 3 for the third, and NULL otherwise. Your CROSS/FILTER then becomes
res = JOIN a BY myUDF(a), b BY myUDF(b);
Pig drops null values in JOINs, so only pairs satisfying your filtering criteria will be passed.

CROSS generates a cross-product of all the tuples in each relation. So there is no need to have a nested FOREACH. Just do the CROSS and then FILTER:
a: {a_attr: chararray}
b: {b_attr: int}
crossed = CROSS a, b;
crossed: {a::a_attr: chararray,b::b_attr: int}
res = FILTER crossed BY ... -- your custom filtering
If you have the FILTER immediately after the CROSS, you should not have (unnecessary) excessive IO trouble from the CROSS writing the entire cross-product to disk before filtering. Records that get filtered will never be written at all.

Related

Performance of accessing table via reference vs ipairs loop

I'm modding a game. I'd like to optimize my code if possible for a frequently called function. The function will look into a dictionary table (consisting of estimated 10-100 entries). I'm considering 2 patterns a) direct reference and b) lookup with ipairs:
PATTERN A
tableA = { ["moduleName.propertyName"] = { some stuff } } -- the key is a string with dot inside, hence the quotation marks
result = tableA["moduleName.propertyName"]
PATTERN B
function lookup(type)
local result
for i, obj in ipairs(tableB) do
if obj.type == "moduleName.propertyName" then
result = obj
break
end
end
return result
end
***
tableB = {
[1] = {
type = "moduleName.propertyName",
... some stuff ...
}
}
result = lookup("moduleName.propertyName")
Which pattern should be faster on average? I'd expect the 'native' referencing to be faster (it is certainly much neater), but maybe this is a silly assumption? I'm able to sort (to some extent) tableB in a order of frequency of the lookups whereas (as I understand it) tableA will have in Lua random internal order by default even if I declare the keys in proper order.

A lookup table will always be faster than searching a table every time.
For 100 elements that's one indexing operation compared to up to 100 loop cycles, iterator calls, conditional statements...
It is questionable though if you would experience a difference in your application with so little elements.
So if you build that data structure for this purpose only, go with a look-up table right away.
If you already have this data structure for other purposes and you just want to look something up once, traverse the table with a loop.
If you have this structure already and you need to look values up more than once, build a look up table for that purpose.

Ordering items by sizename when it's not alphabetically logic?

In my app, the admin may add sizes to his products in this order.
Variant.create(size_name: "L")
Variant.create(size_name: "S")
Variant.create(size_name: "XXL")
Variant.create(size_name: "XL")
Sizes could also be (30,24, 33, 31, 29)
In my product view, the select tag display in the order it has been created.
I would like to sort from the smallest size to the biggest (S, M, L ...).
With the numerically sizes,I can order from the smallest to the biggest it's Okay
How I am supped to make sure that both sizes (the numerically and the alphabetically) could be sorted from the smallest to the biggest?

There are many ways to solve this, but at the core of any solution you need to define the order manually (or use a third party library which has already written this manual ordering for you?).
For example, you could somewhere define e.g.
SIZE_NAMES = %w[XS S M L XL XXL]
and then elsewhere in the code, use something like:
variants.sort_by { |variant| SIZE_NAMES.index(variant.size) }
For a more "advanced" solution, you could instead consider defining each size as a custom object rather than a regular String. Take a look at the Comparable module, and the <=> ("spaceship") operator.
By utilising this, you could potentially implement it in such a way that e.g. variants.sort will automatically compare variants by their "converted" size, and order them as you expect.

If you wish to do sorting on db side then you have two options:
Predefined sort like so:
Variant.order(
"CASE size_name
WHEN 'S' THEN 1
WHEN 'L' THEN 2
WHEN 'XL' THEN 3
WHEN 'XXL' THEN 4
ELSE 10
END, size, id"
)
You might want to move it to scope so in case you need to add another size_name there is only one place to change
With active record enums:
enum size_name: { s: 0, l: 1, xl: 2, xxl: 3 }
That way, you can still assign the field by the string/symbol, but the underlying data will actually be an integer, so you can just use order(:size_name, :size) to sort by size_name and size.
Also this way you can add index to speed up ordering

What is a better way than a for loop to implement an algorithm that involves sets?

I'm trying to create an algorithm along these lines:
-Create 8 participants
-Each participant has a set of interests
-Pair them with another participant with the least amount of interests
So what I've done so far is create 2 classes, the Participant and Interest, where the Interest is Hashable so that I can create a Set with it. I manually created 8 participants with different names and interests.
I've made an array of participants selected and I've used a basic for in loop to somewhat pair them together using the intersection() function of sets. Somehow my index always kicks out of range and I'm positive there's a better way of doing this, but it's just so messy and I don't know where to start.
for i in 0..<participantsSelected.count {
if participantsSelected[i].interest.intersection(participantsSelected[i+1].interest) == [] {
participantsSelected.remove(at: i)
participantsSelected.remove(at: i+1)
print (participantsSelected.count)
}
}
So my other issue is using a for loop for this specific algorithm seems a bit off too since what if they all have 1 similar interest, and it won't equal to [] / nil.
Basically the output I'm trying is to remove them from the participants selected array once they're paired up, and for them to be paired up they would have to be with another participant with the least amount of interests with each other.
EDIT: Updated code, here's my attempt to improve my algorithm logic
for participant in 0..<participantsSelected {
var maxInterestIndex = 10
var currentIndex = 1
for _ in 0..<participantsSelected {
print (participant)
print (currentIndex)
let score = participantsAvailable[participant].interest.intersection(participantsAvailable[currentIndex].interest)
print ("testing score, \(score.count)")
if score.count < maxInterestIndex {
maxInterestIndex = score.count
print ("test, \(maxInterestIndex)")
} else {
pairsCreated.append(participantsAvailable[participant])
pairsCreated.append(participantsAvailable[currentIndex])
break
// participantsAvailable.remove(at: participant)
// participantsAvailable.remove(at: pairing)
}
currentIndex = currentIndex + 1
}
}
for i in 0..<pairsCreated.count {
print (pairsCreated[i].name)
}

Here is a solution in the case that what you are looking for is to pair your participants (all of them) optimally regarding your criteria:
Then the way to go is by finding a perfect matching in a participants graph.
Create a graph with n vertices, n being the number of participants. We can denote by u_p the vertex corresponding to participant p.
Then, create weighted edges as follows:
For each couple of participants p, q (p != q), create the edge (u_p, u_q), and weight it with the number of interests these two participants have in common.
Then, run a minimum weight perfect matching algorithm on your graph, and the job is done. You will obtain an optimal result (meaning the best possible, or one among the best possible matchings) in polynomial time.
Minimum weight perfect matching algorithm: The problem is strictly equivalent to the maximum weight matching algorithm. Find the edge of maximum weight (let's denote by C its weight). Then replace the weight w of each edge by C-w, and run a maximum weight matching algorithm on the resulting graph.
I would suggest that yoy use Edmond's blossom algorithm to find a perfect matching in your graph. First because it is efficient and well documented, second because I believe you can find implementations in most existing languages, but also because it truly is a very, very beautiful algorithm (it ain't called blossom for nothing).
Another possibility, if you are sure that your number of participants will be small (you mention 8), you can also go for a brute-force algorithm, meaning to test all possible ways to pair participants.
Then the algorithm would look like:
find_best_matching(participants, interests, pairs):
if all participants have been paired:
return sum(cost(p1, p2) for p1, p2 in pairs), pairs // cost(p1, p2) = number of interests in common
else:
p = some participant which is not paired yet
best_sum = + infinite
best_pairs = empty_set
for all participants q which have not been paired, q != p:
add (p, q) to pairs
current_sum, current_pairs = find_best_matching(participants, interests, pairs)
if current_sum < best_sum:
best_sum = current_sum
best_pairs = current_pairs
remove (p, q) from pairs
return best_sum, best_pairs
Call it initially with find_best_matching(participants, interests, empty_set_of_pairs) to get the best way to match your participants.

Is it possible to create a variable and make its assignment based on certain conditions in a cypher query?

I am trying to create an array of values that will be assigned based on the outcome of a case test. This test will be inside a query that I already know works with a preset value in the query.
The query I am trying to embed in the case test is something like this:
WITH SPLIT (('07/28/2015'), '/' AS cd
MATCH (nodeA: NodeTypeA)-(r:ARelation)->(nodeB: NodeTypeB)
WITH cd, SPLIT (nodeA.ADate, '/') AS dd, nodeA, nodeB, r
WHERE
(TOINT(cd[2])> TOINT(dd[2])) OR (TOINT(cd[2]= TOINT(dd[2]) AND ((TOINT(cd[0])> TOINT(dd[0])) OR (TOINT(cd[0])= TOINT(dd[0]) AND (TOINT(cd[1])>= TOINT(dd[1])))))
RETURN nodeA, nodeB, r
I want to replace the current date with whatever date will be 6 months from the current date, and I came up with something like this, though I am not sure where I would put it in my query or if it would even work (do I initialize the new variable for instance somehow?):
WHEN ((TOINT(cd[0])> 6))
THEN
TOINT(fd[2])=TOINT(cd[2])+1, TOINT(fd[0])=TOINT(cd[0])-6, TOINT(fd[1])=TOINT(cd[1])
ELSE
TOINT(fd[2])=TOINT(cd[2]), TOINT(fd[0])=TOINT(cd[0])+6, TOINT(fd[1])=TOINT(cd[1])
fd would then replace the cd in the original query's WHERE segment. Where would my case test go, is it correctly written (and if not, what is wrong), and would I need something else added to make it all work?

Just use a WITH block to do a computation and bind it to a new variable, like this:
WITH 2 + 2 as y RETURN y;
That basically assigns the value 4 to y.
In your query, you already have a big WITH block. Just put your computations in those, bound to new variables, and you can then refer to those variables in subsequent expressions.
Don't try to modify these variables, just create new ones (with new WITH blocks) as needed. If you need variables that can actually change, then...well hey you're working with a database, the ultimate way to store and update information. Create a new node, and then update it as you see fit. :)

This is my proposed solution
Explanation: I have declared four variables in my query i.e. name1, name2, ken and lana and I am using these variables for creating MATCH pattern (in the MATCH clause) and filtering those in the Where clause.
WITH "Lau" AS name1,
"L" AS name2,
"Keanu Reeves" AS ken,
"Lana Wachowski" AS lana
MATCH(x:Person{ name: ken})-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(y:Person),
(x1:Person{name: lana})-[:DIRECTED]->(m)<-[:DIRECTED]-(y1:Person)
WHERE y.name CONTAINS name1 OR
y.name CONTAINS name2 OR
(y.name CONTAINS name1 AND y.name CONTAINS name2)
RETURN x, m, y, x1;

Array replacement on relationship property in Neo4j

So, let's say I have relationship r, with property r.myarray:
[1,2,3,4,5,6,7]
and I need to write a query which will replace the items in the array - up to an including an arbitrary member guaranteed to be in the array (let's say 3 in this case) - with another array - let's say:
[6,12,13]
to get result:
[6,12,13,4,5,6,7]
I got as far as seeing that you can use RANGE or subset notation for the array (e.g. r.myarray[0..x]) to specify part of the array, and could theoretically do SET to replace the array with the first array plus the second subset (r.myarray[x..r.myarray.length], or something like that). I am about half a mile from a complete answer here, though.
edit: Final, interpolat-able query:
START r=relationship(726)
SET r.myarray = [1,2,3,4] + filter(y in r.ancestors where NOT (y IN [718]));

Range probably isn't what you want. Range produces a collection of numbers. It's good for looping, like if you want to go through all the numbers from 1-10, but it's not that useful with other array indexes. You probably want a combination of the + operator on collections, index operations, with possibly a dash of extract and filter. Combining those will let you do basically whatever you like. Here are some examples of the things you can do. I'm using a WITH clause just to show a data sample, you could of course do this on any node property:
/* Return only the first three items */
with [1,2,3,4,5,6,7] as arr return arr[0..3];
/* Cut out the 4th item, otherwise return everything */
with [1,2,3,4,5,6,7] as arr return arr[0..3] + arr[4..];
/* Return only the even numbers */
with [1,2,3,4,5,6,7] as arr
return filter(y in
extract(x in arr | case when (x % 2 = 0) then x end) where y > 0);

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Is it possible to make a nested FOREACH without COGROUP in PigLatin? - foreach

Related

Performance of accessing table via reference vs ipairs loop

Ordering items by sizename when it's not alphabetically logic?

What is a better way than a for loop to implement an algorithm that involves sets?

Is it possible to create a variable and make its assignment based on certain conditions in a cypher query?

Array replacement on relationship property in Neo4j

Categories

Resources