What is a better way than a for loop to implement an algorithm that involves sets? - ios

I'm trying to create an algorithm along these lines:
-Create 8 participants
-Each participant has a set of interests
-Pair them with another participant with the least amount of interests
So what I've done so far is create 2 classes, the Participant and Interest, where the Interest is Hashable so that I can create a Set with it. I manually created 8 participants with different names and interests.
I've made an array of participants selected and I've used a basic for in loop to somewhat pair them together using the intersection() function of sets. Somehow my index always kicks out of range and I'm positive there's a better way of doing this, but it's just so messy and I don't know where to start.
for i in 0..<participantsSelected.count {
if participantsSelected[i].interest.intersection(participantsSelected[i+1].interest) == [] {
participantsSelected.remove(at: i)
participantsSelected.remove(at: i+1)
print (participantsSelected.count)
}
}
So my other issue is using a for loop for this specific algorithm seems a bit off too since what if they all have 1 similar interest, and it won't equal to [] / nil.
Basically the output I'm trying is to remove them from the participants selected array once they're paired up, and for them to be paired up they would have to be with another participant with the least amount of interests with each other.
EDIT: Updated code, here's my attempt to improve my algorithm logic
for participant in 0..<participantsSelected {
var maxInterestIndex = 10
var currentIndex = 1
for _ in 0..<participantsSelected {
print (participant)
print (currentIndex)
let score = participantsAvailable[participant].interest.intersection(participantsAvailable[currentIndex].interest)
print ("testing score, \(score.count)")
if score.count < maxInterestIndex {
maxInterestIndex = score.count
print ("test, \(maxInterestIndex)")
} else {
pairsCreated.append(participantsAvailable[participant])
pairsCreated.append(participantsAvailable[currentIndex])
break
// participantsAvailable.remove(at: participant)
// participantsAvailable.remove(at: pairing)
}
currentIndex = currentIndex + 1
}
}
for i in 0..<pairsCreated.count {
print (pairsCreated[i].name)
}

Here is a solution in the case that what you are looking for is to pair your participants (all of them) optimally regarding your criteria:
Then the way to go is by finding a perfect matching in a participants graph.
Create a graph with n vertices, n being the number of participants. We can denote by u_p the vertex corresponding to participant p.
Then, create weighted edges as follows:
For each couple of participants p, q (p != q), create the edge (u_p, u_q), and weight it with the number of interests these two participants have in common.
Then, run a minimum weight perfect matching algorithm on your graph, and the job is done. You will obtain an optimal result (meaning the best possible, or one among the best possible matchings) in polynomial time.
Minimum weight perfect matching algorithm: The problem is strictly equivalent to the maximum weight matching algorithm. Find the edge of maximum weight (let's denote by C its weight). Then replace the weight w of each edge by C-w, and run a maximum weight matching algorithm on the resulting graph.
I would suggest that yoy use Edmond's blossom algorithm to find a perfect matching in your graph. First because it is efficient and well documented, second because I believe you can find implementations in most existing languages, but also because it truly is a very, very beautiful algorithm (it ain't called blossom for nothing).
Another possibility, if you are sure that your number of participants will be small (you mention 8), you can also go for a brute-force algorithm, meaning to test all possible ways to pair participants.
Then the algorithm would look like:
find_best_matching(participants, interests, pairs):
if all participants have been paired:
return sum(cost(p1, p2) for p1, p2 in pairs), pairs // cost(p1, p2) = number of interests in common
else:
p = some participant which is not paired yet
best_sum = + infinite
best_pairs = empty_set
for all participants q which have not been paired, q != p:
add (p, q) to pairs
current_sum, current_pairs = find_best_matching(participants, interests, pairs)
if current_sum < best_sum:
best_sum = current_sum
best_pairs = current_pairs
remove (p, q) from pairs
return best_sum, best_pairs
Call it initially with find_best_matching(participants, interests, empty_set_of_pairs) to get the best way to match your participants.

Related

Finding the consecutive win in Cypher query language

from fig we can see that Arsenal have won three match consecutively but I could not write the query.
Here is a query that should return the maximum number of consecutive wins for Arsenal:
MATCH (a:Club {name:'Arsenal FC'})-[r:played_with]-(:Club)
WITH ((CASE a.name WHEN r.home THEN 1 ELSE -1 END) * (TOINT(r.score[0]) - TOINT(r.score[1]))) > 0 AS win, r
ORDER BY TOINT(r.time)
RETURN REDUCE(s = {max: 0, curr: 0}, w IN COLLECT(win) |
CASE WHEN w
THEN {
max: CASE WHEN s.max < s.curr + 1 THEN s.curr + 1 ELSE s.max END,
curr: s.curr + 1}
ELSE {max: s.max, curr: 0}
END
).max AS result;
The WITH clause sets the win variable to true iff Arsenal won a particular game. Notice that the ORDER BY clause converts the time property to an integer, because the ordering of numeric strings does not work properly if the strings could be of different lengths (I am being a bit picky here, admittedly). The REDUCE function is used to calculate the maximum number of consecutive wins.
======
Finally, here are some suggestions for some improvements to your data model. For example:
It looks like your played_with relationship always points from the home team to the away team. If so, you can get rid of the redundant home and away properties, and you can also rename the relationship type to HOSTED to make the direction of the relationship more clear.
The scores and time should be stored as integers, not strings. That would make your queries more efficient, and easier to write and understand.
You could also consider splitting the scores property into two scalar properties, say homeScore and awayScore, which would make your code more clear. There seems to be no advantage to storing the scores in an array.
If you made all the above suggested changes, then you would just need to change the beginning of the above query to this:
MATCH (a:Club {name:'Arsenal FC'})-[r:HOSTED]-(:Club)
WITH ((CASE a WHEN STARTNODE(r) THEN 1 ELSE -1 END) * (r.homeScore - r.awayScore)) > 0 AS win, r
ORDER BY r.time
...

ios- thread 1 exc_bad_instruction error in app

I am writing an app to simulate the nba lottery. I have already written the codes to generate the random combinations, and assigned them to each team.
Here is my method to simulate the drawings and assign the draft positions to each team. standingsArray is an array of Team items of type ObjectWrapper, with values of name, seed, wins, losses, draft position exc... for each team. So basically what Im doing is I have 14 balls and randomly choose 4 balls, which constitute a combination (order doesn't matter). So essentially there are a total of 1001 total possible combinations, but one is thrown out. (you can ignore the first while loop because that is just there so that the thrown out combination isnt selected). A number of combinations is assigned to the 14 lottery teams based on record (250 for worst team, 199 for second worst exc...). The argument in my method standingsArray already has the number of possibilities assigned to each team. Next, I randomly pull 4 balls from the total possibilities, and the team with that combination gets the first pick. But because all the combinations for that team selected cant be chosen again for the second pick, I have to remove all of those combinations, but that is very complicated so instead, i make a new array called tempPossibilities which appends all the combinations for every team except the one just selected, which then allows me to generate a new combination to select from.
However, I am getting an error at this line for j in 0...(standingsArray[i].possibilities?.count)!-1{ It says bad instruction error, and I cannot figure out why I am getting this error. And what else doesnt make sense is that the for loop works and the tempPossibilities array is fully populated with the correct amount of combinations (without the lottery team), even though the error happens at the for loop?
Code is below: any help is appreciated, thank you, and sorry for the really long paragraph
func setDraftPositions(var standingsArray: [Team])->[Team]{
var lottery: [Team]=[]
var totalPossibilities: [[Int]]=combosOfLength(14, m: 4)
var tempPossibilities = []
var rand = Int(arc4random_uniform(UInt32(totalPossibilities.count)))
var draw = totalPossibilities[rand]
while (draw==(unused?.first)!) {
rand = Int(arc4random_uniform(UInt32(totalPossibilities.count)))
draw = totalPossibilities[rand]
}
s: for x in 0...13{
for a in 0...(standingsArray[x].possibilities?.count)!-1{
if(draw==standingsArray[x].possibilities![a]){
standingsArray[x].setDraftingPosition(1)
standingsArray[x].isLottery=true;
lottery.append(standingsArray[x])
for i in 0...(standingsArray.count-1) {
if(standingsArray[i].firstName != standingsArray[x].firstName!) {
for j in 0... (standingsArray[i].possibilities?.count)!-1{ //ERROR is happening here
tempPossibilities.append(standingsArray[i].possibilities![j])
}
}
}
standingsArray.removeAtIndex(x)
break s;
}
}
}
(repeat this for the next 2 picks)
Try this:
for j in 0...(standingsArray[i].possibilities?.count)!-1{
should be written like this:
for j in 0...(standingsArray[i].possibilities?.count)! - 1{
it needs proper spacing.

Categorizing Hastags based on similarities

I have different documents with a list of hashtags in each. I would like to group them under the most relevant hashtag (which would be present in the document itself).
Egs: If there are #Eco, # Ecofriendly # GoingGreen - I would like to group all these under the most relevant and representative Hashtag (say #Eco). How should I be approaching this and what techniques and algorithms should I be looking at?
I would create a bipartite graph of documents-hashtags and use clustering on a bipartite graph:
http://www.cs.utexas.edu/users/inderjit/public_papers/kdd_bipartite.pdf
This way I am not using the content of the document, but just clustering the hashtags, which is what you wanted.
Your question is not very strict, and as such may have multiple answers, however, if we assume that you literally want "I would like to group all these under the most common Hashtag", then simply loop through all hashtags, compute have often they come up, and then for each document select the one with highest number of occurences.
Something like
N = {}
for D in documents:
for h in D.hashtags:
if h not in N: N[h] = 0
N[h] += 1
for D in documents:
best = None
for h in D.hashtags:
if best==None or N[best] < N[h]:
best = h
print 'Document ',D,' should be tagged with ',best

Is it possible to make a nested FOREACH without COGROUP in PigLatin?

I want to use the FOREACH like:
a:{a_attr:chararray}
b:{b_attr:int}
FOREACH a {
res = CROSS a, b;
-- some processing
GENERATE res;
}
By this I mean to make for each element of a a cross-product with all the elements of b, then perform some custom filtering and return tuples.
==EDIT==
Custom filetering = res_filtered = FILTER res BY ...;
GENERATE res_filtered.
==EDIT-2==
How to do it with a nested CROSS no more no less inside a FOR loop without prior GROUP or COGROUP?
Depending on the specifics of your filtering, you may be able to design a limited set of disjoint classes of elements in a and b, and then JOIN on those. For example:
If your filtering rules are
if a_attr starts with "Foo" and b is 4, accept
if a_attr starts with "Bar" and b is greater than 17, accept
if a_attr begins with a letter in [m-z] and b is less than 0, accept
otherwise, reject
Then you can write a UDF that will return 1 for items satisfying the first rule, 2 for the second, 3 for the third, and NULL otherwise. Your CROSS/FILTER then becomes
res = JOIN a BY myUDF(a), b BY myUDF(b);
Pig drops null values in JOINs, so only pairs satisfying your filtering criteria will be passed.
CROSS generates a cross-product of all the tuples in each relation. So there is no need to have a nested FOREACH. Just do the CROSS and then FILTER:
a: {a_attr: chararray}
b: {b_attr: int}
crossed = CROSS a, b;
crossed: {a::a_attr: chararray,b::b_attr: int}
res = FILTER crossed BY ... -- your custom filtering
If you have the FILTER immediately after the CROSS, you should not have (unnecessary) excessive IO trouble from the CROSS writing the entire cross-product to disk before filtering. Records that get filtered will never be written at all.

How to calculate mean based on number of votes/scores/samples/etc?

For simplicity say we have a sample set of possible scores {0, 1, 2}. Is there a way to calculate a mean based on the number of scores without getting into hairy lookup tables etc for a 95% confidence interval calculation?
dreeves posted a solution to this here: How can I calculate a fair overall game score based on a variable number of matches?
Now say we have 2 scenarios ...
Scenario A) 2 votes of value 2 result in SE=0 resulting in the mean to be 2
Scenario B) 10000 votes of value 2 result in SE=0 resulting in the mean to be 2
I wanted Scenario A to be some value less than 2 because of the low number of votes, but it doesn't seem like this solution handles that (dreeve's equations hold when you don't have all values in your set equal to each other). Am I missing something or is there another algorithm I can use to calculate a better score.
The data available to me is:
n (number of votes)
sum (sum of votes)
{set of votes} (all vote values)
Thanks!
You could just give it a weighted score when ranking results, as opposed to just displaying the average vote so far, by multiplying with some function of the number of votes.
An example in C# (because that's what I happen to know best...) that could easily be translated into your language of choice:
double avgScore = Math.Round(sum / n);
double rank = avgScore * Math.Log(n);
Here I've used the logarithm of n as the weighting function - but it will only work well if the number of votes is neither too small or too large. Exactly how large is "optimal" depends on how much you want the number of votes to matter.
If you like the logarithmic approach, but base 10 doesn't really work with your vote counts, you could easily use another base. For example, to do it in base 3 instead:
double rank = avgScore * Math.Log(n, 3);
Which function you should use for weighing is probably best decided by the order of magnitude of the number of votes you expect to reach.
You could also use a custom weighting function by defining
double rank = avgScore * w(n);
where w(n) returns the weight value depending on the number of votes. You then define w(n) as you wish, for example like this:
double w(int n) {
// caution! ugly example code ahead...
// if you even want this approach, at least use a switch... :P
if (n > 100) {
return 10;
} else if (n > 50) {
return 8;
} else if (n > 40) {
return 6;
} else if (n > 20) {
return 3;
} else if (n > 10) {
return 2;
} else {
return 1;
}
}
If you want to use the idea in my other referenced answer (thanks!) of using a pessimistic lower bound on the average then I think some additional assumptions/parameters are going to need to be injected.
To make sure I understand: With 10000 votes, every single one of which is "2", you're very sure the true average is 2. With 2 votes, each a "2", you're very unsure -- maybe some 0's and 1's will come in and bring down the average. But how to quantify that, I think is your question.
Here's an idea: Everyone starts with some "baggage": a single phantom vote of "1". The person with 2 true "2" votes will then have an average of (1+2+2)/3 = 1.67 where the person with 10000 true "2" votes will have an average of 1.9997. That alone may satisfy your criteria. Or to add the pessimistic lower bound idea, the person with 2 votes would have a pessimistic average score of 1.333 and the person with 10k votes would be 1.99948.
(To be absolutely sure you'll never have the problem of zero standard error, use two different phantom votes. Or perhaps use as many phantom votes as there are possible vote values, one vote with each value.)

Resources