Okay this might be very basic but here goes:
I have 23 "campaign" types, each of which contain a number of locales (UK, India, etc). Each of these, such as "Campaign 1, UK" are either "Open" or "Closed" and have a priority assigned to them. They also have a required client number, so "Campaign 1, UK" has a required client number of 5, "Campaign 1, India" has 7, "Campaign 2, UK" has 14 and so on and so forth.
If I have a list containing different locales client numbers such as 14 for "UK", 54 for "India", is it possible to:
If UK is "open", gather all of the campaign types and assign the 14 UK clients to different campaigns based on the priority and required client number?
I understand this probably makes no sense but can clarify if needed.
Thanks in advance!
Related
Question
Is there a fast, scalable way to replace number values by mapped text labels in my visualisations?
Background
I often find myself with questionnaire data of the following format:
ID Sex Age class Answer to question
001 1 2 5
002 2 3 2
003 1 3 1
004 2 5 1
The Sex, Age class and Answer column values actually map to text labels. For the example of Sex:
ID Description
0 Unknown
1 Man
2 Woman
Similar mappings are possible for the other columns.
If I create visualisations of e.g. the distribution of sex in my respondent group I'll get a visual showing that 50% of my data has sex 1 and 50% of my data has sex 2.
The data itself often originates from an Excel or csv file.
What I have tried
To make that visualisation meaningful to other people I:
create a second table containing the mapping between the value and label
create a relationship between the source data and the mapping
use the Description column of my mapping table as a category in my visualisations.
I have to do this for several columns in my dataset, which makes this a tedious process.
Ideal solution
A method that allows me to define, per column, a mapping between values and corresponding text labels. SPSS' VALUE LABELS command comes to mind.
You can simply create a calculated column on your table that defines how you want to map each ID values using a SWITCH function and use that column in your visual. For example,
Sex Label =
SWITCH([Sex],
1, "Man",
2, "Woman",
"Unknown"
)
(Here, the last argument is an else condition that gets returned if none of the previous get matched.)
If you want to do a whole bunch at a time, you can create a new table from your existing table using ADDCOLUMNS like this:
Test =
ADDCOLUMNS(
Table1,
"Sex Label", SWITCH([Sex], 1, "Man", 2, "Woman", "Unknown"),
"Question 1 Label", SWITCH([Question 1], 1, "Yes", 2, "No", "Don't Know"),
"Question 2 Label", SWITCH([Question 2], 1, "Yes", 2, "No", "Don't Know"),
"Question 3 Label", SWITCH([Question 3], 1, "Yes", 2, "No", "Don't Know")
)
Well i have problem with my data
This is my healthcare database
(Name, Value1, Value2, Value3, Value4)
Jhon 10, 20, 30, 40
Jhon 9, 12, 21, 33
Noah 8, 22, 18, 10
Anna 9, 19, 29, 32
Clark 11, 4, 17, 20
In healthcare database one person can ill two times, three times or more as you can see the example of my database there is two jhon's who have two records because he ill twice
The purpose i use k-means is to get two cluster (cluster 1 : group 1, cluster 2 : group 2) with their member
And i want get output like this :
Group 1 : jhon, clark
Group 2 : noah, anna, jhon
You see there is two jhon's, one member can be to group 1 and group 2, so how i can fix this problem ??
K-means works by iterations between pairs of steps. You basically alternate between
assuming you know the mapping of instances to clusters, and calculate the cluster centers
assuming you know the cluster centers, assign instances to clusters
Thus if you have constraints, e.g., that all jhon (sic) should belong to the same cluster, you can incorporate this into step 2: you'll need to find the cluster for which the simultaneous assignment of all of them is the most likely.
See Constrained k-means clustering with background for details.
I use emergency tweets from the netherlands for a project. There are sometimes more than one tweet regarding one event, varying slightly in timestamp and in the string of the tweet itself. I want to delete those "duplicates".
So, In my database if have rows which are quite alike but not exactly the same like
"2014-01-11 10:01:17";"HV 1 METINGEN (+Inc,net: 1+) (KLEIN OGS) (slachtoffers: ) , Van Ostadestraat 332 AMSTERDAM [ ] "
"2014-01-11 09:59:06";"HV 1 METINGEN (+Inc,net: 1+) (KLEIN OGS) (slachtoffers:1) , Van Ostadestraat 332 AMSTERDAM ] "
The Problem is that i have to take into account the temporal aspect and can't just rely on the string. The text can occur multiple times.
Ideal would be an approach where i delete all rows within a temporal buffer of 10 minutes after the first tweet, when the text similarity is over a threshold of 0.75.
for the string comparison i tried similarity(text,text) see
http://www.postgresql.org/docs/9.1/static/pgtrgm.html
for the time aggregation i used :
(extract(minute FROM timestamp_column)::int / 10)
in addition to the regular YYYY-MM-DD-HH24 time aggregation
Any help is appreciated.
I'm doing some text processing and I'm interested in finding and scoring paragraphs of text based on frequency of words and/or phrases, using Ruby ideally.
An example of the problem would be: I have "apple", "banana", "fruit salad", and "orange". This list is likely to be several thousand words and/or phrases long.
I have a body of text to search:
I have a set of apples, and apple computer, and an account on Apple.com but never a fruit salad. Why they never released an Apple Computer that doubled as an orange was beyond me.
This would spit out an array that said:
Apple 4
Orange 1
Banana 0
Fruit salad 1
Ideally, I'd be able to apply different weights, like the domain "apple.com" gets two points, etc.
Is there a library that is particularly useful for doing this?
text = <<_.downcase
I have a set of apples, and apple computer, and an account on Apple.com. Why they never released an Apple Computer that doubled as an orange was beyond me.
_
["apple", "banana", "fruit salad", "orange"]
.map{|w| [w, text.scan(/\b#{w}\b/).length]}
# => [
# ["apple", 3],
# ["banana", 0],
# ["fruit salad", 0],
# ["orange", 1]
# ]
Very easy way to do this is to have a hash of counts, where the key is the word, and the value is incremented on each word occurrence.
Once you have built your hash then you can easily print out the counts of each words such as Apple, Orange, Banana. If case doesn't matter then make sure you convert each word to lower case before using it as the key.
it looks like your are trying to count the term frequency, try this package https://github.com/reddavis/TF-IDF
I need to create subscale scores for 4 subscales of the REI: REI_Appear; REI_Hlth; REI_Mood; REI_Enjoy. The items comprising each subscale are as follows:
Appearance (9 items): 1, 5, 9, 13, 16, 17, 19, 21, 24
Health (8 items): 3, 6, 8, 15, 18, 20, 22, 23
Mood (4 items): 2, 7, 12, 14
Enjoyment (3 items): 4, 10, 25
For example, I have placed REI_Appear in the target variable but then im unsure of what to place in the numeric expression section for it to work?
There are several important issues.
Do you want means or sums or some other composite?
Do any items need reversing?
How do you want to handle missing data?
Assuming you want means, there are no items needing reversing, and you want a participant to have at least 3 items to get a score, then you could use:
compute REI_appear = mean.3(item1, item5, ..., item24).
EXECUTE.
where you replace item1 etc. with the relevant variable names.
I have an existing post dedicated to the topic of computing scale scores for psychological tests which discusses some of these issues further.