Mahout PreferenceArray userID uniqeness? - mahout

Does the userId of a preferenceArray need to be unique across all entries in a FastByIDMap?
I'm comparing two types of objects that contain similar traits however it's possible that their id (primary key) is not unique as it's two db tables.

Via the Mahout user mailing list, userID's in a FastByIDMap must be unique over the entire map.


Create search index from a property list

I am building a database of genes in neo4j, with the main gene name as each node's unique identifier. However, each gene can go by other names, and i want those names to be searchable in the database as well.
I saw there might be a way to do this by indexing the alternate names to relate to the primary name, but not sure how to go about this. The two options i have seen are:
create new :Alias nodes for each alternate gene name, relate them to the primary node, then index that relationship.
Add an array of the alternate names as a property of the primary node, and index it that way.
Is there a correct method for doing this? Thanks

How to count cases with the same ID but different variables in SPSS

I have a data set which has 4420 attendances to a medical department from 1120 people. Each person has a unique ID number and other columns are demographics and primary care provider. I want to filter the data so I can work out how many times each person attends the department and then analyse the data by demographics eg primary care provider or age. It shows whether each attendance is primary or duplicate but I can't figure out how to work out attendances per person.
If what you want to do is to count the number of times each person has visited (assuming each one is represented by a single row in the data), use the AGGREGATE command breaking on the ID variable to add the number of instances to the file as a new variable. In the menus, Data>Aggregate, move the ID variable into the box for Break Variable(s), check the box for Number of cases under Aggregated Variables, change the default N_BREAK to another name if you want, and click OK. That will add a new variable to the data with the number of instances for each unique ID.

If each DynamoDB item has unique schema, how can I query it?

I'm using DynamoDB to store my data. Each item has a name (primary key) and then unique attributes. How can I query by primary key, if the iOS DynamoDB sdk wants me to specify a model class (but each item is unique)? For example, I want to just input name (primary key), then the results will tell me what attributes that item has. Looking at aws's dynamodb sample for ios, you have to specify what these attributes are prior to the query, which I do not want to do. Is that the only way?
The examples you were looking at are for the Dynamo Mapper which is just one of the abstractions you can use to work with Dynamo. In fact it is a pretty high level one and it is convenient if all items have a limited set of known attributes.
But underneath Dynamo is a document database that only requireas items to have a key (that may optionally be composed of a partition and a sort key) but other than that you can definitely store and query each item with a different set of attributes.
Please have a look at the DynamoDB low level API ( which supports querying items by key, and then iterating over each item's attributes. In fact, items are treated as a key-value map where the key is the attribute name and the value is whatever you want to store for each attribute.

Entity Relationship Diagram: How to create a Yelp-kind of app with not just one price-range?

Im new to Rails and I'm in the middle of sketching up an ERD for my new app. A Yelp-sort of app, where a Client is sorted by price.
So I want one Client to have many priceranges - One Client can both have pricerange $ and Pricerange $$$$ for example. The priceranges are:
$ - $$ - $$$ - $$$$ - $$$$$
How would this look in a table? Would I create a table called PriceRange with Range1, Range2, Range3, Range4, Range5 to be booleans?
Doesn't the PriceRange-table need any foreign/primary keys?
Range1 (Boolean)
Range2 (Boolean)
Range3 (Boolean)
Range4 (Boolean)
Range5 (Boolean)
Look, I'm Brazilian and I'm not very knowledgeable about yelp applications. I do not quite know what it is, but from what I saw, they are systems to assess/measure/evaluate (perhaps the translation is wrong here for you) things, in this case, companies, right?
Following this logic, let's think...
By the description of your problem (context), you have clients (companies), and they can have price ranges, correct? If:
A price interval is represented by textual names, such as "$", "$$",
and so on,
and the same price range may have (numeric) values for different companies,
And the same price range (type) can be (or not) assigned to different
Then here is what we have:
By decomposing this conceptual model, you would end up with three tables:
Price Ranges
Price Ranges from Companies
The primary keys of Company and Price Ranges will be passed to Price Ranges from Companies as foreign keys. You can use them as a composite primary key, or use a surrogate key. If using a surrogate key, you will permit/allow a company to have the same kind of price range more than once, which I believe is not the case.
Let's look at another situation, if things are simpler as:
If there is no need to store prices,
and an company may have or not one or more price ranges represented by "$", "$$", and so on,
Then here is what we have:
Similarly, we'll have the same 3 tables. Likewise, you still must pass the primary keys of Companies and Price Ranges to Price Ranges from Companies as foreign keys.
So I want one Client to have many priceranges - One Client can both
have pricerange $ and Pricerange $$$$ for example
Notice how N-N relationships allow us to create optional relationships between entities. This will allow a company to have zero, one, two, (etc.) or all price ranges defined. Again, so that is not allowed a company to have a price range more than once, set the foreign keys as composite primary key in Price Ranges from Companies.
If you have any questions or anything I explained has nothing to do with your context, please do not hesitate to comment.
Is the Price ranges from companies what is called a Joint table?
Yes. There are also other terms used, some in different areas of computer science, such as Link Table, or Intermediate Table.
Actually we do not have a table here in the diagram, but an entity. In the Conceptual Model there are no tables, but entities and relationships. Be careful with this terminology when developing the Conceptual Model, or else you may get confused (I say this from experience).
However, yes, once decomposed, we will have a table from this relationship. When decomposed, N-N relationships will always become tables, no exception. Differently, 1-1 and 1-N (or N-1) relationships do not become tables. These tables with these special names (Join/Link/Intermediate Tables) serves to associate records from different tables, hence the name.
And is it necessary to have a column called Price Range Id? I mean
what is it there for?
At where? If you say at the Price Ranges entity, it is rather necessary. Must We not identify records in a table in some way? Here I set what is called a Surrogate Key. If on the other hand, you have a column with unique values for each record in the table, you can also use this column. I highly recommend that you consider the use of surrogate keys. Read the link I gave you.
In the Conceptual Model, we have to define the properties and also the primary keys. During the phase of the conceptual model, natural attributes of entities can become primary keys if you so desire. In this case, we have what is called a Natural Key.
If on the other hand you refer to Price Ranges from Companies entity, so the question is another ("And is it necessary to have a column called Price Range Id?"). Here we have a table with two columns, as I told you. The two are foreign keys. You need it so you can relate rows from the two tables... I think you were not referring to that, is not it? If so, no problem, you can comment and ask more questions. I do not care to answer. To be honest, I did not quite understand your question.
So that Company 28 can be identified in the Price Ranges (for instance
ID 40) Which would make it easier to call out the price ranges it has?
Maybe my English is not very good, but it seems to me that you have a beginner's doubt/question in relation to the concept of tables and relationships between them. If not that, I apologize because maybe I did not understand. But let's see...
The tables in a database have rows / records. Each line has its own data. Even with this, each line / record needs to be differentiated and identified somehow. That is why we attach to each line an identifier, known as the primary key (this, and this). In summary, the primary key is how we identify, differentiate, separate and organize different records.
Even if all records have different values, you must select a field (column) that represents the primary key of the table. By obligation, every record MUST have a primary key. Although you can choose which field is a primary key, you are allowed to choose one or more fields to serve as the primary key. When this happens, that is, when more than one field participates/serves as the primary key, we have a table with something called Composite Primary Key. Similarly, it has the ability to identify records. Note that, because of that, primary key values must be unique, otherwise you may have 2 identical records.
This is the basic concept so that we can relate tables to each other, in case, records/rows of tables together. If we have a Company identified by the ID 28 (a line/record), and we want to relate it to a Price Range identified by the ID 40, then we need to store somewhere that relationship (28 <--> 40). This is where the role of intermediate/link/join tables comes in (but only to relationships N-N! For 1-N or N-1 relationships it works similarly, but not identical).
My original question was whether it was necessary, and why a company
ID had to link up with a price range ID at all.
With this table storing records which relates to other records (for their primary keys), we can perform a SQL join operation (If you have questions about this, see this image). Depending on how you perform this operation, you'll get:
All companies that have Price Ranges.
All companies that do not have Price Ranges.
All the Price Ranges of a given company.
All companies that have or not a X Price Range.
All price ranges that are given or not to companies.
Anyway, you get all this because of the established relationship.
If it could just be taken out and then the table of price ranges would
only involve Pricerange1-5.
This sentence I did not understand. What should be taken out? Could you please explain this sentence better?

How unique are surveymonkey ids returned from the API?

Are survey monkey ids globally unique? Unique within an account?
I'm storing surveymonkey results in a relational database and I'm wondering if I can use single ids (e.g. answer_id) as primary keys, or whether I need to use composite keys (e.g. (survey_id, question_id, answer_id)).
The answer_id is globally unique among all SurveyMonkey answer_ids so a composite key is not required if you use that as your unique key for answers. The answer_id from one survey will not collide with any other answer_id from any SurveyMonkey survey. As sysmod mentioned, each ID type is in it's own domain so you can't count on them being unique across types.
I asked this a while back, see the archives. As I recall the answer is that any given table is unique but you can't assume they are globally unique in the sense that for example a respondentid should not ever duplicate a question id. One sequence may run over into another's some day.
