Join two csv files based on name field - join

I have two tables with id and name. I want to join them based on the name field and output id and id.
File1:
id, name
a0N3000000A0JNaEAN,Guarda Val
a0Na000000G8CCfEAN,Bentleys House
a0Na000000EQVg6EAH,Alpine Lodge Resort
a0N30000007LwcaEAC,Kulm Hotel
File2:
id, name
a0BQ00000013OeSMAU,Guarda Val
a0BQ00000013OeBMAU,Bentleys House
a0BQ00000013OeVMAU,Alpine Lodge Resort
a0BQ0000001xlQoMAI,Kulm Hotel
What I wish to see is:
id.1, id.2
a0N3000000A0JNaEAN,a0BQ00000013OeSMAU
a0Na000000G8CCfEAN,a0BQ00000013OeBMAU
a0Na000000EQVg6EAH,a0BQ00000013OeVMAU
a0N30000007LwcaEAC,a0BQ0000001xlQoMAI
I have tried to scribble something, but the closest I've got was this:
join -t, -a1 -a2 -1 2 -2 2 -o '0,1.2' <(sort sandees.1.csv) <(sort prodees.1.csv)
Which just prints out the names. For the record, I am using OS X 10.8. I have seen that join behaviour might vary between different OSes.
Thanks

The first thing to do is to get your input files as files; after that is working, you can use the fancier syntax.
The next thing is to do the sorting the way that join requires, namely sort each file on the second field instead of the first field. You need to use:
sort -t, -k2 sandees1.csv >sandees1_sorted.csv
sort -t, -k2 prodees1.csv >prodees1_sorted.csv
Your output format, -o specifies the key plus the second field of the first file. You said you wanted the first field of each file.
join -t, -a1 -a2 -1 2 -2 2 -o '1.1,2.1' sandees1_sorted.csv prodees1_sorted.csv
will produce the desired result.

Related

Model.group(:id) throws an error "Select list is not in GROUP BY clause contains non aggregated column "id"

I am using Model.group(:category) in order to get unique records based on category field in Rails 5 application.
Table data:
id catgeory description
1 abc test
2 abc test1
3 abc test2
4 xyz test
5 xyz testabc
I want records (1,4) as a result. Therefore I am using Model.group(:category) which works fine for MYSQL whose sql_mode is " " .
Unforunately its throwing an error "SELECT list is not in GROUP BY clause and contains nonaggregated column which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by" whose sql_mode is "only_full_group_by".
whats the best way to change the query to match the mode?
Perhaps try specifying which id you want? You could use MIN(id), MAX(id) etc.
MySQL supports a non-standard extension to SQL described here. To continue using that behavior, you could change the sql_mode to TRADITIONAL in config/database.yml.

SPSS - Rank and Partition

In SPSS Statistics Syntax File, I am looking to create a variable that calculates rank based on a desired partitioned column (e.g. equivalent to SQL "rank over (partition by column_a order by column b)" in Oracle SQL developer).
Please see the example:
Initial data without any filters:
Final output after applying get_rank:
To create a rank variable as described, first sort your data and then use the LAG function.
SORT CASES BY column_a column_b .
compute rank=1 .
IF ($CASENUM>1 AND column_a=LAG(column_a)) rank=LAG(rank) + 1 .
EXE .
LAG will look at the value of column_a for the prior case. In the syntax above it checks whether the value in column_a is different from that of the prior case.
If it has, then it will set the rank to 1. If it hasn't, then it will add 1 to the rank of the prior case. Just make sure your data is properly sorted first.
From there, if you want to look only at records that are rank=1, you can either use FILTER BY or SELECT IF to do that.
If indeed you only need the key to filter for key=1 then you can use this:
SORT CASES BY column_a column_b .
match files /file=* /by column_a /first=key1.
Now variable key1 will have value 1 for every first occurence of a column_a category, and you can use it to filter or select.
For a full ranking variable you can use this (don't even need to sort first):
RANK VARIABLES=b (A) BY a /RANK /TIES=MEAN.

Sqlite3: Selecting from multiple tables without duplicates

I've got three tables:
paper: items: attachments:
============ ============== ==============
jkey | title itemID | jkey* itemID* | path
*foreign key from another table
I'm trying to retrieve the title of all papers and their associated attachment paths, for all papers that have attachments.
Current attempt is:
SELECT paper.title,attachments.path IN paper,attachments
WHERE paper.jkey IN (
SELECT items.jkey FROM items,attachments
WHERE items.itemID = attachments.itemID
);
Unfortunately this just seems to print gibberish (the same path for different titles and vice versa).
What am I doing wrong?
If you want to join, you should use joins:
SELECT paper.title,
attachments.path
FROM paper
JOIN items USING (jkey)
JOIN attachments USING (itemID);
To omit duplicate rows, use SELECT DISTINCT ... instead.

How to show same column in dbgrid with different criteria

i need your help to finish my delphi homework.
I use ms access database and show all data in 1 dbgrid using sql. I want to show same column but with criteria (50 record per column)
i want select query to produce output like:
No | Name | No | Name |
1 | A | 51 | AA |
2 | B | 52 | BB |
3~50 | | 53~100| |
Is it possible ?
I can foresee issues if you choose to return a dataset with duplicate column names. To fix this, you must change your query to enforce strictly unique column names, using as. For example...
select A.No as No, A.Name as Name, B.No as No2, B.Name as Name2 from TableA A
join TableB B on B.Something = A.Something
Just as a note, if you're using a TDBGrid, you can customize the column titles. Right-click on the grid control in design-time and select Columns Editor... and a Collection window will appear. When adding a column, link it to a FieldName and then assign a value to Title.Caption. This will also require that you set up all columns. When you don't define any columns here, it automatically returns all columns in the query.
On the other hand, a SQL query may contain duplicate field names in the output, depending on how you structure the query. I know this is possible in SQL Server, but I'm not sure about MS Access. In any case, I recommend always returning a dataset with unique column names and then customizing the DB Grid's column titles. After all, it is also possible to connect to an excel spreadsheet, which can very likely have identical column names. The problem arrives when you try to read from one of those columns for another use.

rails user-defined custom columns

I am using Ruby on Rails 4 and MySQL. I have three types. One is Biology, one is Chemistry, and another is Physics. Each type has unique fields. So I created three tables in database, each with unique column names. However, the unique column names may not be known before hand. It will be required for the user to create the column names associated with each type. I don't want to create a serialized hash, because that can become messy. I notice some other systems enable users to create user-defined columns named like column1, column2, etc.
How can I achieve these custom columns in Ruby on Rails and MySQL and still maintain all the ActiveRecord capabilities, e.g. validation, etc?
Well you don't have much options, your best solution is using NO SQL database (at least for those classes).
Lets see how can you work around using SQL. You can have a base Course model with a has_many :attributes association. In which a attribute is just a combination of a key and a value.
# attributes table
| id | key | value |
| 10 | "column1" | "value" |
| 11 | "column1" | "value" |
| 12 | "column1" | "value" |
Its going to be difficult to determin datatypes and queries covering multiple attributes at the same time.

Resources