Rails/Mongoid: Parent object not recognising child of has_many/belongs_to relation after mongoimport - ruby-on-rails

A CSV containing the following rows is imported into mongodb via CSV using the mongoimport tool:
object_1_id,field1,field2
52db7f026956b996a0000001,apples,oranges
52db7f026956b996a0000001,pears,plums
The fields are imported into the collection Object2.
After import the rows are confirmed to exist via the console.
#<Object2 _id: 52e0713417bcabcb4d09ad12, _type: nil, field1: "apples", field2: "oranges", object_1_id: "52db7f026956b996a0000001">
#<Object2 _id: 52e0713517bcabcb4d09ad76, _type: nil, field1: "pears", field2: "plums", object_1_id: "52db7f026956b996a0000001">
Object2 can access Object1 via object_1_id:
> o = Object2.first
#<Object2 _id: 52e0713417bcabcb4d09ad12, _type: nil, field1: "apples", field2: "oranges", object_1_id: "52db7f026956b996a0000001">
> o1 = o.object_1
#<Object1 _id: "52db7f026956b996a0000001", other_fields: "text and stuff">
But Object1 cannot see any of the Object2 rows that were imported with mongoimport. It can see all rows that have been created via the console or other means:
> o1.object_2s.count
10
> o1.object_2s.find("52e0713417bcabcb4d09ad12")
Mongoid::Errors::DocumentNotFound:
Document not found for class Object2 with id(s) 52e0713417bcabcb4d09ad12.
TL;DR Object1 doesn't appear to recognise child models imported via mongoimport, despite the child correctly storing the parent ID and being able to identify its parent.

As per mu is too short's comment the ids were being imported as Strings instead of BSON ObjectIds.
mongoexport and mongoimport (I was only using the latter) only support strings and numbers (See: https://stackoverflow.com/a/15763908/943833).
In order to import data with type from a CSV you have to use Extended JSON dumps as explained in the above link.
Quick and dirty solution:
1) Export the collection you want to import as JSON using mongoexport:
mongoexport -d database -c collection -o output.json
2) Grab the first line of the export file. It should look something like this:
{ "_id" : { "$oid" : "52dfe0106956b9ee6e0016d8" }, "column2" : "oranges", "column1" : "apples", "object_1_id" : { "$oid" : "52dfe0106956b9ee6e0016d8" }, "updated_at" : { "$date" : 1390403600994 }, "created_at" : { "$date" : 1390403600994 } }
3) The _id field as well as any other fields you don't want to import.
4) Use your language of choice to generate a JSON file using the JSON snippet as a template for each line.
5) Import the new JSON file using mongoimport:
mongoimport -d database -c collection --type json --file modified.json
This will preserve types better than CSV is capable. I'm not sure whether it is as reliable as using mongodump and mongorestore but they aren't an option for me since my CSV file comes from elsewhere.

Related

Get Metadata Multiple Source File System Into Azure SQL Table

I have Multiple folder and files which from FileSystem (linked service) on Azure Data Factory. and my activity is references on link: https://www.sqlservercentral.com/articles/working-with-get-metadata-activity-in-azure-data-factory
for now I'm using process metadata FileName and LastModified per file like this:
and then i'm using stored-procedure on ADF like this:
ALTER PROCEDURE [dbo].[SP_FileSystemMonitoring]
(
-- Add the parameters for the stored procedure here
#FLAG int,
#FILE_NAME nvarchar(100),
#LAST_MODIFIED datetime
)
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON
-- Insert statements for procedure here
IF ( #FILE_NAME IS NOT NULL )
BEGIN
UPDATE [dwh].[FileSystemMonitoring]
SET STATUS = #FLAG,
PROCESS_DATE = DATEADD(HH, 7, Getdate()),
REPORT_DATE = DATEADD(hh,7,(DATEADD(dd,-1,GETDATE()))),
LAST_MODIFIED = #LAST_MODIFIED
WHERE FILE_NAME = #FILE_NAME
but, I want on 1 activity can get metadata on 1 folder and then then file that folder insert to Azure SQL Database, for example
folderA/file1.txt
folderA/file2.txt
On that Azure SQL Table like this:
--------------------------
File_Name | Last_Modified
--------------------------
file1.txt | 2021-12-19 13:45:56
file2.txt | 2021-12-18 10:23:32
I have no idea, because I'm confuse how to mapping on that sink on Azure SQL Table. Thanks before ...
Confused by your question, is it that you want to get the details of the file or folder from the get metadata activity? Or do you want to enumerate/store the child items of a root folder?
If you simply want to reference the items from Get Metadata, add a dynamic expression that navigates the output value to the JSON property you seek. For example:
#activity('Get Metadata Activity Name').output.lastModified
#activity('Get Metadata Activity Name').output.itemName
You can pass each of the above expressions as values to your stored procedure parameters. NOTE: 'Get Metadata Activity Name' should be renamed to the name of your activity.
The output JSON of this activity is like so and will grow depending on what you select to return in the Get Metadata activity. In my example I'm also including childItems.
{
"exists": true,
"lastModified": "2021-03-04T14:00:01Z",
"itemName": "some-container-name",
"itemType": "Folder",
"childItems": [{
"name": "someFilePrefix_1640264640062_24_12_2021_1640264640.csv",
"type": "File"
}, {
"name": "someFilePrefix_1640286000083_24_12_2021_1640286000.csv",
"type": "File"
}
],
"effectiveIntegrationRuntime": "DefaultIntegrationRuntime (Australia Southeast)",
"executionDuration": 0,
"durationInQueue": {
"integrationRuntimeQueue": 0
},
"billingReference": {
"activityType": "PipelineActivity",
"billableDuration": [{
"meterType": "AzureIR",
"duration": 0.016666666666666666,
"unit": "Hours"
}
]
}
}
if you want to store the child files, then you can either parse childItems as an nvarchar JSON value into your stored procedure and then enumerate the JSON array in SQL.
You could also use ADF an enumerate the same childItems property using a ForEach activity for each file. You simply enumerate over:
#activity('Get Metadata Activity Name').output.childItems
You can then call the SP for each file referencing the nested item as:
#item().name
You'll also still be able to reference any of the root parameters from the original get metadata activity within the ForEach activity.

Apollo ios codegen generates optional values

I'm trying with all the latest version of apollo-ios but i'd like to solve this one lingering problem: I keep getting optional values (see image below).
Here's what I've explored (but still can't find whyyy)
When I created the table, Nullable is false. Then, I create a view which is for public to access it.
With apollo schema:download command, here's the generated json: schema.json
With graphqurl command, here's the generated schema.graphql: schema.graphql. Here's the snippet.
"""
columns and relationships of "schedule"
"""
type schedule {
activity: String
end_at: timestamptz
id: Int
"""An array relationship"""
speakers(
"""distinct select on columns"""
distinct_on: [talk_speakers_view_select_column!]
"""limit the number of rows returned"""
limit: Int
"""skip the first n rows. Use only with order_by"""
offset: Int
"""sort the rows by one or more columns"""
order_by: [talk_speakers_view_order_by!]
"""filter the rows returned"""
where: talk_speakers_view_bool_exp
): [talk_speakers_view!]!
start_at: timestamptz
talk_description: String
talk_type: String
title: String
}
I am suspecting that it looks like id: Int is missing ! in the schema, is the cause of codegen interpreting it as optional. But I could be wrong too. Here's the repo for the complete reference https://github.com/vinamelody/MyApolloTest/tree/test
It's because Postgres marks view columns as explicitly nullable, regardless of the underlying column nullability, for some unknown reason.
Vamshi (core Hasura server dev) explains it here in this issue:
https://github.com/hasura/graphql-engine/issues/1965
You don't need that view though -- it's the same as doing a query:
query {
talks(
where: { activity: { _like: "iosconfig21%" } },
order_by: { start_at: "asc" }
}) {
id
title
start
<rest of fields>
}
Except now you have a view you need to manage in your Hasura metadata and create permissions for, like a regular table, on top of the table it's selecting from. My $0.02 anyways.
You can even use a GraphQL alias if you really insist on it being called "schedule" in the JSON response
https://graphql.org/learn/queries/

Jmeter ForEach Controller failing to write variables to file in order retrieved

Jmeter ForEach Controller failing to write variables in original order correctly
I am executing a http request retrieving a json payload with an array of employees. For each record (employee) I need to parse the record for specific fields e.g. firstName, lastName, PersonId and write to a single csv file, incrementing a new row per record.
Unfortunately, the file created has two issues. The PersonId never gets written and secondly the sequence of the values is not consistent with the returned original values. Sometimes I get the same record for lastName with the wrong firstName and vice versa. Not sure if the two issues are related, I suspect my regular expression extract is wrong for a number.
Jmeter setup. (5.2.1)
jmeter setUp
Thread group
+ HTTP Request
++ JSON JMESPath Extractor
+ ForEach Controller
++ Regular Expression Extractor: PersonId
++ Regular Expression Extractor: firstName
++ Regular Expression Extractor: lastName
++ BeanShell PostProcessor
getWorker returns the following payload
jsonPayload
JSON JMESPath Extractor to handle the payload.
{
"items" : [
{
"PersonId" : 398378,
"firstName" : "Sam",
"lastName" : "Shed"
},
{
"PersonId" : 398379,
"firstName" : "Bob",
"lastName" : "House"
}
],
"count" : 2,
"hasMore" : true,
"limit" : 2,
"offset" : 0,
"links" : [
{
"rel" : "self",
"href" : "https://a.site.on.the.internet.com/employees",
"name" : "employees",
"kind" : "collection"
}
]
}
JSON JMESPath Extractor Configuration
Name of created variables: items
JMESPath expressions: items
Match No. -1
Default Values: Not Found
ForEach Controller
ForEach Controller Configuration
Input variable prefix: items
Start Index: Empty
End Index: Empty
Output variable name: items
Add "_"? Checked
Each of the Regular Expression Extracts follow the same pattern as below.
Extract PersonId with Regular Expression
Apply to: Main Sample Only
Field to check: Body
Name of created variable: PersonId
Regular Expression: "PersonId":"(.+?)"
Template: $1$
Match No. Empty
Default Value: PersonId
The final step in the thread is where I write out the parsed results.
BeanShell PostProcessor
PersonNumber = vars.get("PersonNumber");
DisplayName = vars.get("DisplayName");
f = new FileOutputStream("/Applications/apache-jmeter-5.2.1/bin/scripts/getWorker/responses/myText.csv", true);
p = new PrintStream(f);
this.interpreter.setOut(p);
print(PersonId+", "+ PersonNumber+ ", " + DisplayName);
f.close();
I am new to this and looking either for someone to tell me where I screwed up or direct me to a place I can read up on the appropriate topics. (Both are fine). Thank you.
For Each Controller doesn't know the structure of items variable since it is in JSON format. It is capable of just understanding an array and traverses through them. I would suggest to move away from For Each Controller in your case and use the JSON extractor itself for all the values like below
Person ID
First Name
Last Name
Beanshell Sampler Code
import java.io.FileWriter; // Import the FileWriter class
int matchNr = Integer.parseInt(vars.get("personId_C_matchNr"));
log.info("Match number is "+matchNr);
f = new FileOutputStream("myText.csv", true);
p = new PrintStream(f);
for (int i=1; i<=matchNr; i++){
PersonId = vars.get("personId_C_"+i);
FirstName = vars.get("firstName_C_"+i);
LastName = vars.get("lastName_C_"+i);
log.info("Iteration is "+i);
log.info("Person ID is "+PersonId);
log.info("First Name is "+FirstName);
log.info("Last Name is "+LastName);
p.println(PersonId+", "+FirstName+", "+LastName);
}
p.close();
f.close();
Output File
HOW THE ABOVE ACTUALLY WORKS
When you extract values using the matchNr, it goes in a sequential order in which the response has arrived. For example, in your case, Sam & Shed appear as first occurrences and Bob & House appear as subsequent occurrences. Hence JMeter captures them with the corresponding match and stores them as 1st First Name = Sam, 2nd First Name = Bob and so on.
GENERIC STUFF
The regex expression for capturing Person ID which you have used seems to be inaccurate. The appropriate one would be
"PersonId" :(.+?),
and not
"PersonId":"(.+?)"
Move to JSR223 processors instead of Beanshell as they are more performant. Source: Which one is efficient : Java Request, JSR223 or BeanShell Sampler for my script. The migration is pretty simple. Just copy the code that you have in Beanshell and paste it in JSR223.
Close any stream or writer that is open appropriately else it might cause issues when other users are trying to write to the file during load test
In case you are planning to use this file as a subsequent input within JMeter, please note that there is a space between comma and the next element. For example, it is "Sam, Shed" and not "Sam,Shed".JMeter by default does not trim any spaces and will use the value just like that. Hence you might want to take a judicious call regarding that space
Hope this helps!
Since JMeter 3.1 you shouldn't be using Beanshell, go for JSR223 Test Elements and Groovy language for scripting.
Given Groovy has built-in JSON support you shouldn't need any extractors, you can write the data into a file in a single shot like:
new groovy.json.JsonSlurper().parse(prev.getResponseData()).items.each { item ->
new File('myText.csv') << item.get('PersonId') << ',' << item.get('firstName') << ',' << item.get('lastName') << System.getProperty('line.separator')
}
More information: Apache Groovy - Why and How You Should Use It

How to store each key or element from array to database in rails

I'm stuck on trying to store keys or elements from a returned array into the database. It looks something like:
{"id"=>"28898790358_10152709083080359",
"from"=>{"category"=>"Tv channel",
"category_list"=>[{"id"=>"169056916473899",
"name"=>"Broadcasting & Media Production"}],
"name"=>"WGRZ - Channel 2, Buffalo",
"id"=>"28898790358"}
I'm trying to grab stuff like 'id', 'category_list', all of their values and store it into a column of a database table.
I've got it before but this time only certain values get in and usually I get an error:
undefined method[]
You're getting the undefined method [] error when you try to use the [] method on a class that doesn't have that method defined. My guess is that you are running into this problem while parsing the above hash (e.g. calling [] on Nil Class, and getting the error undefined method [] for nil:NilClass).
Long Story Short: You need to parse the JSON and save the items you need to the appropriate spots in the database
I'm a little unclear about the example you gave, so I'll try to use a simplified one.
Suppose you are querying for a TvShow, which belongs_to a TvChannel. TvChannel also has_many_and_belongs_to a certain Category. So perhaps this query you are doing returns the TvShow, along with the TvChannel and Category to which the TvShow belongs. Suppose you get back something like the json below:
json_response = { 'name': 'Desperate Housewives',
'id': '12345-67890',
'tvChannel': {
'name': 'ABC',
'id': '123-456-789',
'categories': [
{ 'name': 'Comedy',
'id': '000-000' },
{ 'name': 'Drama',
'id': '111-111' }
]
}
}
So now you want to save the TvShow Desperate Housewives to the TvChannel ABC, and save ABC to the Categories Comedy and Drama. This is how I would go about parsing:
new_show = TvShow.new
new_show.name = json_response["name"]
new_show.external_id = json_response["id"] # assuming you want to save their identifier
channel = TvChannel.find_or_create_by(:name, json_response["tvChannel"]["name"]) # you might want to do this by the id, but it's up to you.
new_show.tv_channel = channel # this saves the TvChannel's ID to the new_show's tv_channel_id foreign key field
new_show.save # save your new TV Show to the database, with the name, id, and tv_channel_id.
# Next we parse the "categories" part of the JSON, finding each Category by name (creating a new Category if one doesn't exist by that name),
# And then adding our TvChannel to that Category
json_response["tvChannel"]["categories"].each do |category|
c = Category.find_or_create_by(:name, category["name'])
channel.categories << c
channel.save
end
I hoped this helped show you how to parse JSON responses, and how to use ActiveRecord to create or find instances in your database. It is useful to store the JSON response to a variable (json_response in my example above), for easier parsing.
Let me know if I can clarify anything!

mongodb: converting object ID's to BSON::ObjectId

We have recently upgraded our application to Rails3 and we are now using Mongoid, and we have a problem retrieving previous documents from mongo-db by _id.
Upon closer investigation, an old record (which I can't retrieve by _id) look as follows:
#<Audit::Log _id: 4d892bfe6bcaff4ffd000001,
failed: nil, request_id: "68ccb38e9e345bb7fc55331389a902a1",
session_id: "54940ff7e8c7336d813a872db7cb7bc0",
_id: "4d892bfe6bcaff4ffd000001", ... }>
and a good record has the following structure:
#<Audit::Log _id: 4d892bfe6bcaff4ffd000001,
failed: nil, request_id: "68ccb38e9e345bb7fc55331389a902a1",
session_id: "54940ff7e8c7336d813a872db7cb7bc0",
_id: BSON::ObjectId('4d892bfe6bcaff4ffd000001'), ... }>
As you can see, the _id field is different. For the old records it seems to be just a string, and for the new records it is a BSON::ObjectID.
I can't seem to be able to retrieve the records with the old format. Trying to find the records using
Audit::Log.where(:_id => BSON::ObjectId('4d892bfe6bcaff4ffd000001')).first
Audit::Log.where(:_id => '4d892bfe6bcaff4ffd000001').first
both return nil.
But for the good record, I can just do
Audit::Log.where(:'_id' => '4e14152d6bcaff26bb000039').first
I am guessing, but maybe Mongoid automatically converts the string to an ObjectId to find on _id? The only fix I see would be to convert
all the _id-fields to BSON::ObjectId if they are not already. But how do I do that?
Or do you have a better approach?
All of these will work, provided the record actually exists:
Account.where(:_id => "4e0a9c6142f5bc769f000008").first
Account.find(BSON::ObjectId("4e0a9c6142f5bc769f000008"))
Account.find("4e0a9c6142f5bc769f000008")
I'm interested in the JSON returned about a Audit::Log... Why are there two _id fields returned?
#<Audit::Log _id: 4d892bfe6bcaff4ffd000001,
failed: nil, request_id: "68ccb38e9e345bb7fc55331389a902a1",
session_id: "54940ff7e8c7336d813a872db7cb7bc0",
_id: "4d892bfe6bcaff4ffd000001", ... }>
You may want to drop to the mongo driver and see if this log truly exists in the database. Unless you are declaring another "_id" field in the audit_log.rb, I believe this record does not exist.
Ha, I should have looked further at the Mongoid documentation. They have a section describing how to upgrade. In the section to upgrade to 2.0.0.BETA.11 + they describe that the _id's now no longer are String and they point to this gist to convert all your ids from string to ObjectId.
Ex: Here is an id
1.9.3-p125 :096 > profile_id
=> “4fe969dd79216d0af9000002″
1.9.3-p125 :099 > BSON::ObjectId.from_string(profile_id)
=> BSON::ObjectId(’4fe969dd79216d0af9000002′)

Resources