MongoDB Export BinData - mongoexport

i need to export a collection that have a field "Data" like :
"Data" : BinData(0,"AAAAAAAAxFBQKFgBBvodksodfosdqgDw...")
when i export with mongoexport in json output file the field became :
"Data":
{"$binary":
{"base64":"AAAAAAAAxFBQKFgBBvodksodfosdqgDw...","subType":"00"}
}
is there a way to force
}
is there a way to force to create .json in the mongoexport command to generate the output tag of "Data" like this ?: BinData(0,"AAAAAAAAxFBQKFgBBvodksodfosdqgDw...")

Related

Get Metadata Multiple Source File System Into Azure SQL Table

I have Multiple folder and files which from FileSystem (linked service) on Azure Data Factory. and my activity is references on link: https://www.sqlservercentral.com/articles/working-with-get-metadata-activity-in-azure-data-factory
for now I'm using process metadata FileName and LastModified per file like this:
and then i'm using stored-procedure on ADF like this:
ALTER PROCEDURE [dbo].[SP_FileSystemMonitoring]
(
-- Add the parameters for the stored procedure here
#FLAG int,
#FILE_NAME nvarchar(100),
#LAST_MODIFIED datetime
)
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON
-- Insert statements for procedure here
IF ( #FILE_NAME IS NOT NULL )
BEGIN
UPDATE [dwh].[FileSystemMonitoring]
SET STATUS = #FLAG,
PROCESS_DATE = DATEADD(HH, 7, Getdate()),
REPORT_DATE = DATEADD(hh,7,(DATEADD(dd,-1,GETDATE()))),
LAST_MODIFIED = #LAST_MODIFIED
WHERE FILE_NAME = #FILE_NAME
but, I want on 1 activity can get metadata on 1 folder and then then file that folder insert to Azure SQL Database, for example
folderA/file1.txt
folderA/file2.txt
On that Azure SQL Table like this:
--------------------------
File_Name | Last_Modified
--------------------------
file1.txt | 2021-12-19 13:45:56
file2.txt | 2021-12-18 10:23:32
I have no idea, because I'm confuse how to mapping on that sink on Azure SQL Table. Thanks before ...
Confused by your question, is it that you want to get the details of the file or folder from the get metadata activity? Or do you want to enumerate/store the child items of a root folder?
If you simply want to reference the items from Get Metadata, add a dynamic expression that navigates the output value to the JSON property you seek. For example:
#activity('Get Metadata Activity Name').output.lastModified
#activity('Get Metadata Activity Name').output.itemName
You can pass each of the above expressions as values to your stored procedure parameters. NOTE: 'Get Metadata Activity Name' should be renamed to the name of your activity.
The output JSON of this activity is like so and will grow depending on what you select to return in the Get Metadata activity. In my example I'm also including childItems.
{
"exists": true,
"lastModified": "2021-03-04T14:00:01Z",
"itemName": "some-container-name",
"itemType": "Folder",
"childItems": [{
"name": "someFilePrefix_1640264640062_24_12_2021_1640264640.csv",
"type": "File"
}, {
"name": "someFilePrefix_1640286000083_24_12_2021_1640286000.csv",
"type": "File"
}
],
"effectiveIntegrationRuntime": "DefaultIntegrationRuntime (Australia Southeast)",
"executionDuration": 0,
"durationInQueue": {
"integrationRuntimeQueue": 0
},
"billingReference": {
"activityType": "PipelineActivity",
"billableDuration": [{
"meterType": "AzureIR",
"duration": 0.016666666666666666,
"unit": "Hours"
}
]
}
}
if you want to store the child files, then you can either parse childItems as an nvarchar JSON value into your stored procedure and then enumerate the JSON array in SQL.
You could also use ADF an enumerate the same childItems property using a ForEach activity for each file. You simply enumerate over:
#activity('Get Metadata Activity Name').output.childItems
You can then call the SP for each file referencing the nested item as:
#item().name
You'll also still be able to reference any of the root parameters from the original get metadata activity within the ForEach activity.

Mysql 8 - load geojson values using load data infile

How can I import geojson values in a mysql column using load data infile? for example I have a file as follows (simplified for brevity):
{"type":"FeatureCollection", "features": [
{"type":"Feature","geometry":{"type":"Polygon","coordinates":
[
[
[31.287890625000017,-22.40205078125001],
[31.429492187500017,-22.298828125],
[32.37109375,-21.33486328125001]
]
] },"properties":{"NAME":"Zimbabwe"}
},
{"type":"Feature","geometry":{"type":"Polygon","coordinates":
[
[
[30.39609375,-15.64306640625],
[30.3505859375,-15.349707031250006],
[30.231835937500023,-14.990332031250006]
]
]},"properties":{"NAME":"Zambia"}
}
]
}
Currently when I do the following:
LOAD DATA LOCAL INFILE 'C:/Users/Downloads/countries.geojson' INTO TABLE countries (geo_json);
I get error:
Invalid JSON text: "Invalid value." at position 2668 in value for column 'countries.geo_json'
How can I load each feature into a table which has a json column called geo_json. Then I want to extract the name for each feature and add to a name column?

Flume morphline interceptor-split command

Hi I'm trying to use morphline inteceptor and convert my syslog to JSON
for start i tried to use split command for splitting my string ,but im getting error as below:
"" Source r1 has been removed due to an error during configuration
com.typesafe.config.ConfigException$WrongType: /root/flume.17/conf/morph.conf: 21: Cannot concatenate object or list with a non-object-or-list, ConfigString("split") and SimpleConfigObject({"outputFields":"substrings","inputField":"message","addEmptyStrings":false,"isRegex":false,"trim":true,"separator":" "}) are not compatible""
my morphline configuration file:
morphlines : [
{
# Name used to identify a morphline. E.g. used if there are multiple
# morphlines in a morphline config file
id : morphline1
# Import all morphline commands in these java packages and their
# subpackages. Other commands that may be present on the classpath are
# not visible to this morphline.
importCommands : ["org.kitesdk.**"]
commands : [
{
# Parse input attachment and emit a record for each input line
readLine {
charset : UTF-8
}
}
,split {
inputField : message
outputFields : "substrings"
separator : " "
isRegex : false
addEmptyStrings : false
trim : true }
}
}
]
}
]
what do i have to do?I'm new to this
From morhpline documentation
outputField - The name of the field to add output values to, i.e. a single string. Example: tokens. One of outputField or outputFields must be present, but not both.
outputFields - The names of the fields to add output values to, i.e. a list of strings. Example: [firstName, lastName, "", age]. An empty string in a list indicates omit this column in the output. One of outputField or outputFields must be present, but not both.
So you should just specify
outputField : substrings
instead of
outputFields : "substrings"
http://kitesdk.org/docs/1.1.0/morphlines/morphlines-reference-guide.html#split

ElasticSearch: Altering indexed version of text

Before the text in a field is indexed, I want to run code on it to transform it, basically what's going on here https://www.elastic.co/guide/en/elasticsearch/reference/master/gsub-processor.html (but that feature isn't out yet).
For example, I want to be able to transform all . in a field into - for the indexed version.
Any advice? Doing this in elasticsearch-rails.
Use a char_filter where you replace all . into - but this will change the characters of the indexed terms, not the _source itself. Something like this:
"char_filter" : {
"my_mapping" : {
"type" : "mapping",
"mappings" : [
". => -"
]
}
}
or use Logstash with mutate and gsub filter to pre-process the data before being sent to Elasticsearch. Or you do it in your own indexer (whatever that is).

Rails/Mongoid: Parent object not recognising child of has_many/belongs_to relation after mongoimport

A CSV containing the following rows is imported into mongodb via CSV using the mongoimport tool:
object_1_id,field1,field2
52db7f026956b996a0000001,apples,oranges
52db7f026956b996a0000001,pears,plums
The fields are imported into the collection Object2.
After import the rows are confirmed to exist via the console.
#<Object2 _id: 52e0713417bcabcb4d09ad12, _type: nil, field1: "apples", field2: "oranges", object_1_id: "52db7f026956b996a0000001">
#<Object2 _id: 52e0713517bcabcb4d09ad76, _type: nil, field1: "pears", field2: "plums", object_1_id: "52db7f026956b996a0000001">
Object2 can access Object1 via object_1_id:
> o = Object2.first
#<Object2 _id: 52e0713417bcabcb4d09ad12, _type: nil, field1: "apples", field2: "oranges", object_1_id: "52db7f026956b996a0000001">
> o1 = o.object_1
#<Object1 _id: "52db7f026956b996a0000001", other_fields: "text and stuff">
But Object1 cannot see any of the Object2 rows that were imported with mongoimport. It can see all rows that have been created via the console or other means:
> o1.object_2s.count
10
> o1.object_2s.find("52e0713417bcabcb4d09ad12")
Mongoid::Errors::DocumentNotFound:
Document not found for class Object2 with id(s) 52e0713417bcabcb4d09ad12.
TL;DR Object1 doesn't appear to recognise child models imported via mongoimport, despite the child correctly storing the parent ID and being able to identify its parent.
As per mu is too short's comment the ids were being imported as Strings instead of BSON ObjectIds.
mongoexport and mongoimport (I was only using the latter) only support strings and numbers (See: https://stackoverflow.com/a/15763908/943833).
In order to import data with type from a CSV you have to use Extended JSON dumps as explained in the above link.
Quick and dirty solution:
1) Export the collection you want to import as JSON using mongoexport:
mongoexport -d database -c collection -o output.json
2) Grab the first line of the export file. It should look something like this:
{ "_id" : { "$oid" : "52dfe0106956b9ee6e0016d8" }, "column2" : "oranges", "column1" : "apples", "object_1_id" : { "$oid" : "52dfe0106956b9ee6e0016d8" }, "updated_at" : { "$date" : 1390403600994 }, "created_at" : { "$date" : 1390403600994 } }
3) The _id field as well as any other fields you don't want to import.
4) Use your language of choice to generate a JSON file using the JSON snippet as a template for each line.
5) Import the new JSON file using mongoimport:
mongoimport -d database -c collection --type json --file modified.json
This will preserve types better than CSV is capable. I'm not sure whether it is as reliable as using mongodump and mongorestore but they aren't an option for me since my CSV file comes from elsewhere.

Resources