How to generate the pyarrow schema for the dynamic values - google-cloud-dataflow

I am trying to write a parquest schema for my json message that needs to be written back to a GCS bucket using apache_beam
My json is like below:
data = {
"name": "user_1",
"result": [
{
"subject": "maths",
"marks": 99
},
{
"subject": "science",
"marks": 76
}
],
"section": "A"
}
result array in the above example can have many value minimum is 1.

This is the schema you need:
import pyarrow as pa
schema = pa.schema(
[
pa.field("name", pa.string()),
pa.field(
"result",
pa.list_(
pa.struct(
[
pa.field("subject", pa.string()),
pa.field("marks", pa.int32()),
]
)
),
),
pa.field("section", pa.string()),
]
)
If you have a file containing one record per line:
{"name": "user_1", "result": [{"subject": "maths", "marks": 99}, {"subject": "science", "marks": 76}], "section": "A"}
{"name": "user_2", "result": [{"subject": "maths", "marks": 10}, {"subject": "science", "marks": 75}], "section": "A"}
You can load it using:
from pyarrow import json as pa_json
table = pa_json.read_json('filename.json', parse_options=pa_json.ParseOptions(explicit_schema=schema))

Related

Post HTTP Requests cURL with EmailOctopus

As a marketer, I'm going through the EmailOctopus (email service provider) API docs (https://emailoctopus.com/api-documentation) and have trouble combining multiple requests in one.
Goal: Get all campaign reports for all campaigns exported to a CSV.
Step 1: Get all campaign IDs. This works.
curl GET https://emailoctopus.com/api/1.5/campaigns?api_key={APIKEY}
Step 2: Get the report for a single campaign. This works too.
curl GET https://emailoctopus.com/api/1.5/campaigns/{CAMPAIGNID}/reports/summary?api_key={APIKEY}
Step 3: Combine step 1 and 2 and export to a CSV. No idea how to proceed here.
Output step 1:
{
"data": [
{
"id": "00000000-0000-0000-0000-000000000000",
"status": "SENT",
"name": "Foo",
"subject": "Bar",
"to": [
"00000000-0000-0000-0000-000000000001",
"00000000-0000-0000-0000-000000000002"
],
"from": {
"name": "John Doe",
"email_address": "john.doe#gmail.com"
},
"content": {
"html": "<html>Foo Bar<html>",
"plain_text": "Foo Bar"
},
"created_at": "2019-10-30T13:46:46+00:00",
"sent_at": "2019-10-31T13:46:46+00:00"
},
{
"id": "00000000-0000-0000-0000-000000000003",
"status": "SENT",
"name": "Bar",
"subject": "Foo",
"to": [
"00000000-0000-0000-0000-000000000004",
"00000000-0000-0000-0000-000000000005"
],
"from": {
"name": "Jane Doe",
"email_address": "jane.doe#gmail.com"
},
"content": {
"html": "<html>Bar Foo<html>",
"plain_text": "Bar Foo"
},
"created_at": "2019-11-01T13:46:46+00:00",
"sent_at": "2019-11-02T13:46:46+00:00"
}
],
"paging": {
"next": null,
"previous": null
}
}
Output step 2:
{
"id": "00000000-0000-0000-0000-000000000000",
"sent": 200,
"bounced": {
"soft": 10,
"hard": 5
},
"opened": {
"total": 110,
"unique": 85
},
"clicked": {
"total": 70,
"unique": 65
},
"complained": 50,
"unsubscribed": 25
}
How can I get all campaign reports in one go and exported to a CSV?
May be this URLs be helpful
Merging two json in PHP
How to export to csv file a PHP Array with a button?
https://www.kodingmadesimple.com/2016/12/convert-json-to-csv-php.html

Grouping from Json data, which contain array of dictionaries in Swift 4

I've local json file, which contains array of dictionaries. I want to group based on key name from below json. Means same name in one group. Please tell me how can I achieve that. Thank you.
Json Data:
[
{
"name": "Abc",
"number": 123,
"marks": 78
},
{
"name": "xyz",
"number": 456,
"marks": 50
},
{
"name": "Abc",
"number": 789,
"marks": 78
}
]
Code:
code and error message
init(grouping:by:)
You should return value of name key in the closure.
let arr = [ [ "name": "Abc", "number": 123, "marks": 78 ], [ "name": "xyz", "number": 456, "marks": 50 ], [ "name": "Abc", "number": 789, "marks": 78 ] ]
let dict = Dictionary(grouping: arr) { $0["name"] as! String }
print(dict)
//["Abc": [["name": "Abc", "number": 123, "marks": 78], ["name": "Abc", "number": 789, "marks": 78]], "xyz": [["name": "xyz", "number": 456, "marks": 50]]]

fill a JSON file with embedded arrays from Rails

I have a simple "rss" (ApplicationRecord) table indexed by an id. I would like to have a structured JSON that group each user from a family in an array structure. And then each family in a global array. How can I do that ?
my current plain code to put my data in a json file is :
json.rss #rss do |rs|
json.id rs.id
json.name rs.name
json.family rs.family
json.lastdate rs.lastdate
json.last rs.last
json.s1w rs.s1w
json.s2w rs.s2w
end
But the target file that I want is this one :
{
"rss": [
{
"familyname": "Smith",
"children": [
{
"id": "1",
"name": "bob",
"lastdate": "2010-09-23",
"last": "0.88",
"s1w": "0.83",
"s2w": "0.88"
},
{
"id": 2,
"name": "Mary",
"lastdate": "2011-09-23",
"last": "0.89",
"s1w": "0.83",
"s2w": "0.87"
}
]
},
{
"familyname": "Wesson",
"children": [
{
"id": "1",
"name": "john",
"lastdate": "2001-09-23",
"last": "0.88",
"s1w": "0.83",
"s2w": "0.88"
},
{
"id": 2,
"name": "Bruce",
"lastdate": "2000-09-23",
"last": "0.89",
"s1w": "0.83",
"s2w": "0.87"
}
]
}
]
}
The grouping you are trying to achieve can be done in Ruby with:
#rss.group_by(&:family).values
This is assuming #rss is an array-like collection of objects that have a .family method. The result: is an array of arrays of objects grouped by family.
Now it will be up to use to use Jbuilder's array! method to build the desired JSON output.

Avro java serialization as json not working correctly

I have a simple avro schema, from which I generated a java class using the avro-maven-plugin.
The avro schema is as follows:
{
"type": "record",
"name": "addressGeo",
"namespace": "com.mycompany",
"doc": "Best record address and list of geos",
"fields": [
{
"name": "version",
"type": "int",
"default": 1,
"doc": "version the class"
},
{
"name": "eventType",
"type": "string",
"default": "addressGeo",
"doc": "event type"
},
{
"name": "parcelId",
"type": "long",
"doc": "ParcelID of the parcel. Join parcelid and sequence with ParcelInfo"
},
{
"name": "geoCodes",
"type": {"type": "array", "items": "com.mycompany.geoCode"},
"doc": "Multiple Geocodes, with restrictions information"
},
{
"name": "brfAddress",
"type": ["null", "com.mycompany.address"],
"doc": "Address cleansed version of BRF"
}
]
}
If I construct a simple object using the builder, and serialize it using json, I get the following output:
{
"version": 1,
"eventType": {
"bytes": [
97,
100,
100,
114,
101,
115,
115,
71,
101,
111
],
"length": 10,
"string": null
},
"parcelId": 1,
"geoCodes": [
{
"version": 1,
"latitude": 1,
"longitude": 1,
"geoQualityCode": "g",
"geoSourceTypeID": 1,
"restrictions": "NONE"
}
],
"brfAddress": {
"version": 1,
"houseNumber": "1",
"houseNumberFraction": null,
"streetDirectionPrefix": null,
"streetName": "main",
"streetSuffix": "street",
"streetDirectionSuffix": null,
"fullStreetAddress": "1 main street, seattle, wa, 98101",
"unitPrefix": null,
"unitNumber": null,
"city": "seattle",
"state": "wa",
"zipCode": "98101",
"zipPlusFour": null,
"addressDPV": "Y",
"addressQualityCode": "good",
"buildingNumber": "1",
"carrierRoute": "t",
"censusTract": "c",
"censusTractAndBlock": "b",
"dataCleanerTypeID": 1,
"restrictions": "NONE"
}
}
Note the output of the eventType field. It is coming through as an array of bytes whereas the type of the field is a CharSequence.
Any idea why serialization is doing this? It works fine for other types that are strings.
I am using google-gson to serialize the object to json.
You might be working with a older version of avro, that uses CharSequence. Ideally string type should be java String type. I would suggest to update the avro version or have a look at this one - Apache Avro: map uses CharSequence as key

How to represent typical json information

Below JSON contains 4 items in an array. If you look at each item it is some what shows incomplete keys. I am unable to figure out how to represent the consistent data in iOS (of any UI design patterns). By looking at the below information somewhat they are interlinked to each other for example parent key value information is same as company key value and also in employees one value is same as name key value. This seems to be very typical.
{
"arays": [
{
"company": "Microchip",
"parent": "File",
"employees": [
"John",
"Mike"
]
},
{
"company": "Apple",
"mobiles": [
"111-111-1111",
"121-121-1212"
]
},
{
"company": "File",
"parent": "Apple",
"addresses": [
"2600 space center blvd",
"2700 university dr"
]
},
{
"name": "John",
"mobiles": [
"222-222-2222"
],
"addresses": [
"Time Square, NY"
]
}
]
}

Resources