Categorical aggregation in Histogrammar - histogrammar

This is a followup question of SO Two-dimensional aggregation in Histogrammar (Jim Pivarski created this entry from a private email question):
From:
data = [{"item": 'ball', "qty": 3.0},
{"item": 'whistle', "qty": 2.0},
{"item": 'ball', "qty": 5.0}]
I want to obtain a sum aggregation using Histogrammar, i.e.:
ball: 8.0
whistle: 20
Following http://histogrammar.org/docs/tutorials/python-numpy/#histogrammar-in-numpy, and Jim's advice in mentioned SO , I try:
import histogrammar as hg
data = [{"item": 'ball', "qty": 3.0}, {"item": 'whistle', "qty": 2.0}, {"item": 'ball', "qty": 5.0}]
h = hg.Categorize(quantity=lambda d: d.item, value=hg.Sum(lambda d: d.qty))
for datum in data:
h.fill(datum)
print h.toJson()
I get:
AttributeError: 'dict' object has no attribute 'item'

This is just a Python issue: since each datum in your example has the form
{"item": X, "qty": Y}
the way to access it is with d["item"] and d["qty"], rather than d.item and d.qty.
So
h = hg.Categorize(quantity=lambda d: d["item"], value=hg.Sum(lambda d: d["qty"]))
for datum in data:
h.fill(datum)
print h.toJsonString()
results in
{"data": {"bins:type": "Sum", "bins": {"whistle": {"sum": 2.0, "entries": 1.0},
"ball": {"sum": 8.0, "entries": 2.0}}, "entries": 3.0}, "version": "1.0",
"type": "Categorize"}
If you change the way your data are represented, you'd have to change the way they're extracted from each datum.
Incidentally, Histogrammar-Python has a string-based shortcut that extracts fields as attributes (as you were trying to do) or as items (as I did above). The following would work with either kind of data:
h = hg.Categorize("item", hg.Sum("qty"))
This string-based method will also work if data is a dictionary of 1D Numpy arrays (or equivalently, a Numpy Record Array; I don't remember if there's a Pandas hook in there, too). In that case, you'd declare the histogram exactly as above but fill it like this:
h.fill.numpy(data)
It's the different fill method that interprets the strings differently.

Related

Why is GeoDjango not returning my GeoJSON in SRID 4326?

I have a model with point data stored in srid 2953.
When I serialize this data, I assumed that GeoDjando would convert this to valid GeoJSON by converting the coordinates to SRID 4326.
Maybe I need to specificly tell it to convert this?
From what I have read I understand that CRS has been depreciated from GeoJSON, and that it is only valid in SRID 4326?
class Hpnrecord(models.Model):
...
geom = models.PointField(srid=2953, null=True)
Later in a serializer I have:
class HpnrecordSerializer(serializers.GeoFeatureModelSerializer):
class Meta:
fields = "__all__"
geo_field = "geom"
model = Hpnrecord
When I view the returned data I am getting this:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
2594598.985,
7425392.375
]
},
"properties": {
}
},
as you can see, the coordinates are being displayed as Easting and Northing (the same as what is stored in the model), and not being converted to SRID 4326. My endpoint is expecting to receive this in srid 4326.
How do I specify that I expect the serializaiton to be in SRID 4326?
As you might have noticed, SRID transformations are not done automatically. I have 2 suggestions for you:
Suggestion 1: Store the data in the desired SRID
Before you store the data, you convert it first to your desired srid of 4326. Your model would change:
class Hpnrecord(models.Model):
...
geom = models.PointField(srid=4326, null=True)
Storing the data would look like this:
from django.contrib.gis.geos import Point
...
point = Point(x, y, srid=2953)
point.transform(4326)
model_instance.geom = point
model_instance.save()
Suggestion 2: Use the serializer's to_representation()
You keep your models as they are and you convert the SRID on the fly using the serializer's to_representation() method, see the docs. Note that converting it on the fly will result in a speed penalty, but you can leave the models as they are.
class HpnrecordSerializer(serializers.GeoFeatureModelSerializer):
class Meta:
fields = "__all__"
geo_field = "geom"
model = Hpnrecord
def to_representation(self, instance):
"""Convert `geom` to srid 4326."""
ret = super().to_representation(instance)
ret['geom'].transform(4326)
return ret

OPA masking a dynamic array field

I'm trying to apply masking on an input and result field that is part of an array. And the size of the array is dynamic. Based on the documentation, it is instructed to provide absolute array index which is not possible in this use case. Do we have any alternative?
Eg. If one needs to mask the age field of all the students from the input document?
Input:
"students" : [
{
"name": "Student 1",
"major": "Math",
"age": "18"
},
{
"name": "Student 2",
"major": "Science",
"age": "20"
},
{
"name": "Student 3",
"major": "Entrepreneurship",
"age": "25"
}
]
If you want to just generate a copy of input that has a field (or set of fields) removed from the input, you can use json.remove. The trick is to use a comprehension to compute the list of paths to remove. For example:
paths_to_remove := [sprintf("/students/%v/age", [x]) | some x; input.students[x]]
result := json.remove(input, paths_to_remove)
If you are trying to mask fields from the input document in the decision log using the Decision Log Masking feature then you would write something like:
package system.log
mask[x] {
some i
input.input.students[i]
x := sprintf("/input/students/%v/age", [i])
}

How to get only matched data from nested class using query builder in elastic search?

I am trying to get only the matched data from nested array of elastic search class. but I am not able to get it..the whole nested array data is being returned as output.
this is my Query:-
QueryBuilders.nestedQuery("questions",
QueryBuilders.boolQuery()
.must(QueryBuilders.matchQuery("questions.questionTypeId", quesTypeId)), ScoreMode.None)
.innerHit(new InnerHitBuilder());
I am using querybuilders to get data from nested class.Its working fine but not able to get only the matched data.
Request Body :
{
"questionTypeId" : "MCMC"
}
when questionTypeId = "MCMC"
this is the output i am getting..Here I want to exclude the output for which the questionTypeId = "SCMC".
output :
{
"id": "46",
"subjectId": 1,
"topicId": 1,
"subtopicId": 1,
"languageId": 1,
"difficultyId": 4,
"isConceptual": false,
"examCatId": 3,
"examId": 1,
"usedIn": 1,
"questions": [
{
"id": "46_31",
"pid": 31,
"questionId": "QID41336691",
"childId": "CID1",
"questionTypeId": "MCMC",
"instruction": "This is a single correct multiple choice question.",
"question": "Who holds the most english premier league titles?",
"solution": "Manchester United",
"status": 1000,
"questionTranslation": []
},
{
"id": "46_33",
"pid": 33,
"questionId": "QID41336677",
"childId": "CID1",
"questionTypeId": "SCMC",
"instruction": "This is a single correct multiple choice question.",
"question": "Who holds the most english premier league titles?",
"solution": "Manchester United",
"status": 1000,
"questionTranslation": []
}
]
}
As you have tagged this with spring-data-elasticsearch:
Support to return inner hits was recently added to version 4.1.M1 and so will be included in the next released version. Then in a SearchHit you will get the complete top level document, but in the innerHits property only the matching inner hits will be returned.

How to create Neo4J relationship between nodes yelp dataset

I am new to Neo4j. I am trying to populate Yelp dataset in Neo4j. Basically, I am interested in three json file provided by them i.e.
user.json
{
"user_id": "-lGwMGHMC_XihFJNKCJNRg",
"name": "Gabe",
"review_count": 277,
"yelping_since": "2014-10-31",
"friends": ["Oa84FFGBw1axX8O6uDkmqg", "SRcWERSl4rhm-Bz9zN_J8g", "VMVGukgapRtx3MIydAibkQ", "8sLNQ3dAV35VBCnPaMh1Lw", "87LhHHXbQYWr5wlo5W7_QQ"],
"useful": 45,
"funny": 4,
"cool": 55,
"fans": 17,
"elite": [],
"average_stars": 4.72,
"compliment_hot": 5,
"compliment_more": 1,
"compliment_profile": 0,
"compliment_cute": 1,
"compliment_list": 0,
"compliment_note": 11,
"compliment_plain": 20,
"compliment_cool": 15,
"compliment_funny": 15,
"compliment_writer": 1,
"compliment_photos": 8
}
I have omitted several entries from friends array to make output readable
business.json
{
"business_id": "YDf95gJZaq05wvo7hTQbbQ",
"name": "Richmond Town Square",
"neighborhood": "",
"address": "691 Richmond Rd",
"city": "Richmond Heights",
"state": "OH",
"postal_code": "44143",
"latitude": 41.5417162,
"longitude": -81.4931165,
"stars": 2.0,
"review_count": 17,
"is_open": 1,
"attributes": {
"RestaurantsPriceRange2": 2,
"BusinessParking": {
"garage": false,
"street": false,
"validated": false,
"lot": true,
"valet": false
},
"BikeParking": true,
"WheelchairAccessible": true
},
"categories": ["Shopping", "Shopping Centers"],
"hours": {
"Monday": "10:00-21:00",
"Tuesday": "10:00-21:00",
"Friday": "10:00-21:00",
"Wednesday": "10:00-21:00",
"Thursday": "10:00-21:00",
"Sunday": "11:00-18:00",
"Saturday": "10:00-21:00"
}
}
review.json
{
"review_id": "VfBHSwC5Vz_pbFluy07i9Q",
"user_id": "-lGwMGHMC_XihFJNKCJNRg",
"business_id": "YDf95gJZaq05wvo7hTQbbQ",
"stars": 5,
"date": "2016-07-12",
"text": "My girlfriend and I stayed here for 3 nights and loved it.",
"useful": 0,
"funny": 0,
"cool": 0
}
As we can see in the sample files that relationship between user and business is associated via the review.json file. How can I create a relationship edge between user and business using the review.json file.
I have also seen Mark Needham tutorial where he has shown StackOverflow data population but in that case, relationship file was already present with sample data. Do I need to build a similar file? If yes, how should I approach this problem? or is there any other way to build relationship between user & business?
It very much depends on your model as to what you want to do, but you could do 3 imports:
//Create Users - does assume the data is unique
CALL apoc.load.json('file:///c://temp//SO//user.json') YIELD value AS user
CREATE (u:User)
SET u = user
then add the businesses:
CALL apoc.load.json('file:///c://temp//SO//business.json') YIELD value AS business
CREATE (b:Business {
business_id : business.business_id,
name : business.name,
neighborhood : business.neighborhood,
address : business.address,
city : business.city,
state : business.state,
postal_code : business.postal_code,
latitude : business.latitude,
longitude : business.longitude,
stars : business.stars,
review_count : business.review_count,
is_open : business.is_open,
categories : business.categories
})
For the businesses, we can't just do the SET b = business because the JSON has nested maps. So you might want to decide if you want them, and might have to go down a different route.
Lastly, the reviews, which is where we join it all up.
CALL apoc.load.json('file:///c://temp//SO//review.json') YIELD value AS review
CREATE (r:Review)
SET r = review
WITH r
//Match user to a review
MATCH (u:User {user_id: r.user_id})
CREATE (u)-[:HAS_REVIEW]->(r)
WITH r, u
//Match business to a review, and a user to a business
MATCH (b:Business {business_id: r.business_id})
//Merge here in case of multiple reviews
MERGE (u)-[:HAS_REVIEWED]->(b)
CREATE (b)-[:HAS_REVIEW]->(r)
Obviously - change labels/relationships to types you want, and it might need tuning depending on the size of data etc, so you might need to use apoc.periodic.iterate to work it.
Apoc is here if you need it (and you should use it!)

How to describe AttributeMappings with object that has no key in RestKit?

I have the json like that:
{
"response":
[
8236,
{
"pid": 1234,
"lat": 56,
"long": 30,
},
{
"pid": 123,
"lat": 56,
"long": 29
},
]
}
So how to describe it in RKEntityMapping? How to describe object without key? What attributes should be in AttributeMappingsFromDictionary?
Do I need to create 2 classes with relationships like that:
First one will be describe Root object with variables counter and relationship to second class which have pid,lat and long?
I tried to do like described above with 2 classes and relationship but restkit crash.
You would need to use a response descriptor with a dynamic mapping and key path of response. The dynamic mapping should be passed each item in the array in turn and you can then decide what mapping to return to handle it.
To deal with the individual mapping you would need to use a nil keypath mapping.

Resources