Half node in sankey Highchart - highcharts

Is there any way to draw a half node in Sankey using Highcharts?. I
have searched a lot but didn't get any solution. From d3 it can be done but the same thing I want to achieve with Highcharts. For reference, I have attached the image. If you observe the nodes Terminated before Institution, Terminated after institution, Pending Final Decision, Pending Institution Decision, and Other all these are half nodes. So, is there any way I can do the same with Highcharts?
[1]: https://i.stack.imgur.com/BeQGC.pngstrong text

Currently, the only easy way to do that seems to be to use a combination of column and offset node options:
series: [{
keys: ['from', 'to', 'weight'],
data: [
['China', 'EU', 94],
['China', 'US', 53]
],
nodes: [{
id: 'US',
column: 1,
offset: 110,
}, {
id: 'EU',
offset: -70,
column: 2
}],
...
}]
Live demo: https://jsfiddle.net/BlackLabel/x593vrtj/
API Reference:
https://api.highcharts.com/highcharts/series.sankey.nodes.column
https://api.highcharts.com/highcharts/series.sankey.nodes.offset

Related

Determine differences between incoming CSV data and existing Mongo collection for large data sets

I have an incoming CSV that I am trying to compare with an existing collection of mongo documents (Note objects) to determine additions, deletions, and updates. The incoming CSV and mongo collection are quite large at around 500K records each.
ex. csv_data
[{
id: 1, text: "zzz"
},
{
id: 2, text: "bbb"
},
{
id: 4, text: "ddd"
},
{
id: 5, text: "eee"
}]
Mongo collection of Note objects:
[{
id: 1, text: "aaa"
},
{
id: 2, text: "bbb"
},
{
id: 3, text: "ccc"
},
{
id: 4, text: "ddd"
}]
As a result I would want to get
an array of additions
[{
id: 5, text: "eee"
}]
an array of removals
[{
id: 3, text: "ccc"
}]
an array of updates
[{
id: 1, text: "zzz"
}]
I tried using select statements to filter for each particular difference but it is failing / taking hours when using the real data set with all 500k records.
additions = csv_data.select{|record| !Note.where(id: record[:id]).exists?}
deletions = Note.all.select{|note| !csv_data.any?{|row| row[:id] == note.id}}
updates = csv_data.select do |record|
note = Note.where(id: record[:id])
note.exists? && note.first.text != record[:text]
end
How would I better optimize this?
Assumption: the CSV file is a snapshot of the data in the database taken at some other time, and you want a diff.
In order to get the answers you want, you need to read every record in the DB. Right now you are effectively doing this three times, once to obtain each statistic. Which is c.1.5m DB calls, and possibly more if there are significantly more notes on the DB than there are in the file. I'd follow these steps:
Read the CSV data into a hash keyed by ID
Read each record in the database, and for each record:
If the DB ID is found in the CSV hash, move it from the hash to the updates
If the DB ID isn't found in the CSV hash, add it to the deletes
When you reach the end of the DB, anything still left in the CSV hash must therefore be an addition
While it's still not super-slick, at least you only get to do the database I/O once instead of three times...

How to use a multi-layer map with countries and lakes with Highmaps?

Is there any way to use a multi-layer map in Highcharts? In my case, I need three layers: One for the countries, one for the borders (which show the disputed ones differently than the normal ones) and one for the lakes, like this:
In the moment, I don't see how this could be achieved. Or can I export the three layers from shapefile to JSON and then stick the three together? But would a »join« in order to color the countries still work?
Thanks for any hints.
According to the comments - something, like is required on the image, can be done by base on this official demo: https://jsfiddle.net/gh/get/library/pure/highcharts/highcharts/tree/master/samples/maps/demo/mapline-mappoint
After some attempts, #luftikus143 face into the issue that geometry can't be set as null in his custom JSON file and my solution is to assign it as an object with the empty coordinates array. Demo: jsfiddle.net/BlackLabel/06xvrs8m/1
{
"type": "Feature",
"geometry": {
type: 'polygon',
"coordinates": [
[
]
]
},
"properties": {
"OBJECTID": 1,
"NAME": "Great Bear Lake",
"Shape_Leng": 35.7525061159,
"Shape_Area": 6.12829979344
}
},

predix sum query on non negative numbers only

I have a timeseries dataset which has both negative and non-negative numbers. There is a value (-999) which indicated nan values in the cloud. What I want to do is, I want to use a sum query which will take the negative numbers into consideration. Is there a way to omit negative numbers while querying?
If I understand your question correctly, you are looking for a Predix Time Series query that will return the sum of all tag readings but exclude any -999 values from the result.
If so, the query body might look like this:
{"start": "1w-ago",
"tags": [{
"name": "STACK",
"filters": {"measurements": {"values": -999, "condition": "gt"}},
"aggregations": [{"type": "sum", "sampling": {"datapoints": 1}}]
}]
}
I wrote a small test script with the PredixPy SDK to demonstrate the scenario and result if that's helpful for you.
# Run this is a new space to create services
import predix.admin.app
app = predix.admin.app.Manifest()
app.create_uaa('stack-admin-secret')
app.create_client('stack-client', 'stack-client-secret')
app.create_timeseries()
# Populate some test data into time series
tag = 'STACK'
values = [-999, -5, 10, 20, 30]
ts = app.get_timeseries()
for val in values:
ts.send(tag, val)
# Query and compare against expected result
expected = sum(values[1:])
response = ts.get_datapoints(tag, measurement=('gt', -999), aggregations='sum')
result = response['tags'][0]['results'][0]['values'][0][1]
print(expected, result)
You may also want to consider in the future that when data is ingested you use the quality attribute so that instead of filtering on values greater than -999 you could query for quality is GOOD or UNCERTAIN.
{"start": "1w-ago",
"tags": [{"name": "STACK",
"filters": {"qualities": {"values": ["3"]}},
"aggregations": [{"type": "sum", "sampling": {"datapoints": 1}}]
}]
}
Hope that helps.

How can I join 2 or more mapData in my highmaps?

I want to plot Asia-Pacific (Asia and Africa), how can I achieve this?
series: [{
data: data,
mapData: Highcharts.maps['custom/asia'],
joinBy: 'hc-key',
name: 'Random data',
},
It is not possible to just join two maps you mentioned, because they have different base coordinates/properties. You would end up with two overlapping maps - demo #1, demo #2.
You could use bigger map - with more than you need (e.g. continents world map). Next don't provide data for areas you do not want to have and set allAreas to false.
Example: http://jsfiddle.net/8wsezjqy/
Another option is to create a custom map as explained in the Docs.

Is it not good to use huge "documents" in MongoDB?

Since we can structure a MongoDB any way we want, we can do it this way
{ products:
[
{ date: "2010-09-08", data: { pageviews: 23, timeOnPage: 178 }},
{ date: "2010-09-09", data: { pageviews: 36, timeOnPage: 202 }}
],
brands:
[
{ date: "2010-09-08", data: { pageviews: 123, timeOnPage: 210 }},
{ date: "2010-09-09", data: { pageviews: 61, timeOnPage: 876 }}
]
}
so as we add data to it day after day, the products document and brands document will become bigger and bigger. After 3 years, there will be a thousand elements in products and in brands. Is it not good for MongoDB? Should we break it down more into 4 documents:
{ type: 'products', date: "2010-09-08", data: { pageviews: 23, timeOnPage: 178 }}
{ type: 'products', date: "2010-09-09", data: { pageviews: 36, timeOnPage: 202 }}
{ type: 'brands', date: "2010-09-08", data: { pageviews: 123, timeOnPage: 210 }}
{ type: 'brands', date: "2010-09-08", data: { pageviews: 61, timeOnPage: 876 }}
So that after 3 years, there will be just 2000 "documents"?
Assuming you're using Mongoid (you tagged it), you wouldn't want to use your first schema idea. It would be very inefficient for Mongoid to pull out those huge documents each time you wanted to look up a single little value.
What would probably be a much better model for you is:
class Log
include Mongoid::Document
field :type
field :date
field :pageviews, :type => Integer
field :time_on_page, :type => Integer
end
This would give you documents that look like:
{_id: ..., date: '2010-09-08', type: 'products', pageviews: 23, time_on_page: 178}
Don't worry about the number of documents - Mongo can handle billions of these. And you can index on type and date to easily find whatever figures you want.
Furthermore, this way it's a lot easier to update the records through the driver, without even pulling the record from the database. For example, on each pageview you could do something like:
Log.collection.update({'type' => 'products', 'date' => '2010-09-08'}, {'$inc' => {'pageview' => 1}})
I'm not a MongoDB expert, but 1000 isn't "huge". Also I would seriously doubt any difference between 1 top-level document containing 4000 total subelements, and 4 top-level documents each containing 1000 subelements -- one of those six-of-one vs. half-dozen-of-another issues.
Now if you were talking 1 document with 1,000,000 elements vs. 1000 documents each with 1000 elements, that's a different order of magnitude + there might be advantages of one vs. the other, either/both in storage time or query time.
You have talked about how you are going to update the data, but how do you plan to query it? It probably makes a difference on how you should structure your docs.
The problem with using embedded elements in arrays is that each time you add to that it may not fit in the current space allocated for the document. This will cause the (new) document to be reallocated and moved (that move will require re-writing any of the indexes for the doc).
I would generally suggest the second form you suggested, but it depends on the questions above.
Note: 4MB is an arbitrary limit and will be raised soon; you can recompile the server for any limit you want in fact.
It seems your design closely resembles the relational table schema.
So every document added will be a separate entry in a collection having its own identifier. Though mongo document size is limited to 4 MB, its mostly enough to accommodate plain text documents. And you don't have to worry about number of growing documents in mongo, thats the essence of document based databases.
Only thing you need to worry about is size of the db collection. Its limited to 2GB for 32 bit systems. Because MongoDB uses memory-mapped files, as they're tied to the available memory addressing. This is not a problem with 64 bit systems.
Hope this helps
Again this depends on your use case of querying. If you really care about single item, such as products per day:
{ type: 'products', date: "2010-09-08", data: { pageviews: 23, timeOnPage: 178 }}
then you could include multiple days in one date.
{ type: 'products', { date: "2010-09-08", data: { pageviews: 23, timeOnPage: 178 } } }
We use something like this:
{ type: 'products', "2010" : { "09" : { "08" : data: { pageviews: 23, timeOnPage: 178 }} } } }
So we can increment by day: { "$inc" : { "2010.09.08.data.pageviews" : 1 } }
Maybe seems complicated, but the advantage is you can store all data about a 'type' in 1 record. So you can retrieve a single record and have all information.

Resources